Sqoop Import

You can use the Sqoop Import job entry to import data from a relational database into the Hadoop Distributed File System (HDFS) by using Apache Sqoop.

You can create, edit, and select a Hadoop cluster configuration. Cluster configuration settings can be reused in other transformation steps and job entries that support Hadoop.

This job entry has two setup modes:

  • Quick Setup: Provides the minimum options needed to run a Sqoop import (default).

  • Advanced Options: Provides additional options to manage your import, including a command-line view that you can use to reuse an existing Sqoop command.

For more information about Apache Sqoop, see http://sqoop.apache.org/arrow-up-right.

General

  • Name: Specify the unique name of the job entry on the canvas. You can customize the name or leave it as the default.

  • Advanced Options: Select Advanced Options to switch to Advanced Options mode. In Advanced Options mode, select Quick Setup to return to Quick Setup mode.

Quick Setup mode

Sqoop Import step Quick Setup mode

Source

The source refers to the database that contains the data you want to import.

Option
Definition

Database Connection

Select Choose Available to select an existing database connection.

If you do not have an existing connection, select New. To modify an existing connection, select Edit.

Edit

Edit the selected database connection.

New

Create a new database connection. For more information, see Define data connectionsarrow-up-right.

Table

The source table name. If your database requires a schema, use SCHEMA.TABLE_NAME. The table must exist in the source database.

Browse

Browse configured database connections by using the Database Explorer.

Target

The target refers to the Hadoop cluster and HDFS directory where you want to write the imported data.

Option
Definition

Hadoop Cluster

Select an existing Hadoop cluster configuration or create a new one.

For Hadoop information, see Use Hadoop with Pentahoarrow-up-right.

Target Directory

The HDFS directory where you want to write the imported data.

Browse

Browse the cluster file system and select a target directory.

Note: Browse works only when you have a valid cluster connection configured.

chevron-rightOpen File dialog boxhashtag

When you have a valid cluster connection, select Browse to open the Open File dialog box.

Option
Definition

Open from Folder

The path and name of the HDFS directory you are browsing. This directory becomes the active directory.

Up One Level

Display the parent directory of the active directory.

Delete

Delete a folder from the active directory.

Create Folder

Create a new folder in the active directory.

Active Directory Contents

Display the contents of the active directory.

Filter

Filter the items displayed in the active directory contents.

Advanced Options mode

The Advanced Options mode displays List View by default.

circle-info

If you configured values in Quick Setup mode, those values display in List View.

Sqoop import step Advanced Options mode
Option
Definition

List View

View and edit settings as Argument/Value pairs on the Default tab.

Use the Custom tab to add your own argument/value pairs.

Command Line View

Enter command-line arguments. A typical use case is pasting an existing Sqoop command line into this field.

Last updated

Was this helpful?