Job settings tab

This tab includes the following options:

Option

Description

Hive job flow name

Specify the name of the Hive job flow to execute.

S3 staging directory

Specify the Amazon Simple Storage Service (S3) address for the container of stored objects (the bucket) in which your job flow logs will be stored. Artifacts required for execution (for example, Hive Script) will also be stored in this bucket before execution.

Hive script

Specify the address of the Hive script to execute within Amazon S3 or on your local file system.

Command line arguments

Enter in any command line arguments you want to pass into the specified Hive script. Use spaces to separate multiple arguments.

Keep job flow alive

Select if you want to keep your job flow active after the PDI entry finishes. If this option is not selected, the job flow will terminate when the PDI entry finishes.

Enabling blocking

Select if you want the PDI entry to wait until the EMR Hive job completes. Blocking is the only way for PDI to be aware of the status of a Hive job. Additionally, selecting this option enables proper error handling and routing.

When you clear this option, the Hive job is blindly executed and PDI moves on to the next entry.

Logging interval

If you Enable blocking, specify the number of seconds between status log messages.

PreviousCluster NextBulk load into Amazon Redshift

Last updated 5 months ago