Job settings tab

Job settings tab, Amazon EMR Job Executor

This tab includes the following options:

Option

Description

EMR job flow name

Specify the name of the Amazon EMR job flow to execute.

S3 staging directory

Specify the Amazon Simple Storage Service (S3) address of the working directory for this Hadoop job. This directory will contain the MapReduce JAR and log files.

MapReduce Jar

Specify the address of the Java JAR that contains your Hadoop mapper and reducer classes. The job must be configured and submitted using a static main method in any class of the JAR.

Command line arguments

Enter in any command line arguments you want to pass into the static main method of the specified MapReduce Jar. Use spaces to separate multiple arguments.

Keep job flow alive

Select if you want to keep your job flow active after the PDI entry finishes. If this option is not selected, the job flow will terminate when the PDI entry finishes.

Enable blocking

Select if you want to force the job to wait until each PDI entry completes before continuing to the next entry. Blocking is the only way for PDI to be aware of the status of a Hadoop job. Additionally, selecting this option enables proper error handling and routing.

When you clear this option, the Hadoop job is blindly executed and PDI moves on to the next entry.

Logging interval

If you Enable blocking, specify the number of seconds between status log messages.

Last updated

Was this helpful?