Amazon EMR Job Executor

The Amazon EMR Job Executor job entry runs Hadoop jobs in Amazon Elastic MapReduce (EMR). You can use this entry to access job flows in your Amazon Web Services (AWS) account.

Before you begin

You must have an AWS account configured for EMR.
You must have a Java JAR created to control the remote job.

Entry name

Entry name specifies the unique name of the job entry on the canvas. You can change it.

Configure the entry (tabs)

EMR settings tab

Use this tab to connect to your AWS account and select or create the EMR cluster.

AWS connection

Option

Description

Access key

Unique identifier for your AWS account. The access key and secret key are used to sign requests, identify the sender, and help prevent request tampering.

Secret key

Secret key associated with the access key. The access key and secret key are used to sign requests, identify the sender, and help prevent request tampering.

Region

Amazon EC2 region where the job flow runs. Available regions depend on your AWS account. See the AWS documentation for regions and availability zones.

Select Connect to establish the connection.

Cluster

Select New to create a new job flow (cluster), or Existing if you already have a job flow ID.

If you select New, configure these options:

Option

Description

EC2 role

Amazon EC2 instance profile role for the cluster. Processes running on cluster instances use this role when calling other AWS services. Available roles depend on your AWS account.

EMR role

Role that permits Amazon EMR to call other AWS services (for example, Amazon EC2) on your behalf. See the AWS documentation for EMR IAM roles. Available roles depend on your AWS account.

Master instance type

Amazon EC2 instance type used as the Hadoop master (handles task distribution).

Slave instance type

Amazon EC2 instance type used as one or more Hadoop workers. Valid only when Number of instances is greater than 1.

EMR release

EMR release version (defines service components and versions).

Number of instances

Number of EC2 instances for the job flow.

If you select Existing, specify the existing ID in Existing JobFlow ID.

Job settings tab

Option

Description

EMR job flow name

Name of the EMR job flow to execute.

S3 staging directory

Amazon S3 location of the working directory for this job. This directory contains the MapReduce JAR and log files.

MapReduce Jar

Location of the JAR that contains your Hadoop mapper and reducer classes. The job must be configured and submitted using a static main method in a class in the JAR.

Command line arguments

Command-line arguments passed to the static main method of the specified JAR. Separate multiple arguments with spaces.

Keep job flow alive

Keeps the job flow active after the entry finishes. If not selected, the job flow terminates when the entry finishes.

Enable blocking

Waits for the EMR job to complete before continuing to the next entry. Blocking is required for PDI to track job status and to support error handling and routing. If cleared, the job is submitted and PDI continues immediately.

Logging interval

When Enable blocking is selected, number of seconds between status log messages.

PreviousPDI job entries NextAmazon Hive Job Executor

Last updated 2 months ago

Was this helpful?

hashtagBefore you begin

hashtagEntry name

hashtagConfigure the entry (tabs)

hashtagEMR settings tab

hashtagJob settings tab

Before you begin

Entry name

Configure the entry (tabs)

EMR settings tab

Job settings tab