Job Setup tab

The following table describes the options for setting up the inputs and outputs of the job:

Option

Definition

Input path

Enter the path of the input directory, such as /wordcount/input, from your Hadoop cluster where the source data for the MapReduce job is stored. A comma-separated list can be used for multiple input directories.If you want to input from S3 storage, then you must use the S3A connector with the s3a:// protocol. Connectors "s3" and "s3n" are not supported. See Hadoop documentation for details.

Output path

Enter the path of the directory, such as /wordcount/output, on your Hadoop cluster where you want the output from the MapReduce job to be stored. The output directory cannot exist prior to running the MapReduce job.To specify S3 storage as the destination, you must use the S3A connector with the s3a:// protocol.

Remove output path before job

Select to remove the specified output path before the MapReduce job is scheduled.Note: This option is not for use with S3. If you need to clean the output path for S3 destinations, use an alternative entry, such as Delete folders, to clear the output folder.

Input format

Enter the Apache Hadoop class name that describes the input specification for the MapReduce job. See InputFormat for more information.

Output format

Enter the Apache Hadoop class name that describes the output specification for the MapReduce job. See OutputFormatfor more information.

Ignore output of map key

Select to ignore the key output from the mapper transformation and replace it with NullWritable.

Ignore output of map value

Select to ignore the value output from the mapper transformation and replace it with NullWritable.

Ignore output of reduce key

Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer.

Ignore output of reduce value

Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer.

PreviousReducer tab NextCluster tab

Last updated 23 days ago

Was this helpful?