Job Setup tab

The following table describes the options for setting up the inputs and outputs of the job:

Option

Definition

Input path

Enter the path of the input directory, such as /wordcount/input, from your Hadoop cluster where the source data for the MapReduce job is stored. A comma-separated list can be used for multiple input directories.

Output path

Enter the path of the directory, such as /wordcount/output, on your Hadoop cluster where you want the output from the MapReduce job to be stored.

Note: The output directory cannot exist prior to running the MapReduce job.

Remove output path before job

Select to remove the specified output path before the MapReduce job is scheduled.

Input format

Enter the Apache Hadoop class name that describes the input specification for the MapReduce job. See InputFormat for more information.

Output format

Enter the Apache Hadoop class name that describes the output specification for the MapReduce job. See OutputFormatfor more information.

Ignore output of map key

Select to ignore the key output from the mapper transformation and replace it with NullWritable.

Ignore output of map value

Select to ignore the value output from the mapper transformation and replace it with NullWritable.

Ignore output of reduce key

Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer.

Ignore output of reduce value

Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer.

PreviousReducer tab NextCluster tab

Last updated 5 months ago

Was this helpful?