PDI steps and entries included in the Hadoop add-on installation

If you want to use the PDI client to access and manipulate your data on a Hadoop cluster, you must also apply the Hadoop add-on installation. See Install PDI tools and plugins for instructions to include the add-on.

With the Hadoop add-on, you can also use the following transformation steps and job entries from your PDI client:

Transformation steps
- Annotate Stream
- CouchDB Input
- Hadoop File Input
- Hadoop File Output
- HBase Input
- HBase Output
- HBase Row Decoder
- MapReduce Input
- MapReduce Output
- ORC Input
- ORC Output
- Parquet Input
- Parquet Output
- Shared Dimension
Job entries
- Amazon EMR Job Executor
- Amazon Hive Job Executor
- Build Model
- Hadoop Copy Files
- Hadoop Job Executor
- Oozie Job Executor
- Pentaho MapReduce
- Publish Model
- Spark Submit
- Sqoop Export
- Sqoop Import
- Start a PDI cluster on YARN
- Stop a PDI cluster on YARN

See the Pentaho Data Integration document for details on PDI transformations, jobs, steps, and entries.

PreviousStep 4: Install PDI plugins NextDocker container deployment of Pentaho

Last updated 24 days ago

Was this helpful?