PDI steps and entries included in the Hadoop add-on installation
If you want to use the PDI client to access and manipulate your data on a Hadoop cluster, you must also apply the Hadoop add-on installation. See Install PDI tools and plugins for instructions to include the add-on.
With the Hadoop add-on, you can also use the following transformation steps and job entries from your PDI client:
Transformation steps
Annotate Stream
CouchDB Input
Hadoop File Input
Hadoop File Output
HBase Input
HBase Output
HBase Row Decoder
MapReduce Input
MapReduce Output
ORC Input
ORC Output
Parquet Input
Parquet Output
Shared Dimension
Job entries
Amazon EMR Job Executor
Amazon Hive Job Executor
Build Model
Hadoop Copy Files
Hadoop Job Executor
Oozie Job Executor
Pentaho MapReduce
Publish Model
Spark Submit
Sqoop Export
Sqoop Import
Start a PDI cluster on YARN
Stop a PDI cluster on YARN
See the Pentaho Data Integration document for details on PDI transformations, jobs, steps, and entries.
Last updated
Was this helpful?