PDI steps and entries included in the Hadoop add-on installation

If you want to use the PDI client to access and manipulate your data on a Hadoop cluster, you must also apply the Hadoop add-on installation. See Install PDI tools and plugins for instructions to include the add-on.

With the Hadoop add-on, you can also use the following transformation steps and job entries from your PDI client:

  • Transformation steps

    • Annotate Stream

    • CouchDB Input

    • Hadoop File Input

    • Hadoop File Output

    • HBase Input

    • HBase Output

    • HBase Row Decoder

    • MapReduce Input

    • MapReduce Output

    • ORC Input

    • ORC Output

    • Parquet Input

    • Parquet Output

    • Shared Dimension

  • Job entries

    • Amazon EMR Job Executor

    • Amazon Hive Job Executor

    • Build Model

    • Hadoop Copy Files

    • Hadoop Job Executor

    • Oozie Job Executor

    • Pentaho MapReduce

    • Publish Model

    • Spark Submit

    • Sqoop Export

    • Sqoop Import

    • Start a PDI cluster on YARN

    • Stop a PDI cluster on YARN

See the Pentaho Data Integration document for details on PDI transformations, jobs, steps, and entries.

Last updated

Was this helpful?