Advanced Pentaho Data Integration topics

The following topics help to extend your knowledge of PDI beyond basic setup and use:

  • PDI and Hitachi Content Platform (HCP)

    You can use PDI transformation steps to improve your HCP data quality before storing the data in other formats, such as JSON , XML, or Parquet.

  • Hierarchical data

    You can manipulate structured, complex, and nested data types, and load filtered subsets of large JSON files.

  • PDI and Snowflake

    Using PDI job entries for Snowflake, you can load your data into Snowflake and orchestrate warehouse operations.

  • Metadata discovery

    You can use to automate the tedious process of manually identifying and determining metadata from Cobol Copybook and databases.

  • Use Streamlined Data Refinery (SDR)

    You can use SDR to build a simplified and specific ETL refinery composed of a series of PDI jobs that take raw data, augment and blend it through the request form, and then publish it to use in Analyzer.

  • Use Command Line Tools

    You can use PDI's command line tools to execute PDI content from outside of the PDI client.

  • Metadata Injection

    You can insert data from various sources into a transformation at runtime.

  • Use Carte Clusters

    You can use Carte to build a simple web server that allows you to run transformations and jobs remotely.

  • Connecting to a Hadoop cluster with the PDI client

    Use transformation steps to connect to a variety of Big Data data sources, including Hadoop, NoSQL, and analytical databases such as MongoDB.

  • Partition Data

    Split a data set into a number of sub-sets according to a rule that is applied on a row of data.

  • Use a Data Service

    Query the output of a step as if the data were stored in a physical table by turning a transformation into a data service.

  • Use Data Lineage

    Track your data from source systems to target applications and take advantage of third-party tools, such as Meta Integration Technology (MITI) and yEd, to track and view specific data.

  • Use the Marketplace

    Download, install, and share plugins developed by Pentaho and members of the user community.

Note: If you want to develop custom plugins that extend PDI functionality or embed the engine into your own Java applications, see the Administer Pentaho Data Integration and Analytics document.

Last updated