Pentaho Data Integration

Pentaho Data Integration (PDI) provides the Extract, Transform, and Load (ETL) capabilities that facilitate the process of capturing, cleansing, and storing data using a uniform and consistent format that is accessible and relevant to end users and IoT technologies. If you or your administrator have not already installed PDI on your system, see the Install Pentaho Data Integration and Analytics document for details.

Get Started with the PDI client

PDI client (also known as Spoon) is a desktop application that enables you to build transformations and schedule and run jobs.

Common uses of PDI client include:

  • Data migration between different databases and applications

  • Loading huge data sets into databases taking full advantage of cloud, clustered and massively parallel processing environments

  • Data cleansing with steps ranging from very simple to very complex transformations

  • Data integration including the ability to leverage real-time ETL as a data source for Pentaho Reporting

  • Data warehouse population with built-in support for slowly changing dimensions and surrogate key creation (as described above)

To get started with Pentaho Data Integration, see the following topics:

Last updated

Was this helpful?