# OpenLineage Plugin

The Pentaho Data Integration (PDI) OpenLineage plugin enables PDI to emit rich, standardized OpenLineage events that can be consumed by Pentaho Data Catalog (PDC) to capture how data moves and is transformed in PDI ETL pipelines. PDC uses information it captures to provide visual end-to-end transparency of data flows, which improves data observability, strengthens compliance and governance, aids in troubleshooting data issues, and enhances data trust and quality for business users.

OpenLineage events are emitted from PDI when supported transformations are executed by discovering input and output datasets and, when possible, generating column-level lineage.&#x20;

The OpenLineage plugin emits events for:&#x20;

* **Start**: transformation starts
* **Complete**: transformations ends
* **Abort**: transformation was stopped without errors
* **Fail**: transformation ended with errors

## Compatibility matrix

OpenLineage plugin functionality is certified to work as intended for the following versions of PDI:&#x20;

* 10.2.0.1 (SP1)
* 10.2.0.2 (SP2)
* 10.2.0.3 (SP3)
* 10.2.0.4 (SP4)
* 10.2.0.5 (SP5)
* 10.2.0.6 (SP6)
* 11.0

## Setting up the plugin

Before you begin, verify that you have a valid license for the OpenLineage plugin. For information about licenses, see [Acquire and install enterprise licenses](/install/pentaho-installation-overview-cp/acquire-and-install-enterprise-licenses.md).

To set up the OpenLineage plugin, you must complete the following tasks:

* [Download the plugin](#download-the-plugin)
* [Install the plugin](#install-the-plugin)
* [Create a configuration file for the plugin](#create-a-configuration-file-for-the-plugin)
* [Enable the plugin](#enable-the-plugin)
* [Validate the plugin works](#validate-the-plugin-works)

### Download the plugin

Download the OpenLineage plugin from the Pentaho Support Portal.

1. On the [Support Portal](https://support.pentaho.com/hc/en-us) home page, sign in using the Pentaho support username and password provided in your Pentaho Welcome Packet.
2. In the **Pentaho** card, click **Download**. The **Downloads** page opens.
3. In the **\<version>.x** list, click **Pentaho \<version> EE Marketplace Plugins Release**.
4. Scroll to the bottom of the page.
5. In the **Marketplace Plugins \<version>** section, click **Open Lineage**.
6. Download the [pdi-openlineage-plugin-\<plugin\_version>-\<build number>.zip](https://download.pentaho.com/PDI/Marketplace+Plugins+10.2/Open+Lineage#) file.

### Install the plugin

Install the OpenLineage plugin in the PDI client and Pentaho Server by running commands appropriate for your operating system.&#x20;

{% hint style="info" %}
**Note:** The plugin can be installed in the PDI client, Pentaho Server, or both.
{% endhint %}

Installation commands include the following placeholders that must be replaced:

* \<path-to-data-integration>: Replace with full path to the PDI client.
* \<path-to-pentaho-server>: Replace with full path to the Pentaho Server.
* \<version\_check\_option>:  Replace with one of the following options:
  * `none`: Installs the plugin on any version of Pentaho. If the Pentaho version is unsupported, an error is shown.
  * `loose`: Default option. Installs the plugin on certified and compatible, newer Pentaho versions.
  * `strict`: Installs plugin only on certified Pentaho versions.

To install the OpenLineage plugin, complete the following steps:

1. Stop the PDI client and Pentaho Server.
2. Extract the [pdi-openlineage-plugin-\<plugin\_version>-\<build number>.zip](https://download.pentaho.com/PDI/Marketplace+Plugins+10.2/Open+Lineage#) file to a folder on the computer where the PDI client or PDI Server is installed.
3. In the [pdi-openlineage-plugin-\<plugin\_version>-\<build number>](https://download.pentaho.com/PDI/Marketplace+Plugins+10.2/Open+Lineage#) folder, open a command prompt as an administrator.
4. In the command prompt, run the following installation commands for your operating system, replacing the placeholders for paths and version check options.&#x20;
   * Windows
     * PDI client

       `install.bat -t <path-to-data-integration> --platformVersionCheck <version_check_option>`&#x20;
     * PDI Server

       `install.bat -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>`&#x20;
   * Linux
     * PDI client

       `./install.sh -t <path-to-data-integration> --platformVersionCheck <version_check_option>`
     * PDI Server

       &#x20;`./install.sh -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>`
5. Start the PDI client and Pentaho Server.

### Generate an encrypted password

If you plan to emit events to PDC, and want to secure your password so that it's not in plain text, you can generate an encrypted password to authenticate to PDC. The encrypted password is used in the configuration file for the OpenLineage plugin.&#x20;

1. On the computer where the PDI client or PDI Server is installed, open a command prompt.
2. Run one of the following commands for your operating system:

   * **Windows**
     * To generate a password using the default Pentaho encryption seed, run the following command:

       ```bash
       cd \<path-to-data-integration> # or <path-to-pentaho-server>
       sh encr.bat <your_password>
       ```
     * To generate a password using your own custom encryption seed, run the following command:

       ```bash
       export KETTLE_TWO_WAY_PASSWORD_ENCODER_SEED=<your_custom_seed>your-custom-seed"
       cd \<path-to-data-integration> # or <path-to-pentaho-server>
       sh encr.bat <your_password>
       ```
   * **Linux**
     * To generate a password using the default Pentaho encryption seed, run the following command:

       ```bash
       cd /<path-to-data-integration> # or <path-to-pentaho-server>
       sh encr.sh <your_password>
       ```
     * To generate a password using your own custom encryption seed, run the following command:

       ```bash
       export KETTLE_TWO_WAY_PASSWORD_ENCODER_SEED=<your_custom_seed>your-custom-seed"
       cd /<path-to-data-integration> # or <path-to-pentaho-server>
       sh encr.sh <your_password>
       ```

   An encrypted password is generated and displayed in the command prompt, like the following example:&#x20;

   ```
   Encrypted 2be98afc86aa7f297a414ab3dce93bcc9
   ```

### Create a configuration file for the plugin

After you install the plugin, create a configuration file that specifies where to send open lineage events. You can create a simple configuration file for testing or a custom configuration to use in productio&#x6E;**.**

1. In a text editor, create a configuration file with content from one of the following examples, based on your needs:&#x20;
   * To create a simple configuration file that you can use to quickly validate that the plugin is working, include only the following content:&#x20;

     ```
     version: 0.0.1
     consumers:
       console:
     ```
   * To create a custom configuration file that includes OpenLineage event consumers in your Pentaho deployment, such as a PDC Server, include the following content:

     ```
     version: 0.0.1
     localHostname: <localhostName>   # optional
     debugMode: false               # PDI client (Spoon) only
     consumers:
       console:
       file:
         - path: /<path_to_file>/openlineage.json
       http:
         - name: PDC
           url: https://<pdc_server_host_name>
           endpoint: /lineage/api/events
           authenticationParameters:
             endpoint: /keycloak/realms/pdc/protocol/openid-connect/token
             username: <pdc_server_username>
             password: <pdc_server_password>
             client_id: pdc-client
             scope: openid
     ```
2. Save the file as `openlineageConfig.yml` in the PDI directory that contains your user-specific configuration files.

   **Notes:**&#x20;

   * By default, user-specific configuration files are stored in the `.kettle` directory, which is usually in one of the following locations:

     * Windows: `C:\Documents and Settings\example_user\.kettle`
     * Linux: `~/.kettle)`

     However, if you run PDI in a container, configuration files might resolve to the `/root/.kettle` directory.&#x20;
   * You can add multiple http consumers in the configuration file.

### Enable the plugin

After you install the OpenLineage plugin and create its configuration file, you must enable the plugin so that it can send open lineage events to the consumers you specified in the configuration file.

#### Enable in PDI client

Enable the plugin in the PDI client by completing the following steps:

1. Log into the PDI client and click **Edit** > **Edit the Kettle.properties file**. The Kettle properties window opens.
2. To make the plugin active, add the following variable and value: `KETTLE_OPEN_LINEAGE_ACTIVE=true`
3. To point PDI to your  `openlineageConfig.yml` file, add the following variable with the *\<path-to-config-file>* placeholder replaced by the full path to your configuration file directory: `KETTLE_OPEN_LINEAGE_CONFIG_FILE=/<path-to-config-file>/openlineageConfig.yml`
4. Click **OK**. The `kettle.properties` file is saved and the OpenLineage plugin is enabled.

#### Enable in Pentaho Server

Enable the client in the Pentaho Server, by completing the following steps:&#x20;

1. Navigate to the `kettle.properties` file.&#x20;

   **Note:** The `kettle.properties` file is usually in one of the following locations:

   * Windows: `C:\Documents and Settings\example_user\.kettle`
   * Linux: `~/.kettle)`

   If you run PDI in a container, the `kettle.properties` file is in the `/root/.kettle` directory.
2. Open the `kettle.properties` file in a text editor.
3. Enable the plugin with its configuration file by adding the following variables and values:&#x20;

   `KETTLE_OPEN_LINEAGE_ACTIVE=true`

   `KETTLE_OPEN_LINEAGE_CONFIG_FILE=/<path-to-config-file>/openlineageConfig.yml`
4. Save the `kettle.properties` file.

### Validate the plugin works

You can validate that the plugin is working by verifying that text related to OpenLineage appears in the appropriate logs and files.

To validate that the plugin is working, complete the following steps:

1. In the PDI client, click **File > Open**, and then navigate to sample transformations in your Pentaho folder. For example, in Windows the sampls are in `<path_to_Pentaho>\Pentaho\design-tools\data-integration\samples\transformations`.
2. Select the sample transformation, `TextInput and Output using variables.ktr`, and click **Open**.
3. To run the transformation click **Action** > **Run,** and then in the **Run Options** window, click **Run**. The transformation runs and **Execution Results** pane appears at the bottom of the PDI client.
4. Validate that consumers you have enabled are receiving OpenLineage events by taking one of the following actions:
   * If the `console` consumer is enabled, in the **Execution Results** pane of the PDI client, click the **Logging** tab and verify that the log contains lines with the text, "`OpenLineage-Plugin`".
   * If a `file` consumer is enabled, open the `openlineage.json` file in a text editor and verify that it contains lines with the text, "`OpenLineage-Plugin`". The `openlineage.json` file location is defined in the `openlineageConfig.yml` file.&#x20;
   * If an `HTTP` consumer is enabled, confirm OpenLineage events are arriving for that consumer. For example, if the PDC is a configured consumer, verify the events arrive in PDC.

#### Troubleshoot plugin

If you are unable to validate that the plugin is working, perform the following troubleshooting actions:

* Verify dataset lineage (input text file -> output text file) and column lineage mappings.
* Validate that the `Kettle.properties` file contains the following variable and value: `KETTLE_OPEN_LINEAGE_ACTIVE=true`.&#x20;
* Verify that the credentials specified in the `openlineageConfig.yml` file are correct.
* Check your network and firewall settings.

## Supported steps

Note: This list of supported steps is for version 0.5.0 of the plugin.&#x20;

#### Steps that support dataset lineage and column-level lineage

* Abort
* Append Streams
* Block this step until steps finish&#x20;
* Blocking Step
* Data Grid
* Delay Row
* Delete
* Dummy
* Filter Rows
* Generate Rows
* Get Variables
* Group By
* Java Filter
* Mail
* Merge Join
* Microsoft Excel Input

  Lineage is supported for local files, AWS, Mineo, HCP, and other S3-compatible connections.&#x20;
* Microsoft Excel Output (deprecated)

  Lineage is supported for local files, AWS, Mineo, HCP, and other S3-compatible connections. \[1]
* Microsoft Excel Writer&#x20;

  Lineage is supported for local files, AWS, Mineo, HCP, and other S3-compatible connections. \[1]
* Prioritize streams
* S3 CSV Input
* S3 File Output \[1]
* Send message to syslog
* Set Variables
* Sort Rows
* Switch/Case
* Table input

  Lineage is supported for the following connections, using the listed SQL functions and clauses:

  * Connection types: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Google BigQuery, Redshift, and Generic Connection \[2]
  * SQL functions: aliases, joins, subqueries, functions, aggregations, constants, expressions, cases, window functions, CTEs, and the set operators: unions, intersects, and excepts.
  * Clauses: GROUP BY, ORDER BY, WHERE, WITH, and HAVING.
* Table output

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* Text file input&#x20;

  Lineage is supported for local files, AWS, Mineo, HCP, and other S3- compatible connections. Fixed filetype is not supported.
* Text file output

  Lineage is supported for local files, AWS, Mineo, HCP, and other S3- compatible file systems. \[1] Fixed filetype is not supported.
* Write to Log

#### Steps that support only dataset lineage, not column-level lineage:

* Combination lookup/update

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* CSV File Input
* Database Lookup

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* De-serialize from file
* Dimension lookup/update

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* Fixed file input
* Gzip Csv Input
* Insert/Update

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* JSON Input
* JSON Output \[1]
* LDIF Input
* Load file content in memory
* Property Input
* Properties Output \[1]
* Sql File Output \[1]
* Synchronize after merge

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* Update

  Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. \[2]
* XBase Input

**Notes:**

\[1] Step, which can create multiple files as its output, can be configured to add filenames to its results file so that the name of each file is recorded in lineage. If the `Add filenames to result` option is disabled for the step, only a single, generic target is recorded in lineage. For example, if the `Add filenames to result` option is enabled for the step, the output is recorded in lineage as `<filename>_001.csv`, `<filename>_002.csv`, `<filename>_003.csv`, and so on. But, if the option is disabled, the output is recorded as only `<filename>.csv`.

\[2] Step allows generic connections, but lineage works only with generic connections that are listed as supported.

{% hint style="info" %}
**Note:** The Google Big Query connection is not supported on table output step. An OpenLineage event won't have any dataset outputs from any Google Big Query storage.
{% endhint %}

## Uninstall plugin

Uninstall the OpenLineage plugin from the PDI client and Pentaho Server by running commands appropriate for your operating system.&#x20;

Before you begin, you must download the OpenLineage plugin from the Pentaho Support Portal, which contains script files for uninstalling the plugin. For details, see [Download the plugin](#download-the-plugin).&#x20;

{% hint style="info" %}
**Note:** The plugin can be uninstalled from the PDI client, Pentaho Server, or both.
{% endhint %}

Commands for uninstalling the plugin include the following placeholders that must be replaced:

* \<path-to-data-integration>: Replace with full path to the PDI client.
* \<path-to-pentaho-server>: Replace with full path to the Pentaho Server.
* \<version\_check\_option>:  Replace with one of the following options:
  * `none`: Installs the plugin on any version of Pentaho. If the Pentaho version is unsupported, an error is shown.
  * `loose`: Default option. Installs the plugin on certified and compatible, newer Pentaho versions.
  * `strict`: Installs plugin only on certified Pentaho versions.

To uninstall the OpenLineage plugin, complete the following steps:

1. Stop the PDI client and Pentaho Server.
2. Extract the [pdi-openlineage-plugin-\<plugin\_version>-\<build number>.zip](https://download.pentaho.com/PDI/Marketplace+Plugins+10.2/Open+Lineage#) file to a folder on the computer where the PDI client or PDI Server is installed.
3. In the [pdi-openlineage-plugin-\<plugin\_version>-\<build number>](https://download.pentaho.com/PDI/Marketplace+Plugins+10.2/Open+Lineage#) folder, open a command prompt as an administrator.
4. In the command prompt, run the following installation commands for your operating system, replacing the placeholders for paths and version check options.&#x20;
   * Windows
     * PDI client

       `uninstall.bat -t <path-to-data-integration> --platformVersionCheck <version_check_option>`&#x20;
     * PDI Server

       `uninstall.bat -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>`&#x20;
   * Linux
     * PDI client

       `./uninstall.sh -t <path-to-data-integration> --platformVersionCheck <version_check_option>`
     * PDI Server

       &#x20;`./uninstall.sh -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>`
5. Start the PDI client and Pentaho Server.

## Upgrade plugin

{% hint style="info" %}
**Important**: Do not install a new version of the OpenLineage plugin over an existing installation of the plugin.
{% endhint %}

To upgrade the OpenLineage plugin, you must uninstall the plugin and then download and install the new version of the plugin. For details, see the following sections:&#x20;

* [Uninstall the plugin](#uninstall-plugin)
* [Download the plugin](#download-the-plugin)
* [Install the plugin](#install-the-plugin)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/archived-merged-pages/loading-data-from-pdi-archive/openlineage-plugin.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
