OpenLineage Plugin
The Pentaho Data Integration (PDI) OpenLineage plugin enables PDI to emit rich, standardized OpenLineage events that can be consumed by Pentaho Data Catalog (PDC) to capture how data moves and is transformed in PDI ETL pipelines. PDC uses information it captures to provide visual end-to-end transparency of data flows, which improves data observability, strengthens compliance and governance, aids in troubleshooting data issues, and enhances data trust and quality for business users.
OpenLineage events are emitted from PDI when supported transformations are executed by discovering input and output datasets and, when possible, generating column-level lineage.
The OpenLineage plugin emits events for:
Start: transformation starts
Complete: transformations ends
Abort: transformation was stopped without errors
Fail: transformation ended with errors
Compatibility matrix
OpenLineage plugin functionality is certified to work as intended for the following versions of PDI:
10.2.0.1 (SP1)
10.2.0.2 (SP2)
10.2.0.3 (SP3)
10.2.0.4 (SP4)
10.2.0.5 (SP5)
10.2.0.6 (SP6)
11.0
Setting up the plugin
Before you begin, verify that you have a valid license for the OpenLineage plugin. For information about licenses, see Acquire and install enterprise licenses.
To set up the OpenLineage plugin, you must complete the following tasks:
Download the plugin
Download the OpenLineage plugin from the Pentaho Support Portal.
On the Support Portal home page, sign in using the Pentaho support username and password provided in your Pentaho Welcome Packet.
In the Pentaho card, click Download. The Downloads page opens.
In the <version>.x list, click Pentaho <version> EE Marketplace Plugins Release.
Scroll to the bottom of the page.
In the Marketplace Plugins <version> section, click Open Lineage.
Download the pdi-openlineage-plugin-<plugin_version>-<build number>.zip file.
Install the plugin
Install the OpenLineage plugin in the PDI client and Pentaho Server by running commands appropriate for your operating system.
Installation commands include the following placeholders that must be replaced:
<path-to-data-integration>: Replace with full path to the PDI client.
<path-to-pentaho-server>: Replace with full path to the Pentaho Server.
<version_check_option>: Replace with one of the following options:
none: Installs the plugin on any version of Pentaho. If the Pentaho version is unsupported, an error is shown.loose: Default option. Installs the plugin on certified and compatible, newer Pentaho versions.strict: Installs plugin only on certified Pentaho versions.
To install the OpenLineage plugin, complete the following steps:
Stop the PDI client and Pentaho Server.
Extract the pdi-openlineage-plugin-<plugin_version>-<build number>.zip file to a folder on the computer where the PDI client or PDI Server is installed.
In the pdi-openlineage-plugin-<plugin_version>-<build number> folder, open a command prompt as an administrator.
In the command prompt, run the following installation commands for your operating system, replacing the placeholders for paths and version check options.
Windows
PDI client
install.bat -t <path-to-data-integration> --platformVersionCheck <version_check_option>PDI Server
install.bat -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>
Linux
PDI client
./install.sh -t <path-to-data-integration> --platformVersionCheck <version_check_option>PDI Server
./install.sh -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>
Start the PDI client and Pentaho Server.
Generate an encrypted password
If you plan to emit events to PDC, and want to secure your password so that it's not in plain text, you can generate an encrypted password to authenticate to PDC. The encrypted password is used in the configuration file for the OpenLineage plugin.
On the computer where the PDI client or PDI Server is installed, open a command prompt.
Run one of the following commands for your operating system:
Windows
To generate a password using the default Pentaho encryption seed, run the following command:
To generate a password using your own custom encryption seed, run the following command:
Linux
To generate a password using the default Pentaho encryption seed, run the following command:
To generate a password using your own custom encryption seed, run the following command:
An encrypted password is generated and displayed in the command prompt, like the following example:
Create a configuration file for the plugin
After you install the plugin, create a configuration file that specifies where to send open lineage events. You can create a simple configuration file for testing or a custom configuration to use in production.
In a text editor, create a configuration file with content from one of the following examples, based on your needs:
To create a simple configuration file that you can use to quickly validate that the plugin is working, include only the following content:
To create a custom configuration file that includes OpenLineage event consumers in your Pentaho deployment, such as a PDC Server, include the following content:
Save the file as
openlineageConfig.ymlin the PDI directory that contains your user-specific configuration files.Notes:
By default, user-specific configuration files are stored in the
.kettledirectory, which is usually in one of the following locations:Windows:
C:\Documents and Settings\example_user\.kettleLinux:
~/.kettle)
However, if you run PDI in a container, configuration files might resolve to the
/root/.kettledirectory.You can add multiple http consumers in the configuration file.
Enable the plugin
After you install the OpenLineage plugin and create its configuration file, you must enable the plugin so that it can send open lineage events to the consumers you specified in the configuration file.
Enable in PDI client
Enable the plugin in the PDI client by completing the following steps:
Log into the PDI client and click Edit > Edit the Kettle.properties file. The Kettle properties window opens.
To make the plugin active, add the following variable and value:
KETTLE_OPEN_LINEAGE_ACTIVE=trueTo point PDI to your
openlineageConfig.ymlfile, add the following variable with the <path-to-config-file> placeholder replaced by the full path to your configuration file directory:KETTLE_OPEN_LINEAGE_CONFIG_FILE=/<path-to-config-file>/openlineageConfig.ymlClick OK. The
kettle.propertiesfile is saved and the OpenLineage plugin is enabled.
Enable in Pentaho Server
Enable the client in the Pentaho Server, by completing the following steps:
Navigate to the
kettle.propertiesfile.Note: The
kettle.propertiesfile is usually in one of the following locations:Windows:
C:\Documents and Settings\example_user\.kettleLinux:
~/.kettle)
If you run PDI in a container, the
kettle.propertiesfile is in the/root/.kettledirectory.Open the
kettle.propertiesfile in a text editor.Enable the plugin with its configuration file by adding the following variables and values:
KETTLE_OPEN_LINEAGE_ACTIVE=trueKETTLE_OPEN_LINEAGE_CONFIG_FILE=/<path-to-config-file>/openlineageConfig.ymlSave the
kettle.propertiesfile.
Validate the plugin works
You can validate that the plugin is working by verifying that text related to OpenLineage appears in the appropriate logs and files.
To validate that the plugin is working, complete the following steps:
In the PDI client, click File > Open, and then navigate to sample transformations in your Pentaho folder. For example, in Windows the sampls are in
<path_to_Pentaho>\Pentaho\design-tools\data-integration\samples\transformations.Select the sample transformation,
TextInput and Output using variables.ktr, and click Open.To run the transformation click Action > Run, and then in the Run Options window, click Run. The transformation runs and Execution Results pane appears at the bottom of the PDI client.
Validate that consumers you have enabled are receiving OpenLineage events by taking one of the following actions:
If the
consoleconsumer is enabled, in the Execution Results pane of the PDI client, click the Logging tab and verify that the log contains lines with the text, "OpenLineage-Plugin".If a
fileconsumer is enabled, open theopenlineage.jsonfile in a text editor and verify that it contains lines with the text, "OpenLineage-Plugin". Theopenlineage.jsonfile location is defined in theopenlineageConfig.ymlfile.If an
HTTPconsumer is enabled, confirm OpenLineage events are arriving for that consumer. For example, if the PDC is a configured consumer, verify the events arrive in PDC.
Troubleshoot plugin
If you are unable to validate that the plugin is working, perform the following troubleshooting actions:
Verify dataset lineage (input text file -> output text file) and column lineage mappings.
Validate that the
Kettle.propertiesfile contains the following variable and value:KETTLE_OPEN_LINEAGE_ACTIVE=true.Verify that the credentials specified in the
openlineageConfig.ymlfile are correct.Check your network and firewall settings.
Supported steps
Note: This list of supported steps is for version 0.5.0 of the plugin.
Steps that support dataset lineage and column-level lineage
Abort
Append Streams
Block this step until steps finish
Blocking Step
Data Grid
Delay Row
Delete
Dummy
Filter Rows
Generate Rows
Get Variables
Group By
Java Filter
Mail
Merge Join
Microsoft Excel Input
Lineage is supported for local files, AWS, Mineo, HCP, and other S3-compatible connections.
Microsoft Excel Output (deprecated)
Lineage is supported for local files, AWS, Mineo, HCP, and other S3-compatible connections. [1]
Microsoft Excel Writer
Lineage is supported for local files, AWS, Mineo, HCP, and other S3-compatible connections. [1]
Prioritize streams
S3 CSV Input
S3 File Output [1]
Send message to syslog
Set Variables
Sort Rows
Switch/Case
Table input
Lineage is supported for the following connections, using the listed SQL functions and clauses:
Connection types: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Google BigQuery, Redshift, and Generic Connection [2]
SQL functions: aliases, joins, subqueries, functions, aggregations, constants, expressions, cases, window functions, CTEs, and the set operators: unions, intersects, and excepts.
Clauses: GROUP BY, ORDER BY, WHERE, WITH, and HAVING.
Table output
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
Text file input
Lineage is supported for local files, AWS, Mineo, HCP, and other S3- compatible connections. Fixed filetype is not supported.
Text file output
Lineage is supported for local files, AWS, Mineo, HCP, and other S3- compatible file systems. [1] Fixed filetype is not supported.
Write to Log
Steps that support only dataset lineage, not column-level lineage:
Combination lookup/update
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
CSV File Input
Database Lookup
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
De-serialize from file
Dimension lookup/update
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
Fixed file input
Gzip Csv Input
Insert/Update
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
JSON Input
JSON Output [1]
LDIF Input
Load file content in memory
Property Input
Properties Output [1]
Sql File Output [1]
Synchronize after merge
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
Update
Lineage is supported for the following connections: MySQL, PostgreSQL, Denodo, Sybase, Oracle, Vertica, SQL Server, Snowflake, Redshift, and Generic Connection. [2]
XBase Input
Notes:
[1] Step, which can create multiple files as its output, can be configured to add filenames to its results file so that the name of each file is recorded in lineage. If the Add filenames to result option is disabled for the step, only a single, generic target is recorded in lineage. For example, if the Add filenames to result option is enabled for the step, the output is recorded in lineage as <filename>_001.csv, <filename>_002.csv, <filename>_003.csv, and so on. But, if the option is disabled, the output is recorded as only <filename>.csv.
[2] Step allows generic connections, but lineage works only with generic connections that are listed as supported.
Uninstall plugin
Uninstall the OpenLineage plugin from the PDI client and Pentaho Server by running commands appropriate for your operating system.
Before you begin, you must download the OpenLineage plugin from the Pentaho Support Portal, which contains script files for uninstalling the plugin. For details, see Download the plugin.
Commands for uninstalling the plugin include the following placeholders that must be replaced:
<path-to-data-integration>: Replace with full path to the PDI client.
<path-to-pentaho-server>: Replace with full path to the Pentaho Server.
<version_check_option>: Replace with one of the following options:
none: Installs the plugin on any version of Pentaho. If the Pentaho version is unsupported, an error is shown.loose: Default option. Installs the plugin on certified and compatible, newer Pentaho versions.strict: Installs plugin only on certified Pentaho versions.
To uninstall the OpenLineage plugin, complete the following steps:
Stop the PDI client and Pentaho Server.
Extract the pdi-openlineage-plugin-<plugin_version>-<build number>.zip file to a folder on the computer where the PDI client or PDI Server is installed.
In the pdi-openlineage-plugin-<plugin_version>-<build number> folder, open a command prompt as an administrator.
In the command prompt, run the following installation commands for your operating system, replacing the placeholders for paths and version check options.
Windows
PDI client
uninstall.bat -t <path-to-data-integration> --platformVersionCheck <version_check_option>PDI Server
uninstall.bat -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>
Linux
PDI client
./uninstall.sh -t <path-to-data-integration> --platformVersionCheck <version_check_option>PDI Server
./uninstall.sh -t <path-to-pentaho-server> --platformVersionCheck <version_check_option>
Start the PDI client and Pentaho Server.
Upgrade plugin
To upgrade the OpenLineage plugin, you must uninstall the plugin and then download and install the new version of the plugin. For details, see the following sections:
Last updated
Was this helpful?

