ETL metadata injection

The ETL metadata injection step inserts data from multiple sources into another transformation at runtime. This insertion reduces the need to repeat the same tasks each time a different input source is used.

In PDI, you can create a transformation to use as a template for your repetitive tasks. This transformation is known as the template transformation. The template transformation is a child transformation that is reused by the ETL metadata injection step, using metadata created from various input sources.

You create another transformation to prepare the values that you want to inject as metadata. That transformation passes the values to the ETL metadata injection step, which injects the metadata into the template transformation, as shown in the following diagram:

ETL Metadata Injection Process

For example, you might have a simple transformation to load transaction values from a supplier, filter specific values, and output them to a file. If you have more than one supplier, you would need to run this simple transformation for each supplier. With metadata injection, you can keep the repetitive transformation as a template and inject supplier-specific metadata at runtime. See Example.

The following procedure is recommended for using this step:

  1. Optimize your data for injection, such as preparing folder structures and inputs.

  2. Develop transformations for the following tasks:

    • The repetitive process (the template transformation)

    • Metadata injection through the ETL metadata injection step

    • Handling multiple inputs (as needed)

The metadata is injected into the template transformation through steps that support metadata injection. See Steps supporting metadata injection.

Step name and template transformation

Field
Description

Step name

Specify the unique name of the ETL metadata injection step on the canvas. You can customize the name or leave it as the default.

Transformation

Path to the transformation you want to use as a template for your repetitive tasks.

Select Browse to navigate to your template transformation in the VFS browser.

If you select a transformation that shares the same root path as the current transformation, PDI automatically inserts ${Internal.Entry.Current.Directory} in place of the common root path.

Example:

  • Current transformation: /home/admin/transformation.ktr

  • Selected transformation: /home/admin/path/sub.ktr

  • Converted path: ${Internal.Entry.Current.Directory}/path/sub.ktr

If you are working with a repository, specify the name of the template transformation in your repository. If you are not working with a repository, specify the XML file name of the template transformation on your system.

Template transformations previously specified by reference are automatically converted to be specified by name within the Pentaho Repository. |

Options

The ETL metadata injection step includes two tabs: Inject metadata and Options.

Inject metadata tab

ETL Metadata Injection Step Inject Metadata Tab

The columns in the table on this tab specify which fields in the template transformation are injected with metadata.

Column
Description

Target injection step key

Available fields in each step of the template transformation that can be injected with metadata.

Target description

How the target fields relate to their target steps.

Required

Source fields that are required when you use the Delete or Insert/Update steps with ETL metadata injection.

Source step

Step associated with the fields to inject into the target fields as metadata.

Source field

Fields to inject into the target fields as metadata.

Specify the source field

  1. In the Target injection step key column, double-click the field for which you want to specify a source field.

    The Source field dialog box opens.

    PDI ETL Metadata Source field dialog box
  2. Select a source field and then select OK.

  3. (Optional) In Filter, enter a term and then select the search icon to search for fields with that name.

    Select the Regex icon to search using Java regular expressions.

  4. (Optional) Select Use constant value to inject a constant value. You can:

    • Enter a value.

    • Use an internal variable, such as ${Internal.Step.Unique.Count}.

    • Use a combination of values and variables, such as ${FILE_PREFIX}_${FILE_DATE}.txt.

Injecting metadata into the ETL metadata injection step

For injecting metadata into the ETL metadata injection step itself, the following exceptions apply:

  • To inject a method for how to specify a transformation (such as FILENAME, REPOSITORY_BY_NAME, or REPOSITORY_BY_REFERENCE), set a TRANS_SPECIFICATION_METHOD constant to the field of an input step. You can then map the field as a source to the TRANS_SPECIFICATION_METHOD constant in the ETL metadata injection step.

  • The target field for the ETL metadata injection step inserting metadata into the original injection is defined by [GROUP NAME].[FIELD NAME]. For example, if the group name is OUTPUT_FIELDS and the field name is OUTPUT_FIELDNAME, set the target field to OUTPUT_FIELDS.OUTPUT_FIELDNAME.

Options tab

ETL Metadata Injection Step Options Tab

Enter the following optional settings:

Option
Description

Step to read from (optional)

(Optional) Step in your template transformation to pass data directly to a step following the ETL metadata injection step in your current transformation.

Field name

If Step to read from (optional) is specified, name of the field passed directly from the step in the template transformation.

Type

If Step to read from (optional) is specified, data type of the field passed directly from the step in the template transformation.

Length

If Step to read from (optional) is specified, length of the field passed directly from the step in the template transformation.

Precision

If Step to read from (optional) is specified, precision of the field passed directly from the step in the template transformation.

Optional target file (KTR after injection)

For initial transformation development or debugging, an optional file where PDI saves a copy of the template transformation after metadata injection occurs. The resulting transformation is your template transformation with the metadata already injected as constant values.

Streaming source step

Source step in your current transformation to directly pass data to the Streaming target step in the template transformation.

Streaming target step

Target step in your template transformation to receive data directly from the Streaming source step.

Run resulting transformation

Select to inject metadata and run the template transformation. If this option is cleared, metadata injection occurs, but the template transformation does not run.

Example

In this example, you have a template transformation to load transaction values from a supplier spreadsheet, filter specific values, and output them to a text file. The template transformation is injected with metadata values stored in Microsoft spreadsheets.

The example is in the pentaho/design-tools/data-integration/samples/transformations/metadata-injection-example folder of your PDI distribution.

Metadata injection example folder strucutre

Microsoft spreadsheets containing input data are stored in the metadata-injection-example/data/in folder. Metadata values are stored in spreadsheets within the metadata-injection-example/metadata folder. The template and the transformation for injecting the metadata are in the metadata-injection-example/transformations folder.

circle-info

This example assumes a basic understanding of working with transformations and steps.

Input data

Data files are frequently uploaded from multiple sources. This example models a situation where two suppliers have uploaded spreadsheets into the metadata-injection-example/data/in folder.

When you use metadata injection, you typically focus on a subset of data values common to all your input files. In this example, metadata is used for:

  • Transaction date

  • Transaction invoice number

  • Net value of the transaction

  • Currency used in the transaction

The metadata for these values and the output target text file are created and maintained in the metadata-injection-example/metadata folder.

Transformations

Metadata injection involves a main repetitive process. For this example:

  • 03_process_supplier_file is the template transformation.

  • 02_process_supplier contains the ETL metadata injection step and injects metadata into the template transformation.

  • 01_process_all_suppliers calls 02_process_supplier for each supplier file.

This example contains three transformations:

  • Transformation for all input sources (01_process_all_suppliers)

    Iterates through all supplier spreadsheets, calls the metadata injection transformation for each supplier, and logs the entire process. Each input source is specified through a variable in a Transformation Executor step, which calls the 02_process_supplier transformation.

  • Metadata injection transformation (02_process_supplier)

    Defines the metadata structure and how it is injected into the template transformation. In this example, the metadata values are stored in separate spreadsheet files. This transformation extracts the values, prepares them for injection, and injects them through the ETL metadata injection step.

  • Template transformation (03_process_supplier_file)

    Processes each supplier spreadsheet. Each step is configured to accept injected metadata instead of hardcoded values for a single source. For example, a supplier field might be set from a variable, depending on which supplier is being processed.

Results

You run the entire process by running 01_process_all_suppliers, which calls 02_process_supplier for each supplier input file. The 02_process_supplier transformation then calls the 03_process_supplier_file template transformation through the ETL metadata injection step.

To run this example:

  1. In PDI, open 01_process_all_suppliers.ktr from the metadata-injection-example/transformation folder.

  2. Run the 01_process_all_suppliers transformation.

  3. Examine:

    • processed_data_{today’s date}.txt in metadata-injection-example/data/out

    • log_{timestamp}.txt in metadata-injection-example/logging

The output text file contains values from the input files for the following common values:

  • Transaction date

  • Transaction invoice number

  • Net value of the transaction

  • Currency used in the transaction

Below are links to articles and videos about using the ETL metadata injection step in PDI.

Articles

Video

Steps supporting metadata injection

The ETL metadata injection step injects metadata into your template transformation through a step that supports metadata injection.

chevron-rightList of steps that support metadata injectionhashtag

The following table specifies which steps contain fields that support metadata injection:

Step
Version introduced
Fields supporting metadata injection

Add XML

7.0

All fields

8.2

All fields

7.0

All fields

8.2

All fields

8.2

All fields

Append Streams

7.0

All fields

7.0

All fields

8.0

All fields

Combination Lookup / Update

7.0

All fields

Concat Fields

5.1

All fields

9.0

All fields

7.0

All fields

4.1

See CSV File Input for a list of supported fields

Data Grid

5.1

All fields

Data Validator

7.0

All fields

9.0

All fields

Dimension Lookup-Update

7.1

All fields

ETL metadata injection

7.0

8.2

All fields

Filter Rows

7.0

All fields

Fixed File Input

4.1

See Fixed File Input for a list of supported fields

Get Data from XML

5.0

See Get Data from XML for a list of supported fields

8.2

All fields

Get Table Names

7.0

All fields

Get Variable

7.0

All fields

Google Analytics

6.1

All fields

Greenplum Load

7.0

All fields

5.0

All fields

GZIP CSV Input

5.1

All fields

7.0

All fields

7.0

All fields

7.0

All fields

7.0

All fields

7.0

All fields

If Field Value is Null

7.0

All fields

Insert/Update

7.0

All fields

8.2

All fields

8.2

All fields

Join Rows (Cartesian product)

7.0

All fields

7.0

All fields

JSON Output

5.2

All fields

8.0

All fields

8.0

All fields

8.3

All fields

8.3

All fields

7.0

All fields

7.0

All fields

7.0

All fields

Merge Join

7.0

All fields

7.0

All fields

7.0

All fields

7.0

All fields

8.1

All fields

8.1

All fields

Multiway Merge Join

7.0

All fields

MySQL Bulk Loader

7.0

All fields

Null If

7.0

All fields

Oracle Bulk Loader

7.0

All fields

8.1

All fields

8.1

All fields

8.0

All fields

8.0

All fields

PostgreSQL Bulk Loader

5.1

All fields

Replace in String

7.0

All fields

4.2

All fields (as of 7.0)

4.2

See Row Normaliser for a list of supported fields

6.1

All fields

6.1

All fields

8.3

All fields

4.1

All fields

8.2

All fields

7.0

All fields

5.0

All fields

Sorted Merge

7.0

All fields

5.0

All fields (as of 6.1)

7.0

All fields

7.0

All fields

Stream Lookup

6.1

All fields

9.0

All fields

8.3

All fields

Switch / Case

7.0

All fields

Synchronize After Merge

7.0

All fields

5.2

See Table Input for a list of supported fields

5.1

See Table Output for a list of supported fields

5.0

All fields

5.2

All fields

Update

7.0

All fields

User Defined Java Expression

5.2

All fields

Value Mapper

6.1

All fields

Vertica Bulk Loader

7.0

All fields

XML Join

7.0

All fields

6.1

All fields

Last updated

Was this helpful?