ETL metadata injection
The ETL metadata injection step inserts data from multiple sources into another transformation at runtime. This insertion reduces the need to repeat the same tasks each time a different input source is used.
In PDI, you can create a transformation to use as a template for your repetitive tasks. This transformation is known as the template transformation. The template transformation is a child transformation that is reused by the ETL metadata injection step, using metadata created from various input sources.
You create another transformation to prepare the values that you want to inject as metadata. That transformation passes the values to the ETL metadata injection step, which injects the metadata into the template transformation, as shown in the following diagram:

For example, you might have a simple transformation to load transaction values from a supplier, filter specific values, and output them to a file. If you have more than one supplier, you would need to run this simple transformation for each supplier. With metadata injection, you can keep the repetitive transformation as a template and inject supplier-specific metadata at runtime. See Example.
The following procedure is recommended for using this step:
Optimize your data for injection, such as preparing folder structures and inputs.
Develop transformations for the following tasks:
The repetitive process (the template transformation)
Metadata injection through the ETL metadata injection step
Handling multiple inputs (as needed)
The metadata is injected into the template transformation through steps that support metadata injection. See Steps supporting metadata injection.
Step name and template transformation
Step name
Specify the unique name of the ETL metadata injection step on the canvas. You can customize the name or leave it as the default.
Transformation
Path to the transformation you want to use as a template for your repetitive tasks.
Select Browse to navigate to your template transformation in the VFS browser.
If you select a transformation that shares the same root path as the current transformation, PDI automatically inserts ${Internal.Entry.Current.Directory} in place of the common root path.
Example:
Current transformation:
/home/admin/transformation.ktrSelected transformation:
/home/admin/path/sub.ktrConverted path:
${Internal.Entry.Current.Directory}/path/sub.ktr
If you are working with a repository, specify the name of the template transformation in your repository. If you are not working with a repository, specify the XML file name of the template transformation on your system.
Template transformations previously specified by reference are automatically converted to be specified by name within the Pentaho Repository. |
Options
The ETL metadata injection step includes two tabs: Inject metadata and Options.
Inject metadata tab

The columns in the table on this tab specify which fields in the template transformation are injected with metadata.
Target injection step key
Available fields in each step of the template transformation that can be injected with metadata.
Target description
How the target fields relate to their target steps.
Required
Source fields that are required when you use the Delete or Insert/Update steps with ETL metadata injection.
Source step
Step associated with the fields to inject into the target fields as metadata.
Source field
Fields to inject into the target fields as metadata.
Specify the source field
In the Target injection step key column, double-click the field for which you want to specify a source field.
The Source field dialog box opens.

PDI ETL Metadata Source field dialog box Select a source field and then select OK.
(Optional) In Filter, enter a term and then select the search icon to search for fields with that name.
Select the Regex icon to search using Java regular expressions.
(Optional) Select Use constant value to inject a constant value. You can:
Enter a value.
Use an internal variable, such as
${Internal.Step.Unique.Count}.Use a combination of values and variables, such as
${FILE_PREFIX}_${FILE_DATE}.txt.
Injecting metadata into the ETL metadata injection step
For injecting metadata into the ETL metadata injection step itself, the following exceptions apply:
To inject a method for how to specify a transformation (such as
FILENAME,REPOSITORY_BY_NAME, orREPOSITORY_BY_REFERENCE), set aTRANS_SPECIFICATION_METHODconstant to the field of an input step. You can then map the field as a source to theTRANS_SPECIFICATION_METHODconstant in the ETL metadata injection step.The target field for the ETL metadata injection step inserting metadata into the original injection is defined by
[GROUP NAME].[FIELD NAME]. For example, if the group name isOUTPUT_FIELDSand the field name isOUTPUT_FIELDNAME, set the target field toOUTPUT_FIELDS.OUTPUT_FIELDNAME.
Options tab

Enter the following optional settings:
Step to read from (optional)
(Optional) Step in your template transformation to pass data directly to a step following the ETL metadata injection step in your current transformation.
Field name
If Step to read from (optional) is specified, name of the field passed directly from the step in the template transformation.
Type
If Step to read from (optional) is specified, data type of the field passed directly from the step in the template transformation.
Length
If Step to read from (optional) is specified, length of the field passed directly from the step in the template transformation.
Precision
If Step to read from (optional) is specified, precision of the field passed directly from the step in the template transformation.
Optional target file (KTR after injection)
For initial transformation development or debugging, an optional file where PDI saves a copy of the template transformation after metadata injection occurs. The resulting transformation is your template transformation with the metadata already injected as constant values.
Streaming source step
Source step in your current transformation to directly pass data to the Streaming target step in the template transformation.
Streaming target step
Target step in your template transformation to receive data directly from the Streaming source step.
Run resulting transformation
Select to inject metadata and run the template transformation. If this option is cleared, metadata injection occurs, but the template transformation does not run.
Example
In this example, you have a template transformation to load transaction values from a supplier spreadsheet, filter specific values, and output them to a text file. The template transformation is injected with metadata values stored in Microsoft spreadsheets.
The example is in the pentaho/design-tools/data-integration/samples/transformations/metadata-injection-example folder of your PDI distribution.

Microsoft spreadsheets containing input data are stored in the metadata-injection-example/data/in folder. Metadata values are stored in spreadsheets within the metadata-injection-example/metadata folder. The template and the transformation for injecting the metadata are in the metadata-injection-example/transformations folder.
This example assumes a basic understanding of working with transformations and steps.
Input data
Data files are frequently uploaded from multiple sources. This example models a situation where two suppliers have uploaded spreadsheets into the metadata-injection-example/data/in folder.
When you use metadata injection, you typically focus on a subset of data values common to all your input files. In this example, metadata is used for:
Transaction date
Transaction invoice number
Net value of the transaction
Currency used in the transaction
The metadata for these values and the output target text file are created and maintained in the metadata-injection-example/metadata folder.
Transformations
Metadata injection involves a main repetitive process. For this example:
03_process_supplier_fileis the template transformation.02_process_suppliercontains the ETL metadata injection step and injects metadata into the template transformation.01_process_all_supplierscalls02_process_supplierfor each supplier file.
This example contains three transformations:
Transformation for all input sources (
01_process_all_suppliers)Iterates through all supplier spreadsheets, calls the metadata injection transformation for each supplier, and logs the entire process. Each input source is specified through a variable in a Transformation Executor step, which calls the
02_process_suppliertransformation.Metadata injection transformation (
02_process_supplier)Defines the metadata structure and how it is injected into the template transformation. In this example, the metadata values are stored in separate spreadsheet files. This transformation extracts the values, prepares them for injection, and injects them through the ETL metadata injection step.
Template transformation (
03_process_supplier_file)Processes each supplier spreadsheet. Each step is configured to accept injected metadata instead of hardcoded values for a single source. For example, a
supplierfield might be set from a variable, depending on which supplier is being processed.
Results
You run the entire process by running 01_process_all_suppliers, which calls 02_process_supplier for each supplier input file. The 02_process_supplier transformation then calls the 03_process_supplier_file template transformation through the ETL metadata injection step.
To run this example:
In PDI, open
01_process_all_suppliers.ktrfrom themetadata-injection-example/transformationfolder.Run the
01_process_all_supplierstransformation.Examine:
processed_data_{today’s date}.txtinmetadata-injection-example/data/outlog_{timestamp}.txtinmetadata-injection-example/logging
The output text file contains values from the input files for the following common values:
Transaction date
Transaction invoice number
Net value of the transaction
Currency used in the transaction
Reference links
Below are links to articles and videos about using the ETL metadata injection step in PDI.
Articles
"Metadata Injection – Examples for Special Scenarios" by Jens Bleuel
"Parse nasty XLS with dynamic ETL" by Matt Casters
"Non-Native Metadata Injection" by Dan Keeley
Video
Steps supporting metadata injection
The ETL metadata injection step injects metadata into your template transformation through a step that supports metadata injection.
List of steps that support metadata injection
The following table specifies which steps contain fields that support metadata injection:
Add XML
7.0
All fields
Append Streams
7.0
All fields
Combination Lookup / Update
7.0
All fields
Concat Fields
5.1
All fields
Data Grid
5.1
All fields
Data Validator
7.0
All fields
Dimension Lookup-Update
7.1
All fields
ETL metadata injection
7.0
All fields. See Injecting metadata into the ETL metadata injection step for exceptions.
Filter Rows
7.0
All fields
Fixed File Input
4.1
See Fixed File Input for a list of supported fields
Get Data from XML
5.0
See Get Data from XML for a list of supported fields
Get Table Names
7.0
All fields
Get Variable
7.0
All fields
Google Analytics
6.1
All fields
Greenplum Load
7.0
All fields
GZIP CSV Input
5.1
All fields
If Field Value is Null
7.0
All fields
Insert/Update
7.0
All fields
Join Rows (Cartesian product)
7.0
All fields
JSON Output
5.2
All fields
Merge Join
7.0
All fields
Multiway Merge Join
7.0
All fields
MySQL Bulk Loader
7.0
All fields
Null If
7.0
All fields
Oracle Bulk Loader
7.0
All fields
PostgreSQL Bulk Loader
5.1
All fields
Replace in String
7.0
All fields
Sorted Merge
7.0
All fields
Stream Lookup
6.1
All fields
Switch / Case
7.0
All fields
Synchronize After Merge
7.0
All fields
Update
7.0
All fields
User Defined Java Expression
5.2
All fields
Value Mapper
6.1
All fields
Vertica Bulk Loader
7.0
All fields
XML Join
7.0
All fields
Last updated
Was this helpful?

