Working with transformations
Create, configure, and run transformations to perform ETL tasks as part of a workflow.
After you run a transformation, review results to validate output and troubleshoot issues.
You can schedule a transformation to run once or on a recurring schedule. See Schedule a transformation or job.
In this topic:
Create a transformation
Create a transformation to arrange a network of logical ETL tasks, called steps, into a data workflow.
To create a transformation in Pipeline Designer, complete these steps:
Sign in to the Pentaho User Console.
Open Pipeline Designer:
If you are using Modern Design, in the left menu, select Pipeline Designer.
If you are using Classic Design, select Switch to the Modern Design. Then select Pipeline Designer.
Pipeline Designer opens with the Quick Access section expanded.
In the Transformation card, select Create Transformation.
A new, blank transformation opens with the Design pane selected.
Add steps to the transformation:
In the Design pane, search for or browse to each step you want to use.
You may need to expand sections in the Design pane to find steps.
Drag the steps you want to use onto the canvas.
Hover over a step to open the step menu.
Select one of these options:
Delete: Delete the step from the canvas.
Edit: Open the step window and configure step properties.
To learn about the step, in the lower-left corner, select Help.
Duplicate: Add a copy of the step to the canvas.
More Actions > Change Number of Copies: Set how many parallel copies run.
In Number of copies (1 or higher), you can enter a number or a variable.
To insert a variable, select the Select variable to insert icon.
More Actions > Data Movement: Choose how rows move to next steps.
Round-Robin: Distribute rows evenly across step copies.
Load Balance: Route rows to the step copy with the lightest load.
Copy Data to Next Steps: Send each row to all parallel step copies.
Add hops between steps:
Hover over a step handle until a plus sign (+) appears. Drag the connection to the handle of another step.
Optional: Add a note on the canvas:
In the canvas toolbar, select Add Note. In Notes, enter your note, then select Save.
You can format the note in Notes. Select Style, then set font, color, and shadow options.
Save the transformation:
Select Save.
The Select File or Directory dialog box opens.
Search for or browse to the repository folder where you want to save.
Optional: Create a folder.
Select New Folder, enter a folder name, then select Save.
Optional: Delete a folder.
Select the folder, then select Delete.
In Select File or Directory, select Save.
The Save Change dialog box opens.
Select Yes to confirm.
Edit transformation properties
Transformation properties describe the transformation and control its behavior. Use properties to customize processing and control output.
To edit transformation properties, complete these steps:
Sign in to the Pentaho User Console.
Open Pipeline Designer:
If you are using Modern Design, in the left menu, select Pipeline Designer.
If you are using Classic Design, select Switch to the Modern Design. Then select Pipeline Designer.
Pipeline Designer opens with the Quick Access section expanded.
In the table at the bottom of the screen, select Recently opened or Favorites.
Open the transformation:
Search for or browse to the transformation, then select Open.
Or select Open files. Then in Select File or Directory, browse to the transformation and select Open.
In the Canvas Action toolbar, select Settings.
The Transformation Properties window opens.
Edit properties in each tab.
For details, see Transformation properties.
If you updated options in the Logging tab, generate SQL for the logging table:
Select SQL.
The Simple SQL editor opens. It shows DDL for a new table or details for an existing table.
Optional: Edit the SQL statements.
See Use the SQL Editor.
Optional: Select Clear cache to remove cached query results and metadata.
Select Execute.
Select Save.
Transformation properties
The Transformation Properties window has these sections:
Transformation tab
Use the Transformation tab to specify general properties.
Transformation name
The name of the transformation. This field is required to save settings to the repository.
Transformation filename
The file name (*.ktr) of the transformation.
Description
Short description of the transformation that displays in the repository explorer.
Extended description
Long extended description of the transformation.
Status
Draft or production status.
Version
Version description.
Directory
The repository directory where the transformation is stored.
Created by
Name of the original creator.
Created at
Date and time when the transformation was created.
Last modified by
Username of the last user who modified the transformation.
Last modified at
Date and time when the transformation was last modified.
Parameters tab
Use the Parameters tab to add parameters.
Parameter
A local variable that can be shared across all steps in a transformation.
Default Value
Value that is used if the parameter is not set elsewhere.
Description
Description of the user-defined parameter.
Logging tab
Use the Logging tab to configure how and where logging information is captured.
In the left navigation pane, select which type of logging you want to use.
Log Connection
Database connection used for logging. To configure a new connection, select New.
Log table schema
Schema name, if supported by your database.
Log table name
Name of the log table. If you also use job logging, use a different table name for transformation logging.
Logging interval (seconds)
Interval in which logs are written to the table. Applies only to Transformation and Performance logging types.
Log record timeout (in days)
Number of days to keep log entries before deletion. Applies only to Transformation and Performance logging types. If log data is not deleted as expected, see Log table data is not deleted.
Log size limit (in lines)
Limit for the number of lines stored in LOG_FIELD (CLOB). Applies only to Transformation logging.
Fields to log
Select the fields to log in Fields to log. For effective deletion of expired logs, LOGDATE and TRANSNAME are enabled by default. Monitoring LOG_FIELD can negatively impact Pentaho Server performance. If you do not select all fields, including LOG_FIELD, you will not see information about this transformation in Operations Mart logging.
Dates tab
Use the Dates tab to configure date ranges and limits for this connection.
Maxdate connection
Get the upper limit for a date range on this connection.
Maxdate table
Get the upper limit for a date range in this table.
Maxdate field
Get the upper limit for a date range in this field.
Maxdate offset (seconds)
Increase the upper date limit by this amount. Example: Set -60 to subtract 60 seconds.
Maximum date difference (seconds)
Set the maximum date difference in the obtained date range. Use it to limit job sizes.
Dependencies tab
Use the Dependencies tab to specify transformation dependencies.
The Dependencies tab lets you enter transformation dependencies.
Example: If a dimension depends on three lookup tables, you can track those tables. If values in the lookup tables change, extend the date range to force a full refresh.
Dependencies also help you detect change when a table has a “data last changed” column.
Select Get dependencies to detect dependencies automatically.
Connection
Select a database connection that is already defined for the transformation.
Table
Table from the selected database connection.
Field
Field within the selected table.
Miscellaneous tab
Use the Miscellaneous tab to configure buffer and feedback size. You can also perform administrative tasks.
Number of rows in rowset
Change buffer size between connected steps. Change it only if you are running low on memory.
Show a feedback row in transformation steps?
Add feedback entries to the log file during execution. Enabled by default. Default frequency is every 5000 rows.
The feedback size
Set the number of rows to process before adding a feedback entry. Increase it for large datasets.
Make the transformation database transactional
Open one unique connection per defined and used database connection in the transformation. Enable it to allow a failed transformation to be fully rolled back. Enable it when you need to alter connection settings before a query using an Execute SQL script step. See Database transactions in jobs and transformations. Transformation-wide commit happens when the last step finishes. When the transformation fails, a rollback happens. Commit sizes are ignored.
Shared objects file
Location of the XML file that stores shared objects, like database connections and clustering schemas.
Manage thread priorities?
Enable or disable changing Java thread priorities based on buffer levels. Disable it if the overhead outweighs the benefit.
Monitoring tab
Use the Monitoring tab to enable step performance monitoring.
Enable step performance monitoring?
Enable metrics for each step. Metrics show in the Performance Graph tab in the Execution Results panel.
Step performance measurement interval (ms)
Snapshot interval in milliseconds. Example: 10 ms.
Maximum number of snapshots in memory
Maximum number of measurement snapshots held in memory during runtime.
Run a transformation
Run a transformation to step through ETL tasks in the order specified by the transformation.
Complete these steps:
Sign in to the Pentaho User Console.
Open Pipeline Designer:
If you are using Modern Design, in the left menu, select Pipeline Designer.
If you are using Classic Design, select Switch to the Modern Design. Then select Pipeline Designer.
Pipeline Designer opens with the Quick Access section expanded.
In the table at the bottom of the screen, select Recently opened or Favorites.
Open the transformation:
Search for or browse to the transformation, then select Open.
Or select Open files. Then in Select File or Directory, browse to the transformation and select Open.
In the Canvas Action toolbar, select Run.
Select one of these options:
Select Run.
Or select Run Options.
In Run Options, set options, then select Run. For details, see Transformation run options.
The transformation runs. The Preview panel opens with the Logging tab selected.
To stop a transformation while it is running, see Stop transformations and jobs.
Transformation run options
Select configuration
All transformations run using the Pentaho server configuration.
Clear log before running
Clear existing logs before execution. Use it if your log is large.
Enable safe mode
Validate that every row has an identical layout. Rows with a different layout generate an error.
Gather performance metrics
Monitor execution performance. You can view metrics in the Pentaho Data Integration client. See Use performance graphs.
Log level
Amount of logging and detail captured. Options: Nothing (no logging), Error (errors only), Minimal, Basic (default), Detailed, Debug (very detailed), Row Level (very detailed) (logging per row). Debug and Row Level can expose sensitive data. Consider data sensitivity when selecting these levels.
Parameters
Apply parameter values at runtime. For details, see Parameters.
Variables
Apply temporary values for user-defined and environment variables at runtime. For details, see Variables.
Arguments
Apply named, user-supplied, single-value inputs. Arguments require transformation support. Each transformation supports up to 10 arguments. For details, see Arguments.
Analyze transformation results
Analyze results to identify errors, preview data, and assess performance.
After you run a transformation, Execution Results open at the bottom of the canvas.
You can take these actions:
To expand results, select Expand Preview.
To delete logs, select Delete all logs.
To close the pane, select Close.
If the pane is closed, you can open it from the Canvas Action toolbar. Select Logs.
Logging
The Logging tab shows details for the most recent execution. You can drill in to determine where errors occur. Error lines are highlighted in red.
Preview data
Use Preview Data to preview data per step. Select a step to see its output rows.
Step metrics
The Step Metrics tab shows per-step statistics. It includes rows read, rows written, errors, and rows per second.
If a step causes the transformation to fail, it is highlighted in red.
Transformation steps in Pipeline Designer
Steps extend and expand transformation functionality. Use the following steps in Pipeline Designer.
Steps: A - F
Transform
Add a sequence based on field value changes. When at least one field changes, PDI resets the sequence.
Flow
Block flow until all incoming rows have been processed. Subsequent steps only receive the last input row.
Steps: G - L
Steps: M - R
Transform
Denormalise rows by grouping and assigning values to new fields. Requires sorted input.
Steps: S - Z
Last updated
Was this helpful?

