Transformation steps in Pipeline Designer

Steps extend and expand the functionality of transformations. You can use the following steps in Pipeline Designer.

Steps: A - F

​Abort​

Flow

Abort a transformation.

​Add a checksum​

Transform

Add a checksum column for each input row.

​Add constants​

Transform

Add one or more constants to the input rows.

​Add sequence​

Transform

Get the next value from a sequence.

Transform

Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence.

​Analytic query​

Statistics

Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST).

​Append streams​

Flow

Append two streams in an ordered way.

Flow

Block this step until selected steps finish.

​Blocking step​

Flow

Block flow until all incoming rows have been processed. Subsequent steps only receive the last input row to this step.

​Calculator​

Transform

Create new fields by performing simple calculations.

​Concat fields​

Transform

Concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step.

​Copy rows to result​

Job

Write rows to the executing job. The information will then be passed to the next entry in this job.

​CSV file input​

Input

Read from a simple CSV file input.

​Data Grid​

Input

Enter rows of static data in a grid, usually for testing, reference or demo purpose.

​Data validator​

Validation

Validates passing data based on a set of rules.

​Database lookup​

Lookup

Look up values in a database using field values.

​Delete​

Output

Permanently removes a row from a database.

​Dummy (do nothing)​

Flow

Does not do anything. It is useful, however, when testing things or in certain situations where you want to split streams.

​Filter rows​

Flow

Filter rows using simple equations.

​Formula​

Scripting

Calculate a formula using Pentaho's libformula.

Steps: G - L

​Generate Random Value​

Input

Generate random value.

​Generate rows​

Input

Generate a number of empty or equal rows.

​Get Repository Names​

Input

List detailed information about transformations and/or jobs in a repository.

​Get Rows From Result​

Job

Read rows from a previous entry in a job.

​Get subfolder names​

Input

Read a parent folder and return all subfolders.

​Get System Info​

Input

Get information from the system like system date, arguments, etc.

​Get Table Names​

Input

Get table names from database connection and send them to the next step.

​Get variables​

Job

Determine the values of certain (environment or Kettle) variables and put them in field values.

​Google Analytics v4​

Input

Fetch data from Google Analytics account.

​Group by​

Statistics

Build aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.

​HTTP client​

Lookup

Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically.

​Insert / update​

Output

Update or insert rows in a database based upon keys.

​Java filter​

Flow

Filter rows using java code.

​Job executor​

Flow

Run a PDI job, and passes parameters and rows.

Joins

Output the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams.

​Json Input​

Input

Extract relevant portions out of JSON structures (file or incoming field) and output rows.

​Json Output​

Output

Create JSON block and output it in a field to a file.

Steps: M - R

​Merge join​

Joins

Join two streams on a given key and outputs a joined set. The input streams must be sorted on the join key.

​Merge rows (diff)​

Joins

Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged.

​Microsoft Excel Input​

Input

Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS).

​Microsoft Excel writer​

Output

Write or appends data to an Excel file.

​MongoDB input​

Big Data

Read all entries from a MongoDB collection in the specified database.

​MongoDB output​

Big Data

Write to a MongoDB collection.

​Null if​

Utility

Set a field value to null if it is equal to a constant value.

​Python Executor​

Scripting

Map upstream data from a PDI input step or execute a Python script to generate data. When you send all rows, Python stores the dataset in a variable that kicks off your Python script.

​REST client​

Lookup

Consume RESTful services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs

​Row Denormaliser​

Transform

Denormalise rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields.

​Row Normaliser​

Transform

Normalise de-normalised information.

Steps: S - Z

​Salesforce delete​

Output

Delete records in a Salesforce module.

​Salesforce input​

Input

Read information from Salesforce.

​Salesforce insert​

Output

Insert records in a Salesforce module.

​Salesforce update​

Output

Update records in a Salesforce module.

​Salesforce upsert​

Output

Insert or update records in a Salesforce module.

​Select values​

Transform

Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision.

​Set variables​

Job

Set environment variables based on a single input row.

​Sort rows​

Transform

Sort rows based upon field values (ascending or descending).

​Sorted Merge​

Joins

Merge rows coming from multiple input steps providing these rows are sorted themselves on the given key fields.

Transform

Split a single string field by delimiter and creates a new row for each split term

Transform

Split a single field into more then one.

Last updated

Was this helpful?