Transformation steps in Pipeline Designer

Steps extend and expand the functionality of transformations. You can use the following steps in Pipeline Designer.

Steps: A - F

Name
Category
Description

Flow

Abort a transformation.

Transform

Add a checksum column for each input row.

Transform

Add one or more constants to the input rows.

Transform

Get the next value from a sequence.

Transform

Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence.

Statistics

Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST).

Flow

Append two streams in an ordered way.

Flow

Block this step until selected steps finish.

Flow

Block flow until all incoming rows have been processed. Subsequent steps only receive the last input row to this step.

Transform

Create new fields by performing simple calculations.

Transform

Concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step.

Job

Write rows to the executing job. The information will then be passed to the next entry in this job.

Input

Read from a simple CSV file input.

Input

Enter rows of static data in a grid, usually for testing, reference or demo purpose.

Validation

Validates passing data based on a set of rules.

Lookup

Look up values in a database using field values.

Output

Permanently removes a row from a database.

Flow

Does not do anything. It is useful, however, when testing things or in certain situations where you want to split streams.

Flow

Filter rows using simple equations.

Scripting

Calculate a formula using Pentaho's libformula.

Steps: G - L

Name
Category
Description

Input

Generate random value.

Input

Generate a number of empty or equal rows.

Input

List detailed information about transformations and/or jobs in a repository.

Job

Read rows from a previous entry in a job.

Input

Read a parent folder and return all subfolders.

Input

Get information from the system like system date, arguments, etc.

Input

Get table names from database connection and send them to the next step.

Job

Determine the values of certain (environment or Kettle) variables and put them in field values.

Input

Fetch data from Google Analytics account.

Statistics

Build aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.

Lookup

Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically.

Output

Update or insert rows in a database based upon keys.

Flow

Filter rows using java code.

Flow

Run a PDI job, and passes parameters and rows.

Joins

Output the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams.

Input

Extract relevant portions out of JSON structures (file or incoming field) and output rows.

Output

Create JSON block and output it in a field to a file.

Steps: M - R

Name
Category
Description

Joins

Join two streams on a given key and outputs a joined set. The input streams must be sorted on the join key.

Joins

Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged.

Input

Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS).

Output

Write or appends data to an Excel file.

Big Data

Read all entries from a MongoDB collection in the specified database.

Big Data

Write to a MongoDB collection.

Utility

Set a field value to null if it is equal to a constant value.

Scripting

Map upstream data from a PDI input step or execute a Python script to generate data. When you send all rows, Python stores the dataset in a variable that kicks off your Python script.

Lookup

Consume RESTful services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs

Transform

Denormalise rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields.

Transform

Normalise de-normalised information.

Steps: S - Z

Name
Category
Description

Output

Delete records in a Salesforce module.

Input

Read information from Salesforce.

Output

Insert records in a Salesforce module.

Output

Update records in a Salesforce module.

Output

Insert or update records in a Salesforce module.

Transform

Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision.

Job

Set environment variables based on a single input row.

Transform

Sort rows based upon field values (ascending or descending).

Joins

Merge rows coming from multiple input steps providing these rows are sorted themselves on the given key fields.

Transform

Split a single string field by delimiter and creates a new row for each split term.

Transform

Split a single field into more then one.

Lookup

Look up values coming from another stream in the transformation.

Transform

Apply certain operations like trimming, padding, and others to string value.

Transform

Cut out a snippet of a string.

Flow

Switch a row to a certain target step based on the case value in a field.

Lookup

Check if a table exists on a specified connection.

Input

Read information from a database table.

Output

Write information to a database table.

Input

Read data from a text file in several formats. This data can then be passed to your next step(s).

Output

Write rows to a text file.

Flow

Run a PDI transformation, sets parameters, and passes rows.

Transform

Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.

Output

Update data in a database table based upon keys.

Utility

Write data to log.

Utility

Create a standard ZIP archive from the data stream fields.

Last updated

Was this helpful?