Concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step.

Copy rows to result

Job

Write rows to the executing job. The information will then be passed to the next entry in this job.

CSV file input

Input

Read from a simple CSV file input.

Data Grid

Input

Enter rows of static data in a grid, usually for testing, reference or demo purpose.

Data validator

Validation

Validates passing data based on a set of rules.

Database lookup

Lookup

Look up values in a database using field values.

Delete

Output

Permanently removes a row from a database.

Dummy (do nothing)

Flow

Does not do anything. It is useful, however, when testing things or in certain situations where you want to split streams.

Filter rows

Flow

Filter rows using simple equations.

Formula

Scripting

Calculate a formula using Pentaho's libformula.

Steps: G - L

Generate Random Value

Input

Generate random value.

Generate rows

Input

Generate a number of empty or equal rows.

Get Repository Names

Input

List detailed information about transformations and/or jobs in a repository.

Get Rows From Result

Job

Read rows from a previous entry in a job.

Get subfolder names

Input

Read a parent folder and return all subfolders.

Get System Info

Input

Get information from the system like system date, arguments, etc.

Get Table Names

Input

Get table names from database connection and send them to the next step.

Get variables

Job

Determine the values of certain (environment or Kettle) variables and put them in field values.

Google Analytics v4

Input

Fetch data from Google Analytics account.

Group by

Statistics

Build aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.

HTTP client

Lookup

Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically.

Insert / update

Output

Update or insert rows in a database based upon keys.

Java filter

Flow

Filter rows using java code.

Job executor

Flow

Run a PDI job, and passes parameters and rows.

Join rows (cartesian product)

Joins

Output the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams.

Json Input

Input

Extract relevant portions out of JSON structures (file or incoming field) and output rows.

Json Output

Output

Create JSON block and output it in a field to a file.

Steps: M - R

Merge join

Joins

Join two streams on a given key and outputs a joined set. The input streams must be sorted on the join key.

Merge rows (diff)

Joins

Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged.

Microsoft Excel Input

Input

Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS).

Microsoft Excel writer

Output

Write or appends data to an Excel file.

MongoDB input

Big Data

Read all entries from a MongoDB collection in the specified database.

MongoDB output

Big Data

Write to a MongoDB collection.

Null if

Utility

Set a field value to null if it is equal to a constant value.

Python Executor

Scripting

Map upstream data from a PDI input step or execute a Python script to generate data. When you send all rows, Python stores the dataset in a variable that kicks off your Python script.

REST client

Lookup

Consume RESTful services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs

Row Denormaliser

Transform

Denormalise rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields.

Row Normaliser

Transform

Normalise de-normalised information.