Hierarchical JSON Input

You can use the Hierarchical JSON input step to load JSON data into PDI from a file. You can use filters to load only the desired data. The data can be split on a hierarchical data path using wildcards.

You can specify the input file directly in this step or use a list of files from an input field. For an overview of hierarchical data in Pentaho, see Hierarchical data.

You can use filters on the input even if you do not use Split rows across path, but the filters must be set to the root level of the HDT you want to load.

When you use Split rows across path, you must specify all filter paths rooted at the split path. If you do not use Split rows across path, a normal HDT extraction path is used. See the Hierarchical data path specificationsarrow-up-right.

Step name

  • Step name: Specifies the unique name of the Hierarchical JSON input step on the canvas. You can customize the name or leave it as the default.

Options

The Hierarchical JSON input step features the following tabs.

Source tab

Hierarchical JSON Input step dialog box showing source tab
Option/Field
Description

From file

Select to specify the file path and name of the JSON file you want to load into PDI.

File name

File path and name of the JSON file to load.

From field

Select to use an incoming field as the JSON file path.

Field with file name

The incoming field containing the JSON file path.

Output tab

Hierarchical JSON Input step Output tab
Field
Description

Output field

Specify the field name for the output column.

Split rows across path

Specify the JSON path to be parsed. See Hierarchical data path specificationsarrow-up-right.

Note: Split rows across path is especially useful when loading JSON array objects within large JSON files.

Filters tab

Hierarchical JSON Input step filters tab

Use Path field (optional) to specify the filters to apply while using Split rows across path to fetch a subset of a JSON file.

For details, see Hierarchical data path specificationsarrow-up-right.

Examples

The following data is example JSON data in a file that you can load into PDI:

Example 1

The following data is extracted from this JSON file when you set Split rows across path to $.employees[*] and do not specify any filters.

Hierarchical JSON Input step example output

Example 2

If you configure the step with a split path of $.employees[*] and you want only the name and age fields, use filters of $.name and $.age on the Filters tab.

This produces two rows on the stream of the Hierarchical JSON Input step:

Row 1

Row 2

Example 3

If you want a filtered entry in a single HDT row, leave Split rows across path blank, and use the filter paths:

This results in a single row with one HDT that does not have the input split:

Last updated

Was this helpful?