XML Input Stream (StAX)

The XML Input Stream (StAX) step reads data from XML files using the Streaming API for XML (StAX) parser.

This step is designed for fast processing of large and complex XML structures. Unlike the Get Data from XMLarrow-up-right step (which uses in-memory processing), the XML Input Stream (StAX) step streams the XML and lets you implement the processing logic in the transformation.

This step is useful when you need to parse XML and:

  • you need fast data loads independent of memory (regardless of file size)

  • you need flexibility to read different parts of the XML in different ways without repeatedly parsing the file

Because some XML processing logic can be complex, you should be familiar with common PDI steps before using this step.

Options

XML Input Stream (StAX) step
Option
Description
Default value / Data type

Step name

Unique name of the XML Input Stream (StAX) step on the canvas.

Filename

Path to the input XML file. Select Browse to choose a file. If the step is connected to a previous step, you can select an incoming field that contains the file path (and Browse is hidden). You can use internal variables in the path.

Source is from a previous step

Select to accept XML data from a previous step.

Source field name

Incoming field to use as XML data.

Add filename to result?

Adds the processed XML file name to the transformation result.

No

Skip (Elements/Attributes)

Number of elements or attributes to skip before producing rows.

0

Limit (Elements/Attributes)

Limits the number of elements or attributes to process. Together with Skip, this supports chunk loading in an outer loop.

0

Default String Length

Default string length for XML data name/value fields.

1024

Encoding

Encoding of the XML file.

UTF-8

Add Namespace information?

Adds the XML data type NAMESPACE to the stream, including optional prefix (in name) and URI (in value). Enabling this can reduce throughput due to extra namespace handling.

No

Trim strings?

Trims whitespace, tabs, carriage returns, and line feeds from the start and end of name/value strings.

Yes

Include filename in output? / Fieldname

Adds the processed file name to the specified field.

xml_filename (String 256)

Row number in output? / Fieldname

Adds the processed row number (starting at 1).

xml_row_number (Integer)

XML data type (numeric) in output? / Fieldname

Adds the processed XML data type as a numeric value.

xml_data_type_numeric (Integer)

XML data type (description) in output? / Fieldname

Adds the processed XML data type as text. This is easier to read but can be slower and consume more memory than numeric types.

xml_data_type_description (String 25)

XML location line in output? / Fieldname

Adds the source XML line number.

xml_location_line (Integer)

XML location column in output? / Fieldname

Adds the source XML column number.

xml_location_column (Integer)

XML element ID in output? / Fieldname

Adds the element number (starting at 0). This increments per new element (not per row) and preserves nesting across levels.

xml_element_id (Integer)

XML parent element ID in output? / Fieldname

Adds the parent element number. Together with element ID, you can reconstruct the element tree.

xml_parent_element_id (Integer)

XML element level in output? / Fieldname

Adds the element nesting level, starting at 0 for root START_ and END_DOCUMENT.

xml_element_level (Integer)

XML path in output? / Fieldname

Adds the XML path.

xml_path (String 1024)

XML parent path in output? / Fieldname

Adds the parent XML path.

xml_parent_path (String 1024)

XML data name in output? / Fieldname

Adds the element/attribute name and optional namespace prefix to the output.

xml_data_name (String 1024 or Default String Length)

XML data value in output? / Fieldname

Adds the element/attribute value and optional namespace URI to the output.

xml_data_value (String 1024 or Default String Length)

If you need Set/Reset functionality, you can use Modified Java Script Value or User Defined Java Class. User Defined Java Class is typically faster.

Samples

Sample transformations are included in design-tools/data-integration/samples/transformations:

  • XML Input Stream (StAX) Test 1 - Basic Tests.ktr

  • XML Input Stream (StAX) Test 2 - Element Blocks.ktr

  • XML Input Stream (StAX) Test 3 - Attribute Groups.ktr

  • XML Input Stream (StAX) Test 4 - Hierarchies.ktr

  • XML Input Stream (StAX) Test 5 - Performance Test Data for Element Blocks.ktr

  • XML Input Stream (StAX) Test 6 - Namespaces.ktr

Example: element blocks

This example parses the XML Input Stream (StAX) Test 2 - Element Blocks.xml file, which includes two main blocks: Analyzer Lists and Products.

The transformation separates blocks by splitting the parent XML path into levels using Switch/Case steps. In more complex flows, consider using mappings (sub-transformations) so each block is clearly represented.

Sample XML:

Preview examples:

  • Step preview: Step preview

  • Example transformation: Example transformation

  • Analyzer lists results: Analyzer lists results

  • Products results: Products results

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.

Last updated

Was this helpful?