XML Input Stream (StAX)
The XML Input Stream (StAX) step reads data from XML files using the Streaming API for XML (StAX) parser.
This step is designed for fast processing of large and complex XML structures. Unlike the Get Data from XML step (which uses in-memory processing), the XML Input Stream (StAX) step streams the XML and lets you implement the processing logic in the transformation.
This step is useful when you need to parse XML and:
you need fast data loads independent of memory (regardless of file size)
you need flexibility to read different parts of the XML in different ways without repeatedly parsing the file
Because some XML processing logic can be complex, you should be familiar with common PDI steps before using this step.
Options

Step name
Unique name of the XML Input Stream (StAX) step on the canvas.
Filename
Path to the input XML file. Select Browse to choose a file. If the step is connected to a previous step, you can select an incoming field that contains the file path (and Browse is hidden). You can use internal variables in the path.
Source is from a previous step
Select to accept XML data from a previous step.
Source field name
Incoming field to use as XML data.
Add filename to result?
Adds the processed XML file name to the transformation result.
No
Skip (Elements/Attributes)
Number of elements or attributes to skip before producing rows.
0
Limit (Elements/Attributes)
Limits the number of elements or attributes to process. Together with Skip, this supports chunk loading in an outer loop.
0
Default String Length
Default string length for XML data name/value fields.
1024
Encoding
Encoding of the XML file.
UTF-8
Add Namespace information?
Adds the XML data type NAMESPACE to the stream, including optional prefix (in name) and URI (in value). Enabling this can reduce throughput due to extra namespace handling.
No
Trim strings?
Trims whitespace, tabs, carriage returns, and line feeds from the start and end of name/value strings.
Yes
Include filename in output? / Fieldname
Adds the processed file name to the specified field.
xml_filename (String 256)
Row number in output? / Fieldname
Adds the processed row number (starting at 1).
xml_row_number (Integer)
XML data type (numeric) in output? / Fieldname
Adds the processed XML data type as a numeric value.
xml_data_type_numeric (Integer)
XML data type (description) in output? / Fieldname
Adds the processed XML data type as text. This is easier to read but can be slower and consume more memory than numeric types.
xml_data_type_description (String 25)
XML location line in output? / Fieldname
Adds the source XML line number.
xml_location_line (Integer)
XML location column in output? / Fieldname
Adds the source XML column number.
xml_location_column (Integer)
XML element ID in output? / Fieldname
Adds the element number (starting at 0). This increments per new element (not per row) and preserves nesting across levels.
xml_element_id (Integer)
XML parent element ID in output? / Fieldname
Adds the parent element number. Together with element ID, you can reconstruct the element tree.
xml_parent_element_id (Integer)
XML element level in output? / Fieldname
Adds the element nesting level, starting at 0 for root START_ and END_DOCUMENT.
xml_element_level (Integer)
XML path in output? / Fieldname
Adds the XML path.
xml_path (String 1024)
XML parent path in output? / Fieldname
Adds the parent XML path.
xml_parent_path (String 1024)
XML data name in output? / Fieldname
Adds the element/attribute name and optional namespace prefix to the output.
xml_data_name (String 1024 or Default String Length)
XML data value in output? / Fieldname
Adds the element/attribute value and optional namespace URI to the output.
xml_data_value (String 1024 or Default String Length)
If you need Set/Reset functionality, you can use Modified Java Script Value or User Defined Java Class. User Defined Java Class is typically faster.
Samples
Sample transformations are included in design-tools/data-integration/samples/transformations:
XML Input Stream (StAX) Test 1 - Basic Tests.ktrXML Input Stream (StAX) Test 2 - Element Blocks.ktrXML Input Stream (StAX) Test 3 - Attribute Groups.ktrXML Input Stream (StAX) Test 4 - Hierarchies.ktrXML Input Stream (StAX) Test 5 - Performance Test Data for Element Blocks.ktrXML Input Stream (StAX) Test 6 - Namespaces.ktr
Example: element blocks
This example parses the XML Input Stream (StAX) Test 2 - Element Blocks.xml file, which includes two main blocks: Analyzer Lists and Products.
The transformation separates blocks by splitting the parent XML path into levels using Switch/Case steps. In more complex flows, consider using mappings (sub-transformations) so each block is clearly represented.
Sample XML:
Preview examples:
Step preview:

Example transformation:

Analyzer lists results:

Products results:

Metadata injection support
All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.
Last updated
Was this helpful?

