CSV File Input
The CSV File Input step reads data from delimited text files into a PDI transformation. Although this step is called CSV File Input, you can use it with many delimiter types, such as pipes (|), tabs, and semicolons (;).
The semicolon (;) is the default delimiter for this step.
The options for this step are a subset of the Text File Input step. CSV File Input differs from Text File Input in the following ways:
NIO (non-blocking I/O)
The step uses native system calls to read files faster, but it is limited to local files. It does not support VFS.
Parallel running
If you configure this step to run in multiple copies (or in clustered mode) and enable Running in parallel?, each copy reads a separate block of a single file.
Lazy conversion
You can use Lazy conversion? to avoid unnecessary data type conversions. This is most useful when you pass most fields through the transformation without changing them (for example, from a text file to a text file or database).
An example transformation (CSV Input - Reading customer data.ktr) is available in the data-integration/samples/transformations directory.
Options

The CSV File Input step includes the following options.
Step name
Specify the unique name of the CSV File Input step on the canvas. You can customize the name or leave it as the default.
Filename
Specify the input CSV file name, or select Browse to locate it. If your source is from a previous step, Browse is hidden. Use the drop-down list in the field to select the stream field that contains the CSV file name (or names).
Include the filename in the output? (Only appears if your source is from a previous step)
If your source is from a previous step, select this option to include the input source file name in the output.
Delimiter
Specify the delimiter character used in the source file. You can set special characters (for example, CHAR HEX01) by using the format $[value], such as $[01] or $[6F,FF,00,1F].
Enclosure
Specify the enclosure character used in the source file. You can set special characters (for example, CHAR HEX01) by using the format $[value], such as $[01] or $[6F,FF,00,1F].
NIO buffer size
Specify the read buffer size in bytes (how many bytes are read at one time from the source).
Lazy conversion?
Select this option to use lazy conversion for better performance. Lazy conversion can provide significant performance improvements when you read data from a text file and write it back out without transforming most fields.
Header row present?
Select this option if the first row in the source file contains column names.
Add filename to result
Adds the CSV source file name (or names) to the result of the transformation.
The row number field name (optional)
Specify the name of the field that contains the row number in the step output.
Running in parallel?
Select this option if you will run multiple copies of this step and you want each copy to read a separate part of the CSV file (or files).
When reading multiple files, the step uses the total size of all files to split the workload. Make sure all step copies receive the complete file list. Otherwise, the parallel algorithm might not work as expected.
Caution: Parallel reading is supported only for files that do not contain fields with line breaks or carriage returns.
New line possible in fields?
Select this option if fields can contain newline characters.
Format
Select the file line-ending format: DOS, UNIX, or mixed. UNIX files end lines with line feeds. DOS files use carriage returns and line feeds. If you select mixed, the step does not verify line endings.
File encoding
Specify the source file encoding.
Fields
Use the Fields table to define which fields to read from your CSV file.
Select Get fields to populate the table based on the current settings (such as Delimiter and Enclosure).
Select Preview to review the output rows.
The table contains the following columns.
Name
The field name.
Type
The field type (for example, String, Date, or Number).
Format
Optional format mask to convert the field value. See Common Formats for common date and numeric formats.
Length
Field length by type:
Number: total number of significant figures.
String: total string length.
Date: length of printed output (for example,
yyyyfor a year).
Precision
Number of floating-point digits for number fields.
Currency
Currency symbol (for example, $ or €).
Decimal
Decimal symbol (dot . or comma ,).
Group
Grouping symbol (comma , or dot .).
Trim Type
Trimming method applied to strings.
For guidance on choosing data types and field metadata, see Understanding PDI data types and field metadata.
Metadata injection support
You can use ETL metadata injection with CSV File Input to pass metadata to your transformation at runtime.
The following CSV File Input fields support metadata injection:
Options: Filename, Delimiter, Enclosure, NIO Buffer Size, Lazy Conversion, Header Row Present?, Add Filename to Result, The Row Number Field Name, Running in Parallel?, and File Encoding.
Values: Name, Length, Decimal, Type, Precision, Group, Format, Currency, and Trim Type.
See also
Last updated
Was this helpful?

