Text File Input

The Text File Input step reads data from a variety of text file types, including formats generated by spreadsheets and fixed-width flat files.

You can:

  • Read from a list of files or directories.

  • Use regular expressions to include or exclude files.

  • Accept file names from previous steps.

Step name and preview

  • Step name: Specifies the unique name of the step on the canvas. You can change it.

  • Preview rows: Displays the rows generated by this step based on your configuration. Use preview to validate that the configuration matches the rows you intend to read.

Configure the step (tabs)

The Text File Input step includes these tabs:

  • File

  • Content

  • Error Handling

  • Filters

  • Fields

  • Additional output fields

File tab

Use the File tab to specify the input file(s).

Option
Description

File or directory

Source file or directory. Select Browse to locate the file or folder, then select Add to include it in Selected files. For supported file system types, see Connecting to Virtual File Systems.

Regular expression

Regular expression to match files within the specified directory.

Exclude regular expression

Regular expression to exclude files within the specified directory.

Regular expression examples

You can use the Wildcard (RegExp) field to search using regular expressions.

File name
Regular expression
Files selected

/dirA/

.userdata.\\.txt

All files in /dirA/ with names containing userdata and ending with .txt.

/dirB/

AAA.\\*

All files in /dirB/ with names starting with AAA.

/dirC/

\\[ENG:A-Z\\]\\[ENG:0-9\\].\\*

All files in /dirC/ with names that start with a capital letter followed by a digit (A0–Z9).

Selected files table

The Selected files table is populated when you select Add after specifying File or directory.

Column
Description

File/Directory

Source location from File or directory.

Wildcard (RegExp)

Regular expression used to match file names within a directory.

Exclude wildcard

Regular expression used to exclude file names.

Required

Whether the source is required.

Include subfolders

Whether subfolders are included.

Select Delete to remove a source from the table. Select Edit to remove a source from the table and return it to File or directory.

Accept file names from previous steps

Use these options to read the file name from the incoming stream.

Option
Description

Accept filenames from previous step

Gets file names from a previous step.

Pass through fields from previous step

Passes fields from the previous step through this step unchanged.

Step to read file names from

The step that provides the file name(s).

Field in the input to use as filename

The field that contains the file name to read.

Show action buttons

After you configure sources, you can inspect the resolved file list and sample content.

Button
Description

Show filename(s)

Shows the file names of sources connected to the step.

Show file content

Shows raw content of the selected file.

Show content from first data line

Shows content starting at the first data line for the selected file.

Content tab

Use the Content tab to describe the file format.

Option
Description

Filetype

Select CSV or Fixed length. Based on this selection, the Get Fields behavior in the Fields tab changes.

Separator

Field delimiter (commonly semicolon or tab). Select Insert Tab to insert a tab character. Default: ;.

Enclosure

Optional enclosure character used when a field contains the separator character. Default: ".

Allow breaks in enclosed fields

Not implemented.

Escape

Escape character(s) indicating the next character is literal. Example: with escape \\ and enclosure ', the text Not the nine o\\'clock news is parsed as Not the nine o'clock news.

Header

Indicates the file has header lines. Use Number of header lines to specify how many.

Footer

Indicates the file has footer lines. Use Number of footer lines to specify how many.

Wrapped lines

Indicates data lines wrap beyond a page limit. Use Number of times wrapped.

Paged layout (printout)

Use for files designed for line printers. Use Document header lines and Number of lines per page to position data lines.

Compression

Select when the source is in a ZIP or GZip archive. Only the first file in the archive is read.

No empty rows

Do not send empty rows to downstream steps.

Include filename in output

Adds file name to the output. Specify Filename fieldname.

Rownum in output

Adds row number to the output. Specify Rownum fieldname. Select Rownum by file to reset per file.

Format

Line ending format: DOS, UNIX, or mixed. If mixed, no verification is performed.

Encoding

File encoding. Leave blank to use the system default. To use Unicode, specify UTF-8 or UTF-16.

Length

Length unit for fields: Characters or Bytes.

Limit

Limits the number of records generated. 0 means unlimited.

Be lenient when parsing dates?

When selected, invalid dates can be normalized (for example, Jan 32nd becomes Feb 1st). Clear for strict parsing.

The date format Locale

Locale to use when parsing dates written in full (for example, February 2nd, 2006).

Add filenames to result

Adds file names to the transformation result file list.

Error Handling tab

Use the Error Handling tab to control parsing behavior when the step encounters malformed records or unexpected file content.

Option
Description

Ignore errors?

Ignores errors during parsing.

Skip error files?

Skips files that contain errors. Optionally generates a file listing the files where errors occur.

Error file field name

Output field name to capture the error file name.

File error message field name

Output field name to capture the error message in the error file.

Skip error lines?

Skips lines that contain errors. Optionally generates a file listing the failing line numbers.

Error count fieldname

Output field name for the number of errors on the line.

Error fields fieldname

Output field name for the names of the fields where errors occurred.

Error text fieldname

Output field name for descriptions of parsing errors.

Warning files directory

Directory for warning files. File name format: <warning dir>/filename.<date_time>.<warning extension>.

Error files directory

Directory for error files. File name format: <errorfile_dir>/filename.<date_time>.<errorfile_extension>.

Failing line numbers files directory

Directory for failing line numbers files. File name format: <errorline dir>/filename.<date_time>.<errorline extension>.

Filters tab

Use the Filters tab to skip specific lines in the source file.

Column
Description

Filter string

String to search for.

Filter position

Position where the filter string must appear. 0 is the first position. Values below 0 search the entire line.

Stop on filter

Y stops processing the current file when encountered. N continues.

Positive match

Y processes matching lines. N ignores matching lines.

Fields tab

Use the Fields tab to define the fields to read from each line.

  • Select Get Fields to auto-populate fields based on your current Filetype, delimiter/enclosure settings (for CSV), and/or fixed-length configuration.

  • Select Preview to validate parsing.

circle-info

When Filetype is Fixed length, you typically define field positions and lengths. When Filetype is CSV, you typically define field types and conversion formats.

For guidance on choosing data types and field metadata, see Understanding PDI data types and field metadata.

Additional output fields tab

Use the Additional output fields tab to add file metadata to the output.

Option
Description

Short filename field

File name without path, with extension.

Extension field

File name extension.

Path field

Path in operating system format.

Size field

File size.

Is hidden field

Whether the file is hidden (Boolean).

Last modification field

Last modified date/time.

Uri field

File URI.

Root uri field

Root part of the URI.

Metadata injection support

This step supports metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.

See also

Last updated

Was this helpful?