Text File Input
The Text File Input step reads data from a variety of text file types, including formats generated by spreadsheets and fixed-width flat files.
You can:
Read from a list of files or directories.
Use regular expressions to include or exclude files.
Accept file names from previous steps.
Step name and preview
Step name: Specifies the unique name of the step on the canvas. You can change it.
Preview rows: Displays the rows generated by this step based on your configuration. Use preview to validate that the configuration matches the rows you intend to read.
Configure the step (tabs)
The Text File Input step includes these tabs:
File
Content
Error Handling
Filters
Fields
Additional output fields
File tab
Use the File tab to specify the input file(s).
File or directory
Source file or directory. Select Browse to locate the file or folder, then select Add to include it in Selected files. For supported file system types, see Connecting to Virtual File Systems.
Regular expression
Regular expression to match files within the specified directory.
Exclude regular expression
Regular expression to exclude files within the specified directory.
Regular expression examples
You can use the Wildcard (RegExp) field to search using regular expressions.
/dirA/
.userdata.\\.txt
All files in /dirA/ with names containing userdata and ending with .txt.
/dirB/
AAA.\\*
All files in /dirB/ with names starting with AAA.
/dirC/
\\[ENG:A-Z\\]\\[ENG:0-9\\].\\*
All files in /dirC/ with names that start with a capital letter followed by a digit (A0–Z9).
Selected files table
The Selected files table is populated when you select Add after specifying File or directory.
File/Directory
Source location from File or directory.
Wildcard (RegExp)
Regular expression used to match file names within a directory.
Exclude wildcard
Regular expression used to exclude file names.
Required
Whether the source is required.
Include subfolders
Whether subfolders are included.
Select Delete to remove a source from the table. Select Edit to remove a source from the table and return it to File or directory.
Accept file names from previous steps
Use these options to read the file name from the incoming stream.
Accept filenames from previous step
Gets file names from a previous step.
Pass through fields from previous step
Passes fields from the previous step through this step unchanged.
Step to read file names from
The step that provides the file name(s).
Field in the input to use as filename
The field that contains the file name to read.
Show action buttons
After you configure sources, you can inspect the resolved file list and sample content.
Show filename(s)
Shows the file names of sources connected to the step.
Show file content
Shows raw content of the selected file.
Show content from first data line
Shows content starting at the first data line for the selected file.
Content tab
Use the Content tab to describe the file format.
Filetype
Select CSV or Fixed length. Based on this selection, the Get Fields behavior in the Fields tab changes.
Separator
Field delimiter (commonly semicolon or tab). Select Insert Tab to insert a tab character. Default: ;.
Enclosure
Optional enclosure character used when a field contains the separator character. Default: ".
Allow breaks in enclosed fields
Not implemented.
Escape
Escape character(s) indicating the next character is literal. Example: with escape \\ and enclosure ', the text Not the nine o\\'clock news is parsed as Not the nine o'clock news.
Header
Indicates the file has header lines. Use Number of header lines to specify how many.
Footer
Indicates the file has footer lines. Use Number of footer lines to specify how many.
Wrapped lines
Indicates data lines wrap beyond a page limit. Use Number of times wrapped.
Paged layout (printout)
Use for files designed for line printers. Use Document header lines and Number of lines per page to position data lines.
Compression
Select when the source is in a ZIP or GZip archive. Only the first file in the archive is read.
No empty rows
Do not send empty rows to downstream steps.
Include filename in output
Adds file name to the output. Specify Filename fieldname.
Rownum in output
Adds row number to the output. Specify Rownum fieldname. Select Rownum by file to reset per file.
Format
Line ending format: DOS, UNIX, or mixed. If mixed, no verification is performed.
Encoding
File encoding. Leave blank to use the system default. To use Unicode, specify UTF-8 or UTF-16.
Length
Length unit for fields: Characters or Bytes.
Limit
Limits the number of records generated. 0 means unlimited.
Be lenient when parsing dates?
When selected, invalid dates can be normalized (for example, Jan 32nd becomes Feb 1st). Clear for strict parsing.
The date format Locale
Locale to use when parsing dates written in full (for example, February 2nd, 2006).
Add filenames to result
Adds file names to the transformation result file list.
Error Handling tab
Use the Error Handling tab to control parsing behavior when the step encounters malformed records or unexpected file content.
Ignore errors?
Ignores errors during parsing.
Skip error files?
Skips files that contain errors. Optionally generates a file listing the files where errors occur.
Error file field name
Output field name to capture the error file name.
File error message field name
Output field name to capture the error message in the error file.
Skip error lines?
Skips lines that contain errors. Optionally generates a file listing the failing line numbers.
Error count fieldname
Output field name for the number of errors on the line.
Error fields fieldname
Output field name for the names of the fields where errors occurred.
Error text fieldname
Output field name for descriptions of parsing errors.
Warning files directory
Directory for warning files. File name format: <warning dir>/filename.<date_time>.<warning extension>.
Error files directory
Directory for error files. File name format: <errorfile_dir>/filename.<date_time>.<errorfile_extension>.
Failing line numbers files directory
Directory for failing line numbers files. File name format: <errorline dir>/filename.<date_time>.<errorline extension>.
Filters tab
Use the Filters tab to skip specific lines in the source file.
Filter string
String to search for.
Filter position
Position where the filter string must appear. 0 is the first position. Values below 0 search the entire line.
Stop on filter
Y stops processing the current file when encountered. N continues.
Positive match
Y processes matching lines. N ignores matching lines.
Fields tab
Use the Fields tab to define the fields to read from each line.
Select Get Fields to auto-populate fields based on your current Filetype, delimiter/enclosure settings (for CSV), and/or fixed-length configuration.
Select Preview to validate parsing.
When Filetype is Fixed length, you typically define field positions and lengths. When Filetype is CSV, you typically define field types and conversion formats.
For guidance on choosing data types and field metadata, see Understanding PDI data types and field metadata.
Additional output fields tab
Use the Additional output fields tab to add file metadata to the output.
Short filename field
File name without path, with extension.
Extension field
File name extension.
Path field
Path in operating system format.
Size field
File size.
Is hidden field
Whether the file is hidden (Boolean).
Last modification field
Last modified date/time.
Uri field
File URI.
Root uri field
Root part of the URI.
Metadata injection support
This step supports metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.
See also
Last updated
Was this helpful?

