# Text File Input

The **Text File Input** step reads data from a variety of text file types, including formats generated by spreadsheets and fixed-width flat files.

You can:

* Read from a list of files or directories.
* Use regular expressions to include or exclude files.
* Accept file names from previous steps.

### Step name and preview

* **Step name**: Specifies the unique name of the step on the canvas. You can change it.
* **Preview rows**: Displays the rows generated by this step based on your configuration. Use preview to validate that the configuration matches the rows you intend to read.

### Configure the step (tabs)

The **Text File Input** step includes these tabs:

* **File**
* **Content**
* **Error Handling**
* **Filters**
* **Fields**
* **Additional output fields**

#### File tab

Use the **File** tab to specify the input file(s).

| Option                         | Description                                                                                                                                                                                                                                                                                   |
| ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **File or directory**          | Source file or directory. Select **Browse** to locate the file or folder, then select **Add** to include it in **Selected files**. For supported file system types, see [Connecting to Virtual File Systems](/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser.md). |
| **Regular expression**         | Regular expression to match files within the specified directory.                                                                                                                                                                                                                             |
| **Exclude regular expression** | Regular expression to exclude files within the specified directory.                                                                                                                                                                                                                           |

**Regular expression examples**

You can use the **Wildcard (RegExp)** field to search using regular expressions.

| File name | Regular expression               | Files selected                                                                                     |
| --------- | -------------------------------- | -------------------------------------------------------------------------------------------------- |
| `/dirA/`  | `.userdata.\\.txt`               | All files in `/dirA/` with names containing `userdata` and ending with `.txt`.                     |
| `/dirB/`  | `AAA.\\*`                        | All files in `/dirB/` with names starting with `AAA`.                                              |
| `/dirC/`  | `\\[ENG:A-Z\\]\\[ENG:0-9\\].\\*` | All files in `/dirC/` with names that start with a capital letter followed by a digit (`A0`–`Z9`). |

**Selected files table**

The **Selected files** table is populated when you select **Add** after specifying **File or directory**.

| Column                 | Description                                                     |
| ---------------------- | --------------------------------------------------------------- |
| **File/Directory**     | Source location from **File or directory**.                     |
| **Wildcard (RegExp)**  | Regular expression used to match file names within a directory. |
| **Exclude wildcard**   | Regular expression used to exclude file names.                  |
| **Required**           | Whether the source is required.                                 |
| **Include subfolders** | Whether subfolders are included.                                |

Select **Delete** to remove a source from the table. Select **Edit** to remove a source from the table and return it to **File or directory**.

**Accept file names from previous steps**

Use these options to read the file name from the incoming stream.

| Option                                     | Description                                                       |
| ------------------------------------------ | ----------------------------------------------------------------- |
| **Accept filenames from previous step**    | Gets file names from a previous step.                             |
| **Pass through fields from previous step** | Passes fields from the previous step through this step unchanged. |
| **Step to read file names from**           | The step that provides the file name(s).                          |
| **Field in the input to use as filename**  | The field that contains the file name to read.                    |

**Show action buttons**

After you configure sources, you can inspect the resolved file list and sample content.

| Button                                | Description                                                          |
| ------------------------------------- | -------------------------------------------------------------------- |
| **Show filename(s)**                  | Shows the file names of sources connected to the step.               |
| **Show file content**                 | Shows raw content of the selected file.                              |
| **Show content from first data line** | Shows content starting at the first data line for the selected file. |

#### Content tab

Use the **Content** tab to describe the file format.

| Option                              | Description                                                                                                                                                                                 |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Filetype**                        | Select **CSV** or **Fixed length**. Based on this selection, the **Get Fields** behavior in the **Fields** tab changes.                                                                     |
| **Separator**                       | Field delimiter (commonly semicolon or tab). Select **Insert Tab** to insert a tab character. Default: `;`.                                                                                 |
| **Enclosure**                       | Optional enclosure character used when a field contains the separator character. Default: `"`.                                                                                              |
| **Allow breaks in enclosed fields** | Not implemented.                                                                                                                                                                            |
| **Escape**                          | Escape character(s) indicating the next character is literal. Example: with escape `\\` and enclosure `'`, the text `Not the nine o\\'clock news` is parsed as `Not the nine o'clock news`. |
| **Header**                          | Indicates the file has header lines. Use **Number of header lines** to specify how many.                                                                                                    |
| **Footer**                          | Indicates the file has footer lines. Use **Number of footer lines** to specify how many.                                                                                                    |
| **Wrapped lines**                   | Indicates data lines wrap beyond a page limit. Use **Number of times wrapped**.                                                                                                             |
| **Paged layout (printout)**         | Use for files designed for line printers. Use **Document header lines** and **Number of lines per page** to position data lines.                                                            |
| **Compression**                     | Select when the source is in a ZIP or GZip archive. Only the first file in the archive is read.                                                                                             |
| **No empty rows**                   | Do not send empty rows to downstream steps.                                                                                                                                                 |
| **Include filename in output**      | Adds file name to the output. Specify **Filename fieldname**.                                                                                                                               |
| **Rownum in output**                | Adds row number to the output. Specify **Rownum fieldname**. Select **Rownum by file** to reset per file.                                                                                   |
| **Format**                          | Line ending format: **DOS**, **UNIX**, or **mixed**. If **mixed**, no verification is performed.                                                                                            |
| **Encoding**                        | File encoding. Leave blank to use the system default. To use Unicode, specify `UTF-8` or `UTF-16`.                                                                                          |
| **Length**                          | Length unit for fields: **Characters** or **Bytes**.                                                                                                                                        |
| **Limit**                           | Limits the number of records generated. `0` means unlimited.                                                                                                                                |
| **Be lenient when parsing dates?**  | When selected, invalid dates can be normalized (for example, `Jan 32nd` becomes `Feb 1st`). Clear for strict parsing.                                                                       |
| **The date format Locale**          | Locale to use when parsing dates written in full (for example, `February 2nd, 2006`).                                                                                                       |
| **Add filenames to result**         | Adds file names to the transformation result file list.                                                                                                                                     |

#### Error Handling tab

Use the **Error Handling** tab to control parsing behavior when the step encounters malformed records or unexpected file content.

| Option                                   | Description                                                                                                               |
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| **Ignore errors?**                       | Ignores errors during parsing.                                                                                            |
| **Skip error files?**                    | Skips files that contain errors. Optionally generates a file listing the files where errors occur.                        |
| **Error file field name**                | Output field name to capture the error file name.                                                                         |
| **File error message field name**        | Output field name to capture the error message in the error file.                                                         |
| **Skip error lines?**                    | Skips lines that contain errors. Optionally generates a file listing the failing line numbers.                            |
| **Error count fieldname**                | Output field name for the number of errors on the line.                                                                   |
| **Error fields fieldname**               | Output field name for the names of the fields where errors occurred.                                                      |
| **Error text fieldname**                 | Output field name for descriptions of parsing errors.                                                                     |
| **Warning files directory**              | Directory for warning files. File name format: `<warning dir>/filename.<date_time>.<warning extension>`.                  |
| **Error files directory**                | Directory for error files. File name format: `<errorfile_dir>/filename.<date_time>.<errorfile_extension>`.                |
| **Failing line numbers files directory** | Directory for failing line numbers files. File name format: `<errorline dir>/filename.<date_time>.<errorline extension>`. |

#### Filters tab

Use the **Filters** tab to skip specific lines in the source file.

| Column              | Description                                                                                                       |
| ------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **Filter string**   | String to search for.                                                                                             |
| **Filter position** | Position where the filter string must appear. `0` is the first position. Values below `0` search the entire line. |
| **Stop on filter**  | `Y` stops processing the current file when encountered. `N` continues.                                            |
| **Positive match**  | `Y` processes matching lines. `N` ignores matching lines.                                                         |

#### Fields tab

Use the **Fields** tab to define the fields to read from each line.

* Select **Get Fields** to auto-populate fields based on your current **Filetype**, delimiter/enclosure settings (for CSV), and/or fixed-length configuration.
* Select **Preview** to validate parsing.

{% hint style="info" %}
When **Filetype** is **Fixed length**, you typically define field **positions** and **lengths**. When **Filetype** is **CSV**, you typically define field **types** and **conversion formats**.
{% endhint %}

For guidance on choosing data types and field metadata, see [Understanding PDI data types and field metadata](/pdia-data-integration/understanding-pdi-data-types-and-field-metadata.md).

#### Additional output fields tab

Use the **Additional output fields** tab to add file metadata to the output.

| Option                      | Description                             |
| --------------------------- | --------------------------------------- |
| **Short filename field**    | File name without path, with extension. |
| **Extension field**         | File name extension.                    |
| **Path field**              | Path in operating system format.        |
| **Size field**              | File size.                              |
| **Is hidden field**         | Whether the file is hidden (Boolean).   |
| **Last modification field** | Last modified date/time.                |
| **Uri field**               | File URI.                               |
| **Root uri field**          | Root part of the URI.                   |

### Metadata injection support

This step supports metadata injection. You can use it with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.

### See also

* [CSV File Input](/pdia-data-integration/pdi-transformation-steps-reference-overview/csv-file-input.md)
* [Common Formats](/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/text-file-input-cp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
