# Microsoft Excel Input

You can use the **Microsoft Excel Input** step to read data from Microsoft Excel files.

### Before you begin

The default spreadsheet engine is **Excel 97-2003 XLS**. If you are reading other file types (for example, OpenOffice ODS or Excel 2007+) or using features such as protected worksheets, change **Spread sheet type (engine)** in the **Files** tab.

{% hint style="info" %}
If you are using password-protected worksheets, set **Spread sheet type (engine)** to **Excel 2007 XLSX (Apache POI)**.
{% endhint %}

### Step name

**Step name** specifies the unique name of the step on the canvas. You can change it.

### Preview rows

Select **Preview rows** to display the rows generated by this step. Preview helps you confirm that your settings return the rows you expect.

### Configure the step (tabs)

#### Files tab

Use the **Files** tab to define the source files.

| Option                                   | Description                                                                                                                                                                                              |
| ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Spread sheet type (engine)**           | Spreadsheet engine to use. Supported types include **Excel 97-2003 XLS** (default), **Excel 2007 XLSX (Apache POI)**, **Excel 2007 XLSX (Apache POI Streaming)** (large files), and **Open Office ODS**. |
| **File or directory**                    | Source file or directory. Select **Browse** to locate the source, then select **Add** to add it to **Selected files**.                                                                                   |
| **Regular expression**                   | Regular expression used to match file names within a directory.                                                                                                                                          |
| **Exclude regular expression**           | Regular expression used to exclude file names within a directory.                                                                                                                                        |
| **Password**                             | Password to open an Excel file when using Excel 2007 XLSX (Apache POI).                                                                                                                                  |
| **Accept filenames from previous steps** | Reads file names from a previous step and an input field.                                                                                                                                                |

**Selected files table**

The **Selected files** table lists files and directories used as input sources.

| Column                 | Description                                     |
| ---------------------- | ----------------------------------------------- |
| **File/Directory**     | Source file or directory added to the table.    |
| **Wildcard (RegExp)**  | Wildcards from **Regular expression**.          |
| **Exclude wildcard**   | Wildcards from **Exclude regular expression**.  |
| **Required**           | Marks the source as required.                   |
| **Include subfolders** | Includes subfolders under the directory source. |

Select **Delete** to remove a source. Select **Edit** to remove a source from the table and return it to **File or directory**.

Select **Show filename(s)** to display the file names of sources that the step can connect to.

#### Sheets tab

Use the **Sheets** tab to specify which worksheets and starting positions to read.

| Column           | Description                           |
| ---------------- | ------------------------------------- |
| **Sheet name**   | Sheet name to read from the workbook. |
| **Start row**    | Starting row (zero-based).            |
| **Start column** | Starting column (zero-based).         |

To read all sheets in a workbook, leave **Sheet name** blank and specify only the start row and column. In this case, the field structure of each sheet must be the same.

Select **Get sheetname(s)** to populate the table with all sheets from the source specified in the **Files** tab.

#### Content tab

Use the **Content** tab to control which values are read.

| Option                 | Description                                                         |
| ---------------------- | ------------------------------------------------------------------- |
| **Header**             | Skips the header row in the sheets specified in the **Sheets** tab. |
| **No empty rows**      | Excludes empty rows from the output.                                |
| **Stop on empty rows** | Stops reading the current sheet when an empty row is encountered.   |
| **Limit**              | Limits the number of rows read. `0` means no limit.                 |
| **Encoding**           | Text encoding to use. Leave blank to use the system default.        |

#### Error Handling tab

Use the **Error Handling** tab to control parsing behavior.

| Option                                   | Description                                                                                                                                                               |
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Strict types?**                        | Reports data type errors while reading.                                                                                                                                   |
| **Ignore errors?**                       | Ignores errors during parsing. Optionally writes errors to files in the directories specified below. If cleared, lines with errors appear as `NULL` values in the output. |
| **Skip error lines?**                    | Skips rows that contain errors.                                                                                                                                           |
| **Warnings file directory**              | Directory for warnings files: `<warning dir>/filename.<date_time>.<warning extension>`.                                                                                   |
| **Error files directory**                | Directory for error files: `<errorfile_dir>/filename.<date_time>.<errorfile_extension>`.                                                                                  |
| **Failing line numbers files directory** | Directory for failing line numbers files: `<errorline dir>/filename.<date_time>.<errorline extension>`.                                                                   |

#### Fields tab

Use the **Fields** tab to define the fields extracted from the spreadsheet.

| Column        | Description                                                                                                                           |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| **Name**      | Output field name.                                                                                                                    |
| **Type**      | Field data type.                                                                                                                      |
| **Length**    | Field length.                                                                                                                         |
| **Precision** | Floating-point digits for number-type fields.                                                                                         |
| **Trim type** | Trimming method to apply to strings.                                                                                                  |
| **Repeat**    | Repeats the last row’s value if a row is empty.                                                                                       |
| **Format**    | Optional conversion mask. See [Common Formats](/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats.md). |
| **Currency**  | Currency symbol (for example, `$` or `€`).                                                                                            |
| **Decimal**   | Decimal symbol (for example, `.` in `5,000.00` or `,` in `5.000,00`).                                                                 |
| **Group**     | Grouping symbol (for example, `,` in `10,000.00` or `.` in `5.000,00`).                                                               |

Select **Get fields from header row** to populate the table from the workbook.

See [Understanding PDI data types and field metadata](/pdia-data-integration/understanding-pdi-data-types-and-field-metadata.md) to maximize the efficiency of your transformation and job results.

#### Additional output fields tab

Use the **Additional output fields** tab to add file and sheet metadata to the output.

| Option                   | Description                             |
| ------------------------ | --------------------------------------- |
| **Full filename field**  | Full file name, including extension.    |
| **Sheetname field**      | Worksheet name.                         |
| **Sheet row nr field**   | Current sheet row number.               |
| **Row nr written field** | Number of rows written.                 |
| **Short filename field** | File name without path, with extension. |
| **Extension field**      | File name extension.                    |
| **Path field**           | Path in operating system format.        |
| **Size field**           | File size.                              |
| **Is hidden field**      | Whether the file is hidden (Boolean).   |
| **Uri field**            | File URI.                               |
| **Root uri field**       | Root part of the URI.                   |

### Metadata injection support

This step supports metadata injection. You can use it with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.

Supported fields include:

* **Options**: File and directory, Regular expression, Exclude regular expression, Is file required, Include subfolders, Spreadsheet type, Sheet name, Sheet start row, Sheet start col
* **Values**: Name, Type, Length, Precision, Trim type, Repeat, Format, Currency, Decimal, Grouping


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/microsoft-excel-input.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
