Microsoft Excel Input

You can use the Microsoft Excel Input step to read data from Microsoft Excel files.

Before you begin

The default spreadsheet engine is Excel 97-2003 XLS. If you are reading other file types (for example, OpenOffice ODS or Excel 2007+) or using features such as protected worksheets, change Spread sheet type (engine) in the Files tab.

circle-info

If you are using password-protected worksheets, set Spread sheet type (engine) to Excel 2007 XLSX (Apache POI).

Step name

Step name specifies the unique name of the step on the canvas. You can change it.

Preview rows

Select Preview rows to display the rows generated by this step. Preview helps you confirm that your settings return the rows you expect.

Configure the step (tabs)

Files tab

Use the Files tab to define the source files.

Option
Description

Spread sheet type (engine)

Spreadsheet engine to use. Supported types include Excel 97-2003 XLS (default), Excel 2007 XLSX (Apache POI), Excel 2007 XLSX (Apache POI Streaming) (large files), and Open Office ODS.

File or directory

Source file or directory. Select Browse to locate the source, then select Add to add it to Selected files.

Regular expression

Regular expression used to match file names within a directory.

Exclude regular expression

Regular expression used to exclude file names within a directory.

Password

Password to open an Excel file when using Excel 2007 XLSX (Apache POI).

Accept filenames from previous steps

Reads file names from a previous step and an input field.

Selected files table

The Selected files table lists files and directories used as input sources.

Column
Description

File/Directory

Source file or directory added to the table.

Wildcard (RegExp)

Wildcards from Regular expression.

Exclude wildcard

Wildcards from Exclude regular expression.

Required

Marks the source as required.

Include subfolders

Includes subfolders under the directory source.

Select Delete to remove a source. Select Edit to remove a source from the table and return it to File or directory.

Select Show filename(s) to display the file names of sources that the step can connect to.

Sheets tab

Use the Sheets tab to specify which worksheets and starting positions to read.

Column
Description

Sheet name

Sheet name to read from the workbook.

Start row

Starting row (zero-based).

Start column

Starting column (zero-based).

To read all sheets in a workbook, leave Sheet name blank and specify only the start row and column. In this case, the field structure of each sheet must be the same.

Select Get sheetname(s) to populate the table with all sheets from the source specified in the Files tab.

Content tab

Use the Content tab to control which values are read.

Option
Description

Header

Skips the header row in the sheets specified in the Sheets tab.

No empty rows

Excludes empty rows from the output.

Stop on empty rows

Stops reading the current sheet when an empty row is encountered.

Limit

Limits the number of rows read. 0 means no limit.

Encoding

Text encoding to use. Leave blank to use the system default.

Error Handling tab

Use the Error Handling tab to control parsing behavior.

Option
Description

Strict types?

Reports data type errors while reading.

Ignore errors?

Ignores errors during parsing. Optionally writes errors to files in the directories specified below. If cleared, lines with errors appear as NULL values in the output.

Skip error lines?

Skips rows that contain errors.

Warnings file directory

Directory for warnings files: <warning dir>/filename.<date_time>.<warning extension>.

Error files directory

Directory for error files: <errorfile_dir>/filename.<date_time>.<errorfile_extension>.

Failing line numbers files directory

Directory for failing line numbers files: <errorline dir>/filename.<date_time>.<errorline extension>.

Fields tab

Use the Fields tab to define the fields extracted from the spreadsheet.

Column
Description

Name

Output field name.

Type

Field data type.

Length

Field length.

Precision

Floating-point digits for number-type fields.

Trim type

Trimming method to apply to strings.

Repeat

Repeats the last row’s value if a row is empty.

Format

Optional conversion mask. See Common Formats.

Currency

Currency symbol (for example, $ or ).

Decimal

Decimal symbol (for example, . in 5,000.00 or , in 5.000,00).

Group

Grouping symbol (for example, , in 10,000.00 or . in 5.000,00).

Select Get fields from header row to populate the table from the workbook.

See Understanding PDI data types and field metadata to maximize the efficiency of your transformation and job results.

Additional output fields tab

Use the Additional output fields tab to add file and sheet metadata to the output.

Option
Description

Full filename field

Full file name, including extension.

Sheetname field

Worksheet name.

Sheet row nr field

Current sheet row number.

Row nr written field

Number of rows written.

Short filename field

File name without path, with extension.

Extension field

File name extension.

Path field

Path in operating system format.

Size field

File size.

Is hidden field

Whether the file is hidden (Boolean).

Uri field

File URI.

Root uri field

Root part of the URI.

Metadata injection support

This step supports metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.

Supported fields include:

  • Options: File and directory, Regular expression, Exclude regular expression, Is file required, Include subfolders, Spreadsheet type, Sheet name, Sheet start row, Sheet start col

  • Values: Name, Type, Length, Precision, Trim type, Repeat, Format, Currency, Decimal, Grouping

Last updated

Was this helpful?