Microsoft Excel Input
You can use the Microsoft Excel Input step to read data from Microsoft Excel files.
Before you begin
The default spreadsheet engine is Excel 97-2003 XLS. If you are reading other file types (for example, OpenOffice ODS or Excel 2007+) or using features such as protected worksheets, change Spread sheet type (engine) in the Files tab.
If you are using password-protected worksheets, set Spread sheet type (engine) to Excel 2007 XLSX (Apache POI).
Step name
Step name specifies the unique name of the step on the canvas. You can change it.
Preview rows
Select Preview rows to display the rows generated by this step. Preview helps you confirm that your settings return the rows you expect.
Configure the step (tabs)
Files tab
Use the Files tab to define the source files.
Spread sheet type (engine)
Spreadsheet engine to use. Supported types include Excel 97-2003 XLS (default), Excel 2007 XLSX (Apache POI), Excel 2007 XLSX (Apache POI Streaming) (large files), and Open Office ODS.
File or directory
Source file or directory. Select Browse to locate the source, then select Add to add it to Selected files.
Regular expression
Regular expression used to match file names within a directory.
Exclude regular expression
Regular expression used to exclude file names within a directory.
Password
Password to open an Excel file when using Excel 2007 XLSX (Apache POI).
Accept filenames from previous steps
Reads file names from a previous step and an input field.
Selected files table
The Selected files table lists files and directories used as input sources.
File/Directory
Source file or directory added to the table.
Wildcard (RegExp)
Wildcards from Regular expression.
Exclude wildcard
Wildcards from Exclude regular expression.
Required
Marks the source as required.
Include subfolders
Includes subfolders under the directory source.
Select Delete to remove a source. Select Edit to remove a source from the table and return it to File or directory.
Select Show filename(s) to display the file names of sources that the step can connect to.
Sheets tab
Use the Sheets tab to specify which worksheets and starting positions to read.
Sheet name
Sheet name to read from the workbook.
Start row
Starting row (zero-based).
Start column
Starting column (zero-based).
To read all sheets in a workbook, leave Sheet name blank and specify only the start row and column. In this case, the field structure of each sheet must be the same.
Select Get sheetname(s) to populate the table with all sheets from the source specified in the Files tab.
Content tab
Use the Content tab to control which values are read.
Header
Skips the header row in the sheets specified in the Sheets tab.
No empty rows
Excludes empty rows from the output.
Stop on empty rows
Stops reading the current sheet when an empty row is encountered.
Limit
Limits the number of rows read. 0 means no limit.
Encoding
Text encoding to use. Leave blank to use the system default.
Error Handling tab
Use the Error Handling tab to control parsing behavior.
Strict types?
Reports data type errors while reading.
Ignore errors?
Ignores errors during parsing. Optionally writes errors to files in the directories specified below. If cleared, lines with errors appear as NULL values in the output.
Skip error lines?
Skips rows that contain errors.
Warnings file directory
Directory for warnings files: <warning dir>/filename.<date_time>.<warning extension>.
Error files directory
Directory for error files: <errorfile_dir>/filename.<date_time>.<errorfile_extension>.
Failing line numbers files directory
Directory for failing line numbers files: <errorline dir>/filename.<date_time>.<errorline extension>.
Fields tab
Use the Fields tab to define the fields extracted from the spreadsheet.
Name
Output field name.
Type
Field data type.
Length
Field length.
Precision
Floating-point digits for number-type fields.
Trim type
Trimming method to apply to strings.
Repeat
Repeats the last row’s value if a row is empty.
Format
Optional conversion mask. See Common Formats.
Currency
Currency symbol (for example, $ or €).
Decimal
Decimal symbol (for example, . in 5,000.00 or , in 5.000,00).
Group
Grouping symbol (for example, , in 10,000.00 or . in 5.000,00).
Select Get fields from header row to populate the table from the workbook.
See Understanding PDI data types and field metadata to maximize the efficiency of your transformation and job results.
Additional output fields tab
Use the Additional output fields tab to add file and sheet metadata to the output.
Full filename field
Full file name, including extension.
Sheetname field
Worksheet name.
Sheet row nr field
Current sheet row number.
Row nr written field
Number of rows written.
Short filename field
File name without path, with extension.
Extension field
File name extension.
Path field
Path in operating system format.
Size field
File size.
Is hidden field
Whether the file is hidden (Boolean).
Uri field
File URI.
Root uri field
Root part of the URI.
Metadata injection support
This step supports metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.
Supported fields include:
Options: File and directory, Regular expression, Exclude regular expression, Is file required, Include subfolders, Spreadsheet type, Sheet name, Sheet start row, Sheet start col
Values: Name, Type, Length, Precision, Trim type, Repeat, Format, Currency, Decimal, Grouping
Last updated
Was this helpful?

