S3 CSV Input
The S3 CSV Input step loads a CSV file from an Amazon Simple Storage Service (Amazon S3) bucket into your transformation.
For technical reasons, parallel reading of S3 files is supported only for files that do not contain line breaks or carriage returns inside fields.
Options

Step name
Specify the unique name of the S3 CSV Input step on the canvas. You can customize the name or leave it as the default.
S3 bucket
S3 bucket where the CSV object is stored. You can also select Select bucket to browse and choose a bucket.
Filename
Input file name.
You can enter the S3 object path directly.
Or, if this step receives rows from another step, you can select an incoming field that contains the S3 object path at runtime.
S3 file paths use the following schema:
s3n://s3_bucket_name/absolute_path_to_file | | Delimiter | Field delimiter character. Default is ;. Select Insert Tab to use a tab delimiter. You can specify special characters using $[value] (for example, $[01] or $[6F,FF,00,1F]). | | Enclosure | Field enclosure character. Default is ". You can specify special characters using $[value] (for example, $[01] or $[6F,FF,00,1F]). | | Max line size | Maximum number of characters read per line. Default is 5000. | | Lazy conversion? | Select to delay converting row data until it is needed. | | Header row present? | Select if the source file contains a header row with column names. | | The row number field name | Name of the output field that contains the row number. | | Running in parallel | Select if you run multiple copies of this step and you want each copy to read a separate part of the S3 file(s).
When reading multiple files, the step uses the total size of all files to split the workload. In that case, ensure all step copies receive all file names; otherwise, parallel reading might not work correctly. |
Fields
Use the Fields table to define the fields to read from the S3 CSV file.
Select Get fields to populate the table using the current parsing settings (for example, Delimiter and Enclosure).
Select Preview to preview the incoming data.
Name
Field name.
Type
Field data type.
Format
Format mask for date and numeric fields. See Common Formats.
Length
Field length.
Number: Total number of significant digits.
String: Total length of the string.
Date: Length of printed output (for example, 4 for a year). | | Precision | Number of floating point digits for number fields. | | Currency | Currency symbol (for example,
$or€). | | Decimal | Decimal point character (.or,). | | Group | Thousands separator character (.or,). | | Trim type | Trimming method (none, left, right, both). Trimming works only when Length is not specified. |
For more information, see Understanding PDI data types and field metadata.
AWS credentials
The S3 CSV Input step provides credentials to the AWS SDK for Java using a credential provider chain. By default, the chain looks for credentials in the following locations (in this order):
Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_SESSION_TOKENAWS credentials file (for example,
~/.aws/credentialsor%UserProfile%\\.aws\\credentials)AWS CLI configuration file (for example,
~/.aws/config)ECS container credentials
EC2 instance profile credentials
For details, see:
Metadata injection support
All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.
See also
Last updated
Was this helpful?

