S3 CSV Input

The S3 CSV Input step loads a CSV file from an Amazon Simple Storage Service (Amazon S3) bucket into your transformation.

For technical reasons, parallel reading of S3 files is supported only for files that do not contain line breaks or carriage returns inside fields.

Options

Option

Description

Step name

Specify the unique name of the S3 CSV Input step on the canvas. You can customize the name or leave it as the default.

S3 bucket

S3 bucket where the CSV object is stored. You can also select Select bucket to browse and choose a bucket.

Filename

Input file name.

You can enter the S3 object path directly.
Or, if this step receives rows from another step, you can select an incoming field that contains the S3 object path at runtime.

S3 file paths use the following schema:

s3n://s3_bucket_name/absolute_path_to_file | | Delimiter | Field delimiter character. Default is ;. Select Insert Tab to use a tab delimiter. You can specify special characters using $[value] (for example, $[01] or $[6F,FF,00,1F]). | | Enclosure | Field enclosure character. Default is ". You can specify special characters using $[value] (for example, $[01] or $[6F,FF,00,1F]). | | Max line size | Maximum number of characters read per line. Default is 5000. | | Lazy conversion? | Select to delay converting row data until it is needed. | | Header row present? | Select if the source file contains a header row with column names. | | The row number field name | Name of the output field that contains the row number. | | Running in parallel | Select if you run multiple copies of this step and you want each copy to read a separate part of the S3 file(s).

When reading multiple files, the step uses the total size of all files to split the workload. In that case, ensure all step copies receive all file names; otherwise, parallel reading might not work correctly. |

Fields

Use the Fields table to define the fields to read from the S3 CSV file.

Select Get fields to populate the table using the current parsing settings (for example, Delimiter and Enclosure).
Select Preview to preview the incoming data.

Column

Description

Name

Field name.

Type

Field data type.

Format

Format mask for date and numeric fields. See Common Formats.

Length

Field length.

Number: Total number of significant digits.
String: Total length of the string.
Date: Length of printed output (for example, 4 for a year). | | Precision | Number of floating point digits for number fields. | | Currency | Currency symbol (for example, $ or €). | | Decimal | Decimal point character (. or ,). | | Group | Thousands separator character (. or ,). | | Trim type | Trimming method (none, left, right, both). Trimming works only when Length is not specified. |

For more information, see Understanding PDI data types and field metadata.

AWS credentials

The S3 CSV Input step provides credentials to the AWS SDK for Java using a credential provider chain. By default, the chain looks for credentials in the following locations (in this order):

Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
AWS credentials file (for example, ~/.aws/credentials or %UserProfile%\\.aws\\credentials)
AWS CLI configuration file (for example, ~/.aws/config)
ECS container credentials
EC2 instance profile credentials

For details, see:

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.

hashtagOptions

hashtagFields

hashtagAWS credentials

hashtagMetadata injection support

hashtagSee also

Options

Fields

AWS credentials

Metadata injection support

See also