# S3 CSV Input

The S3 CSV Input step loads a CSV file from an [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html) bucket into your transformation.

{% hint style="warning" %}
For technical reasons, parallel reading of S3 files is supported only for files that do not contain line breaks or carriage returns inside fields.
{% endhint %}

### Options

![S3 CSV Input](/files/hWiTR2AWIOvgncLzIXYE)

| Option        | Description                                                                                                            |
| ------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **Step name** | Specify the unique name of the S3 CSV Input step on the canvas. You can customize the name or leave it as the default. |
| **S3 bucket** | S3 bucket where the CSV object is stored. You can also select **Select bucket** to browse and choose a bucket.         |
| **Filename**  | Input file name.                                                                                                       |

* You can enter the S3 object path directly.
* Or, if this step receives rows from another step, you can select an incoming field that contains the S3 object path at runtime.

S3 file paths use the following schema:

`s3n://s3_bucket_name/absolute_path_to_file` | | **Delimiter** | Field delimiter character. Default is `;`. Select **Insert Tab** to use a tab delimiter. You can specify special characters using `$[value]` (for example, `$[01]` or `$[6F,FF,00,1F]`). | | **Enclosure** | Field enclosure character. Default is `"`. You can specify special characters using `$[value]` (for example, `$[01]` or `$[6F,FF,00,1F]`). | | **Max line size** | Maximum number of characters read per line. Default is `5000`. | | **Lazy conversion?** | Select to delay converting row data until it is needed. | | **Header row present?** | Select if the source file contains a header row with column names. | | **The row number field name** | Name of the output field that contains the row number. | | **Running in parallel** | Select if you run multiple copies of this step and you want each copy to read a separate part of the S3 file(s).

When reading multiple files, the step uses the total size of all files to split the workload. In that case, ensure all step copies receive all file names; otherwise, parallel reading might not work correctly. |

### Fields

Use the **Fields** table to define the fields to read from the S3 CSV file.

* Select **Get fields** to populate the table using the current parsing settings (for example, **Delimiter** and **Enclosure**).
* Select **Preview** to preview the incoming data.

| Column     | Description                                                                                                                                          |
| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Name**   | Field name.                                                                                                                                          |
| **Type**   | Field data type.                                                                                                                                     |
| **Format** | Format mask for date and numeric fields. See [Common Formats](/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats.md). |
| **Length** | Field length.                                                                                                                                        |

* **Number**: Total number of significant digits.
* **String**: Total length of the string.
* **Date**: Length of printed output (for example, 4 for a year). | | **Precision** | Number of floating point digits for number fields. | | **Currency** | Currency symbol (for example, `$` or `€`). | | **Decimal** | Decimal point character (`.` or `,`). | | **Group** | Thousands separator character (`.` or `,`). | | **Trim type** | Trimming method (**none**, **left**, **right**, **both**). Trimming works only when **Length** is not specified. |

For more information, see [Understanding PDI data types and field metadata](/pdia-data-integration/understanding-pdi-data-types-and-field-metadata.md).

### AWS credentials

The S3 CSV Input step provides credentials to the AWS SDK for Java using a credential provider chain. By default, the chain looks for credentials in the following locations (in this order):

1. Environment variables: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN`
2. AWS credentials file (for example, `~/.aws/credentials` or `%UserProfile%\\.aws\\credentials`)
3. AWS CLI configuration file (for example, `~/.aws/config`)
4. ECS container credentials
5. EC2 instance profile credentials

For details, see:

* [AWS environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-environment.html)
* [AWS configuration and credential files](https://docs.aws.amazon.com/cli/latest/userguide/cli-config-files.html)
* [Working with AWS credentials](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html)

### Metadata injection support

All fields of this step support metadata injection. You can use this step with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.

### See also

* [S3 File Output](/pdia-data-integration/pdi-transformation-steps-reference-overview/s3-file-output-cp.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/s3-csv-input-cp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
