S3 File Output

The S3 File Output step writes rows to a text file in Amazon Simple Storage Service (Amazon S3)arrow-up-right.

Performance and memory considerations

Because S3 does not support append mode, and because generated output may be buffered, you might see an out-of-memory error followed by java.io.IOException: Read end dead when a transformation closes the file.

To help avoid these errors:

  • Increase the Java heap space (-Xmx) for Spoon.

  • Set the Kettle property s3.vfs.useTempFileOnUploadData=Y to write a temporary file locally and then upload it to S3.

When using multipart uploads, the default part size is 5 MB to help avoid upload inactivity timeouts. You can override the part size using the s3.vfs.partSize property (range: 5 MB to 1 GB) using formats like 5MB, 5.5MB, or 1GB.

General

  • Step name: Specify the unique name of the S3 File Output step on the canvas. You can customize the name or leave it as the default.

File tab

File tab in S3 File Output

Use the File tab to define the output file location and naming.

Option
Description

Filename

Output file name.

S3 file paths use the following schema:

s3n://s3_bucket_name/absolute_path_to_file | | Do not create file at start | Select to create the file only at the end of processing. | | Accept file name from field? | Select to set the file name from an incoming field. | | File name field | Incoming field that contains the file name (available only when Accept file name from field? is selected). | | Extension | File extension to append. Default is .txt. | | Include stepnr in filename? | Select to include the step copy number in the file name (for example, _0). | | Include partition nr in file name? | Select to include the partition number in the file name. | | Include date in file name? | Select to include the system date (for example, _20181231). | | Include time in file name? | Select to include the system time (for example, _235959). | | Specify date time format | Select to include date/time using Date time format. | | Date time format | Date/time format to use. | | Show filename(s) | Displays a simulated list of files that will be generated. | | Add filenames to result | Clear if you do not want to add generated file names to the transformation result. |

Content tab

Content tab in S3 File Output

Use the Content tab to define how the text output is formatted.

Option
Description

Append

Select to append lines to the end of the file.

Separator

Character used to separate fields in a line (for example, ; or a tab). Default is ;. Select Insert Tab to insert a tab.

Enclosure

Optional character to enclose fields (for example, "). Helps preserve separators inside field values.

Force the enclosure around fields?

Select to enclose all fields using the enclosure character.

Header

Clear if the output file should not include a header line.

Format

Line ending format: DOS or UNIX. Default is CR + LF (Windows/DOS).

Compression

Output compression: .zip or .gzip. Only one file is included in the archive. Default is None.

Encoding

Text encoding. Leave blank to use the system default. Use UTF-8 or UTF-16 for Unicode.

Right pad fields

Select to pad fields with spaces (or truncate) to reach the Length specified on the Fields tab.

Fast data dump (no formatting)

Select to improve performance by omitting formatting.

Split every ... rows

Splits output into files of N rows (when N is greater than 0).

Add ending line of file

Optional final line to add to the file.

Fields tab

Fields tab in S3 File Output

Use the Fields tab to define properties for exported fields.

Column
Description

Name

Field name.

Type

Field data type.

Format

Format mask for date and numeric fields. See Common Formats.

Length

Field length.

  • Number: Total number of significant digits.

  • String: Total length of the string.

  • Date: Length of printed output (for example, 4 for a year). | | Precision | Number of floating point digits for number fields. | | Currency | Currency symbol (for example, $ or €). | | Decimal | Decimal point character (. or ,). | | Group | Thousands separator character (. or ,). | | Trim type | Trimming method (none, left, right, both). Trimming works only when Length is not specified. | | Null | String to write when the field value is null. | | Get fields | Retrieves fields from the incoming stream. | | Minimal width | Minimizes field lengths by removing unnecessary padding. |

For more information, see Understanding PDI data types and field metadata.

AWS credentials

The S3 File Output step provides credentials to the AWS SDK for Java using a credential provider chain. By default, the chain looks for credentials in the following locations (in this order):

  1. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN

  2. AWS credentials file (for example, ~/.aws/credentials or %UserProfile%\.aws\credentials)

  3. AWS CLI configuration file (for example, ~/.aws/config)

  4. ECS container credentials

  5. EC2 instance profile credentials

For details, see:

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.

See also

Last updated

Was this helpful?