Hadoop File Output

Use the Hadoop File Output step to write data to text files stored on a Hadoop cluster.

This step is commonly used to generate comma-separated values (CSV) files that are easily read by spreadsheet applications. You can also generate fixed-width files by setting field lengths on the Fields tab.

Step name

  • Step name: Specify the unique name of the Hadoop File Output step on the canvas. You can customize the name or leave the default.

Options

The Hadoop File Output step includes the following tabs: File, Content, and Fields.

File tab

File tab

Use the File tab to define the basic properties for the output file.

Option
Description

Hadoop Cluster

Hadoop cluster configuration to use.

You can specify host names and ports for HDFS, Job Tracker, and other components in the Hadoop Cluster configuration dialog box. Select Edit to edit an existing configuration or New to create a new one.

For details, see Connecting to a Hadoop cluster with the PDI client. | | Folder/File | Location and/or name of the output text file on the cluster. Select Browse to locate a folder or file in the VFS browser. | | Create Parent Folder | Select to create the parent folder for the output file. | | Do not create file at start | Select to avoid creating empty files when no rows are processed. | | Accept file name from field? | Select to specify the output file name in a field in the input stream.

This setting can be fine-tuned with kettle.properties. See Improving performance when writing multiple files. | | File name field | Field that contains the output file name at runtime. | | Extension | File extension. Default: .txt. | | Include stepnr in filename | Includes the copy number in the file name (for example, _0) when the step runs in multiple copies. | | Include partition nr in file name? | Includes the partition number in the file name. | | Include date in file name | Includes the system date in the file name (for example, _20181231). | | Include time in file name | Includes the system time in the file name (for example, _235959). | | Specify Date time format | Select to choose a custom date-time format in Date time format. | | Date time format | Date-time format to use. | | Show file name(s) | Displays a simulation of generated file names based on the step settings. | | Add filenames to result | Adds the file name to the internal result file set. |

Content tab

Content tab

Use the Content tab to describe the content written to the output text file.

Option
Description

Append

Appends lines to the end of the specified file.

Separator

Character that separates fields in a line. Typically semicolon (;) or tab.

Select Insert TAB to insert a tab character. | | Enclosure | Optional string used to enclose fields (to allow separator characters within fields). | | Force the enclosure around fields? | Select to enclose all fields using the value in Enclosure. | | Header | Select if the output file includes a header row. | | Footer | Select if the output file includes a footer row. | | Format | Line ending format: DOS or UNIX. | | Compression | Compression type: ZIP or GZIP. Only one file is placed in a single archive. | | Encoding | Text encoding. Leave blank to use the default system encoding. For Unicode, specify UTF-8 or UTF-16. | | Right pad fields | Adds spaces to the end of fields (or truncates) until the length specified in the Fields tab is reached. | | Fast data dump (no formatting) | Improves performance when dumping large amounts of data by omitting formatting. | | Split every ... rows | If N is greater than 0, splits output into multiple parts of N rows. | | Add Ending line of file | Specifies an alternate ending line for the output file. |

Fields tab

Use the Fields tab to define properties for exported fields.

Field
Description

Name

Field name.

Type

Field type: String, Date, or Number.

Format

Optional mask to convert the original field format.

Length

For Number: total number of significant figures. For String: string length. For Date: printed output length (for example, 4 for a year).

Precision

Number of digits after the decimal point for number fields.

Currency

Currency symbol (for example, $5,000.00 or €5.000,00).

Decimal

Decimal symbol (period . or comma ,).

Group

Grouping symbol (comma , or period .).

Trim Type

String trimming method. Trimming only works when no field length is specified.

Null

String to write when the input field value is null.

Last updated

Was this helpful?