# Hadoop File Output

Use the **Hadoop File Output** step to write data to text files stored on a Hadoop cluster.

This step is commonly used to generate comma-separated values (CSV) files that are easily read by spreadsheet applications. You can also generate fixed-width files by setting field lengths on the **Fields** tab.

### Step name

* **Step name**: Specify the unique name of the Hadoop File Output step on the canvas. You can customize the name or leave the default.

### Options

The Hadoop File Output step includes the following tabs: **File**, **Content**, and **Fields**.

#### File tab

![File tab](/files/qwkED3tuQr33SIRGOUFc)

Use the **File** tab to define the basic properties for the output file.

| Option             | Description                          |
| ------------------ | ------------------------------------ |
| **Hadoop Cluster** | Hadoop cluster configuration to use. |

You can specify host names and ports for HDFS, Job Tracker, and other components in the Hadoop Cluster configuration dialog box. Select **Edit** to edit an existing configuration or **New** to create a new one.

For details, see [Connecting to a Hadoop cluster with the PDI client](/pdia-data-integration/extracting-data-into-pdi/connecting-to-a-hadoop-cluster-with-the-pdi-client-article.md). | | **Folder/File** | Location and/or name of the output text file on the cluster. Select **Browse** to locate a folder or file in the [VFS browser](/pdia-data-integration/archived-merged-pages/connecting-to-virtual-file-systems-archive/vfs-browser-connecting-to-virtual-file-systems.md). | | **Create Parent Folder** | Select to create the parent folder for the output file. | | **Do not create file at start** | Select to avoid creating empty files when no rows are processed. | | **Accept file name from field?** | Select to specify the output file name in a field in the input stream.

This setting can be fine-tuned with `kettle.properties`. See [Improving performance when writing multiple files](/pdia-data-integration/data-integration-issues/improving-performance-when-writing-multiple-files.md). | | **File name field** | Field that contains the output file name at runtime. | | **Extension** | File extension. Default: `.txt`. | | **Include stepnr in filename** | Includes the copy number in the file name (for example, `_0`) when the step runs in multiple copies. | | **Include partition nr in file name?** | Includes the partition number in the file name. | | **Include date in file name** | Includes the system date in the file name (for example, `_20181231`). | | **Include time in file name** | Includes the system time in the file name (for example, `_235959`). | | **Specify Date time format** | Select to choose a custom date-time format in **Date time format**. | | **Date time format** | Date-time format to use. | | **Show file name(s)** | Displays a simulation of generated file names based on the step settings. | | **Add filenames to result** | Adds the file name to the internal result file set. |

#### Content tab

![Content tab](/files/Yoncn6tuJucgkbTe7IYM)

Use the **Content** tab to describe the content written to the output text file.

| Option        | Description                                                                  |
| ------------- | ---------------------------------------------------------------------------- |
| **Append**    | Appends lines to the end of the specified file.                              |
| **Separator** | Character that separates fields in a line. Typically semicolon (`;`) or tab. |

Select **Insert TAB** to insert a tab character. | | **Enclosure** | Optional string used to enclose fields (to allow separator characters within fields). | | **Force the enclosure around fields?** | Select to enclose all fields using the value in **Enclosure**. | | **Header** | Select if the output file includes a header row. | | **Footer** | Select if the output file includes a footer row. | | **Format** | Line ending format: DOS or UNIX. | | **Compression** | Compression type: ZIP or GZIP. Only one file is placed in a single archive. | | **Encoding** | Text encoding. Leave blank to use the default system encoding. For Unicode, specify **UTF-8** or **UTF-16**. | | **Right pad fields** | Adds spaces to the end of fields (or truncates) until the length specified in the **Fields** tab is reached. | | **Fast data dump (no formatting)** | Improves performance when dumping large amounts of data by omitting formatting. | | **Split every ... rows** | If `N` is greater than `0`, splits output into multiple parts of `N` rows. | | **Add Ending line of file** | Specifies an alternate ending line for the output file. |

#### Fields tab

Use the **Fields** tab to define properties for exported fields.

| Field         | Description                                                                                                                                            |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Name**      | Field name.                                                                                                                                            |
| **Type**      | Field type: **String**, **Date**, or **Number**.                                                                                                       |
| **Format**    | Optional mask to convert the original field format.                                                                                                    |
| **Length**    | For **Number**: total number of significant figures. For **String**: string length. For **Date**: printed output length (for example, `4` for a year). |
| **Precision** | Number of digits after the decimal point for number fields.                                                                                            |
| **Currency**  | Currency symbol (for example, `$5,000.00` or `€5.000,00`).                                                                                             |
| **Decimal**   | Decimal symbol (period `.` or comma `,`).                                                                                                              |
| **Group**     | Grouping symbol (comma `,` or period `.`).                                                                                                             |
| **Trim Type** | String trimming method. Trimming only works when no field length is specified.                                                                         |
| **Null**      | String to write when the input field value is null.                                                                                                    |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/hadoop-file-output-cp-main-page.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
