# ORC Output

The ORC Output step serializes data from the PDI data stream into an ORC file format and then writes it to a file. [ORC](https://orc.apache.org/) is a data format for fast columnar storage.

Fields written to the ORC output file are defined by the input fields. Fields not written to the output file are either deleted or written with alternate field names or default values.

### General tab

Enter the following information in the transformation step fields:

| Option                             | Description                                                                                                                                                                                                                                                                                                                                                               |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Step name**                      | Specify the unique name of the ORC Output step on the canvas. You can customize the name or leave it as the default.                                                                                                                                                                                                                                                      |
| **Folder/File name**               | Specify the location and name of the file or folder. Click **Browse** to display the **Open File** window and navigate to the destination file or folder. For the supported file system types, see [Connecting to Virtual File Systems](/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser.md). ORC files are created in the specified location. |
| **Overwrite existing output file** | Select to overwrite an existing file that has the same file name and extension.                                                                                                                                                                                                                                                                                           |

### Fields tab

![ORC Output step](/spaces/YwnJ6Fexn4LZwKRHghPK/files/QRFCQBnPYxNHAMBnhfjw)

In the **Fields** tab, you can define fields that make up the ORC type description created by this step.

| Field             | Description                                                                                                      |
| ----------------- | ---------------------------------------------------------------------------------------------------------------- |
| **ORC path**      | Specify the name of the field as it will appear in the ORC data file or files.                                   |
| **Name**          | Specify the name of the PDI field.                                                                               |
| **ORC type**      | Define the data type of the field.                                                                               |
| **Precision**     | Specify the total number of digits in the number. Applies only to the Decimal ORC type. The default is `20`.     |
| **Scale**         | Specify the number of digits after the decimal point. Applies only to the Decimal ORC type. The default is `10`. |
| **Default value** | Specify the default value of the field if it is null or empty.                                                   |
| **Null**          | Specify whether the field can contain null values.                                                               |

{% hint style="warning" %}
To help prevent a transformation failure, enter a value in **Default value** for every field where **Null** is set to `No`.
{% endhint %}

You can define the fields manually, or you can provide a path to the PDI data stream and click **Get Fields** to populate the fields.

#### PDI type to ORC type mapping

During field retrieval, PDI converts a PDI type to an applicable ORC type.

| PDI type    | ORC type  |
| ----------- | --------- |
| InetAddress | String    |
| String      | String    |
| TimeStamp   | TimeStamp |
| Binary      | Binary    |
| BigNumber   | Decimal   |
| Boolean     | Boolean   |
| Date        | Date      |
| Integer     | Integer   |
| Number      | Double    |

### Options tab

![ORC Output step Options tab](/spaces/YwnJ6Fexn4LZwKRHghPK/files/F2qRSi4bg9l36hmqBvt4)

The options in the **Options** tab define how the ORC output file is created.

#### Compression

Specify which codec to use to compress the ORC output file:

* **None**: No compression is used. (Default)
* **Zlib**: Writes data blocks using the deflate algorithm, as specified in [RFC 1951](https://www.ietf.org/rfc/rfc1951.txt), typically implemented using the zlib library.
* **LZO**: Writes data blocks using LZO encoding. LZO works well for `CHAR` and `VARCHAR` columns that store very long character strings.
* **Snappy**: Uses Google’s [Snappy](http://google.github.io/snappy/) compression library.

{% hint style="warning" %}
Due to licensing constraints, ORC does not ship with LZO compression libraries. Install the libraries manually on each node if you want to use LZO compression.
{% endhint %}

#### Stripe size (MB)

Define the stripe size in megabytes. An ORC file has one or more stripes. Each stripe is composed of rows of data, an index of the data, and a footer containing metadata about the stripe’s contents. Large stripe sizes enable efficient reads from HDFS. The default is `64`.

For more information, see the Hive ORC documentation: <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>

#### Compress size (KB)

Define the number of kilobytes in each compression chunk. The default is `256`.

#### Inline indexes

If selected, rows are indexed when written for faster filtering and random access on read.

#### Rows between entries

Define the stride size (number of rows between index entries). The value must be `1000` or greater. The default is `10000`.

#### Include date in file name

Add the system date to the filename in `yyyyMMdd` format (for example, `20181231`).

#### Include time in file name

Add the system time to the filename in `HHmmss` format (for example, `235959`).

#### Specify date time format

Select to specify the date and time format by using the drop-down list.

### Metadata injection support

All fields of this step support metadata injection. You can use this step with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/orc-output.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
