# ORC Input

The ORC Input step reads field data from an [Apache ORC](https://orc.apache.org/) (Optimized Row Columnar) file into the PDI data stream.

### Before you begin

Before using the ORC Input step, you must configure a named connection for your distribution, even if you set your **Location** to `Local`. For more information, see [Connecting to a Hadoop cluster with the PDI client](/pdia-data-integration/extracting-data-into-pdi/connecting-to-a-hadoop-cluster-with-the-pdi-client-article.md).

### General tab

Enter the following information in the ORC Input step fields:

| Field                | Description                                                                                                                                                                                                                                                                                                                                                                                          |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Step name**        | Specify the unique name of the ORC Input step on the canvas. You can customize the name or leave it as the default.                                                                                                                                                                                                                                                                                  |
| **Folder/File name** | Specify the fully qualified URL of the source file or folder name for the input fields. Click **Browse** to display the **Open File** window and navigate to the file or folder. For the supported file system types, see [Connecting to Virtual File Systems](/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser.md). The Pentaho engine reads a single ORC file as input. |

### Fields tab

The **Fields** section contains the following items:

* **Pass through fields from the previous step**, which lets you read the fields from the input file without redefining any fields.
* A table defining data about the columns to read from the ORC file.

![ORC Input step](/spaces/YwnJ6Fexn4LZwKRHghPK/files/QBwyTXHiJJiOpbS7ZVYD)

The table in the **Fields** section defines the fields to read as input from the ORC file, the associated PDI field name, and the data type.

| Field                   | Description                                                                                                                                |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| **ORC path (ORC type)** | Specify the name of the field as it appears in the ORC data file and the ORC data type.                                                    |
| **Name**                | Specify the name of the input field.                                                                                                       |
| **Type**                | Specify the data type of the input field.                                                                                                  |
| **Format**              | Specify the [date format](/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats.md) when **Type** is **Date**. |

You can define the fields manually, or you can provide a path to an ORC data file and click **Get Fields** to populate the fields. When the fields are retrieved, the ORC type is converted to a PDI type.

You can preview the data in the ORC file by clicking **Preview**. You can change the PDI type by using the **Type** drop-down list or by entering the type manually.

#### ORC to PDI type mapping

The ORC to PDI data type mappings are shown in the following table:

| ORC type  | PDI type  |
| --------- | --------- |
| String    | String    |
| TimeStamp | TimeStamp |
| Binary    | Binary    |
| Decimal   | BigNumber |
| Boolean   | Boolean   |
| Date      | Date      |
| Integer   | Integer   |
| Double    | Number    |

### Metadata injection support

All fields of this step support metadata injection. You can use this step with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/orc-input.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
