# Parquet Input

The Parquet Input step decodes Parquet data formats and extracts fields using the schema defined in the Parquet source files. The Parquet Input and [Parquet Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/parquet-output) steps gather data from various sources and move that data into the Hadoop ecosystem in the Parquet format.

### Before you begin

Before using the Parquet Input step, you must configure a named connection for your distribution, even if your **Location** is set to `Local`. For more information about named connections, see [Connecting to a Hadoop cluster with the PDI client](https://docs.pentaho.com/pdia-data-integration/extracting-data-into-pdi/connecting-to-a-hadoop-cluster-with-the-pdi-client-article).

### General tab

The following fields are general to this transformation step:

| Field                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Step name**           | Specify the unique name of the Parquet Input step on the canvas. You can customize the name or leave it as the default.                                                                                                                                                                                                                                                                                                       |
| **Folder/File name**    | Specify the fully qualified URL of the source file or folder name for the input fields. Click **Browse** to display the **Open File** window and navigate to the file or folder. For the supported file system types, see [Connecting to Virtual File Systems](https://docs.pentaho.com/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser). The Pentaho engine reads a single Parquet file as input. |
| **Ignore empty folder** | Select to allow the transformation to proceed when the specified source file is not found in the designated location. If not selected, the specified source file is required for the transformation to proceed.                                                                                                                                                                                                               |

### Fields tab

![Parquet Input step](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-36f4ac7f91b4de654f1e709d4a0d2cebf04a9815%2FPDI_ParquetInput_Fields_PentahoEngine.png?alt=media)

The **Fields** section contains the following items:

* **Pass through fields from the previous step**, which lets you read the fields from the input file without redefining any fields.
* A table defining data about the columns to read from the Parquet file.

The table in the **Fields** section defines the fields to read as input from the Parquet file, the associated PDI field name, and the data type.

| Field      | Description                                                                                                                                                     |
| ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Path**   | Specify the name of the field as it appears in the Parquet data file and the Parquet data type.                                                                 |
| **Name**   | Specify the name of the input field.                                                                                                                            |
| **Type**   | Specify the type of the input field.                                                                                                                            |
| **Format** | Specify the [date format](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats) when **Type** is **Date**. |

Provide a path to a Parquet data file and click **Get Fields**. When the fields are retrieved, the Parquet type is converted to a PDI type.

You can preview the data in the Parquet file by clicking **Preview**. You can change the type by using the **Type** drop-down list or by entering the type manually.

#### Parquet to PDI type mapping

The Parquet to PDI data type mappings are shown in the following table:

| Parquet type         | PDI type  |
| -------------------- | --------- |
| ByteArray            | Binary    |
| Boolean              | Boolean   |
| Double               | Number    |
| Float                | Number    |
| FixedLengthByteArray | Binary    |
| Decimal              | BigNumber |
| Date                 | Date      |
| Enum                 | String    |
| Int8                 | Integer   |
| Int16                | Integer   |
| Int32                | Integer   |
| Int64                | Integer   |
| Int96                | Timestamp |
| UInt8                | Integer   |
| UInt16               | Integer   |
| UInt32               | Integer   |
| UInt64               | Integer   |
| UTF8                 | String    |
| TimeMillis           | Timestamp |
| TimestampMillis      | Timestamp |

### Metadata injection support

All fields of this step support metadata injection. You can use this step with [ETL metadata injection](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection) to pass metadata to your transformation at runtime.
