# Avro Input

The Avro Input step decodes binary or JSON Avro data and extracts fields from the structure it defines. [Apache Avro](http://avro.apache.org/docs/current/index.html) is a data serialization system. This step extracts the data from an Avro file to be used in the PDI stream.

### General

The following fields and button are general to this transformation step:

* **Step name**: Specify the unique name of the Avro Input step on the canvas. You can customize the name or leave it as the default.

You can use **Preview** to display the rows generated by this step.

The Avro Input step determines which rows to input based on the information you provide on the option tabs. Preview helps you decide whether the information you provided accurately models the rows you are trying to retrieve.

### Options

The Avro Input transformation step features several tabs with fields. Each tab is described below.

#### Source tab

![Avro Input step Source tab](/files/nuse855tdDU8VdMPOzCe)

Use the **Source** tab to specify the location of the source data and its related schema.

The schema that defines the Avro data is either embedded or in a different location.

Use **Format** to select from one of the following formats:

* **Avro file**: The source material is in a single location. The schema is embedded with the data.
* **JSON datum**: The source material is in different locations. The data is contained in a JSON format, and the schema is separate from the data.
* **Binary datum**: The source material is in different locations. The data is contained in a binary format, and the schema is separate from the data.
* **Avro file (use alternate schema)**: The source material is in different locations. The schema is separate from the data.

The options presented in the **Source** tab depend on whether the schema is embedded with or separate from the data.

**Embedded schema**

![Avro Input Embedded Schema dialog](/files/joLN3ChaENoxzaMciV7s)

If you select **Avro file** as your **Format**, the Avro Input step assumes the schema is embedded with your data.

The location of the source can be either a file or a PDI field.

**Source**

* **From file**: Specify the **Folder/File name** of the file or a folder containing multiple files (the fully qualified URL of the source file name). You can also select **Browse** to navigate to the source file or folder through your VFS connection. For details, see [Connecting to Virtual File Systems](/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser.md).

  A single Avro file is specified to read as input.
* **From field**: Select the **Field name** containing the location of your source. The list of available fields comes from any PDI step connected to the Avro Input step.

**Separate schema**

![Avro Input Separate Schema dialog](/files/f7VdYSQSFluTnMCobF8i)

If you select **JSON datum**, **Binary datum**, or **Avro file (use alternate schema)**, the Avro Input step assumes the schema is in a separate location from your data.

The location of the data and its schema can be either a file or a PDI field.

**Source**

* **From file**: Specify the **Folder/File name** of the file or a folder containing multiple files (the fully qualified URL of the source file name). You can also select **Browse** to navigate to the source file or folder through your VFS connection. For details, see [Connecting to Virtual File Systems](/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser.md).

  A single Avro file is specified to read as input.
* **From field**: Select the **Field name** containing the location of your source. The list of available fields comes from any PDI step connected to the Avro Input step.

**Schema**

* **From file**: Specify the **File name** of the schema file (the fully qualified URL of the schema file name). You can also select **Browse** to navigate to the schema file on your file system through your VFS connection.
* **From field**: Select the **Field name** containing the location of the schema. The list of available fields comes from any PDI step connected to the Avro Input step.

#### Avro Fields tab

![Avro Input Avro Fields Tab](/files/tBfOMHDodA1OvIeGMr5O)

The table in the **Avro Fields** tab defines the following properties for the input fields from the Avro source:

| Field property                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Avro path** (**Avro type**) | The location of the Avro source (and its format type).                                                                                                                                                                                                                                                                                                                                                                                                        |
| **Indexed values**            | <p>The index key to use in an Avro path collection. You can use this field for map or array expansion, which expands array or map values to return multiple rows of data.</p><ul><li>To return map elements, specify an index key.</li><li>To return array elements, specify the array index number, or use the asterisk wildcard (\*) to return all elements of an array.</li></ul><p>When this field is left blank, data is not returned for the field.</p> |
| **Name**                      | The name of the input field.                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| **Type**                      | The type of the input field, such as String or Date.                                                                                                                                                                                                                                                                                                                                                                                                          |
| **Format**                    | The format of the input field.                                                                                                                                                                                                                                                                                                                                                                                                                                |

The **Avro Fields** tab also contains the following options for specifying how certain fields behave in this step:

| Option                                            | Description                                                                                                                                                                                                                              |
| ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Pass through fields from previous step**        | <p>Specify how fields pass through this step:</p><ul><li>Select to pass the fields from the previous step along with the fields in the current step to the next step.</li><li>Clear to not pass these fields to the next step.</li></ul> |
| **Allow null values for missing paths or fields** | <p>Specify how missing fields should be replaced:</p><ul><li>Select to replace missing fields in the incoming data with null values.</li><li>Clear to not replace missing fields with null values.</li></ul>                             |

After you provide a path to an Avro data file or Avro schema, select **Get fields** to populate the fields.

These fields represent the Avro schema. When the schema field is retrieved, the Avro type is converted to an appropriate PDI type. You can change the PDI type.

Below is the Avro-to-PDI data type conversion table.

| Avro type | PDI type  |
| --------- | --------- |
| String    | String    |
| TimeStamp | TimeStamp |
| Bytes     | Binary    |
| Decimal   | BigNumber |
| Boolean   | Boolean   |
| Date      | Date      |
| Long      | Integer   |
| Double    | Number    |
| int       | Integer   |
| float     | Number    |

{% hint style="info" %}
The default format mask for the date type is `yyyy-MM-dd`. The default format mask for the timestamp type is `yyyy-MM-dd HH:mm:ss.SSS`.

If the data is stored in a different format and was stored as a string data type, it is not possible to retrieve the column data. In this case, null is returned for that column.
{% endhint %}

#### Lookup Fields tab

![Avro Input Lookup Fields Tab](/files/OBy9ffmeGk8T3Q5fcoOs)

You can use the **Lookup Fields** tab to create variables and map them to a specific field to use as lookups into an Avro structure at decoding time.

The table in this tab defines the following field properties:

| Field property    | Description                                                    |
| ----------------- | -------------------------------------------------------------- |
| **Name**          | The name of the incoming field                                 |
| **Variable**      | The variable you want to use as the value of an incoming field |
| **Default value** | The value to use when the incoming field value is null         |

Select **Get fields** to populate the **Name** column with names of the incoming fields.

<details>

<summary>Example transformation walkthrough: Use a lookup field</summary>

The following example transformation demonstrates how to use the **Lookup** field.

The transformation processes a CSV file and feeds its data into the Avro Input step. The Avro Input step decodes the Avro structure using a lookup field consisting of an `atm_id` variable mapped to an *atm* field.

1. Save the following code block in a text file as `atm.schema`.

   ```
   {
     "type": "map",
           "values":{
           "type": "record",
           "name":"ATM",
           "fields": [
                     {"name": "serial_no", "type": "string"},
                     {"name": "location", "type": "string"}
           ]
           }
   }
   ```
2. Save the following code block in a text file as `simpleexample.csv`:

   ```
   atm|atms
   atm1|{"atm1": {"serial_no": "zxy555", "location": "Uptown"}, "atm2": {"serial_no": "vvv242", "location": "Downtown"}, "atm4": {"serial_no": "zzz111", "location": "Central"}, "atm6": {"serial_no": "piu786", "location": "Eastside"}, "atm10": {"serial_no": "hbc999", "location": "Westside"}, "atm20": {"serial_no": "mmm456", "location": "Lunar city"}}
   atm2|{"atm1": {"serial_no": "zxy555", "location": "Uptown"}, "atm2": {"serial_no": "vvv242", "location": "Downtown"}, "atm4": {"serial_no": "zzz111", "location": "Central"}, "atm6": {"serial_no": "piu786", "location": "Eastside"}, "atm10": {"serial_no": "hbc999", "location": "Westside"}, "atm20": {"serial_no": "mmm456", "location": "Lunar city"}}
   atm4|{"atm1": {"serial_no": "zxy555", "location": "Uptown"}, "atm2": {"serial_no": "vvv242", "location": "Downtown"}, "atm4": {"serial_no": "zzz111", "location": "Central"}, "atm6": {"serial_no": "piu786", "location": "Eastside"}, "atm10": {"serial_no": "hbc999", "location": "Westside"}, "atm20": {"serial_no": "mmm456", "location": "Lunar city"}}
   ```
3. Create a transformation with a CSV File Input step and a hop from the CSV Input step to the Avro Input step.

   ![Avro Input Sample CSV Transform](/files/EDIRBoWobuIh3TaQ41iE)
4. Configure the CSV File Input step as shown below, where the file name is the path to the `simpleexample.csv` file on your system:

   ![Avro Input Sample CSV File Input](/files/XlFXMmKpyUTyMkL31kcH)

   **Note:** Make sure that the delimiter is the pipe character.
5. Configure the Avro File Input step tabs as shown below, where the schema is the path to the `atm.schema` file on your system:

   ![Avro Input Sample Source Config](/files/nuse855tdDU8VdMPOzCe)
6. Select **Get fields** to populate the **Avro fields** table. Enter the **Indexed values** field as shown below:

   ![Avro Input Sample Avro Fields Config](/files/kopsbdfXvBxHVN9pm1MZ)

   **Note:** Make sure to select the **Pass through fields from previous step** option.
7. Enter the following values in the **Lookup fields** tab:

   ![Avro Input Sample Lookup Fields Config](/files/VO5c5Hyj69OoitDku8fB)
8. Select **Preview** to view the data.

   You should see results similar to the results shown below:

   ![Avro Input Sample Preview Data](/files/1LoHcR5VDw9IkodwRaji)
9. Save your transformation.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-input.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
