# Avro Output

The Avro Output step serializes data from the PDI stream into Avro binary or JSON format, then writes it to a file. [Apache Avro](http://avro.apache.org/docs/current/index.html) is a data serialization system that relies on a schema for decoding binary data and extracting fields.

This step creates the following files:

* A file containing output data in the Avro format
* An Avro schema file defined by the fields in this step

You can define fields manually or retrieve them from incoming steps.

### General

* **Step name**: Specify the unique name of the Avro Output step on the canvas.
* **Folder/File name**: Specify the location and name of the file or folder.

  Select **Browse** to navigate to the destination through your VFS connection. For details, see [Connecting to Virtual File Systems](/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser.md).
* **Overwrite existing output file**: Select to overwrite an existing file that has the same name and extension.

### Options

The Avro Output step includes the following tabs.

#### Fields tab

![Avro Output Fields tab](/files/OAsTXkUEWNWSuQT6BPfj)

The table in the **Fields** tab defines the fields that make up the Avro schema created by this step.

| Field             | Description                                                                                                  |
| ----------------- | ------------------------------------------------------------------------------------------------------------ |
| **Avro path**     | The name of the field as it will appear in the Avro data and schema files.                                   |
| **Name**          | The name of the PDI field.                                                                                   |
| **Avro type**     | The Avro data type of the field.                                                                             |
| **Precision**     | Applies only to the **Decimal** Avro type. The total number of digits in the number. The default is `10`.    |
| **Scale**         | Applies only to the **Decimal** Avro type. The number of digits after the decimal point. The default is `0`. |
| **Default value** | The default value of the field when the field is null or empty.                                              |
| **Null**          | Specify whether the field can contain null values.                                                           |

{% hint style="warning" %}
To prevent transformation failure, set **Default value** for any field where **Null** is set to **No**.
{% endhint %}

Select **Get Fields** to populate the table from the incoming PDI stream.

During field retrieval, PDI converts PDI field types to Avro types. You can change the converted Avro type.

| PDI type        | Avro type |
| --------------- | --------- |
| **InetAddress** | String    |
| **String**      | String    |
| **TimeStamp**   | TimeStamp |
| **Binary**      | Bytes     |
| **BigNumber**   | Decimal   |
| **Boolean**     | Boolean   |
| **Date**        | Date      |
| **Integer**     | Long      |
| **Number**      | Double    |

#### Schema tab

![Avro Output Schema tab](/files/306EB6xeQtTIzULuCWkL)

Use the **Schema** tab to define how the [Avro schema](https://avro.apache.org/docs/1.8.1/spec.html#schema_record) file is created.

* **File name**: Specify the fully qualified URL where the Avro schema file is written.

  Select **Browse** to locate the schema file through your file system.

  If a schema file already exists, it is overwritten.

  If you do not specify a separate schema file, PDI writes an embedded schema in the Avro data file.
* **Namespace**: Specify the namespace used with **Record name** to define the full name of the schema (for example, `example.avro`).
* **Record name**: Specify the name of the Avro record (for example, `User`).
* **Doc value**: Specify documentation to include in the schema.

#### Options tab

![Avro Output Step Options tab](/files/NGpLwe2zqN4NelSL01Kh)

* **Compression**: Specify the codec used to compress data blocks in the Avro output file:

  * **None**: No compression (default).
  * **Deflate**: The step uses the deflate algorithm specified in [RFC 1951](https://www.ietf.org/rfc/rfc1951.txt) (commonly implemented with `zlib`).
  * **Snappy**: The step uses Google’s [Snappy](http://google.github.io/snappy/) compression. Each block is followed by a 4-byte, big-endian CRC32 checksum of the uncompressed data.

  For details, see [Object Container Files](https://avro.apache.org/docs/1.8.1/spec.html#Object+Container+Files).
* **Include date in filename**: Add the system date to the output file name in the `yyyyMMdd` format (for example, `20181231`).
* **Include time in filename**: Add the system time to the output file name in the `HHmmss` format (for example, `235959`).
* **Specify date time format**: Add a date/time format to the output file name from the drop-down list.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-output.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
