# Avro Output

The Avro Output step serializes data from the PDI stream into Avro binary or JSON format, then writes it to a file. [Apache Avro](http://avro.apache.org/docs/current/index.html) is a data serialization system that relies on a schema for decoding binary data and extracting fields.

This step creates the following files:

* A file containing output data in the Avro format
* An Avro schema file defined by the fields in this step

You can define fields manually or retrieve them from incoming steps.

### General

* **Step name**: Specify the unique name of the Avro Output step on the canvas.
* **Folder/File name**: Specify the location and name of the file or folder.

  Select **Browse** to navigate to the destination through your VFS connection. For details, see [Connecting to Virtual File Systems](https://docs.pentaho.com/pdia-data-integration/extracting-data-into-pdi/virtual-file-system-browser).
* **Overwrite existing output file**: Select to overwrite an existing file that has the same name and extension.

### Options

The Avro Output step includes the following tabs.

#### Fields tab

![Avro Output Fields tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-cfc60398332eedfa27948f4fa6e297287cdd5415%2FPDI_AvroOutput_FieldsTab_PentahoEngine.png?alt=media)

The table in the **Fields** tab defines the fields that make up the Avro schema created by this step.

| Field             | Description                                                                                                  |
| ----------------- | ------------------------------------------------------------------------------------------------------------ |
| **Avro path**     | The name of the field as it will appear in the Avro data and schema files.                                   |
| **Name**          | The name of the PDI field.                                                                                   |
| **Avro type**     | The Avro data type of the field.                                                                             |
| **Precision**     | Applies only to the **Decimal** Avro type. The total number of digits in the number. The default is `10`.    |
| **Scale**         | Applies only to the **Decimal** Avro type. The number of digits after the decimal point. The default is `0`. |
| **Default value** | The default value of the field when the field is null or empty.                                              |
| **Null**          | Specify whether the field can contain null values.                                                           |

{% hint style="warning" %}
To prevent transformation failure, set **Default value** for any field where **Null** is set to **No**.
{% endhint %}

Select **Get Fields** to populate the table from the incoming PDI stream.

During field retrieval, PDI converts PDI field types to Avro types. You can change the converted Avro type.

| PDI type        | Avro type |
| --------------- | --------- |
| **InetAddress** | String    |
| **String**      | String    |
| **TimeStamp**   | TimeStamp |
| **Binary**      | Bytes     |
| **BigNumber**   | Decimal   |
| **Boolean**     | Boolean   |
| **Date**        | Date      |
| **Integer**     | Long      |
| **Number**      | Double    |

#### Schema tab

![Avro Output Schema tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-2c567fb32e1854069c3d6e612c8fd4dcdc32da51%2FPDI_TransStep_Avro_Output_Schema_tab.png?alt=media)

Use the **Schema** tab to define how the [Avro schema](https://avro.apache.org/docs/1.8.1/spec.html#schema_record) file is created.

* **File name**: Specify the fully qualified URL where the Avro schema file is written.

  Select **Browse** to locate the schema file through your file system.

  If a schema file already exists, it is overwritten.

  If you do not specify a separate schema file, PDI writes an embedded schema in the Avro data file.
* **Namespace**: Specify the namespace used with **Record name** to define the full name of the schema (for example, `example.avro`).
* **Record name**: Specify the name of the Avro record (for example, `User`).
* **Doc value**: Specify documentation to include in the schema.

#### Options tab

![Avro Output Step Options tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-f8935f9caab0e0f8eaa2b1778ca20f912b214a2a%2FPDI_TransStep_Avro_Output_Options_tab.png?alt=media)

* **Compression**: Specify the codec used to compress data blocks in the Avro output file:

  * **None**: No compression (default).
  * **Deflate**: The step uses the deflate algorithm specified in [RFC 1951](https://www.ietf.org/rfc/rfc1951.txt) (commonly implemented with `zlib`).
  * **Snappy**: The step uses Google’s [Snappy](http://google.github.io/snappy/) compression. Each block is followed by a 4-byte, big-endian CRC32 checksum of the uncompressed data.

  For details, see [Object Container Files](https://avro.apache.org/docs/1.8.1/spec.html#Object+Container+Files).
* **Include date in filename**: Add the system date to the output file name in the `yyyyMMdd` format (for example, `20181231`).
* **Include time in filename**: Add the system time to the output file name in the `HHmmss` format (for example, `235959`).
* **Specify date time format**: Add a date/time format to the output file name from the drop-down list.
