Avro Output

The Avro Output step serializes data from the PDI stream into Avro binary or JSON format, then writes it to a file. Apache Avroarrow-up-right is a data serialization system that relies on a schema for decoding binary data and extracting fields.

This step creates the following files:

  • A file containing output data in the Avro format

  • An Avro schema file defined by the fields in this step

You can define fields manually or retrieve them from incoming steps.

General

  • Step name: Specify the unique name of the Avro Output step on the canvas.

  • Folder/File name: Specify the location and name of the file or folder.

    Select Browse to navigate to the destination through your VFS connection. For details, see Connecting to Virtual File Systems.

  • Overwrite existing output file: Select to overwrite an existing file that has the same name and extension.

Options

The Avro Output step includes the following tabs.

Fields tab

Avro Output Fields tab

The table in the Fields tab defines the fields that make up the Avro schema created by this step.

Field
Description

Avro path

The name of the field as it will appear in the Avro data and schema files.

Name

The name of the PDI field.

Avro type

The Avro data type of the field.

Precision

Applies only to the Decimal Avro type. The total number of digits in the number. The default is 10.

Scale

Applies only to the Decimal Avro type. The number of digits after the decimal point. The default is 0.

Default value

The default value of the field when the field is null or empty.

Null

Specify whether the field can contain null values.

circle-exclamation

Select Get Fields to populate the table from the incoming PDI stream.

During field retrieval, PDI converts PDI field types to Avro types. You can change the converted Avro type.

PDI type
Avro type

InetAddress

String

String

String

TimeStamp

TimeStamp

Binary

Bytes

BigNumber

Decimal

Boolean

Boolean

Date

Date

Integer

Long

Number

Double

Schema tab

Avro Output Schema tab

Use the Schema tab to define how the Avro schemaarrow-up-right file is created.

  • File name: Specify the fully qualified URL where the Avro schema file is written.

    Select Browse to locate the schema file through your file system.

    If a schema file already exists, it is overwritten.

    If you do not specify a separate schema file, PDI writes an embedded schema in the Avro data file.

  • Namespace: Specify the namespace used with Record name to define the full name of the schema (for example, example.avro).

  • Record name: Specify the name of the Avro record (for example, User).

  • Doc value: Specify documentation to include in the schema.

Options tab

Avro Output Step Options tab
  • Compression: Specify the codec used to compress data blocks in the Avro output file:

    • None: No compression (default).

    • Deflate: The step uses the deflate algorithm specified in RFC 1951arrow-up-right (commonly implemented with zlib).

    • Snappy: The step uses Google’s Snappyarrow-up-right compression. Each block is followed by a 4-byte, big-endian CRC32 checksum of the uncompressed data.

    For details, see Object Container Filesarrow-up-right.

  • Include date in filename: Add the system date to the output file name in the yyyyMMdd format (for example, 20181231).

  • Include time in filename: Add the system time to the output file name in the HHmmss format (for example, 235959).

  • Specify date time format: Add a date/time format to the output file name from the drop-down list.

Last updated

Was this helpful?