Avro Output
The Avro Output step serializes data from the PDI stream into Avro binary or JSON format, then writes it to a file. Apache Avro is a data serialization system that relies on a schema for decoding binary data and extracting fields.
This step creates the following files:
A file containing output data in the Avro format
An Avro schema file defined by the fields in this step
You can define fields manually or retrieve them from incoming steps.
General
Step name: Specify the unique name of the Avro Output step on the canvas.
Folder/File name: Specify the location and name of the file or folder.
Select Browse to navigate to the destination through your VFS connection. For details, see Connecting to Virtual File Systems.
Overwrite existing output file: Select to overwrite an existing file that has the same name and extension.
Options
The Avro Output step includes the following tabs.
Fields tab

The table in the Fields tab defines the fields that make up the Avro schema created by this step.
Avro path
The name of the field as it will appear in the Avro data and schema files.
Name
The name of the PDI field.
Avro type
The Avro data type of the field.
Precision
Applies only to the Decimal Avro type. The total number of digits in the number. The default is 10.
Scale
Applies only to the Decimal Avro type. The number of digits after the decimal point. The default is 0.
Default value
The default value of the field when the field is null or empty.
Null
Specify whether the field can contain null values.
To prevent transformation failure, set Default value for any field where Null is set to No.
Select Get Fields to populate the table from the incoming PDI stream.
During field retrieval, PDI converts PDI field types to Avro types. You can change the converted Avro type.
InetAddress
String
String
String
TimeStamp
TimeStamp
Binary
Bytes
BigNumber
Decimal
Boolean
Boolean
Date
Date
Integer
Long
Number
Double
Schema tab

Use the Schema tab to define how the Avro schema file is created.
File name: Specify the fully qualified URL where the Avro schema file is written.
Select Browse to locate the schema file through your file system.
If a schema file already exists, it is overwritten.
If you do not specify a separate schema file, PDI writes an embedded schema in the Avro data file.
Namespace: Specify the namespace used with Record name to define the full name of the schema (for example,
example.avro).Record name: Specify the name of the Avro record (for example,
User).Doc value: Specify documentation to include in the schema.
Options tab

Compression: Specify the codec used to compress data blocks in the Avro output file:
None: No compression (default).
Deflate: The step uses the deflate algorithm specified in RFC 1951 (commonly implemented with
zlib).Snappy: The step uses Google’s Snappy compression. Each block is followed by a 4-byte, big-endian CRC32 checksum of the uncompressed data.
For details, see Object Container Files.
Include date in filename: Add the system date to the output file name in the
yyyyMMddformat (for example,20181231).Include time in filename: Add the system time to the output file name in the
HHmmssformat (for example,235959).Specify date time format: Add a date/time format to the output file name from the drop-down list.
Last updated
Was this helpful?

