Parquet options

Parquet options

In the Options tab, you can define properties for the Parquet file output.

Option

Description

Compression

Specify the codec to use to compress the Parquet Output file:

  • None: No compression is used (default).

  • Snappy: Using Google's Snappy compression library, writes the data blocks that are followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in each block.

  • GZIP: Uses a compression format that is based on the Deflate algorithm.

Version

Specify the version of Parquet you want to use:

  • Parquet 1.0

  • Parquet 2.0

Row group size (MB)

Specify the group size for the rows. The default value is 0.

Data page size (KB)

Specify the page size for the data. The default value is 0.

Dictionary encoding

Specifies the dictionary encoding, which builds a dictionary of values encountered in a column. The dictionary page is written first, before the data pages of the column. Note that if the dictionary grows larger than the Page size, whether in size or number of distinct values, then the encoding method will revert to the plain encoding type.

Page size (KB)

Specify the page size when using dictionary encoding. The default value is 1024.

Last updated

Was this helpful?