Options tab

The following options in the Options tab define how the ORC output file will be created.
Field
Description
Compression
Specifies which codec is used to compress the ORC output file:
None
No compression is used (default).
Zlib
Writes the data blocks using the deflate algorithm, as specified in RFC 1951, and typically implemented using the zlib library.
LZO
Writes the data blocks using LZO encoding, which works well for CHAR and VARCHAR columns that store very long character strings.
Snappy
Using Google's Snappy compression library, writes the data blocks that are followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in each block.
Stripe size (MB)
Defines the stripe size in megabytes. An ORC file has one or more stripes. Each stripe is composed of rows of data, an index of the data, and a footer containing metadata about the stripe’s contents. Large stripe sizes enable efficient reads from HDFS. The default is 64.See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC for additional information.
Compress size (KB)
Defines the number of kilobytes in each compression chunk. The default is 256.
Inline Indexes
If checked, rows are indexed when written for faster filtering and random access on read.
Rows between entries
Defines the stride size or number of rows between index entries (must be greater than or equal to 1000). The stride size is the block of data that can be skipped by the ORC reader during a read operation based on the indexes. The default is 10000.
Include date in file name
Adds the system date to the filename with format `` (20181231
for example).
Include time in file name
Adds the system time to the filename with format HHmmss
(235959
for example).
Specify date time format
Select to specify the date time format using the dropdown list.
**Important:** Due to licensing constraints, ORC does not ship with LZO compression libraries; these must be manually installed on each node if you want to use LZO compression.
Last updated
Was this helpful?