Sort rows

This step sorts rows based on the fields you specify and on whether they should be sorted in ascending or descending order.

If you use multiple copies of this step in parallel, merge the sorted blocks to ensure the correct sort sequence. You can further ensure the correct sequence by adding a Sorted Mergearrow-up-right step immediately after the last Sort rows step.

You can create this type of transformation locally (using Change number of copies to start) or in a clustered environment.

Step name

  • Step name: Specify the unique name of the step on the canvas. You can customize the name or leave it as the default.

Options

The Sort rows step includes the following options.

Option
Description

Sort directory

Directory where temporary files can be stored, if needed. If blank, the step uses the system temporary directory.

TMP-file prefix

Prefix for temporary files.

Sort size (rows in memory)

Number of rows to sort in memory. Default is 1000000. A larger number can improve speed, but might increase memory use.

Free memory threshold (in %)

If free memory drops below this threshold, the sort algorithm begins paging data to disk. This value is re-verified every 1000 rows.

Compress TMP Files

Select to compress temporary files.

Only pass unique rows? (verifies keys only)

Select to pass only unique rows to the output stream.

Fields table

Fields and sort direction. You can also specify case sensitivity.

Get Fields

Populates the fields table from the incoming stream.

Field settings

Use the following settings to refine the sort behavior for individual fields.

Setting
Description

Field names

Name of the field on the stream.

Ascending

Y for ascending order, N for descending order.

Case sensitive compare?

Y to sort by case usage, N to ignore case.

Sort based on current locale?

Y to sort based on the system locale, N to sort based on standard UTF-8 ordering.

Collator strength

If you selected Y for Sort based on current locale?, specify an integer from 0 to 3:

  • 0 (Primary): base letter differences (for example, a vs b).

  • 1 (Secondary): accent differences (for example, a vs ä).

  • 2 (Tertiary): case differences (for example, A vs a).

  • 3 (Identical): strings must match exactly (for example, \u0001 vs \u0002).

Presorted?

Select Y if the field is already sorted. Presorting can improve efficiency.

Metadata injection support

This step supports metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.

Last updated

Was this helpful?