Memory Group By
The Memory Group By step groups rows in memory from a source step. The resulting rows are grouped based on a specified field or collection of fields. A new row is generated for each group.
This step differs from the Group By step by processing all rows in memory and by handling non-sorted input.
If the number of rows you want to group is too large to fit into memory, use a combination of Sort rows and Group By.
General
Step name
Specify the unique name of the Memory Group By step on the canvas. You can customize the name or leave it as the default.
Always give back a result row
Select to return a result row even when there is no input row. If no input rows exist, this option returns a count of 0.

Group fields
Use this table to specify the fields you want to group.
Select Get fields to add all fields from the incoming stream.
Leave this table blank to calculate aggregate functions over the entire dataset.
Aggregates
Use this table to specify the aggregation method and the name of the resulting field.
Name
The name of the aggregate field.
Subject
The subject on which you want to use an aggregation method.
Type
The aggregation method. Available methods include:
Sum
Average (Mean)
Median
Percentile
Minimum
Maximum
Number of values (N)
Concatenate strings separated by , (comma)
First non-null value
Last non-null value
First value (including null)
Last value (including null)
Standard deviation
Concatenate strings separated by the character specified in the Value column
Number of distinct values
Number of rows (without field argument)
Value
The aggregate value.
Metadata injection support
This step supports metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.
Last updated
Was this helpful?

