# Understand repartitioning logic

Data distribution in the steps is shown in the following table.

![Data distributions by steps](/files/8GgXFZsZDbTKJGxnwKao)

As you can see, the CSV file input step divides the work between two step copies and each copy reads 50 rows of data. However, these 2 step copies also need to make sure that the rows end up on the correct `count by state`step copy where they arrive in a 43/57 split. Because of that, it is a general rule that the step performing the repartitioning (row redistribution) of the data (a non-partitioned step before a partitioned one) has internal buffers from every source step copy to every target step copy, as shown below.

![Work division between step copies with partitioning](/files/88WIzPesqd7AYUbugp2S)

This is where partitioning data becomes a useful concept, as it applies specific rule-based direction for aggregation, directing rows from the same state to the same step copy, so that the rows are not split arbitrarily. In the example below, a partition schema called `State` was applied to the `count by state` step and the Remainder of division partitioning rule was applied to the `State` field. Now, the `count by state` aggregation step produces consistent correct results because the rows were split up according to the partition schema and rule, as shown in the preview data.

![Partitioning data using rule-based aggregation](/files/kiwbd5C87VThZ9OCSuOV)

**Note:** To view this transformation in the PDI client, open the `Pentaho/…/design-tools/data-integration/samples/transformations/General - parallel reading and aggregation.ktr` sample file.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/advanced-topics-pentaho-data-integration-overview/partitioning-data/get-started/understand-repartitioning-logic.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
