> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/install/9.3-install/pentaho-configuration/tasks-to-be-performed-by-an-it-administrator/set-up-the-adaptive-execution-layer-ael/advanced-topics/spark-tuning-landing-page-cp/about-spark-tuning-in-pdi-cp/executing-on-the-spark-engine-about-spark-tuning.md).

# Executing on the Spark engine

The Spark engine groups data into partitions. Data is processed in each partition. The Spark engine creates executors, which process partitions of the data. In some cases, you can improve performance by either adding executors or increasing their memory size. The amount and size of executors is limited to your cluster resources.

Not all the memory set aside in the **Spark memory model** is available for data processing. The memory allotted in the Spark memory model is broken down into the following segments:

| Memory type     | Amount                                                | Note                                                                                            |
| --------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| Reserved memory | 300 MB hard coded for Spark.                          | Cannot be adjusted and is not available to any executor.                                        |
| User memory     | 25% of the memory leftover after the reserved memory. | Non-dataset storage used for the executors.                                                     |
| Spark memory    | 75% of the memory leftover after the reserved memory. | Roughly half of this memory is used for data storage, and the other half is used for execution. |

The amount of memory available for partitions comes from the data storage part of Spark memory model. The rest of the Spark memory model is available for the executor. ETL tasks usually require more data storage, while AI and machine learning tasks need more execution memory.

The Spark engine changes the state of data in the memory model through the following types of Spark transformations:

* **Narrow Spark transformation**

  A Spark task that only requires data from a single partition. The data can be input, transformed, and output all within the same partition.
* **Wide Spark transformation**

  A Spark task that requires data from multiple partitions. In a wide transformation, the data must be shuffled between partitions. The partition used to input the data is not the same as the partition used to transform and output the data.

A PDI transformation may have one or more narrow or wide transformations in Spark.

Narrow Spark transformations are more efficient than wide Spark transformations. Wide Spark transformations can lead to repartitioning, which can lead to slow data transfer speeds, transfer failures, and re-calculations. Examples of wide transformations are join, sort, and grouping operations. You can improve execution by coalescing the partitions (reducing the number of partitions) to consolidate splits without shuffling data.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pentaho.com/install/9.3-install/pentaho-configuration/tasks-to-be-performed-by-an-it-administrator/set-up-the-adaptive-execution-layer-ael/advanced-topics/spark-tuning-landing-page-cp/about-spark-tuning-in-pdi-cp/executing-on-the-spark-engine-about-spark-tuning.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.