> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/pentaho-mapreduce.md).

# Pentaho MapReduce

This job entry executes transformations as part of a Hadoop MapReduce job in place of a traditional Hadoop Java class. A Hadoop MapReduce job is made up of any combination of following types of transformations:

* The Mapper transformation takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). It performs filtering and sorting (such as sorting students by first name into queues, one queue for each name). It applies a given function to each element of a list, returning a list of results in the same order.
* The Combiner transformation summarizes the map output records with the same key, which helps to reduce the amount of data written to disk, and transmitted over the network.
* The Reducer transformation performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). It analyzes a recursive data structure and through use of a given combining operation, recombine the results of recursively processing its constituent parts, building up a return value.

**Note:** This entry was formerly known as Hadoop Transformation Job Executor.

With the Pentaho MapReduce entry, you specify PDI transformations to use for the mapper, combiner, and/or reducer through their related tabs. The mapper transformation is required. The combiner and reducer transformations are optional. See [Pentaho MapReduce workflow](/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/pentaho-mapreduce/use-pdi-outside-and-inside-the-hadoop-cluster/pentaho-mapreduce-workflow.md) for details on how PDI works with Hadoop clusters.

**Note:** The **Hadoop job name** field in the **Cluster** tab is required and must be specified for the Pentaho MapReduce entry to work.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/pentaho-mapreduce.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.