> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/pentaho-mapreduce/use-pdi-outside-and-inside-the-hadoop-cluster/pentaho-mapreduce-workflow/pdi-transformation.md).

# PDI Transformation

Start by deciding what you want to do with your data, open a PDI transformation, and drag the appropriate steps onto the canvas, configuring the steps to meet your data requirements. Drag the specifically-designed Hadoop MapReduce Input and Hadoop MapReduce Output steps onto the canvas. PDI provides these steps to completely avoid the need to write Java classes for this functionality. Configure both of these steps as needed. Once you have configured all the steps, add hops to sequence the steps as a transformation. Follow the workflow as shown in this sample transformation in order to properly communicate with Hadoop. Name this transformation Mapper.

![Big Data Key Value Pair Example](/files/oBabnIO8XFSud5XI5p6F)

Hadoop communicates in key/value pairs. PDI uses the MapReduce Input step to define how key/value pairs from Hadoop are interpreted by PDI. The MapReduce Input dialog box enables you to configure the MapReduce Input step.

![MapReduce Input Step dialog](/files/WVLXypIRSUVE42U2Jiwc)

PDI uses a MapReduce Output step to pass the output back to Hadoop. The **MapReduce Output** dialog box enables you to configure the MapReduce Output step.

![MapReduce Output Step dialog](/files/3fXdng1EOfzUI91dEnbq)

What happens in the middle is entirely up to you. Pentaho provides many sample steps you can alter to create the functionality you need.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/pentaho-mapreduce/use-pdi-outside-and-inside-the-hadoop-cluster/pentaho-mapreduce-workflow/pdi-transformation.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
