> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/install/10.2-install/use-hadoop-with-pentaho/big-data-issues/use-yarn-with-s3.md).

# Use YARN with S3

When using the [Start a PDI cluster on YARN](https://pentaho-public.atlassian.net/wiki/spaces/EAI/pages/388312923/Start+a+PDI+Cluster+on+YARN) and [Stop a PDI cluster on YARN job](https://pentaho-public.atlassian.net/wiki/spaces/EAI/pages/388312925/Stop+a+PDI+Cluster+on+YARN) entries to run a transformation that attempts to read data from an Amazon S3 bucket, the transformation fails. The transformation fails because the Pentaho metastore is not accessible to PDI on the cluster. To resolve this problem, verify that the Pentaho metastore is accessible to PDI on the cluster.

Perform the following steps to make the Pentaho metastore accessible to PDI:

1. Navigate to the `<user>/.pentaho/metastore` directory on the machine with the PDI client.
2. On the cluster where the Yarn server is located, create a new directory in the `design-tools/data-integration/plugins/pentaho-big-data-plugin` directory, then copy the metastore directory into this location. This directory is the *\<NEW\_META\_FOLDER\_LOCATION>* variable.
3. Navigate to the `design-tools/data-integration` directory and open the `carte.sh` file with any text editor.
4. Add the following code in the line before the `export OPT` line: `OPT="$OPT -DPENTAHO_METASTORE_FOLDER=<NEW_META_FOLDER_LOCATION>"`, then save and close the file.
5. Create a zip file containing the contents of the `data-integration` directory.
6. In your Start a PDI cluster on YARN job entry, go to the **Files** tab of the Properties window, then locate the **PDI Client Archive** field. Enter the filepath for the zip file.

This task resolves S3 access issues for the following tranformation steps:

* Avro Input
* Avro Output
* Orc Input
* Orc Output
* Parquet Input
* Parquet Output
* Text File Input
* Text File Output


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pentaho.com/install/10.2-install/use-hadoop-with-pentaho/big-data-issues/use-yarn-with-s3.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.