> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/advanced-topics-pentaho-data-integration-overview/connecting-to-a-hadoop-cluster-with-the-pdi-client-article/access-the-apache-hadoop-driver-work-with-data-connecting-to-hadoop-cluster-with-pdi-client.md).

# Using the pre-installed Apache Hadoop driver

You can access and use the installed Apache Hadoop driver for HDFS copy file operations as well as for executing input and output transformations and jobs. The driver works with both secure and unsecured clusters. Because the driver pre-installed, you do not have to install a KAR file.

The supported big data steps in Pentaho include:

* [Avro Input](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/avro-input.md)
* [Avro Output](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/avro-output.md)
* [ORC Input](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/orc-input.md)
* [ORC Output](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/orc-output.md)
* [Parquet Input](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/parquet-input.md)
* [Parquet Output](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/parquet-output.md)

Both operating system file browsers and the Pentaho virtual file system browsers are supported, as well as basic HDFS and VFS operations. For more information, see [Connecting to Virtual File Systems](/pdia-data-integration/10.2-data-integration/data-integration-perspective-in-the-pdi-client/virtual-file-system-browser.md).

**Note:** Only Hadoop clusters that conform with standard Hadoop connection rules work with the Apache Hadoop Driver. For example, while EMR clusters may work, MapR does not work with this driver because the connection rules for MapR are not standard. The Apache Hadoop Driver is not intended to support higher level Hadoop operations such as Hive, HBase, Sqoop, and Oozie. If you require these operations, install the KAR file for the applicable vendor.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/advanced-topics-pentaho-data-integration-overview/connecting-to-a-hadoop-cluster-with-the-pdi-client-article/access-the-apache-hadoop-driver-work-with-data-connecting-to-hadoop-cluster-with-pdi-client.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.