# Using the pre-installed Apache Hadoop driver

You can access and use the installed Apache Hadoop driver for HDFS copy file operations as well as for executing input and output transformations and jobs. The driver works with both secure and unsecured clusters. Because the driver pre-installed, you do not have to install a KAR file.

The supported big data steps in Pentaho include:

* [Avro Input](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/avro-input.md)
* [Avro Output](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/avro-output.md)
* [ORC Input](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/orc-input.md)
* [ORC Output](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/orc-output.md)
* [Parquet Input](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/parquet-input.md)
* [Parquet Output](/pdia-data-integration/10.2-data-integration/pdi-transformation-steps-reference-overview/parquet-output.md)

Both operating system file browsers and the Pentaho virtual file system browsers are supported, as well as basic HDFS and VFS operations. For more information, see [Connecting to Virtual File Systems](/pdia-data-integration/10.2-data-integration/data-integration-perspective-in-the-pdi-client/virtual-file-system-browser.md).

**Note:** Only Hadoop clusters that conform with standard Hadoop connection rules work with the Apache Hadoop Driver. For example, while EMR clusters may work, MapR does not work with this driver because the connection rules for MapR are not standard. The Apache Hadoop Driver is not intended to support higher level Hadoop operations such as Hive, HBase, Sqoop, and Oozie. If you require these operations, install the KAR file for the applicable vendor.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/advanced-topics-pentaho-data-integration-overview/connecting-to-a-hadoop-cluster-with-the-pdi-client-article/access-the-apache-hadoop-driver-work-with-data-connecting-to-hadoop-cluster-with-pdi-client.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
