# Manually add a cluster connection

You can manually create a cluster connection by supplying the `site.xml` files, which are typically provided by the cluster administrator.

**Note:** If you are using high availability (HA) clusters, you must manually add the connection information to create the cluster connection.

Perform the following steps to manually add a cluster connection.

1. In the PDI client, create a new job or transformation or open an existing one.
2. Click the **View** tab and then right-click the **Hadoop Clusters** folder.
3. Click **New cluster**.

   The Hadoop Cluster dialog box appears.

   ![Hadoop New Cluster dialog box](/files/eGz58wVXI9ZzCxOSi8Fb)
4. Enter the connection information from the cluster administrator in the Hadoop Cluster dialog box.

   **Note:** As a best practice, use Kettle variables for each connection parameter value to reduce risks associated with running jobs and transformations in environments that are disconnected from the repository.

| Option                     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Cluster Name**           | <p>Enter the name you want to assign to the cluster connection. <strong>Note:</strong> Valid cluster names may include uppercase and lowercase letters and numbers. In addition, the only special character allowed is a dash (<code>-</code>). To ensure a valid cluster name, do not use any other symbols, punctuation characters, or blank spaces.</p><p>After you create the connection, you can locate this named connection in the <strong>View</strong> tab on the PDI client.</p>                                                                                                                                                                 |
| **Driver** and **Version** | Select the distribution of Hadoop on the cluster and its version number.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **Site XML files**         | <p>Enter the location of the <code>site.xml</code> files provided by the cluster administrator. Click \*\*Browse to add file(s)\*\*and browse to the directory containing the <code>site.xml</code> files. Pentaho creates the applicable directory on the machine where the PDI client is located and copies the <code>site.xml</code> files to that directory.</p><p>Alternatively, if you leave this option blank, Pentaho creates the directory for the distribution and version of Hadoop you selected in the <strong>Driver</strong> and <strong>Version</strong> options. You must then copy the <code>site.xml</code> files to that directory.</p> |
| **HDFS**                   | <p>Provide the following information for the HDFS node:- Enter the <strong>Hostname</strong> for the HDFS node in the Hadoop cluster.</p><ul><li>Enter the <strong>Port</strong> for the HDFS node in the Hadoop cluster.</li></ul><p>Note that if the cluster is enabled for high availability (HA), then a port number is not needed, and you should clear the port number.</p><ul><li>Enter the <strong>Username</strong> and <strong>Password</strong> for the HDFS node, which are typically provided by the cluster administrator.</li></ul>                                                                                                         |
| **JobTracker**             | <p>If you have a separate JobTracker node, provide the following information:- Enter the <strong>Hostname</strong> for the JobTracker node in the Hadoop cluster.</p><ul><li>Enter the <strong>Port</strong> for the JobTracker node in the Hadoop cluster.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                      |
| **ZooKeeper**              | <p>If you have a Zookeeper node and want to connect a Zookeeper service, provide the following information:- Enter the <strong>Hostname</strong> for the Zookeeper node in the Hadoop cluster.</p><ul><li>Enter the <strong>Port</strong> for the Zookeeper node in the Hadoop cluster.</li></ul>                                                                                                                                                                                                                                                                                                                                                          |
| **Oozie**                  | Enter the Oozie client address in the **Hostname** field. Supply this URL only if you want to connect to the Oozie service.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| **Kafka**                  | Enter the host:port pair(s) for the initial connection to the Kafka cluster in the **Bootstrap servers** field. Use a comma-separated list for multiple servers, for example, `host1:port1,host2:port2`. Although there is no need to include all servers used for Kafka, you might want to include more than one in case a server is down.                                                                                                                                                                                                                                                                                                                |

5\. Click \*\*Next\*\* and specify the security option for the cluster.

```
-   If the Hadoop cluster is non-secure, select **None** and click **Next** to [test the connection](Test%20the%20cluster%20connection%20(Add%20Hadoop%20cluster%20connection).md).
-   If the Hadoop cluster is secure, you need to add security to the cluster connection. See [Add security to cluster connections](Secure%20cluster%20connections%20(Add%20Hadoop%20cluster%20connection).md) for instructions.
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/advanced-topics-pentaho-data-integration-overview/connecting-to-a-hadoop-cluster-with-the-pdi-client-article/adding-a-cluster-connection-connect-to-a-hadoop-cluster-with-the-pdi-client/add-a-cluster-connection-manually-add-hadoop-cluster-connection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
