# Manually add a cluster connection

You can manually create a cluster connection by supplying the `site.xml` files, which are typically provided by the cluster administrator.

**Note:** If you are using high availability (HA) clusters, you must manually add the connection information to create the cluster connection.

Perform the following steps to manually add a cluster connection.

1. In the PDI client, create a new job or transformation or open an existing one.

2. Click the **View** tab and then right-click the **Hadoop Clusters** folder.

3. Click **New cluster**.

   The Hadoop Cluster dialog box appears.

   <figure><img src="https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2FF6gVPAyrPs357uxPy1by%2FNewHadoopClusterDialog.png?alt=media&#x26;token=8a2304d3-3989-409e-a181-0e5754cf5f8c" alt=""><figcaption></figcaption></figure>

4. Enter the connection information from the cluster administrator in the Hadoop Cluster dialog box.

   **Note:** As a best practice, use Kettle variables for each connection parameter value to reduce risks associated with running jobs and transformations in environments that are disconnected from the repository.

| Option                                        | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Cluster Name**                              | <p>Enter the name you want to assign to the cluster connection. <strong>Note:</strong> Valid cluster names may include uppercase and lowercase letters and numbers. In addition, the only special character allowed is a dash (<code>-</code>). To ensure a valid cluster name, do not use any other symbols, punctuation characters, or blank spaces.</p><p>After you create the connection, you can locate this named connection in the <strong>View</strong> tab on the PDI client.</p>                                                                                                                                                                 |
| Current Configured **Driver** and **Version** | Read only information about distribution of Hadoop on the cluster and its version number.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| **Site XML files**                            | <p>Enter the location of the <code>site.xml</code> files provided by the cluster administrator. Click \*\*Browse to add file(s)\*\*and browse to the directory containing the <code>site.xml</code> files. Pentaho creates the applicable directory on the machine where the PDI client is located and copies the <code>site.xml</code> files to that directory.</p><p>Alternatively, if you leave this option blank, Pentaho creates the directory for the distribution and version of Hadoop you selected in the <strong>Driver</strong> and <strong>Version</strong> options. You must then copy the <code>site.xml</code> files to that directory.</p> |
| **HDFS**                                      | <p>Provide the following information for the HDFS node:- Enter the <strong>Hostname</strong> for the HDFS node in the Hadoop cluster.</p><ul><li>Enter the <strong>Port</strong> for the HDFS node in the Hadoop cluster.</li></ul><p>Note that if the cluster is enabled for high availability (HA), then a port number is not needed, and you should clear the port number.</p><ul><li>Enter the <strong>Username</strong> and <strong>Password</strong> for the HDFS node, which are typically provided by the cluster administrator.</li></ul>                                                                                                         |
| **JobTracker**                                | <p>If you have a separate JobTracker node, provide the following information:- Enter the <strong>Hostname</strong> for the JobTracker node in the Hadoop cluster.</p><ul><li>Enter the <strong>Port</strong> for the JobTracker node in the Hadoop cluster.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                      |
| **ZooKeeper**                                 | <p>If you have a Zookeeper node and want to connect a Zookeeper service, provide the following information:- Enter the <strong>Hostname</strong> for the Zookeeper node in the Hadoop cluster.</p><ul><li>Enter the <strong>Port</strong> for the Zookeeper node in the Hadoop cluster.</li></ul>                                                                                                                                                                                                                                                                                                                                                          |
| **Oozie**                                     | Enter the Oozie client address in the **Hostname** field. Supply this URL only if you want to connect to the Oozie service.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| **Kafka**                                     | Enter the host:port pair(s) for the initial connection to the Kafka cluster in the **Bootstrap servers** field. Use a comma-separated list for multiple servers, for example, `host1:port1,host2:port2`. Although there is no need to include all servers used for Kafka, you might want to include more than one in case a server is down.                                                                                                                                                                                                                                                                                                                |

5\. Click \*\*Next\*\* and specify the security option for the cluster.

```
-   If the Hadoop cluster is non-secure, select **None** and click **Next** to [test the connection](Test%20the%20cluster%20connection%20(Add%20Hadoop%20cluster%20connection).md).
-   If the Hadoop cluster is secure, you need to add security to the cluster connection. See [Add security to cluster connections](Secure%20cluster%20connections%20(Add%20Hadoop%20cluster%20connection).md) for instructions.
```
