# Use Carte Clusters

Carte is a lightweight web server for running PDI transformations and jobs remotely.

It receives the transformation or job (as XML) plus the run configuration. It also exposes endpoints to monitor, start, and stop executions.

* [Carte clusters](#carte-clusters)
* [Set up servers](#set-up-servers)
* [Configure the PDI client](#configure-the-pdi-client)
* [Run transformations in a cluster](#run-transformations-in-a-cluster)
* [Schedule and run remotely](#schedule-and-run-remotely)
* [Stop Carte](#stop-carte)

### Carte clusters

Use a Carte cluster to distribute transformation processing across multiple Carte servers.

A cluster includes:

* One **master** node that tracks execution.
* Two or more **slave** nodes that do the work.

You can also run a single Carte instance as a standalone remote execution engine. Define one or more Carte servers in the PDI client (Spoon), then send jobs and transformations to them.

{% hint style="info" %}
You can cluster Pentaho Server for failover. If you use Pentaho Server as the cluster master (dynamic cluster), enable the proxy trusting filter. See [Schedule jobs to run on a remote Carte server](#schedule-jobs-to-run-on-a-remote-carte-server).
{% endhint %}

#### Cluster types

**Static cluster**

Static clusters have a fixed schema.

You define the master and slave nodes at design time.

Static clusters fit smaller, stable environments.

**Dynamic cluster**

Dynamic clusters discover slave nodes at run time.

Slave nodes are registered with the master. PDI monitors slaves every 30 seconds to see if they are available.

Dynamic clusters fit cloud-like environments where nodes come and go.

### Set up servers

#### Prerequisites

* Copy required JDBC drivers and PDI plugins from your dev system to each Carte instance.
* If you will run content from a Pentaho Repository, copy `repositories.xml` from your workstation’s `.kettle` directory to the same location on each Carte server.

#### Set up a static cluster (start slave servers)

1. Start each slave server with the host and port you want to expose:

   ```sh
   ./carte.sh 127.0.0.1 8081
   ```
2. Verify each server is reachable from your PDI client.
3. (Optional) Create an init/startup script to start Carte on boot.

{% hint style="info" %}
When Carte runs embedded in Pentaho Server, configuration is controlled by `slave-server-config.xml` under `.../pentaho-solutions/system/kettle/`. Stop Pentaho Server before editing that file.
{% endhint %}

#### Set up a dynamic cluster

Dynamic clusters use two configuration files:

* `carte-master-config.xml` for the master.
* `carte-slave-config.xml` for each slave.

You can rename the files. Keep the required XML structure and values.

**Configure a Carte master server**

1. Copy required JDBC drivers and plugins to the master host.
2. Create `carte-master-config.xml` using this template:

   ```xml
   <slave_config>
     <!-- On a master server, the slaveserver node describes this Carte instance -->
     <slaveserver>
       <name>Master</name>
       <hostname>yourhostname</hostname>
       <port>9001</port>
       <username>cluster</username>
       <password>cluster</password>
       <master>Y</master>
     </slaveserver>
   </slave_config>
   ```

   The master `<name>` must be unique in the cluster.
3. Start Carte using the master config file:

   ```sh
   ./carte.sh carte-master-config.xml
   ```
4. Verify the master is running.
5. (Optional) Create an init/startup script for boot-time startup.

**Configure Carte slave servers**

1. Ensure the master is running.
2. Copy required JDBC drivers and plugins to each slave host.
3. Create `carte-slave-config.xml` using this template:

   ```xml
   <slave_config>
     <!-- The masters node defines the load-balancing Carte instance(s) managing this slave -->
     <masters>
       <slaveserver>
         <name>Master</name>
         <hostname>yourhostname</hostname>
         <port>9000</port>
         <!-- Uncomment if you want DI Server to act as the load balancer -->
         <!-- <webAppName>pentaho</webAppName> -->
         <username>cluster</username>
         <password>cluster</password>
         <master>Y</master>
       </slaveserver>
     </masters>

     <report_to_masters>Y</report_to_masters>

     <!-- The slaveserver node describes this slave instance -->
     <slaveserver>
       <name>SlaveOne</name>
       <hostname>yourhostname</hostname>
       <port>9001</port>
       <username>cluster</username>
       <password>cluster</password>
       <master>N</master>
     </slaveserver>
   </slave_config>
   ```

   Each slave `<name>` must be unique in the cluster.
4. (Optional) To use the master’s Kettle properties on a slave, add these tags inside the slave’s `<slaveserver>`:

   ```xml
   <get_properties_from_master>Master</get_properties_from_master>
   <override_existing_properties>Y</override_existing_properties>
   ```
5. Start Carte using the slave config file:

   ```sh
   ./carte.sh carte-slave-config.xml
   ```
6. If you use Pentaho Repository content, copy `repositories.xml` to each slave’s `.kettle` directory.
7. Restart the master and slave servers. Restart Pentaho Server if it participates.

{% hint style="info" %}
Carte and PDI track object age for transformations and jobs. Objects are purged only when servers are idle. Purge verification runs every 20 seconds.
{% endhint %}

#### Configure schedule and remote execution log cleanup

These settings live in `slave-server-config.xml`.

Stop Pentaho Server before editing this file.

* `max_log_lines`: Max log lines per execution. Use `0` for no limit.
* `max_log_timeout_minutes`: Remove log lines older than this value. Use `0` for no timeout.
* `object_timeout_minutes`: Remove execution entries older than this value. Use `0` for no timeout.

Example:

```xml
<slave_config>
  <max_log_lines>0</max_log_lines>
  <max_log_timeout_minutes>0</max_log_timeout_minutes>
  <object_timeout_minutes>0</object_timeout_minutes>
</slave_config>
```

### Security and advanced server settings

#### Configure Carte servers for SSL

Carte SSL uses the JKS keystore format.

Keep the keystore in a restricted-access directory. Carte runs on Jetty.

For Jetty SSL details, see: <https://wiki.eclipse.org/Jetty/Howto/Configure_SSL>.

1. Stop Carte.
2. Open `carte-master-config.xml`.
3. Add these values inside the master server `<slaveserver>`:

   * `keyStore` (required): Path to the keystore file.
   * `keyStorePassword` (required): Keystore password.
   * `keyPassword` (optional): Private key password. Omit if it matches `keyStorePassword`.

   Example:

   ```xml
   <sslConfig>
     <keyStore>D:\KEY_STORE\Pentaho</keyStore>
     <keyStorePassword>OBF:...</keyStorePassword>
     <keyPassword>OBF:...</keyPassword>
   </sslConfig>
   ```

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>Use the <code>encr</code> tool in the <code>data-integration</code> directory to obfuscate passwords: <code>encr.bat -carte &#x3C;password></code> (Windows) or <code>encr.sh -carte &#x3C;password></code> (Linux).</p></div>
4. Add the same `<sslConfig>` block to each `carte-slave-config.xml`.
5. Start Carte.
6. Access Carte over HTTPS:

   ```
   https://<host>:<port>/
   ```

#### Configure Carte servers for JAAS

You can use JAAS for user authentication.

1. Create a JAAS config file (example below) and save it as `carte-ldap.jaas.conf` on the Carte host:

   ```conf
   Kettle {
     org.eclipse.jetty.jaas.spi.LdapLoginModule required
     debug="true"
     contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
     hostname="localhost"
     port="389"
     bindDn="cn=admin,dc=example,dc=com"
     bindPassword="admin"
     authenticationMethod="simple"
     forceBindingLogin="true"
     userBaseDn="ou=People,dc=example,dc=com"
     userRdnAttribute="uid"
     userIdAttribute="uid"
     userPasswordAttribute="userPassword"
     userObjectClass="inetOrgPerson";
   };

   Kettle2 {
     org.eclipse.jetty.jaas.spi.PropertyFileLoginModule required
     debug="true"
     file="/installs/common/carte.users";
   };
   ```

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>Set <code>debug="false"</code> in production environments.</p></div>
2. Add these Java options to `Spoon.bat` (Windows) or `spoon.sh` (Linux), updating the path:

   ```
   -Djava.security.auth.login.config=<install path>/openldap/carte-ldap.jaas.conf -Dloginmodulename=Kettle
   ```
3. Start Carte. Verify the server does not prompt for BASIC authentication.

#### Change Jetty server parameters

Carte uses an embedded Jetty server.

Only change these settings if you need to tune connection handling.

* `acceptors`: Threads dedicated to accepting connections. Keep it at or below CPU count.
* `acceptQueueSize`: Backlog size before the OS starts rejecting connections.
* `lowResourcesMaxIdleTime`: Close idle connections faster under high load.

Jetty docs:

* <http://wiki.eclipse.org/Jetty/Howto/Configure_Connectors#Configuration_Options>
* <https://wiki.eclipse.org/Jetty/Howto/High_Load>

**Set Jetty parameters in a Carte config file**

Add this block inside `<slave_config>` in `carte-slave-config.xml`:

```xml
<jetty_options>
  <acceptors>2</acceptors>
  <acceptQueueSize>2</acceptQueueSize>
  <lowResourcesMaxIdleTime>2</lowResourcesMaxIdleTime>
</jetty_options>
```

Adjust values, then save the file.

**Set Jetty parameters in `kettle.properties`**

Set these variables to numeric values:

* `KETTLE_CARTE_JETTY_ACCEPTORS`
* `KETTLE_CARTE_JETTY_ACCEPT_QUEUE_SIZE`
* `KETTLE_CARTE_JETTY_RES_MAX_IDLE_TIME`

### Configure the PDI client

#### Initialize slave servers

1. Open a transformation.
2. In **Explorer View**, select the **Slave** tab.
3. Select **New**.
4. Enter the slave server connection details:

   * Server name
   * Hostname or IP address
   * Port (leave blank for port 80)
   * Web App Name (required only for Pentaho Server)
   * User name and password
   * **Is the master**

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>For clustered executions, define one master and the rest as slaves.</p></div>
5. Select **OK**.

#### Create a cluster schema

In **Explorer View**, right-click **Kettle cluster schemas**, then select **New**.

Configure:

* **Schema name**
* **Port**: Starting port for slave step numbering.
* **Sockets buffer size**
* **Sockets flush interval rows**
* **Sockets data compressed?**
* **Dynamic cluster**: Enable if a master Carte server performs failover.
* **Slave Servers**: Add one master and any number of slaves.

### Run transformations in a cluster

* Open the **Run Options** window (toolbar **Run** context menu or `F8`).
* Select a run configuration that runs the transformation in clustered mode.
* To run a clustered transformation from a job, open the **Transformation** job entry, then set **Run this transformation in a clustered mode?** on the **Advanced** tab.
* To assign a cluster to a step, right-click the step, select **Clusters**, then pick a cluster schema.
* When running clustered transformations, enable **Show transformations** to see the generated transformations that run on the cluster.

### Schedule and run remotely

#### Schedule jobs to run on a remote Carte server

These changes are required to schedule a job to run on a remote Carte server.

They are also required if Pentaho Server acts as the load balancer in a dynamic Carte cluster.

1. Stop Pentaho Server and the remote Carte server.
2. Copy `repositories.xml` from your workstation’s `.kettle` directory to the same location on the Carte host.
3. Open `.../tomcat/webapps/pentaho/WEB-INF/web.xml`.
4. In the **Proxy Trusting Filter** section, add the Carte server IP to `TrustedIpAddrs`.
5. Uncomment the proxy trusting filter mappings between the `<!-- begin trust -->` and `<!-- end trust -->` markers.
6. Save `web.xml`.
7. Add `-Dpentaho.repository.client.attemptTrust=true` to the Carte startup script:
   * **Windows (`Carte.bat`)**: add to the `OPT` line.
   * **Linux (`Carte.sh`)**: add to the `OPT` variable before `export OPT`.
8. Start the Carte server and Pentaho Server.

#### Run transformations and jobs from a repository on the Carte server

Copy `repositories.xml` from the user’s `.kettle` directory to the Carte host’s `$HOME/.kettle` directory.

Carte also looks for `repositories.xml` in the directory where you started Carte.

### Stop Carte

You can stop Carte from the command line or from a URL.

#### Stop from the CLI

Arguments:

```
Carte <Interface address> <Port> [-s] [-p <arg>] [-u <arg>]
```

Example:

```
Carte 127.0.0.1 8080 -s -p amidala4ever -u dvader
```

Options:

* `-h, --help`: Help text.
* `-s, --stop`: Stop the running Carte server.
* `-u, --username <arg>`: Admin user name.
* `-p, --password <arg>`: Admin password.

#### Stop from a URL

```
http://localhost:8080/kettle/stopCarte
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/archived-merged-pages/loading-data-from-pdi-archive/use-carte-clusters.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
