Use Carte Clusters

Carte is a lightweight web server for running PDI transformations and jobs remotely.

It receives the transformation or job (as XML) plus the run configuration. It also exposes endpoints to monitor, start, and stop executions.

Carte clusters

Use a Carte cluster to distribute transformation processing across multiple Carte servers.

A cluster includes:

  • One master node that tracks execution.

  • Two or more slave nodes that do the work.

You can also run a single Carte instance as a standalone remote execution engine. Define one or more Carte servers in the PDI client (Spoon), then send jobs and transformations to them.

circle-info

You can cluster Pentaho Server for failover. If you use Pentaho Server as the cluster master (dynamic cluster), enable the proxy trusting filter. See Schedule jobs to run on a remote Carte server.

Cluster types

Static cluster

Static clusters have a fixed schema.

You define the master and slave nodes at design time.

Static clusters fit smaller, stable environments.

Dynamic cluster

Dynamic clusters discover slave nodes at run time.

Slave nodes are registered with the master. PDI monitors slaves every 30 seconds to see if they are available.

Dynamic clusters fit cloud-like environments where nodes come and go.

Set up servers

Prerequisites

  • Copy required JDBC drivers and PDI plugins from your dev system to each Carte instance.

  • If you will run content from a Pentaho Repository, copy repositories.xml from your workstation’s .kettle directory to the same location on each Carte server.

Set up a static cluster (start slave servers)

  1. Start each slave server with the host and port you want to expose:

  2. Verify each server is reachable from your PDI client.

  3. (Optional) Create an init/startup script to start Carte on boot.

circle-info

When Carte runs embedded in Pentaho Server, configuration is controlled by slave-server-config.xml under .../pentaho-solutions/system/kettle/. Stop Pentaho Server before editing that file.

Set up a dynamic cluster

Dynamic clusters use two configuration files:

  • carte-master-config.xml for the master.

  • carte-slave-config.xml for each slave.

You can rename the files. Keep the required XML structure and values.

Configure a Carte master server

  1. Copy required JDBC drivers and plugins to the master host.

  2. Create carte-master-config.xml using this template:

    The master <name> must be unique in the cluster.

  3. Start Carte using the master config file:

  4. Verify the master is running.

  5. (Optional) Create an init/startup script for boot-time startup.

Configure Carte slave servers

  1. Ensure the master is running.

  2. Copy required JDBC drivers and plugins to each slave host.

  3. Create carte-slave-config.xml using this template:

    Each slave <name> must be unique in the cluster.

  4. (Optional) To use the master’s Kettle properties on a slave, add these tags inside the slave’s <slaveserver>:

  5. Start Carte using the slave config file:

  6. If you use Pentaho Repository content, copy repositories.xml to each slave’s .kettle directory.

  7. Restart the master and slave servers. Restart Pentaho Server if it participates.

circle-info

Carte and PDI track object age for transformations and jobs. Objects are purged only when servers are idle. Purge verification runs every 20 seconds.

Configure schedule and remote execution log cleanup

These settings live in slave-server-config.xml.

Stop Pentaho Server before editing this file.

  • max_log_lines: Max log lines per execution. Use 0 for no limit.

  • max_log_timeout_minutes: Remove log lines older than this value. Use 0 for no timeout.

  • object_timeout_minutes: Remove execution entries older than this value. Use 0 for no timeout.

Example:

Security and advanced server settings

Configure Carte servers for SSL

Carte SSL uses the JKS keystore format.

Keep the keystore in a restricted-access directory. Carte runs on Jetty.

For Jetty SSL details, see: https://wiki.eclipse.org/Jetty/Howto/Configure_SSLarrow-up-right.

  1. Stop Carte.

  2. Open carte-master-config.xml.

  3. Add these values inside the master server <slaveserver>:

    • keyStore (required): Path to the keystore file.

    • keyStorePassword (required): Keystore password.

    • keyPassword (optional): Private key password. Omit if it matches keyStorePassword.

    Example:

    circle-info

    Use the encr tool in the data-integration directory to obfuscate passwords: encr.bat -carte <password> (Windows) or encr.sh -carte <password> (Linux).

  4. Add the same <sslConfig> block to each carte-slave-config.xml.

  5. Start Carte.

  6. Access Carte over HTTPS:

Configure Carte servers for JAAS

You can use JAAS for user authentication.

  1. Create a JAAS config file (example below) and save it as carte-ldap.jaas.conf on the Carte host:

    circle-info

    Set debug="false" in production environments.

  2. Add these Java options to Spoon.bat (Windows) or spoon.sh (Linux), updating the path:

  3. Start Carte. Verify the server does not prompt for BASIC authentication.

Change Jetty server parameters

Carte uses an embedded Jetty server.

Only change these settings if you need to tune connection handling.

  • acceptors: Threads dedicated to accepting connections. Keep it at or below CPU count.

  • acceptQueueSize: Backlog size before the OS starts rejecting connections.

  • lowResourcesMaxIdleTime: Close idle connections faster under high load.

Jetty docs:

Set Jetty parameters in a Carte config file

Add this block inside <slave_config> in carte-slave-config.xml:

Adjust values, then save the file.

Set Jetty parameters in kettle.properties

Set these variables to numeric values:

  • KETTLE_CARTE_JETTY_ACCEPTORS

  • KETTLE_CARTE_JETTY_ACCEPT_QUEUE_SIZE

  • KETTLE_CARTE_JETTY_RES_MAX_IDLE_TIME

Configure the PDI client

Initialize slave servers

  1. Open a transformation.

  2. In Explorer View, select the Slave tab.

  3. Select New.

  4. Enter the slave server connection details:

    • Server name

    • Hostname or IP address

    • Port (leave blank for port 80)

    • Web App Name (required only for Pentaho Server)

    • User name and password

    • Is the master

    circle-info

    For clustered executions, define one master and the rest as slaves.

  5. Select OK.

Create a cluster schema

In Explorer View, right-click Kettle cluster schemas, then select New.

Configure:

  • Schema name

  • Port: Starting port for slave step numbering.

  • Sockets buffer size

  • Sockets flush interval rows

  • Sockets data compressed?

  • Dynamic cluster: Enable if a master Carte server performs failover.

  • Slave Servers: Add one master and any number of slaves.

Run transformations in a cluster

  • Open the Run Options window (toolbar Run context menu or F8).

  • Select a run configuration that runs the transformation in clustered mode.

  • To run a clustered transformation from a job, open the Transformation job entry, then set Run this transformation in a clustered mode? on the Advanced tab.

  • To assign a cluster to a step, right-click the step, select Clusters, then pick a cluster schema.

  • When running clustered transformations, enable Show transformations to see the generated transformations that run on the cluster.

Schedule and run remotely

Schedule jobs to run on a remote Carte server

These changes are required to schedule a job to run on a remote Carte server.

They are also required if Pentaho Server acts as the load balancer in a dynamic Carte cluster.

  1. Stop Pentaho Server and the remote Carte server.

  2. Copy repositories.xml from your workstation’s .kettle directory to the same location on the Carte host.

  3. Open .../tomcat/webapps/pentaho/WEB-INF/web.xml.

  4. In the Proxy Trusting Filter section, add the Carte server IP to TrustedIpAddrs.

  5. Uncomment the proxy trusting filter mappings between the <!-- begin trust --> and <!-- end trust --> markers.

  6. Save web.xml.

  7. Add -Dpentaho.repository.client.attemptTrust=true to the Carte startup script:

    • Windows (Carte.bat): add to the OPT line.

    • Linux (Carte.sh): add to the OPT variable before export OPT.

  8. Start the Carte server and Pentaho Server.

Run transformations and jobs from a repository on the Carte server

Copy repositories.xml from the user’s .kettle directory to the Carte host’s $HOME/.kettle directory.

Carte also looks for repositories.xml in the directory where you started Carte.

Stop Carte

You can stop Carte from the command line or from a URL.

Stop from the CLI

Arguments:

Example:

Options:

  • -h, --help: Help text.

  • -s, --stop: Stop the running Carte server.

  • -u, --username <arg>: Admin user name.

  • -p, --password <arg>: Admin password.

Stop from a URL

Last updated

Was this helpful?