Docker container deployment of Carte, Pan, and Kitchen
You can use Docker Compose to run Pentaho Data Integration (PDI) command-line tools (Carte, Kitchen, and Pan) within containers in the following supported environments:
On‑premises
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
Microsoft Azure
Download PDI Docker files
To run PDI command-line tools (Carte, Kitchen, and Pan) in containers, you must first download both the .gz file that contains the PDI Docker image and the ZIP file that contains the configuration files for your environment.
Complete the following steps to download the files that you need for running PDI command-line tools in containers:
On the Support Portal home page, sign in using the Pentaho Support username and password provided in your Pentaho Welcome Packet.
In the Pentaho card, click Download. The Downloads page opens.
In the 11.x list, click Pentaho 11.0 GA Release.
Scroll to the bottom of the Pentaho 11.0 GA Release page.
In the file component section, navigate to the
Docker Image Configurator/Imagesdirectory.Download the
pdi-11.0.0.0-<build number>.tar.gzfile.In the file component section, navigate to the
Docker Image Configurator/Environment Configdirectory.Download one of the following ZIP files that contain the configuration files for your environment:
aws-11.0.0.0-<build number>.zipazure-11.0.0.0-<build number>.zipgcp-11.0.0.0-<build number>.zipon-prem-11.0.0.0-<build number>.zip
Running PDI tools in containers on premises
Run Pentaho Data Integration (PDI) command-line tools (Carte, Kitchen, and Pan) in containers on an on‑premises host using Docker and Docker Compose.
Prepare to run Carte, Kitchen, and Pan in containers
Before running PDI tools in containers on an on‑premises host, you must complete the following tasks:
Create a Docker account.
Note: For help with Docker and Docker Hub, see the online documentation at https://docs.docker.com/.
On the host, install Docker Engine and Docker Compose.
Download both the ZIP file that contains the Docker image and the ZIP file that contains the configuration files for your environment. For instructions downloading Docker files, see the previous section, Download PDI Docker files.
Note: For help with Docker and Docker Hub, see the online documentation at https://docs.docker.com/.
Procedure
On your host, extract the
on-prem-11.0.0.0-<build number>.zipto a working directory.In the working directory, navigate to the
on-prem-11.0.0.0-237/dist/on-prem/pdisubdirectory.In a text editor, open the
.envfile and configure variables for your environment, including thePENTAHO_IMAGE_NAME, which must match the name of image loaded on Docker (example:pdi-11.0.0.0-<build number>.tar.gz). You can get the correct image name by using thedocker image -lscommand.Important: You must enter the URL for your Pentaho license.
The contents of the
.envfile vary slightly for each database and include comments to assist you with editing the file. The following code is an example of the.envfile contents used for a PostgreSQL database.Save and close the
.envfile.Add PDI files to one or more of the following subdirectories:
config: Contains Pentaho files, like.kettleand.pentahofiles, as well as configuration files, such as.sshfiles.softwareOverride: Contains configuration files that override the default settings in the PDI installation.solutionFiles: Contains project solution files, including transformations (.ktr), jobs (.kjb), and any related.kettlefiles.
The host is prepared for running Carte, Kitchen, and Pan in containers.
Run Carte, Kitchen, and Pan in containers
Before you begin, verify that you have uploaded your transformation (.ktr) files, job (.kjb) files, and any related .kettle files to the on-prem-11.0.0.0-237/dist/on-prem/pdi/solutionFiles subdirectory.
In the
on-prem-11.0.0.0-237/dist/on-prem/pdisubdirectory, open a command prompt.Run one or more of the following PDI tools in containers:
To run the Carte Server in a container, complete the following substeps:
Run the Carte Server using the following command:
To verify that Carte is running, log into the Carte homepage at
http://<host>:<port>/kettle/statusor the external IP address assigned to the deployment.(Optional) To execute a transformation (
.ktr) or job (.kjb) that you added to thesolutionFilesdirectory, go to one of the following URLs in your browser:
To run Kitchen in a container, use the following command:
To run Pan in a container, use the following command:
Running PDI tools in cloud containers
Run Pentaho Data Integration (PDI) command-line tools (Carte, Kitchen, and Pan) in cloud-based containers. Use a supported cloud platform to deploy, manage, and run the tools with Kubernetes.
Prepare to run Carte, Kitchen, and Pan in containers
To prepare for running PDI command-line tools (Carte, Kitchen, and Pan) in containers, complete the following tasks:
Before you begin
Before you begin, complete the following tasks:
Create a Docker account.
Note: For help with Docker and Docker Hub, see the online documentation at https://docs.docker.com/.
Install
kubectl.Download both the ZIP file that contains the Docker image and the ZIP file that contains the configuration files for your environment. For instructions downloading Docker files, see the previous section, Download PDI Docker files.
Amazon Web Services tasks
Verify you have access to a standard Amazon EKS cluster.
Verify that you have an approved S3 CSI approach. (Used for configs, logs, and overrides and required if your package YAMLs reference S3 buckets for configuration or storage paths.)
If you plan to pull Pentaho images from Amazon Elastic Container Registry (ECR), or a mirrored ECR, confirm that the node instance role or IRSA-enabled service account has permission to pull images.
Set up your local
kubeconfigso thatkubectlcan communicate with the cluster by running the following command, replacing<name>and<region>:Install AWS CLI.
Google Cloud Platform tasks
Verify that you have access to a standard mode GKE cluster (Autopilot may have memory constraints).
Install and authenticate to
gcloudCLI.Verify that you have a GCS bucket for persistent storage (mounted via GCS FUSE CSI driver) or an alternative persistent storage class.
Microsoft Azure tasks
Verify you have access to a standard Azure Kubernetes Service (AKS) cluster.
Verify that your Azure account has the required permissions to create resource groups, databases, container registries, storage accounts, namespaces, and Kubernetes services.
If you plan to pull Pentaho images from Azure Container Registry (ACR), or a mirrored ACR, verify that your Azure account is assigned the AcrPull role so it can pull images from ACR.
Install Azure CLI.
Set up your local
kubeconfigso thatkubectlcan communicate with the cluster by running the following command:
Tag and push PDI image to the cloud
To tag the PDI image and upload it to the cloud, complete the following steps:
Go to your working directory that contains the
pdi-11.0.0.0-<build number>.tarfile and open a command prompt.Note: For instructions on downloading the
pdi-11.0.0.0-<build number>.tar.gzfile, see the previous section, Download PDI Docker image.(Optional) If you are not logged in, log into Docker Hub using the following command:
docker login.Authenticate to your cloud provider.
For AWS, authenticate to ECR by running the following commands, replacing
<aws region>and<aws account id>:For GCP, authenticate to Google Artifact Registry (GAR) with a specific Artifact Registry host by running the following commands, replacing
<region>with the value for your region:For Azure, authenticate to ACR by running the following commands, replacing
<registryName>with the name of your registry:
Tag and push the PDI Docker image to your cloud provider.
For AWS, run the following command, replacing
<build number>,<aws account id>, and<region>with the values from the downloaded file and your AWS account:For GCP, run the following command, replacing
<build number>,<aws account id>, and<region>with the values from the downloaded file and your GAR account:For Azure, run the following commands, replacing
<image name>,<registry name>, and<image name on acr>with the values from the downloaded file and your Azure account:
The PDI image for Carte, Kitchen, and Pan is tagged and pushed to your cloud provider.
Configure storage in the cloud
To configure storage in the cloud, complete the following steps:
On your local workstation, extract the ZIP file that contains the configuration files for your environment to a temporary working directory.
aws-11.0.0.0-<build number>.zipgcp-11.0.0.0-<build number>.zipazure-11.0.0.0-<build number>.zip
Note: You downloaded the
aws-11.0.0.0-<build number>.zipfile in the previous section, Download PDI Docker files.In your working directory, go to the
pdidirectory for your environment:aws-11.0.0.0-<build number>/dist/aws/pdigcp-11.0.0.0-<build number>/dist/gcp/pdiazure-11.0.0.0-<build number>/dist/azure/pdi
In a text editor, edit the
volumes.yamlfile with the values for your environment.Save and close the
volumes.yamlfile.Create storage for each subdirectory that appears in the
pdidirectory for your environment.For AWS, in your EKS cluster, create an S3 bucket for each of the following subdirectories:
configlogssoftwareOverridesolutionFiles
For GCP, in your Google Kubernetes Engine (GKE) cluster, create a GCS bucket for
each of the following subdirectories:
configlogssoftwareOverridesolutionFiles
For Azure, in your Azure Kubernetes cluster, create a file share for each of the following subdirectories:
configlogssoftwareOverridesolutionFiles
Go back to the
pdidirectory and open a command prompt.In the command prompt, create persistent volumes and persistent volume claims by running the following command:
Storage is configured so that you can run Pentaho command-line tools (Carte, Kitchen, and Pan) in containers.
Edit YAML files for deploying to the cloud
In your working directory, go to the
pdidirectory for your environment:aws-11.0.0.0-<build number>/dist/aws/pdigcp-11.0.0.0-<build number>/dist/gcp/pdiazure-11.0.0.0-<build number>/dist/azure/pdi
In a text editor, edit the
carte.yaml,kitchen.yaml, andpan.yamlfiles for your environment.For AWS, you must update the following values before deploying command-line tools in containers:
The following code is an example of the
.yamlfile contents used for Carte when deploying to AWS.For GCP, you must update the following values before deploying command-line tools in containers:
The following code is an example of the
.yamlfile contents used for Carte when deploying to GCP.For Azure, you must update the following values before deploying command-line tools in containers:
The following code is an example of the
.yamlfile contents used for Carte when deploying to Azure.
In the
pan.yamlfile, update the file location for the transformation (.ktr) file that you plan to run. The following code example has the file location for theRounding.ktrsample file:In the
kitchen.yamlfile, update the file location for the job (.kjb) file that you plan to run. The following code example has the file location for theSet arguments on a transformation.kjbsample file:Save and close the
carte.yaml,kitchen.yaml, andpan.yamlfiles for your environment.
Upload PDI files to the cloud
In your working directory, go to the
pdidirectory for your environment:aws-11.0.0.0-<build number>/dist/aws/pdigcp-11.0.0.0-<build number>/dist/gcp/pdiazure-11.0.0.0-<build number>/dist/azure/pdi
Add PDI files to one or more of the following subdirectories:
config: Contains Pentaho files, like.kettleand.pentahofiles, as well as configuration files, such as.awsand.sshfiles.softwareOverride: Contains configuration files that override the default settings in the PDI installation.solutionFiles: Contains project solution files, including transformations (.ktr), jobs (.kjb), and any related.kettlefiles.
Upload the contents of each subdirectory to the corresponding bucket or file share in your cloud storage.
Run Carte in a cloud container
Before you begin
Verify that you have uploaded your transformation (
.ktr) files, job (.kjb) files, and any related.kettlefiles to thesolutionFilesbucket or file share in your cloud storage.Verify that your
carte.yamlfile is updated with the values for your environment.
Procedure
In your working directory, go to the
pdidirectory for your environment and open a command prompt:aws-11.0.0.0-<build number>/dist/aws/pdigcp-11.0.0.0-<build number>/dist/gcp/pdiazure-11.0.0.0-<build number>/dist/azure/pdi
Run the following commands:
To verify that Carte is running, log into the Carte homepage at
http://<host>:<port>/kettle/statusor the external IP address assigned to the deployment. You can obtain the URL by running the following command:(Optional) To execute a transformation (
.ktr) or job (.kjb) that you added to thesolutionFilesdirectory and uploaded to your cloud storage, go to one of the following URLs in your browser:To verify Carte deployment in a container, open a command prompt and run the following command, replacing
<name of carte pod>:
Run Kitchen in a cloud container
Before you begin
Verify that you have uploaded your job (
.kjb) files and any related.kettlefiles to thesolutionFilesbucket or file share in your cloud storage.Verify that your
kitchen.yamlfile is updated with the correct values for your environment, and that it specifies the correct file location for the job (.kjb) file you plan to run.
Procedure
In your working directory, go to the
pdidirectory for your environment and open a command prompt:aws-11.0.0.0-<build number>/dist/aws/pdigcp-11.0.0.0-<build number>/dist/gcp/pdiazure-11.0.0.0-<build number>/dist/azure/pdi
Run the following commands:
To verify Kitchen deployment in a container, open a command prompt and run the following command, replacing
<name of kitchen pod>:
Notes:
To rerun a job with a new parameter, you must delete the existing job, recreate the job with the new parameter, and then and run it again. You can delete a job by running the following command, replacing
<job name>and<namespace>:If the path to the job changes, you must update it in the
pan.yamlfile and apply the YAML file again.
Run Pan in a cloud container
Before you begin
Verify that you have uploaded your transformation (
.ktr) files and any related.kettlefiles to thesolutionFilesbucket or file share in your cloud storage.Verify that your
pan.yamlfile is updated with the correct values for your environment, and that it specifies the correct file location for the transformation (.ktr) file you plan to run.
Procedure
To run Pan in a Docker container, complete the following steps:
In your working directory, go to the
pdidirectory for your environment and open a command prompt:aws-11.0.0.0-<build number>/dist/aws/pdigcp-11.0.0.0-<build number>/dist/gcp/pdiazure-11.0.0.0-<build number>/dist/azure/pdi
Run the following commands:
To verify Pan deployment in a container, open a command prompt and run the following command, replacing
<name of pan pod>:
Last updated
Was this helpful?

