Docker container deployment of Carte, Pan, and Kitchen

You can use Docker Compose to run Pentaho Data Integration (PDI) command-line tools (Carte, Kitchen, and Pan) within containers in the following supported environments:

  • On‑premises

  • Amazon Web Services (AWS)

  • Google Cloud Platform (GCP)

  • Microsoft Azure

Download PDI Docker files

To run PDI command-line tools (Carte, Kitchen, and Pan) in containers, you must first download both the .gz file that contains the PDI Docker image and the ZIP file that contains the configuration files for your environment.

Complete the following steps to download the files that you need for running PDI command-line tools in containers:

  1. On the Support Portalarrow-up-right home page, sign in using the Pentaho Support username and password provided in your Pentaho Welcome Packet.

  2. In the Pentaho card, click Download. The Downloads page opens.

  3. In the 11.x list, click Pentaho 11.0 GA Release.

  4. Scroll to the bottom of the Pentaho 11.0 GA Release page.

  5. In the file component section, navigate to the Docker Image Configurator/Images directory.

  6. Download the pdi-11.0.0.0-<build number>.tar.gz file.

  7. In the file component section, navigate to the Docker Image Configurator/Environment Config directory.

  8. Download one of the following ZIP files that contain the configuration files for your environment:

    • aws-11.0.0.0-<build number>.zip

    • azure-11.0.0.0-<build number>.zip

    • gcp-11.0.0.0-<build number>.zip

    • on-prem-11.0.0.0-<build number>.zip

Running PDI tools in containers on premises

Run Pentaho Data Integration (PDI) command-line tools (Carte, Kitchen, and Pan) in containers on an on‑premises host using Docker and Docker Compose.

Prepare to run Carte, Kitchen, and Pan in containers

Before running PDI tools in containers on an on‑premises host, you must complete the following tasks:

  • Create a Docker account.

    circle-info

    Note: For help with Docker and Docker Hub, see the online documentation at https://docs.docker.com/arrow-up-right.

  • On the host, install Docker Engine and Docker Compose.

  • Download both the ZIP file that contains the Docker image and the ZIP file that contains the configuration files for your environment. For instructions downloading Docker files, see the previous section, Download PDI Docker files.

circle-info

Note: For help with Docker and Docker Hub, see the online documentation at https://docs.docker.com/arrow-up-right.

Procedure

  1. On your host, extract the on-prem-11.0.0.0-<build number>.zip to a working directory.

  2. In the working directory, navigate to the on-prem-11.0.0.0-237/dist/on-prem/pdi subdirectory.

  3. In a text editor, open the .env file and configure variables for your environment, including the PENTAHO_IMAGE_NAME, which must match the name of image loaded on Docker (example: pdi-11.0.0.0-<build number>.tar.gz). You can get the correct image name by using the docker image -ls command.

    circle-info

    Important: You must enter the URL for your Pentaho license.

    The contents of the .env file vary slightly for each database and include comments to assist you with editing the file. The following code is an example of the .env file contents used for a PostgreSQL database.

  4. Save and close the .env file.

  5. Add PDI files to one or more of the following subdirectories:

    1. config: Contains Pentaho files, like .kettle and .pentaho files, as well as configuration files, such as .ssh files.

    2. softwareOverride: Contains configuration files that override the default settings in the PDI installation.

    3. solutionFiles: Contains project solution files, including transformations (.ktr), jobs (.kjb), and any related .kettle files.

    The host is prepared for running Carte, Kitchen, and Pan in containers.

Run Carte, Kitchen, and Pan in containers

Before you begin, verify that you have uploaded your transformation (.ktr) files, job (.kjb) files, and any related .kettle files to the on-prem-11.0.0.0-237/dist/on-prem/pdi/solutionFiles subdirectory.

  1. In the on-prem-11.0.0.0-237/dist/on-prem/pdi subdirectory, open a command prompt.

  2. Run one or more of the following PDI tools in containers:

    • To run the Carte Server in a container, complete the following substeps:

      1. Run the Carte Server using the following command:

        1. To verify that Carte is running, log into the Carte homepage at http://<host>:<port>/kettle/status or the external IP address assigned to the deployment.

        2. (Optional) To execute a transformation (.ktr) or job (.kjb) that you added to the solutionFiles directory, go to one of the following URLs in your browser:

    • To run Kitchen in a container, use the following command:

    • To run Pan in a container, use the following command:

Running PDI tools in cloud containers

Run Pentaho Data Integration (PDI) command-line tools (Carte, Kitchen, and Pan) in cloud-based containers. Use a supported cloud platform to deploy, manage, and run the tools with Kubernetes.

Prepare to run Carte, Kitchen, and Pan in containers

To prepare for running PDI command-line tools (Carte, Kitchen, and Pan) in containers, complete the following tasks:

Before you begin

Before you begin, complete the following tasks:

  • Create a Docker account.

    circle-info

    Note: For help with Docker and Docker Hub, see the online documentation at https://docs.docker.com/arrow-up-right.

  • Install kubectl.

  • Download both the ZIP file that contains the Docker image and the ZIP file that contains the configuration files for your environment. For instructions downloading Docker files, see the previous section, Download PDI Docker files.

  • Amazon Web Services tasks

    • Verify you have access to a standard Amazon EKS cluster.

    • Verify that you have an approved S3 CSI approach. (Used for configs, logs, and overrides and required if your package YAMLs reference S3 buckets for configuration or storage paths.)

    • If you plan to pull Pentaho images from Amazon Elastic Container Registry (ECR), or a mirrored ECR, confirm that the node instance role or IRSA-enabled service account has permission to pull images.

    • Set up your local kubeconfig so that kubectl can communicate with the cluster by running the following command, replacing <name> and <region>:

    • Install AWS CLI.

  • Google Cloud Platform tasks

    • Verify that you have access to a standard mode GKE cluster (Autopilot may have memory constraints).

    • Install and authenticate to gcloud CLI.

    • Verify that you have a GCS bucket for persistent storage (mounted via GCS FUSE CSI driver) or an alternative persistent storage class.

  • Microsoft Azure tasks

    • Verify you have access to a standard Azure Kubernetes Service (AKS) cluster.

    • Verify that your Azure account has the required permissions to create resource groups, databases, container registries, storage accounts, namespaces, and Kubernetes services.

    • If you plan to pull Pentaho images from Azure Container Registry (ACR), or a mirrored ACR, verify that your Azure account is assigned the AcrPull role so it can pull images from ACR.

    • Install Azure CLI.

    • Set up your local kubeconfig so that kubectl can communicate with the cluster by running the following command:

Tag and push PDI image to the cloud

To tag the PDI image and upload it to the cloud, complete the following steps:

  1. Go to your working directory that contains the pdi-11.0.0.0-<build number>.tar file and open a command prompt.

    circle-info

    Note: For instructions on downloading the pdi-11.0.0.0-<build number>.tar.gz file, see the previous section, Download PDI Docker image.

  2. (Optional) If you are not logged in, log into Docker Hub using the following command: docker login.

  3. Authenticate to your cloud provider.

    • For AWS, authenticate to ECR by running the following commands, replacing <aws region> and <aws account id>:

    • For GCP, authenticate to Google Artifact Registry (GAR) with a specific Artifact Registry host by running the following commands, replacing <region> with the value for your region:

    • For Azure, authenticate to ACR by running the following commands, replacing <registryName> with the name of your registry:

  4. Tag and push the PDI Docker image to your cloud provider.

    • For AWS, run the following command, replacing <build number>, <aws account id>, and <region> with the values from the downloaded file and your AWS account:

    • For GCP, run the following command, replacing <build number>, <aws account id>, and <region> with the values from the downloaded file and your GAR account:

    • For Azure, run the following commands, replacing <image name>, <registry name>, and <image name on acr> with the values from the downloaded file and your Azure account:

    The PDI image for Carte, Kitchen, and Pan is tagged and pushed to your cloud provider.

Configure storage in the cloud

To configure storage in the cloud, complete the following steps:

  1. On your local workstation, extract the ZIP file that contains the configuration files for your environment to a temporary working directory.

    • aws-11.0.0.0-<build number>.zip

    • gcp-11.0.0.0-<build number>.zip

    • azure-11.0.0.0-<build number>.zip

    circle-info

    Note: You downloaded the aws-11.0.0.0-<build number>.zip file in the previous section, Download PDI Docker files.

  2. In your working directory, go to the pdi directory for your environment:

    • aws-11.0.0.0-<build number>/dist/aws/pdi

    • gcp-11.0.0.0-<build number>/dist/gcp/pdi

    • azure-11.0.0.0-<build number>/dist/azure/pdi

  3. In a text editor, edit the volumes.yaml file with the values for your environment.

  4. Save and close the volumes.yaml file.

  5. Create storage for each subdirectory that appears in the pdi directory for your environment.

    • For AWS, in your EKS cluster, create an S3 bucket for each of the following subdirectories:

      • config

      • logs

      • softwareOverride

      • solutionFiles

    • For GCP, in your Google Kubernetes Engine (GKE) cluster, create a GCS bucket for

      each of the following subdirectories:

      • config

      • logs

      • softwareOverride

      • solutionFiles

    • For Azure, in your Azure Kubernetes cluster, create a file share for each of the following subdirectories:

      • config

      • logs

      • softwareOverride

      • solutionFiles

  6. Go back to the pdi directory and open a command prompt.

  7. In the command prompt, create persistent volumes and persistent volume claims by running the following command:

    Storage is configured so that you can run Pentaho command-line tools (Carte, Kitchen, and Pan) in containers.

Edit YAML files for deploying to the cloud

  1. In your working directory, go to the pdi directory for your environment:

    • aws-11.0.0.0-<build number>/dist/aws/pdi

    • gcp-11.0.0.0-<build number>/dist/gcp/pdi

    • azure-11.0.0.0-<build number>/dist/azure/pdi

  2. In a text editor, edit the carte.yaml, kitchen.yaml, and pan.yaml files for your environment.

    • For AWS, you must update the following values before deploying command-line tools in containers:

      The following code is an example of the .yaml file contents used for Carte when deploying to AWS.

    • For GCP, you must update the following values before deploying command-line tools in containers:

      The following code is an example of the .yaml file contents used for Carte when deploying to GCP.

    • For Azure, you must update the following values before deploying command-line tools in containers:

      The following code is an example of the .yaml file contents used for Carte when deploying to Azure.

  3. In the pan.yaml file, update the file location for the transformation (.ktr) file that you plan to run. The following code example has the file location for the Rounding.ktr sample file:

  4. In the kitchen.yaml file, update the file location for the job (.kjb) file that you plan to run. The following code example has the file location for the Set arguments on a transformation.kjb sample file:

  5. Save and close the carte.yaml, kitchen.yaml, and pan.yaml files for your environment.

Upload PDI files to the cloud

  1. In your working directory, go to the pdi directory for your environment:

    • aws-11.0.0.0-<build number>/dist/aws/pdi

    • gcp-11.0.0.0-<build number>/dist/gcp/pdi

    • azure-11.0.0.0-<build number>/dist/azure/pdi

  2. Add PDI files to one or more of the following subdirectories:

    • config: Contains Pentaho files, like .kettle and .pentaho files, as well as configuration files, such as .aws and .ssh files.

    • softwareOverride: Contains configuration files that override the default settings in the PDI installation.

    • solutionFiles: Contains project solution files, including transformations (.ktr), jobs (.kjb), and any related .kettle files.

  3. Upload the contents of each subdirectory to the corresponding bucket or file share in your cloud storage.

Run Carte in a cloud container

Before you begin

  • Verify that you have uploaded your transformation (.ktr) files, job (.kjb) files, and any related .kettle files to the solutionFiles bucket or file share in your cloud storage.

  • Verify that your carte.yaml file is updated with the values for your environment.

Procedure

  1. In your working directory, go to the pdi directory for your environment and open a command prompt:

    • aws-11.0.0.0-<build number>/dist/aws/pdi

    • gcp-11.0.0.0-<build number>/dist/gcp/pdi

    • azure-11.0.0.0-<build number>/dist/azure/pdi

  2. Run the following commands:

  3. To verify that Carte is running, log into the Carte homepage at http://<host>:<port>/kettle/status or the external IP address assigned to the deployment. You can obtain the URL by running the following command:

  4. (Optional) To execute a transformation (.ktr) or job (.kjb) that you added to the solutionFiles directory and uploaded to your cloud storage, go to one of the following URLs in your browser:

  5. To verify Carte deployment in a container, open a command prompt and run the following command, replacing <name of carte pod>:

Run Kitchen in a cloud container

Before you begin

  • Verify that you have uploaded your job (.kjb) files and any related .kettle files to the solutionFiles bucket or file share in your cloud storage.

  • Verify that your kitchen.yaml file is updated with the correct values for your environment, and that it specifies the correct file location for the job (.kjb) file you plan to run.

Procedure

  1. In your working directory, go to the pdi directory for your environment and open a command prompt:

    • aws-11.0.0.0-<build number>/dist/aws/pdi

    • gcp-11.0.0.0-<build number>/dist/gcp/pdi

    • azure-11.0.0.0-<build number>/dist/azure/pdi

  2. Run the following commands:

  3. To verify Kitchen deployment in a container, open a command prompt and run the following command, replacing <name of kitchen pod>:

circle-info

Notes:

  • To rerun a job with a new parameter, you must delete the existing job, recreate the job with the new parameter, and then and run it again. You can delete a job by running the following command, replacing <job name> and <namespace>:

  • If the path to the job changes, you must update it in the pan.yaml file and apply the YAML file again.

Run Pan in a cloud container

Before you begin

  • Verify that you have uploaded your transformation (.ktr) files and any related .kettle files to the solutionFiles bucket or file share in your cloud storage.

  • Verify that your pan.yaml file is updated with the correct values for your environment, and that it specifies the correct file location for the transformation (.ktr) file you plan to run.

Procedure

To run Pan in a Docker container, complete the following steps:

  1. In your working directory, go to the pdi directory for your environment and open a command prompt:

    • aws-11.0.0.0-<build number>/dist/aws/pdi

    • gcp-11.0.0.0-<build number>/dist/gcp/pdi

    • azure-11.0.0.0-<build number>/dist/azure/pdi

  2. Run the following commands:

  3. To verify Pan deployment in a container, open a command prompt and run the following command, replacing <name of pan pod>:

Last updated

Was this helpful?