Troubleshooting Pentaho Data Catalog

The Pentaho Data Catalog log files contain information that can help you determine the root cause of error messages you might see. Refer to the following topics for information on how to resolve the issues causing the error messages.

Low disk space message

If you see a Low disk space message from Pentaho Data Catalog while loading images into the Docker repository, you can resolve this issue by linking the Docker root directory to another directory.

Important: The other directory should have at least 100 GB of free space.

Use the following steps to resolve this issue:

  1. Enter the following commands to link the /var/lib/docker directory to a directory with at least 100 GB of free space.

    Note: In this example, the directory with at least 100 GB of free space is <dir with min 100 GB free>. You should replace <dir with min 100 GB free> in the command with the full path to your directory with a minimum of 100 GB of free space.

    sudo systemctl stop docker
    sudo mv /var/lib/docker <dir with min 100 GB free>
    sudo ln -s <dir with min 100 GB free> /var/lib/docker
    sudo systemctl start docker
  2. Repeat the action that produced the Low disk space message.

The action should succeed without producing a Low disk space message.

The service "um-alpine-init" didn't complete successfully message

If you see the message service "um-alpine-init" didn't complete successfully when using Keycloak with Pentaho Data Catalog, there are two possible causes.

When you use the Keycloak identity and access management (IAM) tool for user authentication with Pentaho Data Catalog, the service um-alpine-init checks whether Keycloak is up and running. You can determine the cause of the message service "um-alpine-init" didn't complete successfully by checking the log files for other messages that appear with the um-alpine-init message. The problem could be that the GLOBAL_SERVER_HOST_NAME variable or the security certificates need to be updated.

Use the following steps on the PDC server to determine the cause of the message and resolve the issue.

  1. Enter the following command to search the logs for the message about the um-alpine-init service:

    ./pdc.sh logs um-alpine-init

    When you find the service "um-alpine-init" didn't complete successfully message, you need to follow different steps depending on what you see. Use one of the optional steps below.

  2. (Optional) If you see the message Max retries reached. Exiting…, this means that Data Catalog cannot connect to Keycloak. Use the following steps on the PDC server to establish a connection to Keycloak:

    1. Edit the conf/.env file to update the GLOBAL_SERVER_HOST_NAME variable with your hostname or IP address, as in the following example:

      GLOBAL_SERVER_HOST_NAME="myhost.pdc.eng.example.com"

    2. Enter the following command to restart PDC services:

      ./pdc.sh restart

    PDC should start up and run without an error message.

  3. (Optional) If you see the message Update TokenResp error 401, use the following steps:

    1. Use the following command to look for "um-css-admin-api" in the logs:

      ./pdc.sh logs um-css-admin-api

      Sample output:

      um-css-admin-api-1  | [2024-03-28T16:56:27.558Z]
              [css-admin-api : css-admin-api] - [error]:   
              Failed to fetch public key self-signed certificate 
       um-css-admin-api-1  |  Error:
              self-signed certificate
       um-css-admin-api-1 
                |     at TLSSocket.onConnectSecure
              (node:_tls_wrap:1659:34)
       um-css-admin-api-1 
                |     at TLSSocket.emit
              (node:events:517:28)
       um-css-admin-api-1  |     at TLSSocket._finishInit
              (node:_tls_wrap:1070:8)
       um-css-admin-api-1 
                |     at ssl.onhandshakedone
              (node:_tls_wrap:856:12)
       um-css-admin-api-1 
                |     at TLSWrap.callbackTrampoline
              node:internal/async_hooks:128:17) {
       um-css-admin-api-1 
                |   code:
              'DEPTH_ZERO_SELF_SIGNED_CERT'
       um-css-admin-api-1 
              | }  

      This output includes the message Failed to fetch public key self-signed certificate, which points to an error with the PDC self-signed certificates.

    2. Stop PDC services by entering the following command:

      ./pdc.sh stop

    3. Edit the conf/.env file to change the GLOBAL_SERVER_HOST_NAME variable to the fully qualified domain name (FQDN) for the host server, as in the following example:

      GLOBAL_SERVER_HOST_NAME="myhost.pdc.eng.example.com"

    4. Remove all certificates by entering the following command:

      rm -rf conf/{https,extra-certs,mongodb}

    5. Restart PDC services by entering the following command:

      ./pdc.sh restart

      The server generates new self-signed certificates based on the FQDN that you provided.

opensearch-cluster-init service fails to start on existing deployment

The opensearch-cluster-init service may fail to start when you run ./pdc.sh up on an existing Pentaho Data Catalog deployment. To find the possible causes:

1. Log in to the deployment server where Pentaho Data Catalog is running.

2. Run the following command to get the container ID of the OpenSearch service:

docker ps | grep opensearch

3. Check the OpenSearch logs:

docker logs <opensearch-container-id>

The possible causes include a crashed process leaving a lock file, insufficient disk space, high disk I/O, excessive CPU or memory load, permissions or access errors, and more. Use the following procedures to identify and resolve the issue based on the cause reported in the OpenSearch logs.

Case 1: Lock file prevents startup

If the OpenSearch logs show a message about being unable to acquire the lock (for example, “failed to obtain node locks”), follow these steps to resolve it. This happens if OpenSearch crashed previously and did not remove the lock file, preventing new processes from accessing the data folder.

Perform the following steps to fix the issue:

  1. Confirm that the logs mention a lock file (node.lock) preventing startup.

  2. Stop the services:

    ./pdc.sh stop
  3. Remove the OpenSearch lock file:

    sudo rm -rf /var/lib/docker/volumes/pdc_opensearch_data/_data/nodes/0/node.lock
  4. Restart the services:

    ./pdc.sh start
  5. Verify that the OpenSearch service starts successfully:

    docker ps | grep opensearch

Case 2: Unable to connect to OpenSearch during upgrade from 10.2.1 to 10.2.5

When upgrading Pentaho Data Catalog from version 10.2.1 to 10.2.5, you may encounter an error indicating that the system is unable to connect to OpenSearch. This typically happens when the required network port is blocked at the load balancer.

Note: This issue occurs only when upgrading from Pentaho Data Catalog 10.2.1 to 10.2.5, and it doesn’t occur in a fresh installation of Pentaho Data Catalog 10.2.5 or later, because opening port 9200 is part of the installation prerequisites. See Install Pentaho Data Catalog for more information.

Perform the following steps to resolve the issue:

  1. Log in to the deployment server where Pentaho Data Catalog is installed.

  2. Verify the OpenSearch connection by checking if port 9200 is accessible from the server:

curl -v http://<opensearch-host>:9200
  1. If the connection fails or times out, check the load balancer configuration that routes traffic to the OpenSearch service.

  2. Open port 9200 on the load balancer to allow traffic to reach the OpenSearch service.

  3. After updating the load balancer settings, test the connection again using the curl command to confirm that port 9200 is now accessible.

  4. Restart the Pentaho Data Catalog services to ensure the application connects properly to OpenSearch:

./pdc.sh restart
  1. Verify that the upgrade is completed successfully and the system is functioning as expected.

Result

Case 3: Unable to connect to OpenSearch using HTTPS

When Pentaho Data Catalog is installed, the system may fail to connect to OpenSearch over HTTPS. This happens because the OpenSearch Security plugin is enabled but not yet initialized, the .opendistro_security index is missing. Without this index, OpenSearch cannot load users, roles, TLS settings, and other security configurations.

You may see an error like this in the logs:

[2025-06-16T16:53:35,311][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [af38e3ceb454] Failure no such index [.opendistro_security] retrieving configuration for [ACTIONGROUPS, ALLOWLIST, AUDIT, CONFIG, INTERNALUSERS, NODESDN, ROLES, ROLESMAPPING, TENANTS, WHITELIST] (index=.opendistro_security)

Perform the following steps to resolve the issue:

  1. Log in to the deployment server where Pentaho Data Catalog is running.

  2. Stop all PDC containers:

./pdc.sh stop
  1. List all Docker volumes related to OpenSearch to confirm their presence:

docker volume ls | grep pdc_opensearch

Typical volumes you will see:

pdc_opensearch_data
pdc_opensearch_snapshots
  1. Delete the pdc_opensearch_data volume:

docker volume rm pdc_opensearch_data
  1. Delete the pdc_opensearch_snapshots volume:

docker volume rm pdc_opensearch_snapshots
  1. Start the PDC services again:

./pdc.sh up
  1. Verify the OpenSearch service is up and healthy by checking the cluster status:

curl -s --user ${OPENSEARCH_USERNAME}:${OPENSEARCH_PASSWORD} --cacert ${OPENSEARCH_HTTP_TLS_CA_CERT_LOCATION} -X GET "${OPENSEARCH_URL}/_cluster/health?pretty" | grep status

Ensure that the status is green or yellow.

Last updated

Was this helpful?