Troubleshooting Pentaho Data Catalog
The Pentaho Data Catalog log files contain information that can help you determine the root cause of error messages you might see. Refer to the following topics for information on how to resolve the issues causing the error messages.
Low disk space message
If you see a Low disk space
message from Pentaho Data Catalog while loading images into the Docker repository, you can resolve this issue by linking the Docker root directory to another directory.
Important: The other directory should have at least 100 GB of free space.
Use the following steps to resolve this issue:
Enter the following commands to link the
/var/lib/docker
directory to a directory with at least 100 GB of free space.Note: In this example, the directory with at least 100 GB of free space is <dir with min 100 GB free>. You should replace <dir with min 100 GB free> in the command with the full path to your directory with a minimum of 100 GB of free space.
sudo systemctl stop docker sudo mv /var/lib/docker <dir with min 100 GB free> sudo ln -s <dir with min 100 GB free> /var/lib/docker sudo systemctl start docker
Repeat the action that produced the
Low disk space
message.
The action should succeed without producing a Low disk space
message.
The service "um-alpine-init" didn't complete successfully message
If you see the message service "um-alpine-init" didn't complete successfully
when using Keycloak with Pentaho Data Catalog, there are two possible causes.
When you use the Keycloak identity and access management (IAM) tool for user authentication with Pentaho Data Catalog, the service um-alpine-init
checks whether Keycloak is up and running. You can determine the cause of the message service "um-alpine-init" didn't complete successfully
by checking the log files for other messages that appear with the um-alpine-init
message. The problem could be that the GLOBAL_SERVER_HOST_NAME variable or the security certificates need to be updated.
Use the following steps on the PDC server to determine the cause of the message and resolve the issue.
Enter the following command to search the logs for the message about the
um-alpine-init
service:./pdc.sh logs um-alpine-init
When you find the
service "um-alpine-init" didn't complete successfully
message, you need to follow different steps depending on what you see. Use one of the optional steps below.(Optional) If you see the message
Max retries reached. Exiting…
, this means that Data Catalog cannot connect to Keycloak. Use the following steps on the PDC server to establish a connection to Keycloak:Edit the
conf/.env
file to update the GLOBAL_SERVER_HOST_NAME variable with your hostname or IP address, as in the following example:GLOBAL_SERVER_HOST_NAME="myhost.pdc.eng.example.com"
Enter the following command to restart PDC services:
./pdc.sh restart
PDC should start up and run without an error message.
(Optional) If you see the message
Update TokenResp error 401
, use the following steps:Use the following command to look for "
um-css-admin-api
" in the logs:./pdc.sh logs um-css-admin-api
Sample output:
um-css-admin-api-1 | [2024-03-28T16:56:27.558Z] [css-admin-api : css-admin-api] - [error]: Failed to fetch public key self-signed certificate um-css-admin-api-1 | Error: self-signed certificate um-css-admin-api-1 | at TLSSocket.onConnectSecure (node:_tls_wrap:1659:34) um-css-admin-api-1 | at TLSSocket.emit (node:events:517:28) um-css-admin-api-1 | at TLSSocket._finishInit (node:_tls_wrap:1070:8) um-css-admin-api-1 | at ssl.onhandshakedone (node:_tls_wrap:856:12) um-css-admin-api-1 | at TLSWrap.callbackTrampoline node:internal/async_hooks:128:17) { um-css-admin-api-1 | code: 'DEPTH_ZERO_SELF_SIGNED_CERT' um-css-admin-api-1 | }
This output includes the message
Failed to fetch public key self-signed certificate
, which points to an error with the PDC self-signed certificates.Stop PDC services by entering the following command:
./pdc.sh stop
Edit the
conf/.env
file to change the GLOBAL_SERVER_HOST_NAME variable to the fully qualified domain name (FQDN) for the host server, as in the following example:GLOBAL_SERVER_HOST_NAME="myhost.pdc.eng.example.com"
Remove all certificates by entering the following command:
rm -rf conf/{https,extra-certs,mongodb}
Restart PDC services by entering the following command:
./pdc.sh restart
The server generates new self-signed certificates based on the FQDN that you provided.
PDC should start up and run without displaying any error messages.
opensearch-cluster-init
service fails to start on existing deployment
opensearch-cluster-init
service fails to start on existing deploymentThe opensearch-cluster-init
service may fail to start when you run ./pdc.sh up
on an existing Pentaho Data Catalog deployment. To find the possible causes:
1. Log in to the deployment server where Pentaho Data Catalog is running.
2. Run the following command to get the container ID of the OpenSearch service:
docker ps | grep opensearch
3. Check the OpenSearch logs:
docker logs <opensearch-container-id>
The possible causes include a crashed process leaving a lock file, insufficient disk space, high disk I/O, excessive CPU or memory load, permissions or access errors, and more. Use the following procedures to identify and resolve the issue based on the cause reported in the OpenSearch logs.
Case 1: Lock file prevents startup
If the OpenSearch logs show a message about being unable to acquire the lock (for example, “failed to obtain node locks
”), follow these steps to resolve it. This happens if OpenSearch crashed previously and did not remove the lock file, preventing new processes from accessing the data folder.
Perform the following steps to fix the issue:
Confirm that the logs mention a lock file (
node.lock
) preventing startup.Stop the services:
./pdc.sh stop
Remove the OpenSearch lock file:
sudo rm -rf /var/lib/docker/volumes/pdc_opensearch_data/_data/nodes/0/node.lock
Restart the services:
./pdc.sh start
Verify that the OpenSearch service starts successfully:
docker ps | grep opensearch
The OpenSearch service starts successfully.
Case 2: Unable to connect to OpenSearch during upgrade from 10.2.1 to 10.2.5
When upgrading Pentaho Data Catalog from version 10.2.1 to 10.2.5, you may encounter an error indicating that the system is unable to connect to OpenSearch. This typically happens when the required network port is blocked at the load balancer.
Note: This issue occurs only when upgrading from Pentaho Data Catalog 10.2.1 to 10.2.5, and it doesn’t occur in a fresh installation of Pentaho Data Catalog 10.2.5 or later, because opening port 9200 is part of the installation prerequisites. See Install Pentaho Data Catalog for more information.
Perform the following steps to resolve the issue:
Log in to the deployment server where Pentaho Data Catalog is installed.
Verify the OpenSearch connection by checking if port 9200 is accessible from the server:
curl -v http://<opensearch-host>:9200
If the connection fails or times out, check the load balancer configuration that routes traffic to the OpenSearch service.
Open port 9200 on the load balancer to allow traffic to reach the OpenSearch service.
After updating the load balancer settings, test the connection again using the curl command to confirm that port 9200 is now accessible.
Restart the Pentaho Data Catalog services to ensure the application connects properly to OpenSearch:
./pdc.sh restart
Verify that the upgrade is completed successfully and the system is functioning as expected.
Result
After opening port 9200 on the load balancer and restarting the services, Pentaho Data Catalog connects successfully to OpenSearch.
Case 3: Unable to connect to OpenSearch using HTTPS
When Pentaho Data Catalog is installed, the system may fail to connect to OpenSearch over HTTPS. This happens because the OpenSearch Security plugin is enabled but not yet initialized, the .opendistro_security index is missing. Without this index, OpenSearch cannot load users, roles, TLS settings, and other security configurations.
You may see an error like this in the logs:
[2025-06-16T16:53:35,311][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [af38e3ceb454] Failure no such index [.opendistro_security] retrieving configuration for [ACTIONGROUPS, ALLOWLIST, AUDIT, CONFIG, INTERNALUSERS, NODESDN, ROLES, ROLESMAPPING, TENANTS, WHITELIST] (index=.opendistro_security)
Perform the following steps to resolve the issue:
Log in to the deployment server where Pentaho Data Catalog is running.
Stop all PDC containers:
./pdc.sh stop
List all Docker volumes related to OpenSearch to confirm their presence:
docker volume ls | grep pdc_opensearch
Typical volumes you will see:
pdc_opensearch_data
pdc_opensearch_snapshots
Do not delete or modify these volumes unless explicitly instructed by Customer Support. Deleting these volumes will permanently remove OpenSearch data and should only be performed under support supervision.
Delete the pdc_opensearch_data volume:
docker volume rm pdc_opensearch_data
Delete the pdc_opensearch_snapshots volume:
docker volume rm pdc_opensearch_snapshots
Start the PDC services again:
./pdc.sh up
Verify the OpenSearch service is up and healthy by checking the cluster status:
curl -s --user ${OPENSEARCH_USERNAME}:${OPENSEARCH_PASSWORD} --cacert ${OPENSEARCH_HTTP_TLS_CA_CERT_LOCATION} -X GET "${OPENSEARCH_URL}/_cluster/health?pretty" | grep status
Ensure that the status is green or yellow.
After deleting the OpenSearch volumes and restarting the services, the .opendistro_security index is reinitialized, and OpenSearch works correctly over HTTPS.
Last updated
Was this helpful?