# Troubleshooting (to be hidden)

## The "um-alpine-init" service didn't complete successfully

If you see the message `service "um-alpine-init" didn't complete successfully` when using Keycloak with Pentaho Data Catalog, there are two possible causes.

When you use the Keycloak identity and access management (IAM) tool for user authentication with Pentaho Data Catalog, the service `um-alpine-init` checks whether Keycloak is up and running. You can determine the cause of the message `service "um-alpine-init" didn't complete successfully` by checking the log files for other messages that appear with the `um-alpine-init` message. The problem could be that the *GLOBAL\_SERVER\_HOST\_NAME* variable or the security certificates need to be updated.

Use the following steps on the PDC server to determine the cause of the message and resolve the issue.

1. Enter the following command to search the logs for the message about the `um-alpine-init` service:

   ```bash
   ./pdc.sh logs um-alpine-init
   ```

   When you find the `service "um-alpine-init" didn't complete successfully` message, you need to follow different steps depending on what you see. Use one of the optional steps below.
2. (Optional) If you see the message `Max retries reached. Exiting…`, this means that Data Catalog cannot connect to Keycloak. Use the following steps on the PDC server to establish a connection to Keycloak:

   1. Edit the `conf/.env` file to update the *GLOBAL\_SERVER\_HOST\_NAME* variable with your hostname or IP address, as in the following example:

      ```bash
      GLOBAL_SERVER_HOST_NAME="myhost.pdc.eng.example.com"
      ```
   2. Enter the following command to restart PDC services:

      ```bash
      ./pdc.sh restart
      ```

   PDC should start up and run without an error message.
3. (Optional) If the error log includes the following messages, rerun the following command:

   * The container attempted to obtain a token from Keycloak and update a token lifespan, but received a 400 error response.
   * The process exited with code 1 after logging `Update TokenResp error 400`

   ```bash
   ./pdc.sh up
   ```
4. (Optional) If you see the message `Update TokenResp error 401`, use the following steps:
   1. Use the following command to look for "`um-css-admin-api`" in the logs:

      ```bash
      ./pdc.sh logs um-css-admin-api
      ```

      Sample output:

      ```
      um-css-admin-api-1  | [2024-03-28T16:56:27.558Z]
              [css-admin-api : css-admin-api] - [error]:   
              Failed to fetch public key self-signed certificate 
       um-css-admin-api-1  |  Error:
              self-signed certificate
       um-css-admin-api-1 
                |     at TLSSocket.onConnectSecure
              (node:_tls_wrap:1659:34)
       um-css-admin-api-1 
                |     at TLSSocket.emit
              (node:events:517:28)
       um-css-admin-api-1  |     at TLSSocket._finishInit
              (node:_tls_wrap:1070:8)
       um-css-admin-api-1 
                |     at ssl.onhandshakedone
              (node:_tls_wrap:856:12)
       um-css-admin-api-1 
                |     at TLSWrap.callbackTrampoline
              node:internal/async_hooks:128:17) {
       um-css-admin-api-1 
                |   code:
              'DEPTH_ZERO_SELF_SIGNED_CERT'
       um-css-admin-api-1 
              | }  
      ```

      This output includes the message `Failed to fetch public key self-signed certificate`, which points to an error with the PDC self-signed certificates.
   2. Stop PDC services by entering the following command:

      ```bash
      ./pdc.sh stop
      ```
   3. Edit the `conf/.env` file to change the *GLOBAL\_SERVER\_HOST\_NAME* variable to the fully qualified domain name (FQDN) for the host server, as in the following example:

      ```bash
      GLOBAL_SERVER_HOST_NAME="myhost.pdc.eng.example.com"
      ```
   4. Remove all certificates by entering the following command:

      ```bash
      rm -rf conf/{https,extra-certs,mongodb}
      ```
   5. Restart PDC services by entering the following command:

      ```bash
      ./pdc.sh restart
      ```

      The server generates new self-signed certificates based on the FQDN that you provided.

{% hint style="success" %}
PDC should start up and run without displaying any error messages.
{% endhint %}

## `opensearch-cluster-init` service fails to start on an existing deployment

The `opensearch-cluster-init` service may fail to start when you run `./pdc.sh up` on an existing Pentaho Data Catalog deployment. To find the possible causes:

1\. Log in to the deployment server where Pentaho Data Catalog is running.

2\. Run the following command to get the container ID of the OpenSearch service:

```
docker ps | grep opensearch
```

3\. Check the OpenSearch logs:

```
docker logs <opensearch-container-id>
```

The possible causes include a crashed process leaving a lock file, insufficient disk space, high disk I/O, excessive CPU or memory load, permissions or access errors, and more. Use the following procedures to identify and resolve the issue based on the cause reported in the OpenSearch logs.

### Case 1: Lock file prevents startup

If the OpenSearch logs show a message about being unable to acquire the lock (for example, “`failed to obtain node locks`”), follow these steps to resolve it. This happens if OpenSearch crashed previously and did not remove the lock file, preventing new processes from accessing the data folder.

Perform the following steps to fix the issue:

1. Confirm that the logs mention a lock file (`node.lock`) preventing startup.
2. Stop the services:

   ```bash
   ./pdc.sh stop
   ```
3. Remove the OpenSearch lock file:

   ```bash
   sudo rm -rf /var/lib/docker/volumes/pdc_opensearch_data/_data/nodes/0/node.lock
   ```
4. Restart the services:

   ```bash
   ./pdc.sh start
   ```
5. Verify that the OpenSearch service starts successfully:

   ```bash
   docker ps | grep opensearch
   ```

{% hint style="success" %}
The OpenSearch service starts successfully.
{% endhint %}

### Case 2: Unable to connect to OpenSearch during upgrade from 10.2.1 to 10.2.5

When upgrading Pentaho Data Catalog from version 10.2.1 to 10.2.5, you may encounter an error indicating that the system is unable to connect to OpenSearch. This typically happens when the required network port is blocked at the load balancer.

{% hint style="info" %}
This issue occurs only when upgrading from Pentaho Data Catalog 10.2.1 to 10.2.5. It does not occur in a fresh installation of Pentaho Data Catalog 10.2.5 or later because opening port 9200 is part of the installation prerequisites. See [Install Pentaho Data Catalog](/pdc-10.2-install/pdc-10.2-install/install-pentaho-data-catalog.md) for more information.
{% endhint %}

Perform the following steps to resolve the issue:

1. Log in to the deployment server where Pentaho Data Catalog is installed.
2. Verify the OpenSearch connection by checking if port 9200 is accessible from the server:

   ```
   curl -v http://<opensearch-host>:9200
   ```
3. If the connection fails or times out, check the load balancer configuration that routes traffic to the OpenSearch service.
4. Open port 9200 on the load balancer to allow traffic to reach the OpenSearch service.
5. After updating the load balancer settings, test the connection again using the curl command to confirm that port 9200 is now accessible.
6. Restart the Pentaho Data Catalog services to ensure the application connects properly to OpenSearch:

   ```
   ./pdc.sh restart
   ```
7. Verify that the upgrade is completed successfully and the system is functioning as expected.

Result

{% hint style="success" %}
After opening port 9200 on the load balancer and restarting the services, Pentaho Data Catalog connects successfully to OpenSearch.
{% endhint %}

### Case 3: Unable to connect to OpenSearch using HTTPS

When Pentaho Data Catalog is installed, the system may fail to connect to OpenSearch over HTTPS. This happens because the OpenSearch Security plugin is enabled but not yet initialized, and the `.opendistro\_security` index is missing. Without this index, OpenSearch cannot load users, roles, TLS settings, and other security configurations.

You may see an error like this in the logs:

```
[2025-06-16T16:53:35,311][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [af38e3ceb454] Failure no such index [.opendistro_security] retrieving configuration for [ACTIONGROUPS, ALLOWLIST, AUDIT, CONFIG, INTERNALUSERS, NODESDN, ROLES, ROLESMAPPING, TENANTS, WHITELIST] (index=.opendistro_security)
```

Perform the following steps to resolve the issue:

1. Log in to the deployment server where Pentaho Data Catalog is running.
2. Stop all PDC containers:

   ```
   ./pdc.sh stop
   ```
3. List all Docker volumes related to OpenSearch to confirm their presence:

   ```
   docker volume ls | grep pdc_opensearch
   ```

   \
   Typical volumes you will see:

   ```
   pdc_opensearch_data
   pdc_opensearch_snapshots
   ```

   <div data-gb-custom-block data-tag="hint" data-style="warning" class="hint hint-warning"><p>Do not delete or modify these volumes unless explicitly instructed by Customer Support. Deleting these volumes will permanently remove OpenSearch data and should only be performed under support supervision.</p></div>
4. Delete the `pdc_opensearch_data` volume:

   ```
   docker volume rm pdc_opensearch_data
   ```
5. Delete the pdc\_opensearch\_snapshots volume:

   ```
   docker volume rm pdc_opensearch_snapshots
   ```
6. Start the PDC services again:

   ```
   ./pdc.sh up
   ```
7. Verify the OpenSearch service is up and healthy by checking the cluster status:

   ```
   curl -s --user ${OPENSEARCH_USERNAME}:${OPENSEARCH_PASSWORD} --cacert ${OPENSEARCH_HTTP_TLS_CA_CERT_LOCATION} -X GET "${OPENSEARCH_URL}/_cluster/health?pretty" | grep status
   ```

Ensure that the status is green or yellow.

{% hint style="success" %}
After deleting the OpenSearch volumes and restarting the services, the .opendistro\_security index is reinitialized, and OpenSearch works correctly over HTTPS.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdc-admin/pdc-10.2-admin/troubleshooting-to-be-hidden.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
