Install Data Catalog

This procedure describes how to install Pentaho Data Catalog using the online release package. Use the Data Catalog and Pentaho Data Optimizer installation bundle provided through the Support Portal.

Perform the following steps to install Data Catalog:

Prerequisites

Before you begin, ensure that:

  • You have the license code identifier provided by your Pentaho representative.

  • The server has outbound internet connectivity to complete license activation.

  • Docker and Docker Compose are installed and configured.

  • System requirements are met.

circle-info

Important

Before starting the installation, back up the existing conf/.env file if you are upgrading or reinstalling to retain any environment customizations you have made in case the file is overwritten during the installation process. During installation, Data Catalog checks for the PDC_DATA_ENCRYPTION_KEY variable in the conf/.env file.

  • If the variable exists, the current conf/.env file is retained.

  • If the variable does not exist, a new .env file is generated that includes a new PDC_DATA_ENCRYPTION_KEY. If needed, you can add any custom environment variable settings back into the new .env file from your saved file.

Procedure

  1. Verify that you have root privileges or have the necessary permissions to run Docker.

  2. Open a terminal window on your dedicated Data Catalog deployment server.

  3. Save the Data Catalog release package in the Data Catalog server.

    Now you will have two files:

    • The Docker image: pdc-<version>-images.tgz (example: pdc-10.2.5-images.tgz)

    • The PDC deployment bundle: pdc-<DEPLOYMENT_PACKAGE_TYPE> -<version>-compose.tgz The <DEPLOYMENT_PACKAGE_TYPE> placeholder corresponds to the type of PDC service and <version> is the PDC version you want to deploy. For example:

    • For PDC full services, use: pdc-full-10.2.5-compose.tgz

    • For PDC with Pentaho Data Optimizer services, use: pdc-pdo-10.2.5-compose.tgz

    • For PDC with Pentaho Data Mastering services, use: pdc-pdm-10.2.5-compose.tgz Note: If you are unsure which deployment package to use, contact Pentaho Supportarrow-up-right for guidance.

  4. Extract the files from the PDC deployment bundle to the /opt directory using the following command:

    tar -xvf [*name of release package*].tgz -C /opt

    Example: tar -xvf pdc-full-10.2.5-compose.tgz -C /opt

    The command creates a pentaho directory and extracts the contents of the deployment into a pdc-docker-deployment subdirectory.

  5. Load the required installation images (that are saved in the Data Catalog server) into Docker:

    docker load –i pdc-<version>-images.tgz
    cd /opt/pentaho/pdc-docker-deployment
    sudo ./pdc.sh

    Example: pdc-10.2.5-images.tgz

  6. (Optional) If you get this message: GLOBAL_SERVER_HOST_NAME env is not set, select the number for the environment variable value that you want to set from the listing or enter the number using the keyboard and then press Enter:

    1.	IP address
    2.	Hostname
    3.	Hostname.localhost.localdomain
    4.	Hostname.localhost.localdomain
    5.	Other 
    #?

    If you select 1, the script sets the GLOBAL_SERVER_HOST_NAME variable to the IP address in the conf/.env file.

    Note: If you select 5, then enter the valid IP address or the fully qualified domain name.

  7. Edit the conf/.env file to apply the license for your product, add email domains, and perform Keycloak SMTP configuration:

    Note: This SMTP configuration sets up forgot password functionality only. To set up Data Catalog to email users when they are tagged in a comment, data pipes template, or other notification, see Set up email server to send Data Catalog notifications in the Administer Pentaho Data Catalog guide.

    sudo vi conf/.env

    1. Add the provided license code identifier to the License Server ID variable in the Licensing Server URL, as follows:LICENSING_SERVER_URL=https://pentaho.compliance.flexnetoperations.com/instances/*&lt;License Server ID&gt;*/request

    2. Add new email allowed domains. By default, Data Catalog includes users that use hv.com and hitachivantara.com emails. You can add your own domain to this list:

      EMAIL_DOMAINS='["hv.com", "hitachivantara.com", "abc.com"]'

      Note: Do not overwrite or delete hv.com or hitachivantara.com as these email domains are used to create the default users in the deployment.

    3. Add configuration for Keycloak SMTP. In the example value below, SMTP configuration is set to use hitachivantara.com emails, but you can change these to point to your company’s SMTP server configuration:

      KEYCLOAK_SMTP='{"replyToDisplayName" : "[email protected]","starttls" : "true","auth" : "true","envelopeFrom" : "[email protected]", "ssl" : "true","password" : "fwjx mpvb hcdb yofp","port" : "465","host" : "smtp.hitachivantara.com","replyTo" : "[email protected]","from" : "[email protected]","fromDisplayName" : "[email protected]","user" : "[email protected]"}'
    4. Save the file.

    The licensing, email, and SMTP settings are complete.

    Note: - If you want to update email or SMTP settings after installation, this needs to be done using IAM APIs. - During installation, you can also modify the COMPOSE_PROFILES or other variables in the .env file to enable or disable specific services (for example, Physical Assets). For details, see Disable the Physical Assets feature from Data Catalog deployment in the Administer Pentaho Data Catalog guide.

  8. Start all the Docker containers using the following command:

    sh pdc.sh up

Result

The installation is ready for use after all the Docker containers have successfully started.

Next steps

Access Data Catalog through your browser (the Chrome browser is recommended) using the value provided in the GLOBAL_SERVER_HOST_NAME (in step 7) and confirm that the application is successfully installed and running.

circle-info

For new installations, you are redirected to the PDC login page.

Data Catalog provides a set of default users for demonstration and testing purposes. These default users have the following specific roles assigned:

Role
Actions

Business User

A user who needs to view business-specific glossaries and dictionaries

Data User

A user who needs to use Data Catalog to find data for a business operation

Business Steward

A user who needs to maintain business-specific glossaries and dictionaries

Data Steward

A user who needs to update and process data in Data Catalog for use for a business operation, including migrating data if you have a license for Pentaho Data Optimizer

Admin

A user who needs to configure the product

Data Developer

A user who needs to create and update business or metadata rules

Data Storage Administrator

A user who monitors and manages storage utilization across data sources, folders, and schemas.

For more information, see User roles and permissions in Data Catalog in the Use Pentaho Data Catalog guide.

Refer to the installation package for credential details for the default users. This information is found in an encrypted file.

circle-info

Important: In Development and Production environments, it is a best practice to create users during installation and deprecate the default users.

After installing Data Catalog, you may need to set up additional components, depending on your environment. For more information, see Advanced configuration in the Administer Pentaho Data Catalog guide.

Last updated

Was this helpful?