Install Data Catalog

This article covers the installation of Data Catalog using a release package. Use the following Data Catalog and Pentaho Data Optimizer online installation package

You need the license code identifier provided by your sales or other Pentaho representative and a connection to the internet to complete this installation.

Important: It is a best practice before installing Data Catalog to save a copy of your conf/.env file to retain any environment customizations you have made in case the file is overwritten during the installation process. During installation, Data Catalog checks for a PDC_DATA_ENCRYPTION_KEY environment variable in the conf/.env file. If the variable exists, the conf/.env file is retained. However, if the variable does not exist, Data Catalog generates a new .env file containing a PDC_DATA_ENCRYPTION_KEY environment variable. If needed, you can add any custom environment variable settings back in to the new .env file from your saved file.

Perform the following steps to install Data Catalog:

  1. Verify that you have root privileges or have the necessary permissions to run Docker.

  2. Open a terminal window on your dedicated Data Catalog deployment server.

  3. Save the Data Catalog release package in the Data Catalog server.

    Now you will have two files:

    • The Docker image: pdc-<version>-images.tgz (example: pdc-10.2.5-images.tgz)

    • The PDC deployment bundle: pdc-<DEPLOYMENT_PACKAGE_TYPE> -<version>-compose.tgz The <DEPLOYMENT_PACKAGE_TYPE> placeholder corresponds to the type of PDC service and <version> is the PDC version you want to deploy. For example:

    • For PDC full services, use: pdc-full-10.2.5-compose.tgz

    • For PDC with Pentaho Data Optimizer services, use: pdc-pdo-10.2.5-compose.tgz

    • For PDC with Pentaho Data Mastering services, use: pdc-pdm-10.2.5-compose.tgz Note: If you are unsure which deployment package to use, contact Pentaho Support for guidance.

  4. Extract the files from the PDC deployment bundle to the /opt directory using the following command:

    tar -xvf [*name of release package*].tgz -C /opt

    Example: tar -xvf pdc-full-10.2.5-compose.tgz -C /opt

    The command creates a pentaho directory and extracts the contents of the deployment into a pdc-docker-deployment subdirectory.

  5. Load the required installation images (that are saved in the Data Catalog server) into Docker:

    docker load –i pdc-<version>-images.tgz
    cd /opt/pentaho/pdc-docker-deployment
    sudo ./pdc.sh

    Example: pdc-10.2.5-images.tgz

  6. (Optional) If you get this message: GLOBAL_SERVER_HOST_NAME env is not set, select the number for the environment variable value that you want to set from the listing or enter the number using the keyboard and then press Enter:

    1.	IP address
    2.	Hostname
    3.	Hostname.localhost.localdomain
    4.	Hostname.localhost.localdomain
    5.	Other 
    #?

    If you select 1, the script sets the GLOBAL_SERVER_HOST_NAME variable to the IP address in the conf/.env file.

    Note: If you select 5, then enter the valid IP address or the fully qualified domain name.

  7. Edit the conf/.env file to apply the license for your product, add email domains, and perform Keycloak SMTP configuration:

    Note: This SMTP configuration sets up forgot password functionality only. To set up Data Catalog to email users when they are tagged in a comment, data pipes template, or other notification, see Set up email server to send Data Catalog notifications in the Administer Pentaho Data Catalog document.

    sudo vi conf/.env

    1. Add the provided license code identifier to the License Server ID variable in the Licensing Server URL, as follows:LICENSING_SERVER_URL=https://pentaho.compliance.flexnetoperations.com/instances/*&lt;License Server ID&gt;*/request

    2. Add new email allowed domains. By default, Data Catalog includes users that use hv.com and hitachivantara.com emails. You can add your own domain to this list:

      EMAIL_DOMAINS='["hv.com", "hitachivantara.com", "abc.com"]'

      Note: Do not overwrite or delete hv.com or hitachivantara.com as these email domains are used to create the default users in the deployment.

    3. Add configuration for Keycloak SMTP. In the example value below, SMTP configuration is set to use hitachivantara.com emails, but you can change these to point to your company’s SMTP server configuration:

      KEYCLOAK_SMTP='{"replyToDisplayName" : "[email protected]","starttls" : "true","auth" : "true","envelopeFrom" : "[email protected]", "ssl" : "true","password" : "fwjx mpvb hcdb yofp","port" : "465","host" : "smtp.hitachivantara.com","replyTo" : "[email protected]","from" : "[email protected]","fromDisplayName" : "[email protected]","user" : "[email protected]"}'
    4. Save the file.

    The licensing, email, and SMTP settings are complete.

    Note: If you want to update email or SMTP settings after installation, this needs to be done using IAM APIs.

  8. Start all the Docker containers using the following command:

    sh pdc.sh up

The installation is ready for use after all the Docker containers have successfully started.

Access Data Catalog through your browser (the Chrome browser is recommended) using the value provided in the GLOBAL_SERVER_HOST_NAME (in step 7) and confirm that the application is successfully installed and running.

Note: For new installations, you are redirected to the PDC login page.

Data Catalog provides a set of default users for demonstrating and testing. These default users have the following specific roles assigned:

Role
Actions

Business User

A user who needs to view business-specific glossaries and dictionaries

Data User

A user who needs to use Data Catalog to find data for a business operation

Business Steward

A user who needs to maintain business-specific glossaries and dictionaries

Data Steward

A user who needs to update and process data in Data Catalog for use for a business operation, including migrating data if you have a license for Pentaho Data Optimizer

Admin

A user who needs to configure the product

Data Developer

A user who needs to create and update business or metadata rules

Data Storage Administrator

A user who monitors and manages storage utilization across data sources, folders, and schemas.

For more information, see the User roles and permissions in Data Catalog section in Use Pentaho Data Catalog.

Refer to the installation package for credential details for the default users. This information is found in an encrypted file.

Important: For Development and Production environments, it is a best practice to create users upon installation and deprecate these default users.

After installing Data Catalog, there may be other components you need to set up, depending on your environment. For more information, see the Advanced configuration section in the Administer Pentaho Data Catalog document.

Last updated

Was this helpful?