Install Data Catalog
This article covers the installation of Data Catalog using a release package. Use the following Data Catalog and Pentaho Data Optimizer online installation package
You need the license code identifier provided by your sales or other Pentaho representative and a connection to the internet to complete this installation.
Important: It is a best practice before installing Data Catalog to save a copy of your conf/.env
file to retain any environment customizations you have made in case the file is overwritten during the installation process. During installation, Data Catalog checks for a PDC_DATA_ENCRYPTION_KEY
environment variable in the conf/.env
file. If the variable exists, the conf/.env
file is retained. However, if the variable does not exist, Data Catalog generates a new .env
file containing a PDC_DATA_ENCRYPTION_KEY
environment variable. If needed, you can add any custom environment variable settings back in to the new .env
file from your saved file.
Perform the following steps to install Data Catalog:
Verify that you have
root
privileges or have the necessary permissions to run Docker.Open a terminal window on your dedicated Data Catalog deployment server.
Save the Data Catalog release package in the Data Catalog server.
Now you will have two files:
The Docker image:
pdc-<version>-images.tgz
(example:pdc-10.2.5-images.tgz
)The PDC deployment bundle:
pdc-<DEPLOYMENT_PACKAGE_TYPE> -<version>-compose.tgz
The<DEPLOYMENT_PACKAGE_TYPE>
placeholder corresponds to the type of PDC service and<version>
is the PDC version you want to deploy. For example:For PDC full services, use:
pdc-full-10.2.5-compose.tgz
For PDC with Pentaho Data Optimizer services, use:
pdc-pdo-10.2.5-compose.tgz
For PDC with Pentaho Data Mastering services, use:
pdc-pdm-10.2.5-compose.tgz
Note: If you are unsure which deployment package to use, contact Pentaho Support for guidance.
Extract the files from the PDC deployment bundle to the
/opt
directory using the following command:tar -xvf [*name of release package*].tgz -C /opt
Example:
tar -xvf pdc-full-10.2.5-compose.tgz -C /opt
The command creates a
pentaho
directory and extracts the contents of the deployment into apdc-docker-deployment
subdirectory.Load the required installation images (that are saved in the Data Catalog server) into Docker:
docker load –i pdc-<version>-images.tgz cd /opt/pentaho/pdc-docker-deployment sudo ./pdc.sh
Example:
pdc-10.2.5-images.tgz
(Optional) If you get this message:
GLOBAL_SERVER_HOST_NAME env is not set
, select the number for the environment variable value that you want to set from the listing or enter the number using the keyboard and then press Enter:1. IP address 2. Hostname 3. Hostname.localhost.localdomain 4. Hostname.localhost.localdomain 5. Other #?
If you select
1
, the script sets the GLOBAL_SERVER_HOST_NAME variable to the IP address in theconf/.env
file.Note: If you select
5
, then enter the valid IP address or the fully qualified domain name.Edit the
conf/.env
file to apply the license for your product, add email domains, and perform Keycloak SMTP configuration:Note: This SMTP configuration sets up forgot password functionality only. To set up Data Catalog to email users when they are tagged in a comment, data pipes template, or other notification, see Set up email server to send Data Catalog notifications in the Administer Pentaho Data Catalog document.
sudo vi conf/.env
Add the provided license code identifier to the License Server ID variable in the Licensing Server URL, as follows:
LICENSING_SERVER_URL=https://pentaho.compliance.flexnetoperations.com/instances/*<License Server ID>*/request
Add new email allowed domains. By default, Data Catalog includes users that use
hv.com
andhitachivantara.com
emails. You can add your own domain to this list:EMAIL_DOMAINS='["hv.com", "hitachivantara.com", "abc.com"]'
Note: Do not overwrite or delete
hv.com
orhitachivantara.com
as these email domains are used to create the default users in the deployment.Add configuration for Keycloak SMTP. In the example value below, SMTP configuration is set to use
hitachivantara.com
emails, but you can change these to point to your company’s SMTP server configuration:KEYCLOAK_SMTP='{"replyToDisplayName" : "[email protected]","starttls" : "true","auth" : "true","envelopeFrom" : "[email protected]", "ssl" : "true","password" : "fwjx mpvb hcdb yofp","port" : "465","host" : "smtp.hitachivantara.com","replyTo" : "[email protected]","from" : "[email protected]","fromDisplayName" : "[email protected]","user" : "[email protected]"}'
Save the file.
The licensing, email, and SMTP settings are complete.
Note: If you want to update email or SMTP settings after installation, this needs to be done using IAM APIs.
Start all the Docker containers using the following command:
sh pdc.sh up
The installation is ready for use after all the Docker containers have successfully started.
Access Data Catalog through your browser (the Chrome browser is recommended) using the value provided in the GLOBAL_SERVER_HOST_NAME
(in step 7) and confirm that the application is successfully installed and running.
Note: For new installations, you are redirected to the PDC login page.
Data Catalog provides a set of default users for demonstrating and testing. These default users have the following specific roles assigned:
Business User
A user who needs to view business-specific glossaries and dictionaries
Data User
A user who needs to use Data Catalog to find data for a business operation
Business Steward
A user who needs to maintain business-specific glossaries and dictionaries
Data Steward
A user who needs to update and process data in Data Catalog for use for a business operation, including migrating data if you have a license for Pentaho Data Optimizer
Admin
A user who needs to configure the product
Data Developer
A user who needs to create and update business or metadata rules
Data Storage Administrator
A user who monitors and manages storage utilization across data sources, folders, and schemas.
For more information, see the User roles and permissions in Data Catalog section in Use Pentaho Data Catalog.
Refer to the installation package for credential details for the default users. This information is found in an encrypted file.
Important: For Development and Production environments, it is a best practice to create users upon installation and deprecate these default users.
After installing Data Catalog, there may be other components you need to set up, depending on your environment. For more information, see the Advanced configuration section in the Administer Pentaho Data Catalog document.
Last updated
Was this helpful?