Install Data Catalog
This article covers the installation of Data Catalog using a release package. Use the following Data Catalog and Pentaho Data Optimizer online installation package
You need the license code identifier provided by your sales or other Pentaho representative and a connection to the internet to complete this installation.
Perform the following steps to install Data Catalog:
Verify that you have
rootprivileges or have the necessary permissions to run Docker.Open a terminal window on your dedicated Data Catalog deployment server.
Save the Data Catalog release package in the Data Catalog server.
Now you will have two files:
The Docker image:
pdc-<version>-images.tgz(example:pdc-10.2.5-images.tgz)The PDC deployment bundle:
pdc-<DEPLOYMENT_PACKAGE_TYPE> -<version>-compose.tgzThe<DEPLOYMENT_PACKAGE_TYPE>placeholder corresponds to the type of PDC service and<version>is the PDC version you want to deploy. For example:For PDC full services, use:
pdc-full-10.2.5-compose.tgzFor PDC with Pentaho Data Optimizer services, use:
pdc-pdo-10.2.5-compose.tgzFor PDC with Pentaho Data Mastering services, use:
pdc-pdm-10.2.5-compose.tgzNote: If you are unsure which deployment package to use, contact Pentaho Support for guidance.
Extract the files from the PDC deployment bundle to the
/optdirectory using the following command:tar -xvf [*name of release package*].tgz -C /optExample:
tar -xvf pdc-full-10.2.5-compose.tgz -C /optThe command creates a
pentahodirectory and extracts the contents of the deployment into apdc-docker-deploymentsubdirectory.Load the required installation images (that are saved in the Data Catalog server) into Docker:
docker load –i pdc-<version>-images.tgz cd /opt/pentaho/pdc-docker-deployment sudo ./pdc.shExample:
pdc-10.2.5-images.tgz(Optional) If you get this message:
GLOBAL_SERVER_HOST_NAME env is not set, select the number for the environment variable value that you want to set from the listing or enter the number using the keyboard and then press Enter:1. IP address 2. Hostname 3. Hostname.localhost.localdomain 4. Hostname.localhost.localdomain 5. Other #?If you select
1, the script sets theGLOBAL_SERVER_HOST_NAMEvariable to the IP address in theconf/.envfile.Note: If you select
5, then enter the valid IP address or the fully qualified domain name.Edit the
conf/.envfile to apply the license for your product, add email domains, and perform Keycloak SMTP configuration:Note: This SMTP configuration sets up forgot password functionality only. To set up Data Catalog to email users when they are tagged in a comment, data pipes template, or other notification, see Set up email server to send Data Catalog notifications in the Administer Pentaho Data Catalog guide.
sudo vi conf/.envAdd the provided license code identifier to the License Server ID variable in the Licensing Server URL, as follows:
LICENSING_SERVER_URL=https://pentaho.compliance.flexnetoperations.com/instances/*<License Server ID>*/requestAdd new email allowed domains. By default, Data Catalog includes users that use
hv.comandhitachivantara.comemails. You can add your own domain to this list:EMAIL_DOMAINS='["hv.com", "hitachivantara.com", "abc.com"]'Note: Do not overwrite or delete
hv.comorhitachivantara.comas these email domains are used to create the default users in the deployment.Add configuration for Keycloak SMTP. In the example value below, SMTP configuration is set to use
hitachivantara.comemails, but you can change these to point to your company’s SMTP server configuration:KEYCLOAK_SMTP='{"replyToDisplayName" : "[email protected]","starttls" : "true","auth" : "true","envelopeFrom" : "[email protected]", "ssl" : "true","password" : "fwjx mpvb hcdb yofp","port" : "465","host" : "smtp.hitachivantara.com","replyTo" : "[email protected]","from" : "[email protected]","fromDisplayName" : "[email protected]","user" : "[email protected]"}'Save the file.
The licensing, email, and SMTP settings are complete.
Note: - If you want to update email or SMTP settings after installation, this needs to be done using IAM APIs. - During installation, you can also modify the
COMPOSE_PROFILESor other variables in the.envfile to enable or disable specific services (for example, Physical Assets). For details, see Disable the Physical Assets feature from Data Catalog deployment in the Administer Pentaho Data Catalog guide.Start all the Docker containers using the following command:
sh pdc.sh up
The installation is ready for use after all the Docker containers have successfully started.
Access Data Catalog through your browser (the Chrome browser is recommended) using the value provided in the GLOBAL_SERVER_HOST_NAME (in step 7) and confirm that the application is successfully installed and running.
Data Catalog provides a set of default users for demonstration and testing purposes. These default users have the following specific roles assigned:
Business User
A user who needs to view business-specific glossaries and dictionaries
Data User
A user who needs to use Data Catalog to find data for a business operation
Business Steward
A user who needs to maintain business-specific glossaries and dictionaries
Data Steward
A user who needs to update and process data in Data Catalog for use for a business operation, including migrating data if you have a license for Pentaho Data Optimizer
Admin
A user who needs to configure the product
Data Developer
A user who needs to create and update business or metadata rules
Data Storage Administrator
A user who monitors and manages storage utilization across data sources, folders, and schemas.
For more information, see the User roles and permissions in Data Catalog section in Use Pentaho Data Catalog guide.
Refer to the installation package for credential details for the default users. This information is found in an encrypted file.
After installing Data Catalog, you may need to set up additional components, depending on your environment. For more information, see the Advanced configuration section in the Administer Pentaho Data Catalog guide.
Last updated
Was this helpful?

