Install Pentaho Data Catalog

Includes setup steps for PDC, Remote Workers, and optional components.

Pentaho Data Catalog is a powerful solution for data governance, discovery, and cataloging. Installing Data Catalog in your environment enables you to manage structured and unstructured data using intelligent automation and machine learning, while laying the groundwork for advanced features like Pentaho Data Optimizer and Pentaho Data Mastering (if licensed).

This guide helps you install PDC and its optional components across various deployment scenarios, ranging from a single-node server to a distributed, containerized environment. You can also deploy Remote Workers to support scalable and secure metadata processing across different network zones.

What’s included with installation

When you install PDC, the following are also installed (based on your license):

  • Pentaho Data Optimizer (PDO): Automates intelligent data tiering to object storage.

  • Pentaho Data Mastering (PDM): Enables advanced data mastering and curation workflows.

To access the appropriate release package, Pentaho provides specific credentials along with a URL download link. These credentials grant you access to download the required package for your server. For more information, contact Pentaho support.

To install Data Catalog, see the following topics:

To install and configure a remote worker, see Install and configure a Remote Worker.

For Data Catalog upgrade instructions, see Upgrade Data Catalog and to upgrade to a patch version, see Upgrade Data Catalog to a patch version.

For cloud-based deployments, see Hyperscalers.

Use the Chrome browser to access the PDC user interface after installation. Default users are available for demo environments. For production environments, it is strongly recommended to create your own user accounts and disable default credentials.

For more help or access to downloads, visit the Pentaho Support Portal.

Additional Resources

Last updated

Was this helpful?