PDC API Documentation

Explore the Pentaho Data Catalog (PDC) API, its capabilities, endpoint groups, and key use cases for developers, admins, stewards, and analysts.

Overview

The Pentaho Data Catalog API provides programmatic access to all major capabilities of the catalog. It enable developers, administrators, and data stewards to integrate catalog functions into applications, scripts, or automation pipelines without relying solely on the Data Catalog user interface.

Data Catalog API is designed with a RESTful architecture, uses JSON for request and response bodies, and secures access using OAuth2 bearer tokens. All endpoints are exposed through a consistent and versioned URL structure.

Capabilities

Data Catalog API exposes a wide range of endpoints that mirror the functionality available in the Pentaho Data Catalog user interface. By calling these endpoints, you can automate catalog operations, integrate metadata into external systems, and embed discovery and governance features directly into your applications. Using the PDC API, you can:

  • Check system health Verify that your PDC instance is running and accessible.

  • Authenticate and authorize Obtain a JWT bearer token to securely access all other endpoints.

  • Search assets Query datasets, entities, and collections using search terms, filters, and facets.

  • Manage data sources Create, retrieve, and manage data source connections.

  • Work with entities Get metadata, update attributes, filter entities, and retrieve profiling information.

  • Run and monitor jobs Trigger profiling, ingestion, or other background tasks and monitor their progress.

  • Manage datasets and collections Create, update, and organize datasets, collections, categories, and groups.

  • Consume notifications Subscribe to and retrieve system notifications.

API base URL

All PDC API endpoints are exposed under the /api/public path of your PDC deployment. This path provides a consistent entry point for accessing versioned resources (for example, /api/public/v1/...) and unversioned system checks (for example, /api/public/health).

Entry points
URL pattern
Example

Swagger UI

https://<your-domain>/api/public/swagger/

https://10.177.176.228/api/public/swagger/

OpenAPI JSON

https://<your-domain>/api/public/swagger/json

https://10.177.176.228/api/public/swagger/json

Base prefix

https://<your-domain>/api/public

https://10.177.176.228/api/public

Versioned endpoints

https://<your-domain>/api/public/v1/...

https://10.177.176.228/api/public/v1/search

Health check

https://<your-domain>/api/public/health

https://10.177.176.228/api/public/health

Endpoint groups

For easier navigation, the PDC API organizes its endpoints into logical domains. Each group focuses on a specific area of functionality, such as authentication, searching, or managing data sources, so that developers can quickly locate the endpoints they need. The main groups are:

Who should use the API

The PDC API is designed to support a variety of roles across your data ecosystem:

  • 👩‍💻 Developers can integrate catalog capabilities directly into applications, extend existing solutions, or build automation workflows.

  • 🛠️ Administrators can programmatically manage data sources, run and monitor jobs, and enforce governance policies at scale.

  • 🗂️ Data stewards can streamline metadata curation by automating tagging, labeling, and business term assignments.

  • 📊 Analysts can query curated data assets and bring them directly into their analytics and reporting pipelines.

  • 👥 Business users can consume notifications, track the status of data assets, or run lightweight scripts to stay informed without using the full UI.


Next steps

Last updated