PDC API Documentation
Explore the Pentaho Data Catalog (PDC) API, its capabilities, endpoint groups, and key use cases for developers, admins, stewards, and analysts.
Overview
The Pentaho Data Catalog API provides programmatic access to all major capabilities of the catalog. It enable developers, administrators, and data stewards to integrate catalog functions into applications, scripts, or automation pipelines without relying solely on the Data Catalog user interface.
Data Catalog API is designed with a RESTful architecture, uses JSON for request and response bodies, and secures access using OAuth2 bearer tokens. All endpoints are exposed through a consistent and versioned URL structure.
Capabilities
Data Catalog API exposes a wide range of endpoints that mirror the functionality available in the Pentaho Data Catalog user interface. By calling these endpoints, you can automate catalog operations, integrate metadata into external systems, and embed discovery and governance features directly into your applications. Using the PDC API, you can:
Check system health Verify that your PDC instance is running and accessible.
Authenticate and authorize Obtain a JWT bearer token to securely access all other endpoints.
Search assets Query datasets, entities, and collections using search terms, filters, and facets.
Manage data sources Create, retrieve, and manage data source connections.
Work with entities Get metadata, update attributes, filter entities, and retrieve profiling information.
Run and monitor jobs Trigger profiling, ingestion, or other background tasks and monitor their progress.
Manage datasets and collections Create, update, and organize datasets, collections, categories, and groups.
Consume notifications Subscribe to and retrieve system notifications.
API base URL
All PDC API endpoints are exposed under the /api/public path of your PDC deployment. This path provides a consistent entry point for accessing versioned resources (for example, /api/public/v1/...) and unversioned system checks (for example, /api/public/health).
Swagger UI
https://<your-domain>/api/public/swagger/
https://10.177.176.228/api/public/swagger/
OpenAPI JSON
https://<your-domain>/api/public/swagger/json
https://10.177.176.228/api/public/swagger/json
Base prefix
https://<your-domain>/api/public
https://10.177.176.228/api/public
Versioned endpoints
https://<your-domain>/api/public/v1/...
https://10.177.176.228/api/public/v1/search
Health check
https://<your-domain>/api/public/health
https://10.177.176.228/api/public/health
Endpoint groups
For easier navigation, the PDC API organizes its endpoints into logical domains. Each group focuses on a specific area of functionality, such as authentication, searching, or managing data sources, so that developers can quickly locate the endpoints they need. The main groups are:
Health
Verify that the PDC API is running and reachable — useful for monitoring or automation scripts.
/api/public/health
Auth
Authenticate with username and password to obtain a bearer token for secure API access.
/api/public/v1/auth
Notifications
Retrieve or create notifications to track system events, data changes, or catalog activity.
/api/public/v1/notifications
Search
Search catalog assets and retrieve facets to filter results for discovery and analytics.
/api/public/v1/search, /api/public/v1/search/facets
Data Sources
Create, retrieve, and manage data source connections across databases, files, and cloud stores.
/api/public/v1/data-sources
Data Entities
Get, update, or filter entities and fetch profiling information for metadata analysis.
/api/public/v1/entities/... (single, bulk, filter, profiling)
Who should use the API
The PDC API is designed to support a variety of roles across your data ecosystem:
👩💻 Developers can integrate catalog capabilities directly into applications, extend existing solutions, or build automation workflows.
🛠️ Administrators can programmatically manage data sources, run and monitor jobs, and enforce governance policies at scale.
🗂️ Data stewards can streamline metadata curation by automating tagging, labeling, and business term assignments.
📊 Analysts can query curated data assets and bring them directly into their analytics and reporting pipelines.
👥 Business users can consume notifications, track the status of data assets, or run lightweight scripts to stay informed without using the full UI.
Next steps
Get started with PDC API — Try your first requests in minutes.
Authentication — Learn how to obtain and refresh tokens.
Errors — Explore error codes, error response schema, and troubleshooting.
Troubleshooting guide — Diagnose common issues, interpret error responses, and apply best practices to resolve problems.
API Reference — Browse detailed endpoints by domain.
Last updated

