# PDC API Documentation v2

## Overview

The Pentaho Data Catalog API provides programmatic access to all major capabilities of the catalog. It enable developers, administrators, and data stewards to integrate catalog functions into applications, scripts, or automation pipelines without relying solely on the Data Catalog user interface.

Data Catalog API is designed with a **RESTful architecture**, uses **JSON** for request and response bodies, and secures access using **OAuth2 bearer tokens**. All endpoints are exposed through a consistent and versioned URL structure.

## Capabilities

Data Catalog API exposes a wide range of endpoints that mirror the functionality available in the Pentaho Data Catalog user interface. By calling these endpoints, you can automate catalog operations, integrate metadata into external systems, and embed discovery and governance features directly into your applications. Using the PDC API, you can:

* **Check system health**\
  Verify that your PDC instance is running and accessible.
* **Authenticate and authorize**\
  Obtain a JWT bearer token to securely access all other endpoints.
* **Search assets**\
  Query datasets, entities, and collections using search terms, filters, and facets.
* **Manage data sources**\
  Create, retrieve, and manage data source connections.
* **Work with entities**\
  Get metadata, update attributes, filter entities, and retrieve profiling information.
* **Run and monitor jobs**\
  Trigger profiling, ingestion, or other background tasks and monitor their progress.
* **Manage datasets and collections**\
  Create, update, and organize datasets, collections, categories, and groups.
* **Consume notifications**\
  Subscribe to and retrieve system notifications.

## API base URL

All PDC API endpoints are exposed under the `/api/public` path of your PDC deployment. This path provides a consistent entry point for accessing versioned resources, such as  `/api/public/v2/...`.

<table><thead><tr><th width="188">Entry points</th><th>URL pattern</th><th>Example</th></tr></thead><tbody><tr><td><strong>Swagger UI</strong></td><td><code>https://&#x3C;your-domain>/api/public/v2/swagger/</code></td><td><code>https://10.177.176.228/api/public/v2/swagger/</code></td></tr><tr><td><strong>OpenAPI JSON</strong></td><td><code>https://&#x3C;your-domain>/api/public/v2/swagger/json</code></td><td><code>https://10.177.176.228/api/public/v2/swagger/json</code></td></tr><tr><td><strong>Base prefix</strong></td><td><code>https://&#x3C;your-domain>/api/public/v2/</code></td><td><code>https://10.177.176.228/api/public/v2/</code></td></tr></tbody></table>

## Endpoint groups

For easier navigation, the PDC API organizes its endpoints into logical domains. Each group focuses on a specific area of functionality, such as authentication, searching, or managing data sources, so that developers can quickly locate the endpoints they need. The main groups are:

<table data-view="cards"><thead><tr><th>Group</th><th>Description</th><th data-hidden>Endpoint(s)</th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>/link</td><td>Authenticate with username and password to obtain a bearer token for secure API access.</td><td><code>/api/public/v1/auth</code></td><td><a href="../pdc-api-ref-v2/auth#post-api-public-v1-auth">#post-api-public-v1-auth</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/data-entities">data-entities</a></td><td>Get, update, or filter entities and fetch profiling information for metadata analysis.</td><td><code>/api/public/v1/entities/...</code> (single, bulk, filter, profiling)</td><td><a href="../pdc-api-ref-v2/data-entities#get-api-public-v1-entities-id">#get-api-public-v1-entities-id</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/data-sources">data-sources</a></td><td>Create, retrieve, and manage data source connections across databases, files, and cloud stores.</td><td><code>/api/public/v1/data-sources</code></td><td><a href="../pdc-api-ref-v2/data-sources#get-api-public-v1-data-sources-id">#get-api-public-v1-data-sources-id</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/data-collections">data-collections</a></td><td>Manage datasets, collections, categories, and groups to organize catalog content.</td><td><code>/api/public/v1/data-collections</code></td><td><a href="../pdc-api-ref-v2/data-collections#get-api-public-v1-data-collections-id">#get-api-public-v1-data-collections-id</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/jobs">jobs</a></td><td>Run and monitor background jobs such as profiling, ingestion, or schema scanning.</td><td><code>/api/public/v1/job/execution</code></td><td><a href="../pdc-api-ref-v2/jobs#get-api-public-v1-jobs-id-status">#get-api-public-v1-jobs-id-status</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/licensing">licensing</a></td><td>Retrieve licensing information and manage offline licenses in Data Catalog.</td><td><code>/api/public/v1/licensing/licenses</code></td><td><a href="pdc-api-ref-v2/licensing">licensing</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/notifications">notifications</a></td><td>Retrieve or create notifications to track system events, data changes, or catalog activity.</td><td><code>/api/public/v1/notifications</code></td><td><a href="../pdc-api-ref-v2/notifications#get-api-public-v1-notifications">#get-api-public-v1-notifications</a></td></tr><tr><td><a data-mention href="pdc-api-ref-v2/search">search</a></td><td>Search catalog assets and retrieve facets to filter results for discovery and analytics.</td><td><code>/api/public/v1/search</code>, <code>/api/public/v1/search/facets</code></td><td><a href="../pdc-api-ref-v2/search#post-api-public-v1-search">#post-api-public-v1-search</a></td></tr></tbody></table>

## Who should use the API

The PDC API is designed to support a variety of roles across your data ecosystem:

* 👩‍💻 **Developers** can integrate catalog capabilities directly into applications, extend existing solutions, or build automation workflows.
* 🛠️ **Administrators** can programmatically manage data sources, run and monitor jobs, and enforce governance policies at scale.
* 🗂️ **Data stewards** can streamline metadata curation by automating tagging, labeling, and business term assignments.
* 📊 **Analysts** can query curated data assets and bring them directly into their analytics and reporting pipelines.
* 👥 **Business users** can consume notifications, track the status of data assets, or run lightweight scripts to stay informed without using the full UI.

***

## Next steps

* [Get started with PDC API](https://docs.pentaho.com/pdc-api-docs/get-started-with-pdc-api-v2) — Try your first requests in minutes.
* [Authentication](https://docs.pentaho.com/pdc-api-docs/get-started-with-pdc-api-v2/authentication) — Learn how to obtain and refresh tokens.
* [Errors](https://docs.pentaho.com/pdc-api-docs/get-started-with-pdc-api-v2/errors) — Explore error codes, error response schema, and troubleshooting.
* [Troubleshooting guide](https://docs.pentaho.com/pdc-api-docs/get-started-with-pdc-api-v2/troubleshooting-guide) — Diagnose common issues, interpret error responses, and apply best practices to resolve problems.
* [API Reference](https://docs.pentaho.com/pdc-api-docs/pdc-api-ref-v2) — Browse detailed endpoints by domain.
