Resource properties

Pentaho Data Catalog modifies discovered resource properties to add user-defined metadata to the resource. This metadata holds business value or communicates the data quality of the resource.

Properties

Pentaho Data Catalog discovers metadata properties of resources during the Data Profiling (for structured data) and Data Discovery (for unstructured data) processes. Data Catalog uses standard properties as the default, giving you general information about the resource based on metadata standards.

After making any changes in the properties, make sure you save and rerun the Data Profiling and Data Discovery processes.

Custom properties

Custom properties are user-defined metadata that extend the standard metadata available in Data Catalog. They help capture additional business, technical, or operational information that contributes to the overall business value of your data assets.

Custom properties appear in the Custom Properties pane on the Summary tab when you select a resource. You can use them to describe how a resource is used, who owns it, or any other business-specific context. The description below each field typically explains how the property is used or what values it can take. For example:

Category
Example custom property
Example values

Business context

Business Unit

Retail, Finance, HR

Data management

Data Owner

John Smith, Jane Doe

Compliance

Regulatory Zone

GDPR, HIPAA, CCPA

Operations

Criticality Level

High, Medium, Low

Technical

Source System

SAP, Salesforce, Snowflake

Custom properties can be defined for multiple asset types in Pentaho Data Catalog, including:

  • Data resources (datasets, files, or tables)

  • Business glossaries and terms

  • Applications

  • Policies and standards

  • Physical Assets

  • Machine Learning (ML) Models

For more information on adding, editing, or removing custom properties, see Manage custom properties in the Administer Pentaho Data Catalog guide.

Benefits of Using Custom Properties

Implementing custom properties offers several key advantages:

  • Enhanced Metadata Management: Capture and organize additional information.

  • Improved Data Discovery: Enable users to search, filter, and group assets using business-defined attributes.

  • Stronger Governance: Support ownership tracking, compliance alignment, and data stewardship accountability.

  • Business Context Integration: Bridge the gap between technical metadata and business terminology, making data more accessible across teams.

  • Flexible Reporting & Analysis: Enrich metadata insights for dashboards, lineage views, and other catalog features.

Data labels

Data labels in Data Catalog are structured metadata elements defined as key-value pairs, designed to provide standardized, informative machine-readable labels derived from data content to data assets such as tables, columns, datasets, and files. Data Labels are intended for use by consumers of data such as ML Models. While similar in format to #Custom properties, which also use key-value pairs, Data labels are meant for specific purpose of data labeling , a concept required for supervised machine learning.

This structured approach helps you to manage metadata more effectively, particularly for use cases involving AI, machine learning, and data governance. With data labels, you can classify data using consistent terminology, such as 'Sensitivity: Confidential' or 'Data Quality: High', which improves model training, search accuracy, and compliance.

Last updated

Was this helpful?