Machine Learning (ML) Models

In today's data-driven landscape, Machine Learning (ML) models play a crucial role in extracting insights, making predictions, and automating decision-making. Pentaho Data Catalog introduces ML Model Tracking, allowing organizations to efficiently manage, analyze, and track ML model metadata. With ML Model Tracking, you can:

Discover and manage ML model metadata within Data Catalog.
Track model performance metrics across multiple experiments and runs.
Ensure traceability by viewing model dependencies, artifacts, and lineage.
Enhance collaboration by enabling teams to log, retrieve, and analyze ML model metadata.
Ensure governance and compliance by monitoring data drift and performance degradation.

In ML Model Tracking, Data Catalog integrates with MLflow to import and catalog experiments, runs, model versions, parameters, metrics, and artifacts into a structured hierarchy known as ML Models. This enables organizations to:

Centralize ML metadata management by storing and organizing ML Models in one place.
Ensure experiment reproducibility by tracking datasets, parameters, and artifacts for each model version.
Enhance collaboration by enabling data scientists, analysts, and business users to easily access ML model metadata.

Strengthen governance and compliance by monitoring ML Models for performance degradation, detecting drift, and triggering alerts when necessary.

ML Models components

ML Models in Data Catalog are hierarchically structured to help users easily navigate, manage, and track machine learning models, their versions, experiments, and associated metadata. The hierarchy ensures that every component of an ML model server, such as ML models, versions, experiments, and runs, is well-organized and accessible. To configure and import ML model server components, see the Configure for ML model servers tpoic in the Administer Pentaho Data Catalog document.

The following table lists the components of the ML Models hierarchy in Data Catalog:

Item

ML Models component

Description

ML model server

A centralized repository where machine learning (ML) models are trained, stored, and deployed, enabling model tracking, serving, and monitoring.

ML model

A trained machine learning artifact that makes predictions based on data. Each model is created using algorithms, datasets, and hyperparameters, and it produces versions, runs, and artifacts during training.

Version

An iteration of a model evolved over time through continuous training and experimentation.

Experiment

A systematic process of training and evaluating a model with various configurations, datasets, or hyperparameters to identify the most effective version.

Run

An individual training attempt of an ML model, where various parameters and datasets are used to evaluate different approaches.

Each run records:

ML Runs Tags: Key-value pairs associated with a specific ML run that provide additional metadata for tracking, searching, and organizing experiments in ML Model Tracking.

Parameters: Hyperparameters used for training, like learning rate and batch size.

Metrics: Model evaluation results, like accuracy, RMSE, and MSE.

Run ID: A unique identifier for the execution.

Tour the ML Models page

In Data Catalog, the ML Models page provides a user-friendly interface for managing and viewing ML model servers and their components. To access and explore the ML Models page, in the left navigation menu, click ML Models. This page is divided into two primary areas: navigation and content pane.

In Data Catalog, on the left navigation pane, you see a list of ML Model Servers and their components, including ML Models, Versions, Experiments, and Runs, organized in a hierarchical tree structure. You can explore this hierarchy to locate specific ML Model components for further analysis. You can choose View Table or View Galaxy under Actions to understand the ML Models hierarchy. To configure and import ML Model server components, see Configure for ML Model Servers in the Administer Pentaho Data Catalog document. If required, you can also manually create ML model server components using the following options available under Actions:

Add New Model Server: Creates a new ML model server in Data Catalog to track and manage ML Models and their metadata.
Add New Model: Creates a new ML model under a registered model server, which can be tracked, versioned, and managed within Data Catalog.
Add New Experiment: Creates a new ML experiment within an ML model to log multiple training runs and compare different model configurations.
Add New Version: Adds a new ML model version, representing an improved or modified iteration of an existing model with updated parameters or datasets.
Add New Run: Logs a new ML run under an experiment, capturing metadata such as hyperparameters, metrics, artifacts, and execution details for tracking and evaluation.

Additionally, similar to other hierarchy assets in Data Catalog, you can import and export ML model components. For more information, see the Manage Machine Learning (ML) Models section in the Administer Pentaho Data Catalog document.

ML Models content pane

You can view detailed information about the selected ML Models component, including metadata specific to the chosen component. For instance, when a version is selected, the content pane shows its associated metadata, where you can clearly understand the data attributes and context.

The following table identifies the key details available in the content pane for an ML model:

Item

Name

Description

Data banner

Shows the name, path, and type icon identifying the ML model server and its components, such as an ML model, version, experiment, or run. With these details, you can gain a clear understanding of the component's position within the hierarchy and its classification. The name and type attributes identifying the resource are provided.

Actions menu

Shows a menu of actionable options tailored to the selected ML Models component type. You can perform tasks such as copying the application path (hierarchy) to reference the asset’s location or switching to a Galaxy view to visualize the data alternately. For more information, see ML Models Galaxy view.

Data tabs

Get access to detailed information and metadata related to the selected ML Model components through the following tabs:

Summary
Custom
Business Terms
Data Elements
Comment
Applications
Policies

On each tab, you can get insights into specific attributes or relationships of the ML Models components, which can help to analyze and understand the ML Models hierarchy. To learn more about each tab, see ML Models hierarchy view.

Different views of the ML Models

In the ML Models section of Data Catalog, you can explore ML Models in multiple views to understand the structure and relationships between models, versions, experiments, and runs in a way that best suits your needs.

By default, Data Catalog displays ML model server components in a hierarchical tree-structured format. This view organizes ML assets, such as model servers, models, versions, experiments, and runs, by following a clear parent-child relationship, enabling users to understand how different ML components are interconnected.

You can also switch to Table View or Galaxy View by selecting the View Table or View Galaxy option from the Actions menu in the navigation pane. In the table view, you see ML model server components, such as ML models, versions, experiments, and runs, presented in a tabular layout or a detailed, spreadsheet-like format, which helps you analyze, compare, and manage ML models efficiently. In Galaxy view, you can visualize ML model server components in a spatial format, where you can identify relationships and patterns between the components, making it ideal for gaining a broader, more interconnected perspective of ML model servers.

ML Models hierarchy view

In Data Catalog, you can configure ML model servers and import their components into the ML Models section. In the hierarchy view of the ML Models page, you can manage them visually and intuitively. Additionally, you can sync ML model server components and maintain their associated details, ensuring clarity and consistency in data-related discussions. The following options are available on the ML Models page.

ML Models component name

The name of the ML Models component, such as an ML model server, ML model, version, experiment, and run. With this, you can quickly identify and understand the specific ML Models component you are working with.

Actions

The Actions menu with the following features:

Feature

Description

Copy Path

Copies the hierarchical path of the ML Models component for quick reference or to share it with others.

View Galaxy

Takes you to the Galaxy view of the selected ML Models component. Here, you can see the ML Models components and their related assets. See Galaxy view for more details.

Summary tab

Gives a summarized view of the selected ML Models component. In the Summary tab, you can view the following information.

Note: The visible information depends on the ML Models component you have selected.

Name

Description

Definition

Update with a custom description of the registered model server that helps users understand its purpose, configuration, and role within their machine learning (ML) workflows.

Assets

Displays the child components associated with the selected ML Models component, such as ML Models, Versions, Experiments, and Runs. The table displays key details like component name, type, associated ML tags, parent component, status, and who created it and who updated the time, helping users track relationships, navigate the ML hierarchy, and manage metadata in Data Catalog.

Additional Details (only for versions and runs)

Displays detailed metadata about the selected version and run.

For a version, it displays the following version-specific metadata so that the user can verify which ML run produced a model version, locate the stored artifacts, and verify if the version is ready for deployment:

Run ID: Unique identifier for the ML run that generated this version.

Version Source: Path to the model artifacts stored in MLflow or an external system.

Version Stage: Indicates the deployment stage of the model.

Version Status: Shows whether the version is ready for use.

Custom Properties: Displays any associated custom properties of the selected version.

Created By: The user who created the selected version entry in Data Catalog.

Updated By: The user who last updated the selected version. For a run, it displays the following additional metadata about the specific ML run:

Artifact URI: A clickable link to access artifacts generated during the ML run.

Run ID: Unique identifier for the ML run.

Run Status: Indicates the completion status of the run (for example, FINISHED or FAILED).

Ratings

Highlights the popularity of the resource by providing the average rating of all users for the selected ML Models component.

Properties

Shows the properties of the selected ML Models component.

Domain: The specific area or domain within the organization to which the ML Models component belongs.

Status: The current state of the selected ML Models, such as Imported, Accepted, Draft, Reviewed, Deprecated, and Unknown.

Created By: The user who created the ML Models component entry in Data Catalog.

Last Updated: The date and time when the ML Models component data was last updated.

Tags

Lists the tags that are linked to the selected ML Models component. You can also add your own tags, which will assist in identifying the resource using specific keywords.

Style

Displays the icon and color associated with the glossary, if any. Click Change to edit the associated icon or color.

Custom tab

You can use custom properties to annotate and categorize the ML models and components with additional context-specific information, enhancing the metadata available for data assets within Data Catalog. This tab lists the custom properties and values added to the component. You can also apply filters to refine the list.

To add a custom property, click Add Custom Property and provide the required information, such as Field Label, Field Type, and Default Value. For more information, see Manage custom properties in Administer Pentaho Data Catalog document.

Business Terms tab

In the Business Terms tab, you can associate ML Models components with relevant business terms to define and categorize the components, ensuring consistency across the organization. It shows the names of associated terms, their parent categories, and the owners responsible for them. You can also apply filters to refine the list.

To add a business term, click Add Terms and choose the business term you want to associate with the ML Models component. After you add any business term to the component, it creates a relationship between the term and the component. To get detailed information, click the term to view the business term in the canvas view with a highlighted focus. You can also click Delete, which only removes the association between the component and the business term but does not delete the actual business term in Data Catalog.

Data Elements tab

The Data Elements tab shows a detailed view of the data elements associated with a selected ML Models component. This tab shows data asset details such as the data source, item name, item type, parent, and associated tags which help you to understand the data structure, maintain metadata, and perform actions like adding, viewing, or deleting data elements. You can also apply filters to refine the list.

To add new data elements, click Add Data Elements and choose the data element you want to associate with the component. To get detailed information, click View to view the data element in the canvas view with a highlighted focus. You can also click Delete, which only removes the association between the component and the data element but does not delete the actual data element.

Comment tab

The Comment tab is a collaborative feature that allows users to discuss and provide feedback on specific data assets within Data Catalog. You can add comments, share suggestions, or ask questions directly in the tab using the provided text box, which includes basic formatting options like bold, italic, and bullet points. In addition, you can tag other users by mentioning them with the "@" symbol followed by their username. Then the specific user, or users, are notified of the comment through email and in the Mentions tab on the Data Catalog landing page, prompting them to respond if necessary. For more information, see Tour of the Home page.

Note: In the Comment tab, you can:

Tag users who have been configured in Data Catalog.
Only delete the comments you posted.
Delete any comment if you are an admin.

Applications tab

The Applications tab contains the third-party or external applications associated with the ML Models component and additional information associated with each application. By linking third-party applications and ML Models components, you can understand how external applications interact with specific ML Models components and the relationship between data and external systems to better assess its purpose and relevance. You can also apply filters to refine the list.

To add an application, click Add Applications and choose the application you want to link to the component. You can also click Delete, which removes the association between the application and the ML Models component but does not delete the actual application in Data Catalog.

Policies tab

In the context of an ML Models component, policies and standards are properties of a ML Models component, meaning a set of rules applied to the component. In the Policies tab, you can explore the standards and policies related to the component and additional information, such as name, parent, and owner. By linking policies and standards with ML Models components, you can give clarity on the policies and standards governing data usage and management and reduce the risk of non-compliance. You can also apply filters to refine the list.

To add a policy, click Add Policy and choose the standard and policy you want to link to the component. After you add any policy to the component, it creates a relationship between the policy and the component. You can also click Delete, which removes the association between the component and policy but does not delete the actual policy in Data Catalog.

ML Models table view

In Data Catalog, the ML Models table view shows a structured, spreadsheet-like layout for browsing and managing ML Models components. This view enhances the way you interact with ML Models components. It provides a centralized overview to view all machine learning assets, such as model servers, models, versions, experiments, and runs, along with their associated metadata in one unified interface. With this consolidated view, you can navigate easily and quickly search, sort, and filter through large volumes of ML Models components to locate the exact asset you need.

To access the Table View for ML Models, click ML Models in the left navigation. Then, the ML Models landing page appears. In the Navigation pane, click Actions and select View Table from the menu options. The ML Models table view appears, displaying all ML Models components in a grid layout for easier visibility and comparison.

The ML Models table view is organized into multiple tabs based on ML Models component types:

Tab Name

Description

All

Displays all ML Models components, including model servers, models, versions, experiments, and runs.

Model Server

Lists all registered ML model servers with their associated metadata.

Model

Displays ML models and their associated properties and tags.

Experiment

Shows all experiments logged under various models or versions.

Version

Lists individual model versions and related tracking metadata.

Run

Displays detailed execution-level run information for experiments.

Each tab in the table view displays common and component-specific attributes in column format. The following table lists such attributes:

Column Name

Description

Name

The name of the ML Models component.

Type

Indicates the component type such as model, run, experiment, and so on.

ML Tags

Metadata tags logged during training.

Parent

The parent entity to which the component belongs, such as a server or model.

Custom Properties

User-defined metadata fields.

Created By

The user who created or modified the asset.

Updated By

The user who last modified the asset.

In the ML Models table view, you can customize and personalize the display of ML Models components, making it easier to focus on relevant metadata and streamline workflows. You can click the filter icon and use the available filter inputs beneath each column header to search or select values, such as filter by model name, type, or ML model server. Additionally, you can tailor the table view to display only the information most relevant to your role by clicking the configure icon and using the checkboxes to show or hide columns based on your preference. You can also rearrange column order using the drag-and-drop handles.

ML Models Galaxy view

In the Navigation pane, under Actions, selecting View Galaxy displays the ML Models components in the Galaxy view. The Galaxy view shows a different visual layout that is useful for exploring relationships and connections among ML Models components. You can use the Galaxy view feature to view the structure of the data and its details quickly. This feature is especially useful when you want to view information that is not easily visualized using the navigation tree. When you open ML Models components in the Galaxy view, you can see the relationships in the data from a bird's eye view and drill down into the data for specific details. To learn more about Galaxy view and its available functions, see Galaxy View.

PreviousPhysical Assets NextGalaxy view

Last updated 11 days ago

Was this helpful?

ML Models components

Tour the ML Models page

ML Models navigation pane

ML Models content pane

Different views of the ML Models

ML Models hierarchy view

ML Models component name

Actions

Summary tab

Custom tab

Business Terms tab

Data Elements tab

Applications tab

Policies tab

ML Models table view

ML Models Galaxy view