Lineage

Lineage in Pentaho Data Catalog helps you understand the complete journey of your data, from its origin to its final use in reports, dashboards, and downstream systems. It provides a visual representation of how data flows across tables, files, datasets, and reports, enabling users to trace relationships, validate data trust, and assess impact across the data landscape.

Lineage is a critical capability for:

  • Data governance and audit readiness

  • Regulatory compliance (such as GDPR, CCPA, or IFRS)

  • Impact analysis during data model or schema changes

  • Root cause analysis and troubleshooting

In Data Catalog, you can have two types of lineage views:

Data lineage

With the Data Lineage view, you can visualize how structured and unstructured data assets are connected across systems. It displays the upstream sources and downstream consumers of a selected data resource, such as a table, file, or dataset, helping you understand how data is transformed and used throughout the organization. Data lineage helps you to:

  • Trace data origin and usage paths

  • Explore relationships between datasets, schemas, and files

  • Validate data integrity and identify dependencies To learn more, see Data lineage.

Report lineage

Similar to Data lineage, the Report Lineage view focuses on business intelligence (BI) components, including reports, dashboards, datasets, and charts, from connected BI servers such as Tableau. It shows how reports are built from datasets, which in turn are derived from data entities, tables, or files. Report lineage helps you to:

  • Understand the data flow into and within reports

  • Support compliance and data privacy requirements

  • Perform impact analysis before modifying source data To learn more, see Report lineage.

Last updated

Was this helpful?