Contribute additional step and job entry analyzers to the Pentaho Metaverse

You can use the Pentaho Metaverse, which contains metadata lineage capabilities for the Pentaho universe. Pentaho Data Integration (PDI) is a major source of lineage information. The metaverse mines metadata and builds a connected relationship model among all the pieces it knows about. The end result is a graph model which allows for lineage (finding where/what contributed to something) and impact analysis (determining what would be affected downstream if something where changed). The metaverse leverages OSGi (blueprints) to allow for modularity and extensibility. Therefore, if something is not supported out-of-the-box by the metaverse, the metaverse can accept components via OSGi bundles which extend its capabilities.

Kettle supports transformations and jobs, each of which is composed of smaller bite-sized operations. A transformation is made up of steps and a job is made up of job entries. Conceptually, these can be thought of as analogs. Kettle provides hundreds of unique steps and job entries which each perform a specific task. As far as the metaverse is concerned, each one of these steps and job entries is a potential source of metadata with respect to lineage.

The metaverse is composed of analyzers which are responsible for mining lineage information from a specific "thing." There are document analyzers which know how to extract the lineage information from documents. PDI produces two document types, transformations (KTR) and jobs (KJB), and for each there is a corresponding document analyzer. Each one analyzes the sub-components, the steps comprising a transformation and the job entries comprising a job, and assigns each subcomponent a specific step analyzer or job entry analyzer if one exists for the implementation of BaseStepMeta.

The out-of-the-box set of analyzers is limited. In the case of a step or job entry not having a corresponding analyzer, there is a generic fallback analyzer. To contribute a new step or job entry analyzer to the system, you can implement the required interface(s) and register a service via OSGI (blueprints) to become available to the system.

Last updated

Was this helpful?