Copybook steps in PDI
Pentaho Data Integration supports simplified integration with fixed-length records in binary mainframe data files, so more users can ingest, integrate, and blend mainframe data as part of their data integration pipelines. This capability is critical if your business relies on massive amounts of customer and transactional datasets generated in mainframes that you want to search and query to create reports.
Mainframe file records are typically defined by a COBOL copybook. A COBOL copybook is a selection of code that defines the data layout of items from a data source, including records, segments, fields, and keys. Copybooks allow developers to reuse data structures in multiple instances.
Copybook data is usually extracted from the mainframes in a block of records and then stored in binary files, along with a definition file, that can be read by PDI. Based on the definition file, the Copybook input step and the Read metadata from Copybook step read the binary content in the data files and convert it to PDI rows which makes the data easy to integrate into your transformations.
These steps navigate you through challenging conversion issues, such as packed decimal numbers and multibyte data type storage, which are typical of COBOL copybooks. The steps can also handle REDEFINES clauses, which change some of the fields in a record based on other values in the record.
Copybook transformation steps in PDI
PDI has two transformation steps you can use to read mainframe records from a file and transform them into PDI rows. These steps are intended for data scientists with mainframe file and copybook experience.
This step reads the mainframe files that were originally created using the copybook definition file and outputs to the PDI stream for use in transformations.
This step reads the metadata of a copybook definition file to use with ETL Metadata Injection in PDI.
Metadata discovery
You can use metadata discovery automate the tedious process of manually identifying and determining metadata from Cobol Copybook and JDBC databases.
The Read metadata from Copybook step reads a binary fixed-length copybook definition file and outputs the file and column descriptor information as fields to Pentaho Data Integration rows.
Query metadata from a database
The Query metadata from a database step discovers metadata from six different JDBC metadata types to use with ETL Metadata Injection in PDI.
Last updated
Was this helpful?

