> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/copybook-input-pdi-step.md).

# Copybook Input

The **Copybook Input** step reads binary data files that are mapped by a fixed-length COBOL copybook definition file. COBOL definition and binary files are used in IT scenarios that include data stored on mainframes. You can extract the binary data files and the definition files from the mainframe for data transformation and analysis, and avoid using mainframe cycles for complex data analysis tasks.

{% hint style="info" %}
The Copybook Input step performs self-contained extraction of the data in binary format.

If you only need to perform ETL metadata injection, use the [Read metadata from Copybook](/pdia-data-integration/pdi-transformation-steps-reference-overview/read-metadata-from-copybook.md) step. You are not required to use both copybook steps in the same transformation.

For more information, see [Copybooks in PDI](/pdia-data-integration/extracting-data-into-pdi/copybook-steps-in-pdi-cp.md).
{% endhint %}

### Before you begin

Review these prerequisites before using the Copybook Input step.

* For PDI to process binary data files, you must first download both the copybook definition file and the binary data files from the mainframe environment. For example, you can use FTP or an SFTP server to download the files to a staging area accessible from PDI. You can also use an [SFTP VFS](/pdia-data-integration/archived-merged-pages/transforming-data-with-pdi-archive/pdi-run-modifiers/parameters/vfs-properties.md) path to connect to and read data directly from the mainframe at runtime.
* The binary data file must remain in binary format when used as input to this step. If you are using FTP to download the files, ensure that the data file is not converted to ASCII.
* This step works with fixed-length COBOL records only. Variable record types such as `VB`, `VBS`, and `OCCURS DEPENDING ON` are not supported.
* Your mainframe administrator can provide more details about the environment-specific copybook file definitions and structures this step requires for reading binary data.

### Step name

* **Step name**: Specifies the unique name of the Copybook Input step on the canvas. You can customize the name or leave it as the default.

### Options

The Copybook Input step includes settings for locating the files to read, a table to define data fields on the PDI output stream, and additional options.

#### Input tab

![Input tab, Copybook Input step](/files/R6pT8Y7bNhgGJvv43bTJ)

The **Input** tab has the following sections.

**Source**

These options specify the location of the binary data.

| Option                                    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Predefined file**                       | Select this option to specify a path to a binary data file that contains the data you want to read into the PDI stream. You can type any VFS path directly into the **File** field (including variables), or you can select **Browse** to locate the binary data file.                                                                                                                                                                                    |
| **File defined in a field**               | Select this option to read the names of the binary files from a field name in the previous step. Select the name of the field from the list.                                                                                                                                                                                                                                                                                                              |
| **Data already loaded in a binary field** | Select this option if the binary data is passed into the step from a binary field on the PDI stream. Select the step generating the binary field from the list. You can use this option to prepare the output of records by another Copybook Input step. Using this method, you can selectively process fields and avoid conversion errors in definition files that include **REDEFINES**. See **Store record as a binary field** in the **Options** tab. |

**Schema**

These options define the location of the copybook definition file and include mapping options for the binary data files.

* **COBOL Copybook file path**: Specify the file path to the copybook definition file. You can enter any VFS or SFTP file path or select **Browse** to open the system file browser. After selecting a file, select **Validate** to verify that the definition file can be accessed and parsed.
* **COBOL Copybook line structure**: Specify the line structure of the definition file.
  * **Standard columns (6 to 72)**: Select this option when the definition file contains line numbers. The first 6 columns of text from each line are ignored. Any data beyond column 72 is ignored.
  * **Full line**: Select this option when the definition file does not contain line numbers.

**Binary format**

Use these options to describe the binary format of the selected file.

<table data-header-hidden><thead><tr><th></th><th></th></tr></thead><tbody><tr><td>Option</td><td>Description</td></tr><tr><td><strong>Source architecture</strong></td><td><p>Select the machine architecture of the binary data source files:</p><ul><li><strong>Big endian (mainframe)</strong>: The most significant byte first and the least significant byte last.</li><li><strong>Little endian</strong>: The least significant byte first and the most significant byte last.</li></ul></td></tr><tr><td><strong>Source charset name</strong></td><td><p>Select the character encoding set of the binary data file. Mainframe EBCDIC is typically encoded using IBM037 or cp1047 character sets.</p><p>For more information about character sets and their aliases, see <a href="https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html">Supported Encodings</a> in the Oracle® documentation.</p></td></tr><tr><td><strong>Packed decimal (COMP-3) sign convention</strong></td><td><p>Select how COMP-3 packed decimals are parsed from the binary data as it relates to sign convention. For a given field, if validation occurs and fails, a conversion error occurs at runtime. See <a href="#use-error-handling">Use error handling</a> for details.</p><ul><li><strong>Strict</strong>: Must follow the IBM S370FPD specification to avoid validation errors. Validation is performed to verify that all nibbles (half-bytes), except the sign nibble, are decimal digits (0-9). This is the default value.</li></ul><pre><code>- For signed packed decimals, the sign nibble must be C (positive) or D (negative).
- For unsigned packed decimals, the sign nibble must be F.
</code></pre><ul><li><strong>Lenient</strong>: Validation is performed to verify that all nibbles contain decimal digits and the sign nibble contains a hexadecimal value of A-F. The sign nibble is only used to interpret a negative number if the value is D.</li><li><strong>Lenient - unchecked</strong>: No validation is performed on the source bytes. The sign nibble may contain any hexadecimal value 0-F, and the last nibble is not included in the result. The sign nibble is only used to interpret a negative number if the value is D.</li></ul></td></tr></tbody></table>

#### Output tab

The table in the **Output** tab provides details of the fields that are read from the binary data and how those fields are placed in the PDI stream as output from the step.

![Output tab, Copybook Input step](/files/1AV25X8wOiYwVWA0lt1q)

You can populate the table using either the **Get Fields** or **Get Fields with Parent Groups** commands. These commands extract directly from the copybook definition file selected in the **Input** tab. Use the **Get Fields with Parent Groups** command if you want to include the higher-level organizational data from the copybook definition file that omits the PICTURE clauses.

To pass the input fields to the PDI output stream, select the **Pass through input fields** check box. Clear this check box to omit the input fields from the PDI output stream.

**Note:** This option is only available if you chose either **File defined in a field** or **Data already loaded in a binary field** in the **Source** section of the **Input** tab.

The output table contains the following columns.

* **Name**: The name of the field in the PDI output stream. You can revise or update this field name as necessary.
* **Path**: The fully qualified path to the binary data column in the copybook definition file.

  **Note:** This field cannot be edited. It is controlled by the copybook definition file.
* **Dest Type**: The PDI data type of the column mapped from the column definition.

  **Note:** This field cannot be edited. It is controlled by the copybook definition file.

  Data type mapping:

  * X becomes **String**
  * COMP, COMP-3, and DISPLAY become **BigNumber**
  * COMP-1 and COMP-2 become **Number**
  * Parent groups become **Binary**
* **Conversion**: The conversion type to apply.
  * **Dest Type** (default): The fields are cast to the PDI type in the **Dest Type** column.
  * **String**: The column is converted to the **Dest Type**, then cast into a string.
  * **Hex String**: The bytes of the underlying data are converted to a hex string. The output field is a string data type.
  * **Binary**: The bytes of the underlying data are not converted, but are instead extracted and placed on the stream as a binary type.

#### Options tab

![Options tab, Copybook Input step](/files/QWSYemr8sGYJK1asSNlp)

Use this tab to define PDI output stream options.

**Output record information**

Use this section to specify details about the records for output.

* **Store record as a binary field**: Specify an additional output field to contain the binary bytes that make up the record currently being processed. You can use the stored binary field as the input for the data fields downstream from the Copybook Input step.
* **Create field with record number**: Specify an additional output field to contain the record number within the file. For fixed-length record definitions, multiplying this number by the fixed record size yields the offset of the record within the input file. This field will reset to zero when a new data file is read. Also, the counter is specific to a copy of the step, so changes to the [**Change Number of Copies to Start**](/pdia-data-integration/archived-merged-pages/transforming-data-with-pdi-archive/work-with-transformations-cp/use-the-transformation-menu.md) option may cause unexpected results.
* **Create field with record checksum**: Specify an additional output field to contain a hex string representation of the `sha1` checksum of the source record byte data.

  **Note:** This option is useful for debugging conversion errors, but it could be resource intensive.

**Conversion errors**

Use this section to specify how to handle errors during conversion.

* **Ignore conversion errors**: Select this check box to log multiple conversion error messages (such as malformed records, bad enclosure strings, wrong number of fields, and premature line ends). The errors are logged in JSON object format in a single PDI row. See [Use error handling](#use-error-handling) for details about the format.

  Clear this check box if you want conversion errors in the source binary files to stop the transformation.

### Use error handling

The JSON object is placed in the error description field within the error row. Error handling must be enabled on the step to capture the error columns and descriptions. On the canvas, right-click the Copybook Input step and select **Error Handling** to open the Step error handling settings window and configure the error output column names. See [Use the Transformation menu](/pdia-data-integration/archived-merged-pages/transforming-data-with-pdi-archive/work-with-transformations-cp/use-the-transformation-menu.md) for details.

The following table details the JSON object format for the output error stream.

| Key        | Type    | Example                                  | Description                                                   |
| ---------- | ------- | ---------------------------------------- | ------------------------------------------------------------- |
| record     | Integer | 0                                        | The record number originating from the Copybook Input step.   |
| converter  | String  | BigNumberColumnConverter                 | The converter class that originated the error.                |
| exception  | String  | RecordException                          | The exception class of the error.                             |
| message    | String  | Invalid sign in field: OPEN-YEAR         | The error message text, if it exists.                         |
| fieldName  | String  | OPEN-YEAR                                | The name of field that has the error.                         |
| position   | Integer | 22                                       | The position of the field.                                    |
| length     | Integer | 3                                        | The length of the field.                                      |
| value      | String  | 404040                                   | The value read as a hexadecimal string.                       |
| recordHash | String  | 240c992c3aaebccf6dc0e99a4ed1a447e4811bed | The record checksum originating from the Copybook Input step. |

### Metadata injection support

All fields of this step support metadata injection. You can use this step with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.

Use the [Read metadata from Copybook](/pdia-data-integration/pdi-transformation-steps-reference-overview/read-metadata-from-copybook.md) step to read copybook definition files and obtain the required mapping information to inject fields. In addition to the **Name**, **Path**, **Dest Type**, and **Conversion**, a decimal precision must be provided.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/copybook-input-pdi-step.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.