# Understanding PDI data types and field metadata

This section is for users who want to maximize the efficiency of their transformation and job results.

As a best practice for producing consistent, predictable outcomes when working with your data in PDI, you must consider how the Pentaho engine processes different data types and field metadata in transformations and jobs. For example, steps like [Avro Input](/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-input.md) and [Text File Input](/pdia-data-integration/pdi-transformation-steps-reference-overview/text-file-input-cp.md), require additional considerations to best meet your working requirements for specific data types, mathematical operations, number conversions, and formatting.

**Note:** As a rule, data is never modified by metadata inside of PDI. Data is only modified when PDI writes to files or similar objects, but not to databases. Refer to the sections below that apply to your use case.

## Data type mappings

PDI data types map internally to Java data types, so the Java behavior of these data types applies to the associated fields, parameters, and variables used in your transformations and jobs. The following table describes these mappings.

| PDI data type    | Java data type | Description                                                                   |
| ---------------- | -------------- | ----------------------------------------------------------------------------- |
| BigNumber        | BigDecimal     | An arbitrary unlimited precision number.                                      |
| Binary           | Byte\[]        | An array of bytes that contain any type of binary data.                       |
| Boolean          | Boolean        | A boolean value `true` or `false.`                                            |
| Date             | Date           | A date-time value with millisecond precision.                                 |
| Integer          | Long           | A signed long 64-bit integer.                                                 |
| Internet Address | InetAddress    | An Internet Protocol (IP) address.                                            |
| Number           | Double         | A double precision floating point value.                                      |
| String           | String         | A variable unlimited length text encoded in UTF-8 (Unicode).                  |
| Timestamp        | Timestamp      | Allows the specification of fractional seconds to a precision of nanoseconds. |

### Using the correct data type for math operations

Using the correct data type for math operations helps ensure expected results from your transformations and jobs. The Number, BigNumber, and Integer types offer specific solutions for different computing needs. The following table highlights the behaviors and possible uses for each data type. For information about the proper method to round or truncate numbers, see [Applying calculations and rounding](#applying-calculations-and-rounding).

| PDI data type | Description                                                                                                                                                                                                                                                                                             |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Number        | <p>Use for general math with real numbers:</p><ul><li>Decimal precision is not guaranteed.</li><li>Normally precise within 15 to 16 decimal points.</li><li>15.4 may be represented as 15.400000000000000000001111111</li><li>15.498723528092515678989712397 may be 15.498723528092515701252…</li></ul> |
| BigNumber     | <p>Use to get exact results from math of decimal numbers:</p><ul><li>Guarantees precision to about 2 billion decimal places.</li><li>Requires more memory than Integer or Number.</li><li>15.498723528092515678989712397 will always be 15.498723528092515678989712397</li></ul>                        |
| Integer       | <p>Use for math without a fraction or a decimal component:</p><ul><li>Handles minimum and maximum values ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807</li></ul>                                                                                                                 |

## Using the fields table properties

You define properties for the fields to read or write using the fields table. The properties in the fields table determine the field-level processing options for your row data, including the metadata attributes. Some commonly used steps that include a fields table are [Split Fields](https://github.com/pentaho/documentation/blob/main/PDIA/11.0/PDI/Transformation%20steps/PDI%20transformation%20steps%20reference%20\(overview\)/Split%20Fields=GUID-36055D0C-1602-4F21-BEA7-BEEDF325CAAD=2=en=.md), [Select Values](https://github.com/pentaho/documentation/blob/main/PDIA/11.0/PDI/Transformation%20steps/PDI%20transformation%20steps%20reference%20\(overview\)/Select%20Values=GUID-14FD2047-C05A-4E28-A0E9-8C8696823EC4=2=en=.md), [Text File Output](/pdia-data-integration/pdi-transformation-steps-reference-overview/text-file-output-cp.md), and [Concat Fields](http://wiki.pentaho.com/display/EAI/Concat+Fields).

When using the fields table the following definitions and processing rules apply.

**Note:** Depending on the transformation step or job entry, some fields tables may feature only a portion of the columns listed below.

![PDI fields table](/files/x6ULxOQazj6tybbAkr8H)

* **Name**

  The name of the field.
* **Type**

  The type of the field. For example, String, Date, or Number. See [Data type mappings](#data-type-mappings) for more information.
* **Format**

  Defines the format mask to use when converting the value to, or reading the value from, a string. The **Format** drop-down menu offers suggestions, but you can enter your own mask. **Format** is only used when converting a non-string data type to a string data type. **Format** overrides **Length** and **Precision**. See [Applying formatting](#applying-formatting) for formatting details.
* **Length**

  Defines the length to use when converting the value to, or reading the value from, a string. The numbers before the decimal point, or a value that is longer than the maximum length, will not be truncated. **Length**, also called Precision in some databases, is a metadata component. PDI converts to the required metadata type when the data is resulted to a string, not during the transformation (or job) or if resulted to non-string data types. See [Output type examples](#output-type-examples) for a listing of string and non-string types. **Length** is not used when **Format** is specified.
* **Precision**

  Defines the number of digits after the decimal point to use when converting the value to, or reading the value from, a string. The numbers before the decimal point will not be truncated. **Precision**, also called Scale in some databases, is a metadata component. PDI converts to the required metadata type when the data is resulted to a string, not during the transformation (or job) or if resulted to non-string data types. See [Output type examples](#output-type-examples) for a listing of string and non-string types. **Precision** is not used when **Format** is specified.
* **Currency**

  Used in conjunction with **Format** to interpret numbers such as `$10,000.00` or `E5.000,00`. If the format mask contains the Unicode currency symbol ¤ (`\u00A4`), then it replaces the symbol by the value in the currency column. In Pentaho, you must use the copy and paste method to apply this symbol. See [Common Formats](/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats.md) for information on valid number formats.
* **Decimal**

  Represents the character that replaces the period (.) in the format mask. Only applies when converting the value to, or reading the value from, a string.
* **Group**

  Represents the character that replaces the comma (,) in the format mask. Only applies when converting the value to, or reading the value from, a string.
* **Null if**

  Converts the value to null if the input value matches.

  **Note:** This value is case-sensitive.
* **Default**

  Defaults to this value if the value is null.
* **Trim type**

  Defines the type of trimming to perform on the input or the output string. Trimming removes the white space on either side of a string. Options are both, left, right, or none.
* **Repeat**

  Determines how null rows are handled. If the value in this row is null, then the value from the last row where the column was not null is used.

### Applying formatting

Format masks define how data returned for a field is converted to, or from, a string. For example, a field might return the value "`7000`", but you want to display it as "`$7,000.00`". To do this, you apply a format mask to the field. The original data is not truncated when using a format mask.

As shown in the table below, when **Format** is used with **Decimal**, the period (.) in the format mask is replaced with the indicated character. Alternatively, when **Format** is used with **Group**, the comma (,) in the format mask is replaced with the indicated character. See [Common Formats](/pdia-data-integration/pdi-transformation-steps-reference-overview/common-formats.md) for information on valid number formats.

| Input / Output value | **Format** | **Decimal** | **Group** |
| -------------------- | ---------- | ----------- | --------- |
| 10.0                 | #.#        | .           | ,         |
| 1,546.99             | #,###.##   | .           | ,         |
| 1g546d99             | #,###.##   | d           | g         |
| €1.546,99            | €#,###.00  | ,           | .         |
| $1,546.99            | $#,###.00  | .           | ,         |

The following table shows that when \*\*Format\*\*, \*\*Decimal\*\*, \*\*Group\*\*, \*\*Length\*\*, and \*\*Precision\*\* are used together. \*\*Format\*\* always overrides \*\*Length\*\* and \*\*Precision\*\*.

| Input    | **Format** | **Decimal** | **Group** | **Length** | **Precision** | String output | Number output |
| -------- | ---------- | ----------- | --------- | ---------- | ------------- | ------------- | ------------- |
| 10.0     | #.#        | .           | ,         | 5          | 2             | 10.0          | 10.0          |
| 10.0     |            | .           | ,         | 5          | 2             | 010.00        | 10.0          |
| 10.01    |            | .           | ,         | 2          | 1             | 10.0          | 10.01         |
| 1,546.99 | #,###.##   | .           | ,         | 10         | 3             | 1,546.99      | 1546.99       |
| 1,546.99 | 0#,###.000 | .           | ,         |            |               | 01,546.990    | 1546.99       |

### Applying calculations and rounding

Number and date calculations performed in PDI do not apply the **Format**, **Length**, and **Precision** properties. For example, using the table below, A + B + B = `30.1` If you preview B, it will appear as `10.0`, so you would think `10.02 + 10.0 + 10.0` `= 30.02`. However, because B was never converted to a string for the calculation, `10.02 + 10.04 + 10.04` `= 30.1.`

| Field | Input   | **Format** | **Decimal** | **Group** | **Length** | **Precision** |
| ----- | ------- | ---------- | ----------- | --------- | ---------- | ------------- |
| A     | `10.02` | #.0        | .           | ,         | 5          | 1             |
| B     | `10.04` |            | .           | ,         | 5          | 1             |

If you want to truncate a string, use the [Strings cut](/pdia-data-integration/pdi-transformation-steps-reference-overview/strings-cut.md) step.

If you want to round or truncate a number, use the following [Calculator](/pdia-data-integration/pdi-transformation-steps-reference-overview/calculator.md) step features:

* Round function
* Floor and Ceil functions

Alternatively, you can convert the date or number to a string in the [Select Values](/pdia-data-integration/pdi-transformation-steps-reference-overview/select-values.md) step, which applies the formatting specified in the metadata.

## Output type examples

The following table provides examples of the string and non-string output types in PDI. Note that **Format**, **Length**, **Precision**, **Decimal**, and **Group** apply only when reading from, or outputting to, a string.

| String output type example                                             | Non-string output type example                    |
| ---------------------------------------------------------------------- | ------------------------------------------------- |
| Preview                                                                | Table Output when the target field is a number.   |
| Text File Output                                                       | Avro Output when the target field is a number.    |
| JSON Output                                                            | Parquet Output when the target field is a number. |
| XML Output                                                             | ORC Output when the target field is a number.     |
| Table Output when the target field is a varchar.                       | Any binary output type                            |
| Anything displayed by PDI including logs, error messages, and prompts. |                                                   |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/understanding-pdi-data-types-and-field-metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
