# HBase Output

Use the **HBase Output** step to write data to an HBase table according to user-defined column metadata.

### Step name

* **Step name**: Specify the unique name of the step on the canvas. You can customize the name or leave the default.

### Options

The HBase Output step includes two tabs:

* **Configure connection**
* **Create/Edit mappings**

#### Configure connection tab

This tab contains HBase connection information.

You can configure a connection in one of two ways:

* Use Hadoop cluster properties.
* Use an `hbase-site.xml` (and optional `hbase-default.xml`) configuration file.

Below the connection details are fields to specify which target HBase table to write to and which mapping to use to encode incoming field values.

![Configure connection tab](/files/pTgfxV3p6YTBvaH9FdiS)

**Connection and write options**

* **Hadoop cluster**: Select an existing Hadoop cluster configuration.
  * Select **Edit** to edit an existing cluster configuration.
  * Select **New** to create a new cluster configuration.
  * For details, see [Connecting to a Hadoop cluster with the PDI client](/pdia-data-integration/extracting-data-into-pdi/connecting-to-a-hadoop-cluster-with-the-pdi-client-article.md).
* **URL to hbase-site.xml**: Address of `hbase-site.xml`.
* **URL to hbase-default.xml**: Address of `hbase-default.xml`.
* **HBase table name**: Target HBase table.
* **Get table names**: Populates the table name list.

  Only mapped table names are retrieved. If you enter `namespace:` in **HBase table name** and then select **Get table names**, only table names in that namespace are shown.

  For namespace details, see [Namespaces](/pdia-data-integration/pdi-transformation-steps-reference-overview/hbase-input-cp-main-page.md#namespaces).
* **Mapping name**: Mapping used to encode and interpret column values.

  Select **Get mappings for the specified table** to populate available mappings.
* **Store mapping info in step meta**: Stores mapping information in step metadata instead of loading it from HBase at runtime.
* **Delete rows by mapping key**: Deletes rows using the row key on the mapped input field.
* **Disable write to WAL**: Disables writing to the Write Ahead Log (WAL).

  The WAL provides a recovery mechanism if a server fails while data is being inserted. Disabling WAL can improve performance.

  This option is not available when **Delete rows by mapping key** is selected.
* **Size of write buffer (bytes)**: Size of the buffer used to transfer data to HBase.

  A larger buffer uses more memory on the client and server but results in fewer remote procedure calls.

  If you leave this field blank, the default in `hbase-default.xml` is used (2 MB / 2097152 bytes).

#### Create/Edit mappings tab

This tab creates or edits a mapping for a given HBase table.

A mapping defines metadata about values stored in the table. Because HBase stores most values as raw bytes, mappings allow PDI to encode values correctly.

Before a value can be written to HBase, you must specify:

* The column family the value belongs to
* The value type
* The key type

The names of fields entering the step must match the **Alias** values in the mapping.

* There can be fewer incoming fields than fields in the mapping.
* If there are more incoming fields than the mapping defines, the step logs an error.
* One incoming field must match the key defined in the mapping.

This tab works similarly to [HBase Input](/pdia-data-integration/pdi-transformation-steps-reference-overview/hbase-input-cp-main-page.md), except that HBase Output can create the target table if it does not already exist.

![Create/Edit mappings tab](/files/0ZF0n9KkUuNayeZ8NO9S)

**Top-level fields**

* **HBase table name**: Select a table name.

  Connection details on the **Configure connection** tab must be complete and valid for this list to populate.
* **Get table names**: Retrieves all table names, including tables without Pentaho mappings.
* **Mapping name**: Existing mappings for the selected table.

  You can define multiple mappings on the same table using different subsets of columns.

**Mapping fields table**

Columns:

* **#**: Order of the mapping operation.
* **Alias**: Name you assign to the key (required for key; optional for non-key columns).
* **Key**: Whether the field is the table key.
* **Column family**: Column family for non-key columns.
* **Column name**: Column name.
* **Type**: Data type.

  Key column types:

  * String
  * Integer
  * UnsignedInteger
  * Long
  * UnsignedLong
  * Date
  * UnsignedDate
  * Binary

  Non-key column types:

  * String
  * Integer
  * Long
  * Float
  * Double
  * Boolean
  * Date
  * BigNumber
  * Serializable
  * Binary
* **Indexed values**: Comma-separated values for string columns.

Buttons:

* **Get incoming fields**: Populates the mapping table from the incoming stream fields.
* **Create a tuple template**: Creates a template to write tuples to HBase.
* **Save mapping**: Saves the mapping.
* **Delete mapping**: Deletes the mapping (does not delete the HBase table).

**Mapping notes**

A valid mapping must define metadata for the table key. The key must have an **Alias** because HBase does not provide a key name.

For keys to sort properly in HBase, note the distinction between signed and unsigned numbers.

Because of the way HBase stores integer and long values internally, the sign bit must be flipped before storing signed numbers so that positive numbers sort after negative numbers. Unsigned values can be stored directly.

Additional behavior:

* **String columns** can optionally define legal values by entering comma-separated values in **Indexed values**.
* **Date keys** can be stored as signed or unsigned long types (epoch-based timestamps). If you map a date key as **String**, PDI can change its type to **Date** for manipulation in the transformation.
* **Boolean values** can be stored as 0/1 integer/long or as strings (`Y/N`, `yes/no`, `true/false`, `T/F`).
* **BigNumber** values can be stored as serialized `BigDecimal` objects or as strings parseable by `BigDecimal`.
* **Serializable** values are serialized Java objects.
* **Binary** values are raw byte arrays.

To speed up mapping creation, select **Get incoming fields**.

* **Alias** and **Column name** are set to each incoming field name.
* Type information is set automatically.
* **Column family** is set to either:
  * The first column family defined (if the table exists)
  * `Family1` (if the table does not exist)

{% hint style="warning" %}
The step does not support adding new column families to an existing table.
{% endhint %}

### Performance considerations

Write buffering and WAL settings can affect performance:

* If you leave **Size of write buffer (bytes)** blank, the buffer is 2 MB (default), auto flush is enabled, and Put operations are executed immediately. This means each row is transmitted to HBase as soon as it reaches the step.
* If you enter a value for **Size of write buffer (bytes)** (even the default value), auto flush is disabled and rows are transferred only when the buffer is full.

Disabling the **Write Ahead Log (WAL)** can improve performance but reduces the ability to recover after server failures.

#### Creating new tables (compression and Bloom filter options)

On the **Create/Edit mappings** tab, you can create a new table by entering a table name that does not already exist.

You can suffix a new table name with options for compression and Bloom filters:

* Compression options: `NONE`, `GZ`, `LZO`
* Bloom filter options: `NONE`, `ROW`, `ROWCOL`

If you do not specify options, the defaults are `NONE` for both compression and Bloom filters.

Example:

```
NewTable@GZ@ROWCOL
```

{% hint style="info" %}
Due to licensing constraints, HBase does not ship with LZO compression libraries. Install them on each node if you want to use LZO compression.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/hbase-output.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
