# Elasticsearch REST bulk insert

This step is available as a separate plugin from the [Pentaho EE Marketplace](https://support.pentaho.com/hc/en-us/categories/200568085-Downloads).

Use the **Elasticsearch REST bulk insert** step if you have records that you want to submit to an Elasticsearch server for indexing. Elastic is a platform of products to search, analyze, and visualize data. The Elastic platform includes Elasticsearch, which is a Lucene-based, multi-tenant-capable, distributed search and analytics engine.

This step sends one or more batches of records to an Elasticsearch server for indexing. Because you can specify the batch size, you can send one, a few, or many records to Elasticsearch.

When record data flows out of the Elasticsearch REST bulk insert step, PDI sends it to Elasticsearch along with your index as metadata. This step is commonly used when you want to send a batch of data to an Elasticsearch server and create new indexes. You can also use this step to add a batch of data to an existing index.

For more information about Elasticsearch, see:

* [Elasticsearch reference](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html)
* [Bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)

### Before you begin

Gather the following items:

* The Elasticsearch REST bulk insert plugin. For installation details, see [Install plugins](https://docs.pentaho.com/pdia-data-integration/redirects/install-plugins-in-pdi).
* A working server with Elasticsearch version 7.x or 8.x installed, or a SaaS offering for your Elasticsearch server. You should be able to connect to Elasticsearch from the computer running PDI.

  <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>As a best practice, use compatibility mode when connecting to Elasticsearch 8.x with older clients. For details, see <a href="https://www.elastic.co/guide/en/elasticsearch/client/net-api/7.17/connecting-to-elasticsearch-v8.html">Connecting to Elasticsearch v8.x using the v7.17.x client</a>.</p></div>
* Privileges to create, insert, and update on the directories that you need to access on the Elasticsearch server.
* Files or data that you want Elasticsearch to index.

### Step name

* **Step name**: Specify the unique name of the Elasticsearch REST bulk insert step on the canvas. You can customize the name or leave it as the default.

### Options

The Elasticsearch REST bulk insert step includes three tabs: **General**, **Document**, and **Output**.

#### General tab

![Elasticsearch REST bulk insert step](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-2bab1adcb5a43c2011c42ad5481a523ee8ebfc4f%2FPDI_Elasticsearch_REST_bulk_insert.png?alt=media)

Use the **General** tab to configure connections to your Elastic nodes and set options for the destination index.

**Connection**

Specify the connection options for each server in the **Servers** table.

| Column      | Description                                                                                                                                     |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| **#**       | Number of the entry.                                                                                                                            |
| **Address** | Hostname (optionally specified with a variable) of the node you want to connect to.                                                             |
| **Port**    | Port (optionally specified with a variable) of the Elastic REST interface.                                                                      |
| **Scheme**  | Scheme or protocol (optionally specified with a variable) to use for REST communication. Typically `http` or `https` for secured Elastic nodes. |

**Authentication**

Use the **Authentication** tab to set user verification options.

| Field              | Description                                  |
| ------------------ | -------------------------------------------- |
| **Authentication** | Authentication method for the Elastic nodes: |

* **None**: Connect without authentication.
* **Basic**: Provide **Username** and **Password** to use basic authentication. | | **Test** | Test the connection and authentication settings. |

**Index**

Use the **Index** options to name and test the output Elastic index.

| Field      | Description                                                                                                                                                                            |
| ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Index**  | Name of the target index for documents submitted by bulk insert requests. You can specify this value as a variable. If the index does not exist in Elasticsearch, the step creates it. |
| **Test**   | Test connectivity to the output index.                                                                                                                                                 |
| **Create** | Create the index if it does not exist.                                                                                                                                                 |

#### Document tab

Use the **Document** tab to specify the documents to index in bulk insert requests. You can either create a document to index from stream fields or use an existing JSON document from a field.

**Create a document to index with stream field data**

![Elasticsearch REST Bulk Insert step, Document tab - Create index option](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-4bbae0fec862112cbf42f63e93e840a0b58f9212%2FPDI_ElasticsearchRESTbulkInsert_DocumentTab.png?alt=media)

Use **Create a document to index with stream field data** to turn each row of stream data into a unique JSON document to be indexed in the bulk request.

Define the fields to use from the input stream with a target name. Select **Get Fields** to automatically populate the list with all incoming stream fields.

| Field           | Description                                                          |
| --------------- | -------------------------------------------------------------------- |
| **Name**        | Name of the source field that the step receives on the input stream. |
| **Target name** | Name of the destination field in the generated JSON document.        |

**Use an existing JSON document from a field**

![Elasticsearch REST Bulk Insert step, Document tab - Use existing option](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-17836fb03bb6090037afffca9c102e733fe95926%2FPDI_ElasticsearchRESTbulkInsert_DocumentTab_JSONfield.png?alt=media)

Use **Use an existing JSON document from a field** if the document you want to index is already available as JSON in a field on the input stream.

| Field          | Description                                                                                   |
| -------------- | --------------------------------------------------------------------------------------------- |
| **JSON Field** | Name of the incoming field that contains a JSON document to be indexed for each row of input. |

#### Output tab

![Elasticsearch REST Bulk Insert step, Output tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-a8ab5ded1552c5146940de20e024ddc56dc35866%2FPDI_ElasticsearchRESTbulkInsert_OutputTab.png?alt=media)

Use the **Output** tab to configure step output and error handling.

**Index settings**

| Field                   | Description                                                                                                                                                         |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **ID Field**            | (Optional) Value that identifies the document indexed in Elasticsearch. If you do not specify a value, Elasticsearch generates an ID automatically.                 |
| **Overwrite if exists** | If selected and **ID Field** is specified, updates a document if the ID exists in the target index. If the ID does not exist, a new document is added to the index. |

**Step settings**

| Field               | Description                                                                                                                                                                                                                                                                                 |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Stop on error**   | Stop processing if there is an error, such as a problem adding the document or pushing the batch to the index, or if the JSON is not well-formed. If this option is cleared and an error occurs, the row is not processed, but the transformation continues so other rows can be processed. |
| **Output rows**     | Pass through the input row data, and optionally output a new document index ID if **ID Output Field** is specified.                                                                                                                                                                         |
| **ID Output Field** | (Optional) Name of the ID field to output newly indexed document IDs. If you leave this blank, the value in **ID Field** is used.                                                                                                                                                           |

**Batch settings**

| Field       | Description                                                                                                                                     |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| **Size**    | Number of items in a batch. Specify a size greater than 1 to perform a bulk insert. A size of 1 does not perform a bulk insert.                 |
| **Timeout** | Value and unit of measure for the maximum amount of time the bulk request can take to process on the Elastic server before the batch times out. |
