# Bulk load into Databricks

Use the **Bulk load into Databricks** job entry to load large amounts of data from files in your cloud accounts into Databricks tables.

This entry uses the Databricks [`COPY INTO`](https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html) command.

### General

* **Entry name**: Specifies the unique name of the Bulk load into Databricks job entry on the canvas. You can customize the name or leave it as the default.

### Options

The **Bulk load into Databricks** entry requires you to specify options and parameters on the **Input** and **Output** tabs.

#### Input tab

{% hint style="info" %}
The input file must exist in either a Databricks external location or a managed volume.
{% endhint %}

![PDI Bulk load Databricks Input tab](/files/X4UV7qRIRGFI4iGpfH73)

| Field                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                    |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Source**                         | Specify the path to the input file. This must be the path to a file in a Databricks external location or managed volume.                                                                                                                                                                                                                                                                                                       |
| **What file type is your source?** | <p>Specify the format of the source file. Supported formats are:</p><ul><li>AVRO</li><li>BINARYFILE</li><li>CSV</li><li>JSON</li><li>ORC</li><li>PARQUET</li><li>TEXT</li></ul>                                                                                                                                                                                                                                                |
| **Force**                          | Set to **false** to skip files that have already been copied into the target table (default). Set to **true** to copy files again, even if they have already been copied into the table.                                                                                                                                                                                                                                       |
| **Merge schema**                   | <p>Set to <strong>false</strong> to fail if the schema of the target table does not match the schema of the incoming files (default). Set to <strong>true</strong> to add new columns to the target table for each column in the source file that does not exist in the target table.</p><p>The target column types must still match the source column types, even when <strong>Merge schema</strong> is selected.</p>         |
| **Format Options**                 | <p>Each file format has a number of options that are specific to that format. Use this table to specify the appropriate options for your file format. See Databricks <a href="https://docs.databricks.com/en/sql/language-manual/delta-copy-into.html#format-options">format options</a>.</p><p><strong>Note:</strong> This entry does not validate that the options entered are appropriate for the selected file format.</p> |

#### Output tab

Use this tab to configure the target table in Databricks.

After you select a connection:

* The **Catalog** list populates.
* After you select a catalog, the **Schema** list populates.
* After you select a schema, the **Table name** list populates.

![PDI Bulk load Databricks Output tab](/files/xYUN956xcA6rI6wmfENb)

| Field                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Database connection** | <p>Specify the Databricks database connection to the Databricks account. You can authenticate with either an access token or a username and password. The username must be the email address you use to sign in to Databricks.</p><p>Click <strong>Edit</strong> to revise an existing connection. Click <strong>New</strong> to add a new connection.</p><p>Examples:</p><p><code>jdbc:databricks\://\<server hostname>:443;HttpPath=\<HTTP path>;PWD=\<Personal Access Token></code></p><p><code>jdbc:databricks\://\<serverhostname>:443;HttpPath=\<HTTP path></code></p><p>The <strong>Custom driver class name</strong> is <code>com.databricks.client.jdbc.Driver</code>.</p> |
| **Catalog**             | Specify a catalog from the list of available catalogs for your Databricks connection.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| **Schema**              | Specify the schema of the target table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| **Table name**          | Specify the name of the target table.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/bulk-load-into-databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
