Bulk load into Databricks
Use the Bulk load into Databricks job entry to load large amounts of data from files in your cloud accounts into Databricks tables.
This entry uses the Databricks COPY INTO command.
General
Entry name: Specifies the unique name of the Bulk load into Databricks job entry on the canvas. You can customize the name or leave it as the default.
Options
The Bulk load into Databricks entry requires you to specify options and parameters on the Input and Output tabs.
Input tab
The input file must exist in either a Databricks external location or a managed volume.

Source
Specify the path to the input file. This must be the path to a file in a Databricks external location or managed volume.
What file type is your source?
Specify the format of the source file. Supported formats are:
AVRO
BINARYFILE
CSV
JSON
ORC
PARQUET
TEXT
Force
Set to false to skip files that have already been copied into the target table (default). Set to true to copy files again, even if they have already been copied into the table.
Merge schema
Set to false to fail if the schema of the target table does not match the schema of the incoming files (default). Set to true to add new columns to the target table for each column in the source file that does not exist in the target table.
The target column types must still match the source column types, even when Merge schema is selected.
Format Options
Each file format has a number of options that are specific to that format. Use this table to specify the appropriate options for your file format. See Databricks format options.
Note: This entry does not validate that the options entered are appropriate for the selected file format.
Output tab
Use this tab to configure the target table in Databricks.
After you select a connection:
The Catalog list populates.
After you select a catalog, the Schema list populates.
After you select a schema, the Table name list populates.

Database connection
Specify the Databricks database connection to the Databricks account. You can authenticate with either an access token or a username and password. The username must be the email address you use to sign in to Databricks.
Click Edit to revise an existing connection. Click New to add a new connection.
Examples:
jdbc:databricks://<server hostname>:443;HttpPath=<HTTP path>;PWD=<Personal Access Token>
jdbc:databricks://<serverhostname>:443;HttpPath=<HTTP path>
The Custom driver class name is com.databricks.client.jdbc.Driver.
Catalog
Specify a catalog from the list of available catalogs for your Databricks connection.
Schema
Specify the schema of the target table.
Table name
Specify the name of the target table.
Last updated
Was this helpful?

