Bulk load into Azure SQL DB

Use the Bulk load into Azure SQL DB job entry to load CSV data from Azure Data Lake Storage into an Azure SQL Database.

For more information about Azure SQL, see the Azure SQL documentation.

Before you begin

To use this job entry, you need:

An Azure account.
A connection to the Azure SQL database where you want to load data.
A VFS connection to Azure Data Lake Storage.
A table and schema in the target database. The table must include all required columns.

You need the INSERT and ADMINISTER DATABASE BULK OPERATIONS permissions in the Azure SQL database.

General

The following field is available on the General tab:

Entry Name: Specify the unique name of the job entry on the canvas. You can customize the name or leave it as the default.

Options

This job entry includes four tabs. Each tab is described in the following sections.

Input tab

Use the options in this tab to define your input source:

Option

Description

Source

Specify the VFS URL of the data to import. The URL can point to a file or a folder in Azure Data Lake Storage. The URL must begin with pvfs:// and must specify the bucket and object you want to load. See Pentaho address to a VFS connection.

What file type is your source?

Select the file type of the input source. The source must be a CSV file.

Delimited text settings

Specify the following settings for a delimited text file:

Delimiter

Specify the character used to separate a data field. The default is semicolon (;).

First line is column names

Select Yes if the first line in the file contains the column names.

Output tab

Use the options on the Output tab to specify the destination:

Option

Description

Database connection

Select your database connection from a list of existing Azure SQL DB connections.

If you do not have an existing connection, click New. If you need to modify an existing connection, click Edit.

Schema

Select the schema to use for the bulk load. The job entry reads the schemas that exist in the database to populate this list.

Table name

Select the name of the table to bulk load. The job entry reads the table names from the selected schema to populate this list.

Columns

Preview the column names and data types in the selected table.

You can use variables in the Schema and Table name fields.

Options tab

Use the options on the Options tab to define how data is loaded into the destination table:

Option

Description

Truncate table

Select to remove all data in the table before loading the current data. Clear to append data during a new data load (default).

Advanced options tab

Use this tab to configure additional parameters for the bulk load.

Any Name/Value pair that you add is passed to the Azure SQL database as a parameter. You are responsible for validating parameter names and values.

The Force parameter is provided as an example.

Option

Description

Force

Specify whether to force loading files into a database:

True

Loads data to the table even if the data was already loaded from that file before. This option can duplicate data in a table.

False

Ignores staged data files already loaded into the table (default).

PreviousBulk load into Amazon Redshift NextBulk load into Databricks

Last updated 2 months ago

Was this helpful?

hashtagBefore you begin

hashtagGeneral

hashtagOptions

hashtagInput tab

hashtagOutput tab

hashtagOptions tab

hashtagAdvanced options tab

Before you begin

General

Options

Input tab

Output tab

Options tab

Advanced options tab