# Bulk load into Snowflake

The **Bulk load into Snowflake** job entry loads large amounts of data into a Snowflake virtual warehouse in a single session.

This entry automates Snowflake’s `COPY INTO` command, which helps you avoid repetitive SQL scripting. To use it, you typically:

* Size your virtual warehouse.
* Define the source data and file type.
* Specify the target database objects.
* Provide any needed `COPY INTO` parameters.

For more information about working with Snowflake in PDI, see [PDI and Snowflake](https://docs.pentaho.com/pdia-data-integration/extracting-data-into-pdi/pdi-and-snowflake-cp).

For more information about Snowflake, including best practices for bulk loading data, see the [Snowflake documentation](https://docs.snowflake.com/).

### Before you begin

To use this job entry, you need:

* A [Snowflake](https://www.snowflake.com/) account.
* A connection to the database where you want to load data.
* An S3 connection or a VFS connection to a Snowflake staging area for your data source.
* A table and schema in the target database. The first time you use this entry, you might need to create the table.

### General

* **Entry name**: Specify the unique name of the job entry on the canvas. You can customize the name or leave it as the default.

### Options

This job entry includes four tabs: **Input**, **Output**, **Options**, and **Advanced Options**.

#### Input tab

![Input tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-0253dc5bb88d7cb4dd33bec94215b36e3ecc3d77%2FPDI_JobEntry_Snowflake_BulkLoader_Input_tab.png?alt=media)

Use the options in this tab to define your input source for the Snowflake `COPY INTO` command:

| Option                             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Source**                         | <p>Choose one of the following input source types:</p><ul><li><strong>S3</strong>: The input source is an S3 bucket.</li><li><strong>Snowflake Staging Area</strong>: The input source is files in a Snowflake staging area.</li></ul><p>Click <strong>Select</strong> to specify the file, folder, prefix, or variable of the S3 bucket or staging location to use as the input for <code>COPY INTO</code>. See “Syntax” in the <a href="https://docs.snowflake.com/">Snowflake documentation</a> for details.</p>                                            |
| **What file type is your source?** | <p>Select the file type of the input source:</p><ul><li><strong>Delimited text</strong>: Character-delimited UTF-8 text.</li><li><strong>Avro</strong>: Avro data serialization protocol.</li><li><strong>JSON</strong>: JavaScript Object Notation (JSON) data file containing a set of either objects or arrays.</li><li><strong>ORC</strong>: Optimized Row Columnar (ORC) file containing Hive data.</li><li><strong>Parquet</strong>: Parquet file of nested data structures in a flat columnar format.</li><li><strong>XML</strong>: XML file.</li></ul> |
| **Compression**                    | <p>Select the type of compression applied to your input source:</p><ul><li><strong>None</strong></li><li><strong>Auto</strong></li><li><strong>BZIP2</strong></li><li><strong>GZIP</strong></li><li><strong>Deflate</strong></li><li><strong>Raw deflate</strong></li><li><strong>Brotli</strong></li><li><strong>Zstd</strong></li></ul><p>For Parquet files, the <strong>Compression</strong> options are:</p><ul><li><strong>None</strong></li><li><strong>Auto</strong></li><li><strong>Snappy</strong></li></ul>                                          |

Depending on the file type selected in **What file type is your source?**, additional file settings appear at the bottom of this tab:

| File Type          | File Settings                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Delimited text** | <p>Specify the following settings for a delimited text file:</p><ul><li><strong>Leading rows to skip</strong>: The number of rows to skip from the beginning of the file. This option is useful for skipping header lines.</li><li><strong>Delimiter</strong>: The character used to separate a data field. Default is semicolon (`;`).</li><li><strong>Quote character</strong>: The character used to enclose a data field. Default is the double-quotation mark.</li><li><strong>Remove quotes</strong>: Whether to remove quotation characters from a data field during the bulk load.</li><li><strong>Empty as null</strong>: Whether to set empty data values to null during the bulk load.</li><li><strong>Trim whitespace</strong>: Whether to remove leading and trailing whitespace during the bulk load.</li></ul><p><strong>Note:</strong> For delimited text files, you must have a table in your database with all required columns defined.</p> |
| **Avro**           | No additional settings.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| **JSON**           | <ul><li><strong>Ignore UTF8 errors</strong>: Whether to ignore UTF-8 errors during the bulk load.</li><li><strong>Allow duplicate elements</strong>: Whether to allow duplicate elements during the bulk load. Snowflake keeps the last duplicate value and discards the others.</li><li><strong>Strip null values</strong>: Whether to delete null values during the bulk load.</li><li><strong>Parse octal numbers</strong>: Whether to parse octal numbers during the bulk load.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| **ORC**            | Additional file settings for ORC files.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| **Parquet**        | Additional file settings for Parquet files.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| **XML**            | <ul><li><strong>Ignore UTF8 errors</strong>: Replace invalid UTF-8 sequences with the Unicode character U+FFFD. If cleared (default), invalid UTF-8 sequences produce an encoding error.</li><li><strong>Preserve space</strong>: Whether to preserve leading and trailing spaces in element content.</li><li><strong>Strip outer element</strong>: Whether to remove the outer XML element and expose the second-level elements as separate documents.</li><li><strong>Enable Snowflake data</strong>: Whether to enable recognition of Snowflake semi-structured data tags.</li><li><strong>Auto convert</strong>: Whether to convert numeric and Boolean values from text to native representation.</li></ul>                                                                                                                                                                                                                                               |

{% hint style="info" %}
If you load semi-structured data, your target table must include a **VARIANT** column for the following file types:

* JSON
* ORC
* Parquet
* XML
  {% endhint %}

#### Output tab

![Output tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-89a14d02863f06911695461fc4577a4e2fdee62b%2FPDI_Bulkload_into_Snowflake_dialog.png?alt=media)

| Option                  | Description                                                                                                                                                                                                                                                                                                                                               |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Database connection** | <p>Select a Snowflake connection. If you do not have an existing connection, click <strong>New</strong>. To modify a connection, click <strong>Edit</strong>.</p><p><strong>Note:</strong> If timeout errors occur, see <a href="../data-integration-issues/snowflake-timeout-errors-general-issues-in-troubleshooting">Snowflake timeout errors</a>.</p> |
| **Schema**              | Select the schema to use for the bulk load. The job entry reads schemas from the database to populate this list.                                                                                                                                                                                                                                          |
| **Table name**          | Select the table to bulk load. The job entry reads tables from the selected schema to populate this list. **Note:** You can only load one table at a time. For multiple tables, use multiple entries or use variables to parameterize the schema and table names.                                                                                         |
| **Columns**             | Preview the column names and associated data types for the selected table.                                                                                                                                                                                                                                                                                |

{% hint style="info" %}
You can use variables in the **Schema** and **Table name** fields.
{% endhint %}

#### Options tab

![Options tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-b23c5becd1b3d846ce5e6571b5305b14e4c3dee0%2FPDI_JobEntry_Snowflake_BulkLoader_Options_tab.png?alt=media)

Use the options in this tab to define how data is loaded using the Snowflake `COPY INTO` command:

| Option                         | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Truncate table**             | Select to remove all data in the table before bulk loading the current data. Clear to append data during a new data load (default).                                                                                                                                                                                                                                                                                                                                                                                                     |
| **On error**                   | <p>Select what to do if an error occurs during the bulk load:</p><ul><li><strong>Abort</strong>: Stop loading data.</li><li><strong>Skip file</strong>: Skip loading the file that caused the error and continue processing.</li><li><strong>Skip file number</strong>: Skip the file after a specified number of errors.</li><li><strong>Skip file percent</strong>: Skip the file after a specified percentage of errors (based on record count).</li><li><strong>Continue</strong>: Ignore errors and continue processing.</li></ul> |
| **Skip file after (x) errors** | The number of errors allowed before the file is skipped. This field appears when **Skip file number** is selected.                                                                                                                                                                                                                                                                                                                                                                                                                      |
| **Skip file after (%) errors** | The percentage of errors allowed before the file is skipped. This field appears when **Skip file percent** is selected.                                                                                                                                                                                                                                                                                                                                                                                                                 |

#### Advanced Options tab

![Advanced Options tab](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-2f46e247ac62666e0413d13a9fae3ac8ae1306f0%2FPDI_JobEntry_Snowflake_BulkLoader_Advanced_Options_tab.png?alt=media)

Use this tab to configure additional `COPY INTO` parameters.

Any **Name/Value** pair that you add is passed to Snowflake as a parameter. You are responsible for validating parameter names and values.

The `Force` parameter is provided as an example.

| Parameter | Description                                                                                                                                                                                                                                                                   |
| --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Force** | <p>Specify whether to force loading files into a table:</p><ul><li><strong>True</strong>: Load data even if the file has already been loaded. This option can duplicate data.</li><li><strong>False</strong>: Ignore files that have already been loaded (default).</li></ul> |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/bulk-load-into-snowflake.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
