Bulk load into Snowflake
The Bulk load into Snowflake job entry loads large amounts of data into a Snowflake virtual warehouse in a single session.
This entry automates Snowflake’s COPY INTO command, which helps you avoid repetitive SQL scripting. To use it, you typically:
Size your virtual warehouse.
Define the source data and file type.
Specify the target database objects.
Provide any needed
COPY INTOparameters.
For more information about working with Snowflake in PDI, see PDI and Snowflake.
For more information about Snowflake, including best practices for bulk loading data, see the Snowflake documentation.
Before you begin
To use this job entry, you need:
A Snowflake account.
A connection to the database where you want to load data.
An S3 connection or a VFS connection to a Snowflake staging area for your data source.
A table and schema in the target database. The first time you use this entry, you might need to create the table.
General
Entry name: Specify the unique name of the job entry on the canvas. You can customize the name or leave it as the default.
Options
This job entry includes four tabs: Input, Output, Options, and Advanced Options.
Input tab

Use the options in this tab to define your input source for the Snowflake COPY INTO command:
Source
Choose one of the following input source types:
S3: The input source is an S3 bucket.
Snowflake Staging Area: The input source is files in a Snowflake staging area.
Click Select to specify the file, folder, prefix, or variable of the S3 bucket or staging location to use as the input for COPY INTO. See “Syntax” in the Snowflake documentation for details.
What file type is your source?
Select the file type of the input source:
Delimited text: Character-delimited UTF-8 text.
Avro: Avro data serialization protocol.
JSON: JavaScript Object Notation (JSON) data file containing a set of either objects or arrays.
ORC: Optimized Row Columnar (ORC) file containing Hive data.
Parquet: Parquet file of nested data structures in a flat columnar format.
XML: XML file.
Compression
Select the type of compression applied to your input source:
None
Auto
BZIP2
GZIP
Deflate
Raw deflate
Brotli
Zstd
For Parquet files, the Compression options are:
None
Auto
Snappy
Depending on the file type selected in What file type is your source?, additional file settings appear at the bottom of this tab:
File Type
File Settings
Delimited text
Specify the following settings for a delimited text file:
Leading rows to skip: The number of rows to skip from the beginning of the file. This option is useful for skipping header lines.
Delimiter: The character used to separate a data field. Default is semicolon (`;`).
Quote character: The character used to enclose a data field. Default is the double-quotation mark.
Remove quotes: Whether to remove quotation characters from a data field during the bulk load.
Empty as null: Whether to set empty data values to null during the bulk load.
Trim whitespace: Whether to remove leading and trailing whitespace during the bulk load.
Note: For delimited text files, you must have a table in your database with all required columns defined.
Avro
No additional settings.
JSON
Ignore UTF8 errors: Whether to ignore UTF-8 errors during the bulk load.
Allow duplicate elements: Whether to allow duplicate elements during the bulk load. Snowflake keeps the last duplicate value and discards the others.
Strip null values: Whether to delete null values during the bulk load.
Parse octal numbers: Whether to parse octal numbers during the bulk load.
ORC
Additional file settings for ORC files.
Parquet
Additional file settings for Parquet files.
XML
Ignore UTF8 errors: Replace invalid UTF-8 sequences with the Unicode character U+FFFD. If cleared (default), invalid UTF-8 sequences produce an encoding error.
Preserve space: Whether to preserve leading and trailing spaces in element content.
Strip outer element: Whether to remove the outer XML element and expose the second-level elements as separate documents.
Enable Snowflake data: Whether to enable recognition of Snowflake semi-structured data tags.
Auto convert: Whether to convert numeric and Boolean values from text to native representation.
If you load semi-structured data, your target table must include a VARIANT column for the following file types:
JSON
ORC
Parquet
XML
Output tab

Database connection
Select a Snowflake connection. If you do not have an existing connection, click New. To modify a connection, click Edit.
Note: If timeout errors occur, see Snowflake timeout errors.
Schema
Select the schema to use for the bulk load. The job entry reads schemas from the database to populate this list.
Table name
Select the table to bulk load. The job entry reads tables from the selected schema to populate this list. Note: You can only load one table at a time. For multiple tables, use multiple entries or use variables to parameterize the schema and table names.
Columns
Preview the column names and associated data types for the selected table.
You can use variables in the Schema and Table name fields.
Options tab

Use the options in this tab to define how data is loaded using the Snowflake COPY INTO command:
Truncate table
Select to remove all data in the table before bulk loading the current data. Clear to append data during a new data load (default).
On error
Select what to do if an error occurs during the bulk load:
Abort: Stop loading data.
Skip file: Skip loading the file that caused the error and continue processing.
Skip file number: Skip the file after a specified number of errors.
Skip file percent: Skip the file after a specified percentage of errors (based on record count).
Continue: Ignore errors and continue processing.
Skip file after (x) errors
The number of errors allowed before the file is skipped. This field appears when Skip file number is selected.
Skip file after (%) errors
The percentage of errors allowed before the file is skipped. This field appears when Skip file percent is selected.
Advanced Options tab

Use this tab to configure additional COPY INTO parameters.
Any Name/Value pair that you add is passed to Snowflake as a parameter. You are responsible for validating parameter names and values.
The Force parameter is provided as an example.
Force
Specify whether to force loading files into a table:
True: Load data even if the file has already been loaded. This option can duplicate data.
False: Ignore files that have already been loaded (default).
Last updated
Was this helpful?

