> For the complete documentation index, see [llms.txt](https://docs.pentaho.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/bulk-load-into-snowflake/options-snowflake-bulk-loader/input-tab.md).

# Input tab

![Input tab](/files/yVg1vwFOXAJRK8RnKbFa)

Use the options in this tab to define your input source for the Snowflake COPY INTO command:

| Option                             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Source**                         | <p>Choose from one of the following input source types:- <strong>S3</strong></p><p>The input source is an S3 bucket.</p><ul><li><strong>Snowflake Staging Area</strong></li></ul><p>The input source is files on a Snowflake staging area.</p><p>Click <strong>Select</strong> to specify the file, folder, prefix, or variable of the S3 bucket or staging location to use as the input for the Snowflake COPY INTO command. See "Syntax" in the <a href="https://docs.snowflake.net/manuals/index.html">Snowflake documentation</a> for more details specifying this option.</p>                                                                                                                                                                                                                                 |
| **What file type is your source?** | <p>Select the file type of the input source. You can select one of the following types:- <strong>Delimited text</strong></p><p>The input source is character-delimited UTF-8 text.</p><ul><li><strong>Avro</strong></li></ul><p>The input source is an Avro data serialization protocol.</p><ul><li><strong>JSON</strong></li></ul><p>The input source is a JavaScript Object Notation (JSON) data file containing a set of either objects or arrays.</p><ul><li><strong>ORC</strong></li></ul><p>The input source is an Optimized Row Columnar (ORC) file containing Hive data.</p><ul><li><strong>Parquet</strong></li></ul><p>The input source is a Parquet file of nested data structures in a flat columnar format.</p><ul><li><strong>XML</strong></li></ul><p>The input source is a file in XML format.</p> |
| **Compression**                    | <p>Select the type of compression applied to your input source:- <strong>None</strong></p><ul><li><strong>Auto</strong></li><li><strong>BZIP2</strong></li><li><strong>GZIP</strong></li><li><strong>Deflate</strong></li><li><strong>Raw deflate</strong></li><li><strong>Brotli</strong></li><li><strong>Zstd</strong></li></ul><p>For Parquet files, the <strong>Compression</strong> options are:- <strong>None</strong></p><ul><li><strong>Auto</strong></li><li><strong>Snappy</strong></li></ul>                                                                                                                                                                                                                                                                                                            |

Depending on what file type you selected for the \*\*What file type is your source\*\* option, the following file settings appear at the bottom of this tab:

<table data-header-hidden><thead><tr><th></th><th></th></tr></thead><tbody><tr><td>File Type</td><td>File Settings</td></tr><tr><td><strong>Delimited text</strong></td><td><p>Specify the following settings for a delimited text file:- <strong>Leading rows to skip</strong></p><p>Specify the number of rows to use as an offset from the beginning of the file. This option is useful to skip header lines.</p><ul><li><strong>Delimiter</strong></li></ul><p>Specify the character used to separate a data field. Default value is semicolon (;).</p><ul><li><strong>Quote character</strong></li></ul><p>Specify the character used to enclose a data field. Default value is double-quotation mark (″).</p><ul><li><strong>Remove quotes</strong></li></ul><p>Select one of the following values to indicate whether quotation characters should be removed from a data field during the bulk load:</p><pre><code>-   **Yes**: Remove the quotation characters.
-   **No**: Retain the quotation characters.
</code></pre><ul><li><strong>Empty as null</strong></li></ul><p>Select one of the following values to indicate whether empty data values should be set to null during the bulk load:</p><pre><code>-   **Yes**: sets empty data values to null.
-   **No**: leaves data values as empty.
</code></pre><ul><li><strong>Trim whitespace</strong></li></ul><p>Select one of the following values to remove trailing and leading whitespace from the data during the bulk load:</p><pre><code>-   **Yes**: Remove the whitespace.
-   **No**: Retain the whitespace.
</code></pre><p><strong>Note:</strong> For delimited text files, you must have a table in your database with all the columns you need defined.</p></td></tr><tr><td><strong>Avro</strong></td><td>No additional settings.</td></tr><tr><td><strong>JSON</strong></td><td><ul><li><strong>Ignore UTF8 errors</strong></li></ul><p>Select one of the following values to ignore UTF8 errors in the data during the bulk load:</p><pre><code>-   **Yes**: Ignore UTF8 errors.
-   **No**: Do not ignore UTF8 errors.
</code></pre><ul><li><strong>Allow duplicate elements</strong></li></ul><p>Select one of the following values to allow duplicate elements in the data during the bulk load:</p><pre><code>-   **Yes**: Allow duplicate elements.
-   **No**: Do not allow duplicate elements.
</code></pre><p><strong>Note:</strong> Snowflake only uses the last duplicate value and discards the others.</p><ul><li><strong>Strip null values</strong></li></ul><p>NULL values are stored as null in JSON files. Select one of the following values to indicate whether to delete NULL values from the data during the bulk load:</p><pre><code>-   **Yes**: Strip the NULL values.
-   **No**: Store the NULL values in a variant column.
</code></pre><ul><li><strong>Parse octal numbers</strong></li></ul><p>Select one of the following values to indicate whether to parse octal numbers during the bulk load:</p><pre><code>-   **Yes**: Parse octal numbers.
-   **No**: Do not parse octal numbers.
</code></pre></td></tr><tr><td><strong>ORC</strong></td><td>Additional file settings for ORC files.</td></tr><tr><td><strong>Parquet</strong></td><td>Additional file settings for Parquet files.</td></tr><tr><td><strong>XML</strong></td><td><ul><li><strong>Ignore UTF8 errors</strong></li></ul><p>Select one of the following values to indicate whether to replace UTF-8 encoding errors during the bulk load:</p><pre><code>-   **Yes**: Replace invalid UTF-8 sequences with Unicode character U+FFFD.
-   **No**: Invalid UTF-8 sequences produce an encoding error \(default\).
</code></pre><ul><li><strong>Preserve space</strong></li></ul><p>Select one of the following values to indicate whether to preserve leading and trailing spaces in element content during the bulk load:</p><pre><code>-   **Yes**: Preserve spaces.
-   **No**: Do not preserve spaces \(default\).
</code></pre><ul><li><strong>Strip outer element</strong></li></ul><p>Select one of the following values to indicate whether to remove the outer XML element, and expose the second level elements as separate documents during the bulk load:</p><pre><code>-   **Yes**: Remove the outer XML element.
-   **No**: Do not remove the outer XML element \(default\).
</code></pre><ul><li><strong>Enable Snowflake data</strong></li></ul><p>Select one of the following values to indicate whether to enable recognition of Snowflake semi-structured data tags from the data during the bulk load:</p><pre><code>-   **Yes**: Enable recognition of Snowflake tags \(default\).
-   **No**: Disable recognition of Snowflake tags.
</code></pre><ul><li><strong>Auto convert</strong></li></ul><p>Select one of the following values to indicate whether to convert numeric and Boolean values from text to native representation during the bulk load:</p><pre><code>-   **Yes**: Convert numeric and Boolean values \(default\).
-   **No**: Do not convert numeric and Boolean values.
</code></pre></td></tr></tbody></table>

\*\*Note:\*\* If you have unstructured data, you must have a variant column in your database table to store the data for the following file types:

* JSON
* ORC
* Parquet
* XML


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/10.2-data-integration/pdi-job-entries-reference-overview/bulk-load-into-snowflake/options-snowflake-bulk-loader/input-tab.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
