# Input tab

![Input tab](https://3411831820-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FAYwCj9fPr1B2pjC11IOQ%2Fuploads%2Fgit-blob-0253dc5bb88d7cb4dd33bec94215b36e3ecc3d77%2FPDI_JobEntry_Snowflake_BulkLoader_Input_tab.png?alt=media)

Use the options in this tab to define your input source for the Snowflake COPY INTO command:

| Option                             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Source**                         | <p>Choose from one of the following input source types:- <strong>S3</strong></p><p>The input source is an S3 bucket.</p><ul><li><strong>Snowflake Staging Area</strong></li></ul><p>The input source is files on a Snowflake staging area.</p><p>Click <strong>Select</strong> to specify the file, folder, prefix, or variable of the S3 bucket or staging location to use as the input for the Snowflake COPY INTO command. See "Syntax" in the <a href="https://docs.snowflake.net/manuals/index.html">Snowflake documentation</a> for more details specifying this option.</p>                                                                                                                                                                                                                                                                                                                                                                                              |
| **What file type is your source?** | <p>Select the file type of the input source. You can select one of the following types:- <strong>Delimited text</strong></p><p>The input source is character-delimited UTF-8 text.</p><ul><li><strong>Avro</strong></li></ul><p>The input source is an Avro data serialization protocol.</p><ul><li><strong>JSON</strong></li></ul><p>The input source is a JavaScript Object Notation (JSON) data file containing a set of either objects or arrays.</p><ul><li><strong>ORC</strong></li></ul><p>The input source is an Optimized Row Columnar (ORC) file containing Hive data. See the <strong>Administer Pentaho Data Integration and Analytics</strong> document for further configuration information when using Hive with Spark on AEL.</p><ul><li><strong>Parquet</strong></li></ul><p>The input source is a Parquet file of nested data structures in a flat columnar format.</p><ul><li><strong>XML</strong></li></ul><p>The input source is a file in XML format.</p> |
| **Compression**                    | <p>Select the type of compression applied to your input source:- <strong>None</strong></p><ul><li><strong>Auto</strong></li><li><strong>BZIP2</strong></li><li><strong>GZIP</strong></li><li><strong>Deflate</strong></li><li><strong>Raw deflate</strong></li><li><strong>Brotli</strong></li><li><strong>Zstd</strong></li></ul><p>For Parquet files, the <strong>Compression</strong> options are:- <strong>None</strong></p><ul><li><strong>Auto</strong></li><li><strong>Snappy</strong></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |

Depending on what file type you selected for the \*\*What file type is your source\*\* option, the following file settings appear at the bottom of this tab:

<table data-header-hidden><thead><tr><th></th><th></th></tr></thead><tbody><tr><td>File Type</td><td>File Settings</td></tr><tr><td><strong>Delimited text</strong></td><td><p>Specify the following settings for a delimited text file:- <strong>Leading rows to skip</strong></p><p>Specify the number of rows to use as an offset from the beginning of the file. This option is useful to skip header lines.</p><ul><li><strong>Delimiter</strong></li></ul><p>Specify the character used to separate a data field. Default value is semicolon (;).</p><ul><li><strong>Quote character</strong></li></ul><p>Specify the character used to enclose a data field. Default value is double-quotation mark (″).</p><ul><li><strong>Remove quotes</strong></li></ul><p>Select one of the following values to indicate whether quotation characters should be removed from a data field during the bulk load:</p><pre><code>-   **Yes**: Remove the quotation characters.
-   **No**: Retain the quotation characters.
</code></pre><ul><li><strong>Empty as null</strong></li></ul><p>Select one of the following values to indicate whether empty data values should be set to null during the bulk load:</p><pre><code>-   **Yes**: sets empty data values to null.
-   **No**: leaves data values as empty.
</code></pre><ul><li><strong>Trim whitespace</strong></li></ul><p>Select one of the following values to remove trailing and leading whitespace from the data during the bulk load:</p><pre><code>-   **Yes**: Remove the whitespace.
-   **No**: Retain the whitespace.
</code></pre><p><strong>Note:</strong> For delimited text files, you must have a table in your database with all the columns you need defined.</p></td></tr><tr><td><strong>Avro</strong></td><td>No additional settings.</td></tr><tr><td><strong>JSON</strong></td><td><ul><li><strong>Ignore UTF8 errors</strong></li></ul><p>Select one of the following values to ignore UTF8 errors in the data during the bulk load:</p><pre><code>-   **Yes**: Ignore UTF8 errors.
-   **No**: Do not ignore UTF8 errors.
</code></pre><ul><li><strong>Allow duplicate elements</strong></li></ul><p>Select one of the following values to allow duplicate elements in the data during the bulk load:</p><pre><code>-   **Yes**: Allow duplicate elements.
-   **No**: Do not allow duplicate elements.
</code></pre><p><strong>Note:</strong> Snowflake only uses the last duplicate value and discards the others.</p><ul><li><strong>Strip null values</strong></li></ul><p>NULL values are stored as null in JSON files. Select one of the following values to indicate whether to delete NULL values from the data during the bulk load:</p><pre><code>-   **Yes**: Strip the NULL values.
-   **No**: Store the NULL values in a variant column.
</code></pre><ul><li><strong>Parse octal numbers</strong></li></ul><p>Select one of the following values to indicate whether to parse octal numbers during the bulk load:</p><pre><code>-   **Yes**: Parse octal numbers.
-   **No**: Do not parse octal numbers.
</code></pre></td></tr><tr><td><strong>ORC</strong></td><td>Additional file settings for ORC files.</td></tr><tr><td><strong>Parquet</strong></td><td>Additional file settings for Parquet files.</td></tr><tr><td><strong>XML</strong></td><td><ul><li><strong>Ignore UTF8 errors</strong></li></ul><p>Select one of the following values to indicate whether to replace UTF-8 encoding errors during the bulk load:</p><pre><code>-   **Yes**: Replace invalid UTF-8 sequences with Unicode character U+FFFD.
-   **No**: Invalid UTF-8 sequences produce an encoding error \(default\).
</code></pre><ul><li><strong>Preserve space</strong></li></ul><p>Select one of the following values to indicate whether to preserve leading and trailing spaces in element content during the bulk load:</p><pre><code>-   **Yes**: Preserve spaces.
-   **No**: Do not preserve spaces \(default\).
</code></pre><ul><li><strong>Strip outer element</strong></li></ul><p>Select one of the following values to indicate whether to remove the outer XML element, and expose the second level elements as separate documents during the bulk load:</p><pre><code>-   **Yes**: Remove the outer XML element.
-   **No**: Do not remove the outer XML element \(default\).
</code></pre><ul><li><strong>Enable Snowflake data</strong></li></ul><p>Select one of the following values to indicate whether to enable recognition of Snowflake semi-structured data tags from the data during the bulk load:</p><pre><code>-   **Yes**: Enable recognition of Snowflake tags \(default\).
-   **No**: Disable recognition of Snowflake tags.
</code></pre><ul><li><strong>Auto convert</strong></li></ul><p>Select one of the following values to indicate whether to convert numeric and Boolean values from text to native representation during the bulk load:</p><pre><code>-   **Yes**: Convert numeric and Boolean values \(default\).
-   **No**: Do not convert numeric and Boolean values.
</code></pre></td></tr></tbody></table>

\*\*Note:\*\* If you have unstructured data, you must have a variant column in your database table to store the data for the following file types:

* JSON
* ORC
* Parquet
* XML
