# Connecting to Virtual File Systems

You can connect to most Virtual File Systems (VFS) through VFS connections in PDI. A VFS connection stores VFS properties for a specific file system. You can reuse the connection whenever you [access files or folders](#access-files-with-a-vfs-connection). For example, you can use an HCP connection in HCP steps without re-entering credentials.

With a VFS connection, you can set your VFS properties with a single instance that can be used multiple times. The VFS connection supports the following file systems:

* **Amazon S3/Minio/HCP**
  * Simple Storage Service (S3) accesses the resources on Amazon Web Services. See [Working with AWS Credentials](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) for Amazon S3 setup instructions.

    **Note:** If a connectivity issue occurs with AWS / S3, perform either of the following actions:

    * Set the Environment Variables for `AWS_REGION` or `AWS_DEFAULT_REGION` to the applicable Default Region.
    * Set the correct Default Region in the shared configuration file (`~/.aws/config`) or the credentials file (`~/.aws/credentials`). For example:

      ![AWS sample config file](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-10cb315a3e908644f3a4f14965a8e78ddc67c2b0%2FAWS_S3_sample_code.png?alt=media)

    See <https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html> and <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html> for more information.
  * Minio accesses data objects on an Amazon compatible storage server. See the [Minio Quickstart Guide](https://docs.min.io/docs/) for Minio setup instructions.
  * HCP uses the S3 protocol to access HCP. See [Access to HCP REST](#access-to-hcp-rest) for setup details.
* **Azure Data Lake Gen 1**

  Accesses data objects on Microsoft Azure Gen 1 storage services. You must create an Azure account and configure Azure Data Lake Storage Gen 1. See [Access to Microsoft Azure](#access-to-microsoft-azure).

  **Note:** Support for Azure Data Lake Gen 1 is discontinued and limited to users with existing Gen 1 accounts. As a best practice, use Azure Data Lake Storage Gen 2. See [Azure](https://azure.microsoft.com/en-us/updates/action-required-switch-to-azure-data-lake-storage-gen2-by-29-february-2024/) for details.
* **Azure Data Lake Gen 2/Blob**

  Accesses data objects on Microsoft Azure Gen 2 or Blob storage services. You must create an Azure account and configure Azure Data Lake Storage Gen 2 and Blob Storage. See [Access to Microsoft Azure](#access-to-microsoft-azure).
* **Google Cloud Storage**

  Accesses data in the Google Cloud Storage file system. See [Google Cloud Storage](https://cloud.google.com/storage/docs) for more information on this protocol.
* **HCP REST**

  Accesses data in the Hitachi Content Platform. You must configure HCP and PDI before accessing the platform. See [Access to HCP REST](#access-to-hcp-rest) for more information.
* **Local**

  Accesses data in your local physical file system.
* **SMB/UNC Provider**

  Accesses data in a Windows platform that uses the Server Message Block (SMB) protocol and Universal Naming Convention (UNC) string to specify the resource location path.
* **Snowflake Staging**

  Accesses a staging area used by Snowflake to load files. See [Snowflake staging area](https://docs.snowflake.net/) for more information on this protocol.

After you create a VFS connection, you can use it with PDI steps and entries that support the use of VFS connections. If you are connected to a repository, the VFS connection is saved in the repository. If you are not connected to a repository, the connection is saved locally on the machine where it was created.

If a VFS connection is not available for your file system, you may be able to access it with the [VFS browser](#vfs-browser).

### Before you begin

You may need to set up access for specific providers before you start.

#### Access to Google Cloud

To access Google Cloud from PDI, you must have a Google account and a service account key file in JSON format. You must also set permissions for your Google Cloud accounts. To create service account credentials, see <https://cloud.google.com/storage/docs/authentication>.

Perform the following steps to set up Google Cloud Storage access:

1. Download the service account credentials file from the Google API Console.
2. Create a system environment variable named **GOOGLE\_APPLICATION\_CREDENTIALS**.
3. Set the variable value to the full path of the JSON key file.

You can now access Google Cloud Storage from PDI.

#### Access to HCP REST

Hitachi Content Platform (HCP) is a distributed storage system that you can access through a VFS connection in the PDI client.

Within HCP, access control lists (ACLs) grant privileges for file operations. [Namespaces](https://knowledge.hitachivantara.com/Documents/Storage/Content_Platform/8.1.2/System_administration/Introduction_to_Hitachi_Content_Platform/01_About_Hitachi_Content_Platform/) are used for logical groupings, permissions, and object metadata. For more information, see the [Introduction to Hitachi Content Platform](https://knowledge.hitachivantara.com/Documents/Storage/Content_Platform/8.1.2/Tenants_and_Namespaces/Introduction_to_Hitachi_Content_Platform).

Perform the following steps to set up access to HCP:

{% hint style="info" %}
This process assumes you have tenant permissions and existing namespaces. See [Tenant Management Console](https://knowledge.hitachivantara.com/Documents/Storage/Content_Platform/8.1.2/Tenants_and_Namespaces/General_administrative_information/03_Tenant_Management_Console).
{% endhint %}

{% hint style="info" %}
To create a successful VFS connection to HCP, configure object versioning in your HCP [Namespaces](https://knowledge.hitachivantara.com/Documents/Storage/Content_Platform/8.1.2/Tenants_and_Namespaces/Managing_namespaces).
{% endhint %}

1. Sign in to the HCP Tenant Management Console.
2. Click **Namespaces**, then select the namespace **Name** you want to configure.

   ![HCP Tenant Management Console](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-04ccfa45ac66944d2da343e424e608e7cbf97db3%2FPDI_HCP-DM_Dialog.png?alt=media)
3. On the **Protocols** tab, click **HTTP(S)**.
4. Verify these settings:
   * **Enable HTTPS**
   * **Enable REST API** with **Authenticated access only**
5. On the **Settings** tab, select **ACLs**.
6. Select **Enable ACLs**.
7. When prompted, click **Enable ACLs** to confirm.

HCP is now set up for access from the PDI client.

#### Access to Microsoft Azure

To access Azure services from PDI, create and configure the following:

* Azure Data Lake Gen 1, or
* Azure Data Lake Storage Gen2 and Blob Storage services

Enable the hierarchical namespace to maximize file system performance.

* Access requires an Azure account with an active subscription. See [Create an account for free](https://azure.microsoft.com/en-us/free/?ref=microsoft.com\&utm_source=microsoft.com\&utm_medium=docs\&utm_campaign=visualstudio).
* Access to Azure Storage requires an Azure Storage account. See [Create a storage account](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal).

### Create a VFS connection

Perform the following steps to create a VFS connection in PDI:

1. Start the PDI client (Spoon).
2. In the **View** tab of the Explorer pane, right-click **VFS Connections**, then click **New**.

   The New VFS connection dialog box opens.

   ![New VFS Connection dialog box](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-823b3330ed8ea8889f29e9706a8a855c3b2f4127%2FPDI%20VFS%20connection%20dialog.png?alt=media)
3. In **Connection Name**, enter a unique name. Optionally, add a **Description**.

   The name can include spaces. Do not use special characters. Avoid `#`, `$`, `/`, `\`, `%`.
4. In **Connection Type**, select a type:
   * **Amazon S3/Minio/HCP**
   * **Azure Data Lake Gen 1**
   * **Azure Data Lake Gen 2 / Blob**
   * **Google Cloud Storage**
   * **HCP REST**
   * **Local**
   * **SMB/UNC Provider**
   * **Snowflake Staging**
5. In the connection details panel, set the options for your connection type.

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>You can add a predefined variable to fields that have the “insert variable” icon. Place your cursor in the field, then press <code>Ctrl+Space</code>. Variables must be predefined in <code>kettle.properties</code>. Runtime variables are not supported.</p><p>See <a href="../archived-merged-pages/transforming-data-with-pdi-archive/pdi-run-modifiers/variables/kettle-variables">Kettle Variables</a>.</p></div>

| Connection type                  | Options                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Amazon**                       | <p>Click <strong>S3 Connection Type</strong> and select <strong>Amazon</strong> from the list to use an Amazon S3 connection.</p><p>Simple Storage Service (S3) accesses the resources on Amazon Web Services. See <a href="https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html">Working with AWS Credentials</a> for Amazon S3 setup instructions.</p><ul><li>Select the <strong>Authentication Type</strong>:<br>- <strong>Access Key/Secret Key</strong><br>- <strong>Credentials File</strong></li><li>Select the <strong>Region</strong>.</li><li><p>When <strong>Authentication Type</strong> is:</p><ul><li><strong>Access Key/Secret Key</strong>, then enter the <strong>Access Key</strong> and <strong>Secret Key</strong>, and optionally enter the <strong>Session Token</strong>.</li><li><strong>Credentials File</strong>, then enter the <strong>Profile Name</strong> and the <strong>File Location</strong>.</li></ul></li><li>Select the <strong>Default S3 Connection</strong> checkbox to make <strong>Amazon</strong> the default S3 connection.</li></ul>                                                                                                                                                                                                                                                 |
| **Minio/HCP**                    | <p>Click <strong>S3 Connection Type</strong> and select <strong>Minio/HCP</strong> from the list to use a Minio/HCP S3 connection.</p><p>Minio accesses data objects on an Amazon compatible storage server. See the <a href="https://docs.min.io/docs/">Minio Quickstart Guide</a> for Minio setup instructions.</p><ul><li>Enter the <strong>Access Key</strong>.</li><li>Enter the <strong>Secret Key</strong>.</li><li>Enter the <strong>Endpoint</strong>.</li><li>Enter the <strong>Signature Version</strong>.</li><li>Select the <strong>PathStyle Access</strong> checkbox to use path-style requests. Otherwise, Amazon S3 bucket-style access is used.</li><li>Select the <strong>Default S3 Connection</strong> checkbox to make <strong>Minio/HCP</strong> the default S3 connection.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| **Azure Data Lake Gen 1**        | <p>Accesses data objects on Microsoft Azure Gen 1 storage services. You must create an Azure account and configure Azure Data Lake Storage Gen 1. See <a href="#access-to-microsoft-azure">Access to Microsoft Azure</a>.</p><ul><li>The <strong>Authentication Type</strong> is <strong>Service-to-service authentication</strong> only.</li><li>Enter the <strong>Account Fully Qualified Domain Name</strong>.</li><li>Enter the <strong>Application (client) ID</strong>.</li><li>Enter the <strong>Client Secret</strong>.</li><li>Enter the <strong>OAuth 2.0 token endpoint</strong>.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| **Azure Data Lake Gen 2 / Blob** | <p>Accesses data objects on Microsoft Azure Gen 2 and Blob storage services. You must create an Azure account and configure Azure Data Lake Storage Gen 2 and Blob Storage. See <a href="#access-to-microsoft-azure">Access to Microsoft Azure</a>.</p><ul><li>Select the <strong>Authentication Type</strong>:<br>- <strong>Account Shared Key</strong><br>- <strong>Azure Active Directory</strong><br>- <strong>Shared Access Signature</strong></li><li>Enter the <strong>Service Account Name</strong>.</li><li>Enter the <strong>Block Size (Min 1 MB to Max 100 MB)</strong>. The default is 50.</li><li>Enter the <strong>Buffer Count (Min 2)</strong>. The default is 5.</li><li>Enter the <strong>Max Block Upload Size (Min 1 MB to 900 MB)</strong>. The default is 100.</li><li>Select the <strong>Access Tier</strong>. The default value is Hot.</li><li><p>When <strong>Authentication Type</strong> is:</p><ul><li><strong>Account Shared Key</strong>, then enter the <strong>Service Account Shared Key</strong>.</li><li><strong>Azure Active Directory</strong>, then enter the <strong>Application (client) ID</strong>, <strong>Client Secret</strong>, and <strong>Directory (tenant) ID</strong>.</li><li><strong>Shared Access Signature</strong>, then enter the <strong>Shared Access Signature</strong>.</li></ul></li></ul> |
| **Google Cloud Storage**         | <p>Accesses data objects on the Google Cloud Storage file system. See <a href="https://cloud.google.com/storage/docs">Google Cloud Storage</a> for more information on this protocol.</p><ul><li>Enter the <strong>Service Account Key Location</strong>.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| **HCP REST**                     | <p>Accesses data objects on the Hitachi Content Platform. You must configure HCP and PDI before accessing the platform. You must also configure object versioning in HCP namespaces. See <a href="#access-to-hcp-rest">Access to HCP REST</a>.</p><ul><li>Enter the <strong>Host</strong> and <strong>Port</strong>.</li><li>Enter the <strong>Tenant</strong>, <strong>Namespace</strong>, <strong>Username</strong>, and <strong>Password</strong>.</li><li>Click <strong>More options</strong>, then enter the <strong>Proxy Host</strong> and <strong>Proxy Port</strong>.</li><li>Select whether to use <strong>Accept self-signed certificate</strong>. Default: No.</li><li>Select whether the <strong>Proxy is secure</strong>. Default: No.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| **Local**                        | <p>Accesses a file system on your local machine.</p><ul><li>Enter the <strong>Root Folder Path</strong> or click <strong>Browse</strong> to set a folder connection. Optionally, use an empty path to allow access to the root directory and its folders.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| **SMB/UNC Provider**             | <p>Accesses Server Message Block data using a Universal Naming Convention string to specify the file location.</p><ul><li>Enter the <strong>Domain</strong>. The domain name of the target machine hosting the resource. If the machine has no domain name, use the machine name.</li><li>Enter the <strong>Port Number</strong>. Default: 445.</li><li>Enter the <strong>Server</strong>, <strong>User Name</strong>, and <strong>Password</strong>.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| **Snowflake Staging**            | <p>Accesses a staging area used by Snowflake to load files. See <a href="https://docs.snowflake.net/">Snowflake staging area</a> for more information.</p><ul><li>Enter the <strong>Host Name</strong>.</li><li>Enter the <strong>Port Number</strong>. Default: 443.</li><li>Enter the <strong>Database</strong>.</li><li>Enter the <strong>Namespace</strong>, <strong>User Name</strong>, and <strong>Password</strong>.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |

6. For all connection types except **Local**, enter the **Root Folder Path** for your VFS connection. Enter the full path to connect to a specific folder. Optionally, use an empty path to allow access to all folders in the root.

   ```
   The default is the root and its folders in your local physical file system.
   ```
7. Optional: Click **Test** to verify the connection.
8. Click **OK**.

You can now use the connection in steps and entries that support VFS connections, such as Snowflake entries or HCP steps. For related information, see:

* [PDI and Snowflake](https://docs.pentaho.com/pdia-data-integration/extracting-data-into-pdi/pdi-and-snowflake-cp)
* [PDI and Hitachi Content Platform (HCP)](https://docs.pentaho.com/pdia-data-integration/extracting-data-into-pdi/pdi-and-hitachi-content-platform-hcp)

For general access details, see [Access files with the VFS browser](#access-files-with-the-vfs-browser).

### Edit a VFS connection

Perform the following steps to edit an existing VFS connection:

1. Right-click **VFS Connections** and select **Edit**.
2. In the Edit VFS Connection dialog box, select the pencil icon next to the section you want to edit.

### Delete a VFS connection

Perform the following steps to delete a VFS connection:

1. Right-click **VFS Connections**.
2. Select **Delete**, then **Yes, Delete**.

The deleted connection no longer appears under **VFS Connections** in the **View** tab.

### Access files with a VFS connection

After you create a VFS connection, you can use the VFS Open and Save dialog boxes to access files in the PDI client.

1. In the PDI client, select **File** > **Open URL** to open a file, or **File** > **Save as** to save a file.

   The VFS Open or Save As dialog box opens.

   ![Open dialog box in the PDI client](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-e22e88115d14d9ee084ce64f6fba260ff6b1ab86%2FssPDI_OpenDialogBox_9.4.png?alt=media)
2. In the left pane, select **VFS connection**, then navigate to your folders and files.
3. Optional: Click the navigation path to show and copy the Pentaho file path. See [Pentaho address for a VFS connection](#pentaho-address-for-a-vfs-connection).
4. Select the file and click **Open** or **Save**.

{% hint style="info" %}
If you are not connected to a repository, you can rename a folder or file. Click the item again to edit its name.
{% endhint %}

### Pentaho address for a VFS connection

The Pentaho address is the Pentaho virtual file system (`pvfs`) location within your VFS connection. When you browse in the file access dialog box, the address bar shows the path for your VFS location.

![VFS navigation path in the Open dialog box in the PDI client](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-737c2cdaff12fc62694969d198861a1a1d3f6cdd%2FssPDI_OpenDialogBox_VFSAddressBar.png?alt=media)

When you click in the address bar, the Pentaho address appears.

![PVFS file path in the Open dialog box in the PDI client](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-f45eca9a3b4549e688d42d86915311abff88aa3d%2FssPDI_OpenDialogBox_VFSAddressBar_PVFSPath.png?alt=media)

You can copy and paste a Pentaho address into file path fields in steps and entries that support VFS connections.

{% hint style="info" %}
Use the Pentaho virtual file system for Amazon S3. Existing transformations and jobs that use Amazon S3 are supported when **Amazon S3** is set as the **Default S3 Connection**.
{% endhint %}

### Create a VFS metastore

A PDI metastore is a location for storing resources shared by multiple transformations. It enables hyperscaler deployments to access the metastore in the cloud. It also lets the PDI client and Pentaho Server reference the same VFS metastore.

The VFS connection information is stored in an XML file. The metastore can be located in one of these places:

* On the machine where you run PDI, in your user directory or in a repository
* On Pentaho Server, as a remote metastore in the server repository
* In a cloud location that is accessible through a VFS connection

Multiple users can access the metastore when it is stored in a remote location. The remote metastore has priority over a local metastore. For example, if you configure a local `metastore-config` file and then connect to a Pentaho Server repository, transformations still use the remote metastore.

#### Enable a VFS metastore

Before you can use a remote metastore, enable a VFS connection in the PDI client. You do this by creating a metastore configuration file, then editing it.

Perform the following steps to enable a VFS metastore:

1. Open the PDI client and create a VFS connection to the storage location you want to use as your metastore. See [Create a VFS connection](#create-a-vfs-connection).
2. Close the PDI client.
3. Go to `Users\<yourusername>\.pentaho\metastore\pentaho\Amazon S3 Connection\` and copy the VFS connection file you created into `Users\<yourusername>\.kettle`.
4. Rename the file to `metastore-config`.
5. Open `metastore-config` in a text editor. Add the `scheme` and `rootPath` elements and their values. See [Metastore configuration](#metastore-configuration).
6. Save the file.
7. Restart the PDI client.

The remote VFS metastore is now enabled. Previous local connections still exist in your local metastore directory. They no longer display in the PDI client. New VFS connections are stored in the location specified in `metastore-config`.

#### Metastore configuration

The elements listed in this section are required for all remote environments. When you create a VFS connection in the PDI client, you do not need to manually edit anything in the `<configuration>` section.

**Common elements**

These elements are required for all VFS connections:

<table data-header-hidden><thead><tr><th></th><th></th><th></th></tr></thead><tbody><tr><td>Element</td><td>Value</td><td>Description</td></tr><tr><td><code>scheme</code></td><td>&#x3C;string></td><td><p>The type of connection. The values are:</p><p><strong>s3</strong> - Amazon, MinIO, and HCP</p><p><strong>gs</strong> - Google cloud storage</p><p><strong>abfss</strong> - Azure Data Lake Storage Gen2</p></td></tr><tr><td><code>rootPath</code></td><td>&#x3C;bucket-name>[/&#x3C;path>]</td><td><p>The bucket name and optional folder path where you want to create the VFS metastore. The <code>rootPath</code> element must point to the location where you will store the metastore file on the cloud location.</p><p>This is analogous to the <code>.pentaho</code> folder in a local metastore.</p><p>Examples:</p><ul><li><code>miniobucket/dir1</code></li><li><code>gcpbucket/dir1</code></li></ul></td></tr><tr><td><code>children</code></td><td></td><td><p>A container for type-specific configurations. For example:</p><pre><code>&#x3C;children>
    &#x3C;child>
&#x3C;id>description&#x3C;/id>
         &#x3C;value>&#x3C;/value>
    &#x3C;type>String&#x3C;/type>
&#x3C;/child>
…
&#x3C;/children>
</code></pre></td></tr></tbody></table>

**S3 elements**

The elements listed below apply to S3 environments. Some elements are conditional.

| Element               | Value                            | Description                                                                                                                                                                           |
| --------------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `accessKey`           | `<s3-access-key>`                | The S3 user’s access key.                                                                                                                                                             |
| `secretKey`           | `<s3-secret-key>`                | The S3 user’s secret key.                                                                                                                                                             |
| `endPoint`            | `<s3-endpoint>`                  | <p>The URL to access the S3 location. Examples:</p><p><code>http\://\<host ip>:port</code></p><p><code><https://my-hcp-namespace.my-hcp-tenant.hcpdemo.hitachivantara.com></code></p> |
| `region`              | `<s3-region>`                    | The user-designated region. For example, `us-east-1`.                                                                                                                                 |
| `connectionType`      | 0 or 1                           | <p>The connection type value. The values are:</p><p><strong>0</strong> - connect to AWS</p><p><strong>1</strong> - connect to MinIO or HCP</p>                                        |
| `credentialFile`      |                                  | An encrypted string that is not user editable                                                                                                                                         |
| `profileName`         | `<string>`                       | The AWS user profile connection when the Type is 0 (AWS) and the `authType` is 1 (credentials file)                                                                                   |
| `defaultS3Config`     | true or false                    | Controls whether the default S3 configuration is used. Set to `true` to use the default S3 configuration                                                                              |
| `credentialsFilePath` | `<path to AWS credentials file>` | The path to the AWS credentials file when the connectionType is 0 (AWS) and the authType is 1 (credentials file)                                                                      |
| `pathStyleAccess`     | true or false                    | Controls which access style is used. Specify `true` for path-style access. Specify `false` for bucket-style access                                                                    |
| `signatureVersion`    | `AWSS3V4SignerType`              | The signature version used when communicating with the AWS S3 metastore location.                                                                                                     |
| `name`                | `vfsMetastore`                   | The connection name.                                                                                                                                                                  |
| `description`         | `<string>`                       | A description of the connection.                                                                                                                                                      |
| `sessionToken`        | `<session token string>`         | Optional. A temporary credential used if the AWS bucket requires a session token for access                                                                                           |
| `authType`            | 0 or 1                           | <p>The authentication type when <code>connectionType</code> is 0 (AWS):</p><p>0 – Access key/Secret key</p><p>1 – Credentials file</p>                                                |

**GCP elements**

The elements listed below apply to GCP environments:

| Element             | Value      | Description                                                                |
| ------------------- | ---------- | -------------------------------------------------------------------------- |
| `serviceAccountKey` | `<string>` | A key that is generated based on the contents of the service account JSON. |
| `keyPath`           | `<path>`   | The path to the file containing the GCP service account JSON.              |
| `name`              | `<string>` | The name of the connection.                                                |
| `description`       | `<string>` | A description of the connection.                                           |

**Azure Data Lake Storage Gen2 elements**

The elements listed below apply to Azure Data Lake Storage Gen2 environments. See [Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/) for more information.

| Element               | Value                | Description                                                                                                                            |
| --------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `sharedKey`           | `<encrypted string>` | The shared key for accessing the service.                                                                                              |
| `accountName`         | `<encrypted string>` | The name of the account.                                                                                                               |
| `accessTier`          | `<string>`           | The access tier value. Default: `Hot`.                                                                                                 |
| `blockSize`           | `<Integer>`          | Default: 50.                                                                                                                           |
| `maxSingleUploadSize` | `<Integer>`          | Default: 100.                                                                                                                          |
| `bufferCount`         | `<Integer>`          | Default: 5.                                                                                                                            |
| `name`                | `<string>`           | The connection name.                                                                                                                   |
| `authType`            | `0`, `1`, or `2`     | <p>The authorization type. Values:</p><p>0 - Account Shared Key</p><p>1 - Azure Active Directory</p><p>2 - Shared Access Signature</p> |

### Steps and entries supporting VFS connections

You may have a transformation or job containing a step or entry that accesses a file on a Virtual File System.

The following steps and entries support VFS connections:

* [Avro Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-input)
* [Avro Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-output)
* [Bulk load from MySQL into file](http://wiki.pentaho.com/display/EAI/BulkLoad+from+Mysql+to+file)
* [Bulk load into MSSQL](http://wiki.pentaho.com/display/EAI/BulkLoad+into+MSSQL)
* [Bulk load into MySQL](http://wiki.pentaho.com/display/EAI/Bulkload+into+MySQL)
* [Copybook Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/copybook-input-pdi-step)
* [CSV File Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/csv-file-input)
* [De-serialize from file](http://wiki.pentaho.com/display/EAI/De-serialize+from+file)
* [Fixed file input](http://wiki.pentaho.com/display/EAI/Fixed+File+Input)
* [Get data from XML](http://wiki.pentaho.com/display/EAI/Get+Data+From+XML)
* [Get File Names](http://wiki.pentaho.com/display/EAI/Get+File+Names)
* [Get Files Rows Count](http://wiki.pentaho.com/display/EAI/Get+Files+Rows+Count)
* [Get SubFolder names](http://wiki.pentaho.com/display/EAI/Get+SubFolder+names)
* [Google Analytics](http://wiki.pentaho.com/display/EAI/Google+Analytics)
* [GZIP CSV Input](http://wiki.pentaho.com/display/EAI/GZIP+CSV+Input)
* [Job (job entry)](https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/job-job-entry)
* [JSON Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/json-input)
* [JSON output](http://wiki.pentaho.com/display/EAI/JSON+output)
* [ORC Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/orc-input)
* [ORC Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/orc-output)
* [Parquet Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/parquet-input)
* [Parquet Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/parquet-output)
* [Query HCP](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/query-hcp)
* [Read metadata from Copybook](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/read-metadata-from-copybook)
* [Read metadata from HCP](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/read-metadata-from-hcp)
* [Text File Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/text-file-output-cp)
* [Transformation (job entry)](https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/transformation-job-entry-cp)
* [Write metadata to HCP](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/write-metadata-to-hcp)

### VFS browser

Some transformation steps and job entries use a Virtual File System (VFS) browser instead of VFS connections and the Open dialog box. When you use the VFS browser, you specify a VFS URL instead of a VFS connection. Files are accessed using HTTP. The URLs include schema data that identifies the protocol.

Files can be local or remote. Files can also be compressed formats, such as TAR and ZIP. For more information, see the [Apache Commons VFS documentation](http://commons.apache.org/proper/commons-vfs/).

#### Before you begin

If you need to access Google Drive, see [Access to a Google Drive](#access-to-a-google-drive).

#### Access to a Google Drive

Perform the following setup steps to initially access Google Drive.

1. Follow the “Step 1” procedure in [Build your first Drive app (Java)](https://developers.google.com/drive/api/v3/quickstart/java) in the [Google Drive APIs documentation](https://developers.google.com/drive/).

   This procedure turns on the Google Drive API and creates a `credentials.json` file.
2. Rename `credentials.json` to `client_secret.json`. Copy it to `data-integration/plugins/pentaho-googledrive-vfs/credentials`.
3. Restart PDI.

   The **Google Drive** option does not appear for the VFS browser until you copy `client_secret.json` into the `credentials` directory and restart PDI.
4. Sign in to your Google account.
5. Enter your Google account credentials.
6. In the permission window, click **Allow**.

After initialization, Pentaho stores a token named **StoredCredential** in `data-integration/plugins/pentaho-googledrive-vfs/credentials`. This token lets you access Google Drive resources without signing in again. If you delete the token, you are prompted to sign in after restarting PDI. If you change account permissions, delete the token and repeat the setup.

{% hint style="info" %}
To access Google Drive from a transformation that runs on Pentaho Server, copy **StoredCredential** and `client_secret.json` into `pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-googledrive-vfs/credentials` on the server.
{% endhint %}

#### Access files with the VFS browser

Perform the following steps to access files with the VFS browser.

1. Select **File** > **Open** in the PDI client.

   The Open dialog box appears.

   ![Open dialog box](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-e0db2371f2c71a4ffa02b6e8278820440e1b56b1%2FPDI%20Open%20file%20dialog%20box.png?alt=media)
2. In the left pane, select the file system type. Supported file systems include:
   * **Local**: Files on your local machine.
   * **Hadoop Cluster**: Files on any Hadoop cluster except S3.
   * **HDFS**: Files on Hadoop distributed file systems.
   * **Google Drive**: Files on Google Drive. See [Access to a Google Drive](#access-to-a-google-drive).
   * **VFS Connections**: Files using a stored VFS connection.
3. Optional: In the **Address** bar, enter a VFS URI.

   Examples:

   * **Local**: `ftp://userID:password@ftp.myhost.com/path_to/file.txt`
   * **HDFS**: `hdfs://myusername:mypassword@mynamenode:port/path`
   * **SMB/UNC Provider**: `smb://<domain>;<username>:<password>@<server>:<port>/<path>`

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>For SMB, “domain” is the Windows host name. “Domain” and “server” can be the same when using an IP address.</p></div>
4. Optional: Use **File type** to filter on file types other than transformations and jobs.
5. Optional: Select a file or folder and click the **X** icon to delete it.
6. Optional: Click the **+** icon to create a new folder.

{% hint style="info" %}
VFS dialog boxes are configured through transformation parameters. See [Configure VFS options](#configure-vfs-options).
{% endhint %}

#### Supported steps and entries

The following steps and entries support the VFS browser:

* [Amazon EMR Job Executor](https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/amazon-emr-job-executor) (introduced in v9.0)
* [Amazon Hive Job Executor](https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/amazon-hive-job-executor) (introduced in v9.0)
* [AMQP Consumer](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/amqp-consumer) (introduced in v9.0)
* [Avro Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-input) (introduced in v8.3)
* [Avro Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/avro-output) (introduced in v8.3)
* [ETL metadata injection](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection)
* [File Exists (Job Entry)](https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/file-exists-job-entry)
* [Hadoop Copy Files](https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/hadoop-copy-files)
* [Hadoop File Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/hadoop-file-input-cp-main-page)
* [Hadoop File Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/hadoop-file-output-cp-main-page)
* [JMS Consumer](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/jms-consumer) (introduced in v9.0)
* [Job Executor](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/job-executor) (introduced in v9.0)
* [Kafka consumer](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/kafka-consumer) (introduced in v9.0)
* [Kinesis consumer](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/kinesis-consumer) (introduced in v9.0)
* [Mapping](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/mapping) (sub-transformation)
* [MQTT Consumer](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/mqtt-consumer) (introduced in v9.0)
* [ORC Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/orc-input) (introduced in v8.3)
* [ORC Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/orc-output) (introduced in v8.3)
* [Parquet Input](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/parquet-input) (introduced in v8.3)
* [Parquet Output](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/parquet-output) (introduced in v8.3)
* [Oozie Job Executor](http://wiki.pentaho.com/display/EAI/Oozie+Job+Executor) (introduced in v9.0)
* [Simple Mapping](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/simple-mapping-sub-transformation) (introduced in v9.0)
* [Single Threader](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/single-threader) (introduced in v9.0)
* [Sqoop Export](http://wiki.pentaho.com/display/EAI/Sqoop+Export) (introduced in v9.0)
* [Sqoop Import](http://wiki.pentaho.com/display/EAI/Sqoop+Import) (introduced in v9.0)
* [Transformation Executor](https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/transformation-executor) (introduced in v9.0)
* [Weka Scoring](https://wiki.pentaho.com/display/EAI/Weka+Scoring) (introduced in v9.0)

{% hint style="info" %}
If you have a Pentaho address for an existing VFS connection, you can paste the `pvfs` location into file or folder fields. You do not need to use **Browse**.
{% endhint %}

For more information on configuring options for SFTP, see [Configure SFTP VFS](https://github.com/pentaho/documentation/blob/main/PDIA/11.0/PDI/Data%20Integration%20Perspective/Data%20Integration%20perspective%20in%20the%20PDI%20client/Connecting%20to%20Virtual%20File%20Systems%20cp/VFS%20browser%20\(Connecting%20to%20Virtual%20File%20Systems\)/broken-reference/README.md).

#### Configure VFS options

The VFS browser can be configured to set variables as parameters at runtime. The sample transformation `VFS Configuration Sample.ktr` is located in `data-integration/samples/transformations`.

For more information on setting variables, see [VFS properties](https://docs.pentaho.com/pdia-data-integration/archived-merged-pages/transforming-data-with-pdi-archive/pdi-run-modifiers/parameters/vfs-properties).

For an example of configuring an SFTP VFS connection, see [Configure SFTP VFS](https://github.com/pentaho/documentation/blob/main/PDIA/11.0/PDI/Data%20Integration%20Perspective/Data%20Integration%20perspective%20in%20the%20PDI%20client/Connecting%20to%20Virtual%20File%20Systems%20cp/VFS%20browser%20\(Connecting%20to%20Virtual%20File%20Systems\)/broken-reference/README.md).
