# Hadoop Copy Files

The **Hadoop Copy Files** job entry copies files in a Hadoop cluster from one location to another.

### General

* **Entry name**: Specify the unique name of the Hadoop Copy Files entry on the canvas. You can customize the name or leave it as the default.

### Options

The Hadoop Copy Files job entry includes two tabs: **Files/Folders** and **Settings**.

#### Files/Folders tab

![Files/Folders tab, Hadoop Copy Files](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-ab92660f05220630494f6fca50fde2649cacb7c0%2FPDI_JobEntry_HadoopCopyFiles_Files.png?alt=media)

| Option                      | Description                                                                                                                                                                                                                                                                                                           |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Source Environment**      | Specify the type of file system containing the files you want to copy.                                                                                                                                                                                                                                                |
| **Source File/Folder**      | Specify the file or directory you want to copy. Click **Browse** to navigate to the source file or folder through the VFS browser. See [VFS browser](https://docs.pentaho.com/pdia-data-integration/archived-merged-pages/connecting-to-virtual-file-systems-archive/vfs-browser-connecting-to-virtual-file-systems). |
| **Wildcard (RegExp)**       | Specify the files to copy by using a regular expression instead of static file names. For example, `.*\.txt` selects all files with a `.txt` extension.                                                                                                                                                               |
| **Destination Environment** | Specify the file system where you want to put your copied files.                                                                                                                                                                                                                                                      |
| **Destination File/Folder** | Specify the file or directory where you want to place your copied file. Click **Browse** and select **Hadoop** to enter your Hadoop cluster connection details.                                                                                                                                                       |

{% hint style="info" %}
The source environment and destination environment must be the same.
{% endhint %}

#### Settings tab

![Settings tab, Hadoop Copy Files](https://773338310-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYwnJ6Fexn4LZwKRHghPK%2Fuploads%2Fgit-blob-0d0d210170f6ee63e0701883b20c387ac0b857c6%2FPDI_JobEntry_HadoopCopyFiles_Settings.png?alt=media)

| Option                                 | Description                                                                                            |
| -------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| **Include subfolders**                 | Select to copy all subdirectories in the chosen directory.                                             |
| **Destination is a file**              | Select if the destination is a file.                                                                   |
| **Copy empty folders**                 | Select to copy empty directories. **Include subfolders** must be selected for this option to be valid. |
| **Create destination folder**          | Select to create the destination directory if it does not exist.                                       |
| **Replace existing files**             | Select to overwrite files in the destination directory.                                                |
| **Remove source files**                | Select to remove the source files after copying them. This is equivalent to a move operation.          |
| **Copy previous results to arguments** | Select to use previous step results as your sources and destinations.                                  |
| **Add files to result files name**     | Select to create a list of the files copied by this entry.                                             |

If you are not using Kerberos security, this job entry sends the username of the signed-in user when copying files, regardless of the username entered in the connection field.

To use a different username, set the `HADOOP_USER_NAME` environment variable to the username you want.

Example:

```
OPT="$OPT .... -DHADOOP_USER_NAME=HadoopNameToSpoof"
```
