Hadoop Copy Files
The Hadoop Copy Files job entry copies files in a Hadoop cluster from one location to another.
General
Entry name: Specify the unique name of the Hadoop Copy Files entry on the canvas. You can customize the name or leave it as the default.
Options
The Hadoop Copy Files job entry includes two tabs: Files/Folders and Settings.
Files/Folders tab

Source Environment
Specify the type of file system containing the files you want to copy.
Source File/Folder
Specify the file or directory you want to copy. Click Browse to navigate to the source file or folder through the VFS browser. See VFS browser.
Wildcard (RegExp)
Specify the files to copy by using a regular expression instead of static file names. For example, .*\.txt selects all files with a .txt extension.
Destination Environment
Specify the file system where you want to put your copied files.
Destination File/Folder
Specify the file or directory where you want to place your copied file. Click Browse and select Hadoop to enter your Hadoop cluster connection details.
The source environment and destination environment must be the same.
Settings tab

Include subfolders
Select to copy all subdirectories in the chosen directory.
Destination is a file
Select if the destination is a file.
Copy empty folders
Select to copy empty directories. Include subfolders must be selected for this option to be valid.
Create destination folder
Select to create the destination directory if it does not exist.
Replace existing files
Select to overwrite files in the destination directory.
Remove source files
Select to remove the source files after copying them. This is equivalent to a move operation.
Copy previous results to arguments
Select to use previous step results as your sources and destinations.
Add files to result files name
Select to create a list of the files copied by this entry.
If you are not using Kerberos security, this job entry sends the username of the signed-in user when copying files, regardless of the username entered in the connection field.
To use a different username, set the HADOOP_USER_NAME environment variable to the username you want.
Example:
Last updated
Was this helpful?

