Hadoop Copy Files

The Hadoop Copy Files job entry copies files in a Hadoop cluster from one location to another.

General

Entry name: Specify the unique name of the Hadoop Copy Files entry on the canvas. You can customize the name or leave it as the default.

Options

The Hadoop Copy Files job entry includes two tabs: Files/Folders and Settings.

Files/Folders tab

Option

Description

Source Environment

Specify the type of file system containing the files you want to copy.

Source File/Folder

Specify the file or directory you want to copy. Click Browse to navigate to the source file or folder through the VFS browser. See VFS browser.

Wildcard (RegExp)

Specify the files to copy by using a regular expression instead of static file names. For example, .*\.txt selects all files with a .txt extension.

Destination Environment

Specify the file system where you want to put your copied files.

Destination File/Folder

Specify the file or directory where you want to place your copied file. Click Browse and select Hadoop to enter your Hadoop cluster connection details.

The source environment and destination environment must be the same.

Settings tab

Option

Description

Include subfolders

Select to copy all subdirectories in the chosen directory.

Destination is a file

Select if the destination is a file.

Copy empty folders

Select to copy empty directories. Include subfolders must be selected for this option to be valid.

Create destination folder

Select to create the destination directory if it does not exist.

Replace existing files

Select to overwrite files in the destination directory.

Remove source files

Select to remove the source files after copying them. This is equivalent to a move operation.

Copy previous results to arguments

Select to use previous step results as your sources and destinations.

Add files to result files name

Select to create a list of the files copied by this entry.

If you are not using Kerberos security, this job entry sends the username of the signed-in user when copying files, regardless of the username entered in the connection field.

To use a different username, set the HADOOP_USER_NAME environment variable to the username you want.

Example:

OPT="$OPT .... -DHADOOP_USER_NAME=HadoopNameToSpoof"

PreviousGoogle BigQuery loader NextJob (job entry)

Last updated 2 months ago

Was this helpful?

hashtagGeneral

hashtagOptions

hashtagFiles/Folders tab

hashtagSettings tab

General

Options

Files/Folders tab

Settings tab