Use Command Line Tools to Run Transformations and Jobs

You can use command line tools to execute Pentaho Data Integration (PDI) content outside the PDI client. Use them in scripts and schedulers, like cron.

Use Pan to run transformations. Use Kitchen to run jobs.

Startup script options

Pan and Kitchen recognize the startup-script options used by the PDI client. These options are in Spoon.bat (Windows) and Spoon.sh (Linux).

To use these options with Pan or Kitchen, add them to your startup script.

Note: The default directory for the startup script is design-tools/data-integration.

Option

Description

FILTER_GTK_WARNINGS

Suppresses GTK warnings from spoon.sh and kitchen.sh. Set to true to suppress warnings. Leave empty to show warnings.

SKIP_WEBKITGTK_CHECK

Suppresses warnings about missing libwebkitgtk when launching the PDI client. Set to true to suppress warnings. Leave empty to show warnings.

KETTLE_HOME

Identifies the user's home directory for PDI configuration files. Use it to change the location of files normally in <user home>/.kettle.

KETTLE_LOG_SIZE_LIMIT

Limits the log size for transformations and jobs that do not set a log size limit property.

KETTLE_JNDI_ROOT

Changes the Simple JNDI path, which contains jdbc.properties.

KETTLE_DIR

Directory where the PDI client is installed.

KETTLE_REPOSITORY

Repository that Kettle connects to at startup.

LIBPATH

Value passed as the -Djava.library.path Java parameter.

PENTAHO_DI_JAVA_OPTIONS

Additional Java arguments when running Kettle. Use it for settings like memory limits.

Pan (run transformations)

Pan runs transformations from a PDI repository (database or enterprise) or a local file. The options are the same for the shell script and batch file.

Note: Windows uses the forward slash (/) and colon (:) syntax. If option values contain spaces, quote the full argument. Example: "-param:MASTER_HOST=192.168.1.3" "-param:MASTER_PORT=8181".

pan.sh -option=value arg1 arg2

pan.bat /option:value arg1 arg2

Example:

sh pan.sh -rep=initech_pdi_repo -user=pgibbons -pass=lumburgh -trans=TPS_reports_2011

pan.bat /rep:initech_pdi_repo /user:pgibbons /pass:lumburgh /trans:TPS_reports_2011

Switch

Purpose

rep

Enterprise repository name.

user

Repository username.

pass

Repository password.

trans

Name of the transformation to run.

dir

Repository directory that contains the transformation, including the leading slash.

file

Local .ktr file path.

level

Logging level: Basic, Detailed, Debug, Rowlevel, Error, Nothing.

logfile

Log file path.

listdir

Lists directories in the specified repository.

listtrans

Lists transformations in the specified repository directory.

listrep

Lists available repositories.

exprep

Exports all repository objects to one XML file.

norep

Prevents Pan from logging into a repository. Useful when environment variables like KETTLE_REPOSITORY are set, but you want to run a local .ktr.

safemode

Runs in safe mode with extra checking.

version

Shows version, revision, and build date.

param

Sets a named parameter in name=value format. Example: -param:Foo=bar.

listparam

Lists information about named parameters in the specified transformation.

metrics

Gathers metrics during execution.

maxloglines

Maximum number of log lines kept internally. 0 keeps all lines (default).

maxlogtimeout

Maximum age (minutes) of a log line kept internally. 0 keeps lines indefinitely (default).

Pan status codes

Pan returns one of these status codes:

Status code

Definition

Transformation ran without a problem.

Errors occurred during processing.

Unexpected error during loading or running the transformation.

Unable to prepare and initialize the transformation.

Transformation could not be loaded from XML or the repository.

Error loading steps or plugins.

Command line usage was printed.

Kitchen (run jobs)

Kitchen runs jobs from a PDI repository (database or enterprise) or a local file. The options are the same for the shell script and batch file.

Note: Windows uses the forward slash (/) and colon (:) syntax. If option values contain spaces, quote the full argument. Example: "-param:MASTER_HOST=192.168.1.3" "-param:MASTER_PORT=8181".

kitchen.sh -option=value arg1 arg2

kitchen.bat /option:value arg1 arg2

Switch

Purpose

rep

Enterprise or database repository name.

user

Repository username.

pass

Repository password.

job

Name of the job (as it appears in the repository) to run.

dir

Repository directory that contains the job, including the leading slash.

file

Local .kjb file path.

level

Logging level: Basic, Detailed, Debug, Rowlevel, Error, Nothing.

logfile

Log file path.

listdir

Lists subdirectories within the specified repository directory.

listjob

Lists jobs in the specified repository directory.

listrep

Lists available repositories.

export

Exports all linked resources of the specified job. Argument is a ZIP filename.

norep

Prevents Kitchen from logging into a repository. Useful when environment variables like KETTLE_REPOSITORY are set, but you want to run a local .kjb.

version

Shows version, revision, and build date.

param

Sets a named parameter in name=value format. Example: -param:FOO=bar.

listparam

Lists information about named parameters in the specified job.

maxloglines

Maximum number of log lines kept internally. 0 keeps all lines (default).

maxlogtimeout

Maximum age (minutes) of a log line kept internally. 0 keeps lines indefinitely (default).

Example:

sh kitchen.sh -rep=initech_pdi_repo -user=pgibbons -pass=lumburghsux -job=TPS_reports_2011

kitchen.bat /rep:initech_pdi_repo /user:pgibbons /pass:lumburghsux /job:TPS_reports_2011

Kitchen status codes

Kitchen returns one of these status codes:

Status code

Definition

Job ran without a problem.

Errors occurred during processing.

Unexpected error during loading or running the job.

Job could not be loaded from XML or the repository.

Error loading steps or plugins.

Command line usage was printed.

Import `.kjb` or `.ktr` files from a ZIP archive

Pan and Kitchen can read PDI content from ZIP files. Use the ! switch.

Windows example:

Kitchen.bat /file:"zip:file:///C:/Pentaho/PDI Examples/Sandbox/linked_executable_job_and_transform.zip!Hourly_Stats_Job_Unix.kjb"

Linux and Solaris example (escape !):

./kitchen.sh -file:"zip:file:////home/user/pentaho/pdi-ee/my_package/linked_executable_job_and_transform.zip\!Hourly_Stats_Job_Unix.kjb"

Export repository content from the command line

To export repository objects into XML format using command-line tools, pass named parameters when calling Kitchen or Pan.

Example (Kitchen):

call kitchen.bat /file:C:\Pentaho_samples\repository\repository_export.kjb ^
"/param:rep_name=PDI2000" "/param:rep_user=admin" "/param:rep_password=password" ^
"/param:rep_folder=/public/dev" ^
"/param:target_filename=C:\Pentaho_samples\repository\export\dev.xml"

Parameter

Description

rep_folder

Repository folder

rep_name

Repository name

rep_password

Repository password

rep_user

Repository username

target_filename

Target filename

Note: You can use obfuscated passwords with Encr, the command line tool for encrypting strings for storage and use by PDI.

Example batch file that checks for errors:

@echo off
ECHO This is an example of a batch file calling repository_export.kjb

cd C:\Pentaho\pdi-ee-<version>\data-integration

call kitchen.bat /file:C:\Pentaho_samples\repository\repository_export.kjb "/param:rep_name=PDI2000" ^
"/param:rep_user=admin" "/param:rep_password=password" "/param:rep_folder=/public/dev" ^
"/param:target_filename=C:\Pentaho_samples\repository\export\dev.xml"

if errorlevel 1 goto error
echo Export finished successful.
goto finished

:error
echo ERROR: An error occurred during repository export.
:finished
REM Allow the user to read the message when testing.
pause

Use Pan and Kitchen with a Hadoop cluster

To use Pan or Kitchen on a Hadoop cluster, configure Pentaho to run transformations and jobs with the PDI client or the Pentaho Server. You do not need these configurations if the PDI client connects to the Pentaho Repository.

To use Pan and Kitchen from a repository directly on the Pentaho Server, create the named cluster definition in the server repository. See Connecting to a Hadoop cluster with the PDI client.

Note: If the PDI client and Pentaho Server run on the same platform, cluster configuration files in /home/<user>/.pentaho/metastore can be overwritten. Use the same cluster connection names on both hosts.

Configure the PDI client

Create a connection to the Hadoop cluster where you want to run the job or transformation.
Create and test the job or transformation in the PDI client.
Go to design-tools/data-integration/plugins/pentaho-big-data-plugin.
Open plugin.properties in a text editor.
Set hadoop.configurations.path to the directory that contains metastore.
Example: hadoop.configurations.path=/home/<user>/.pentaho
The default metastore location is /home/<user>/.pentaho/metastore.
Save and close plugin.properties.

Configure the Pentaho Server

If the server is on a different host, copy the metastore directory and its contents from the PDI client to a location the server can access.
The default metastore location for the PDI client is /home/<user>/.pentaho/metastore.
Go to pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin.
Open plugin.properties in a text editor.
Set hadoop.configurations.path to the directory that contains metastore.
Save and close plugin.properties.

Last updated 1 month ago

Was this helpful?

hashtagStartup script options

hashtagPan (run transformations)

hashtagPan status codes

hashtagKitchen (run jobs)

hashtagKitchen status codes

hashtagImport .kjb or .ktr files from a ZIP archive

hashtagExport repository content from the command line

hashtagUse Pan and Kitchen with a Hadoop cluster

hashtagConfigure the PDI client

hashtagConfigure the Pentaho Server