Use Command Line Tools to Run Transformations and Jobs
You can use command line tools to execute Pentaho Data Integration (PDI) content outside the PDI client. Use them in scripts and schedulers, like cron.
Use Pan to run transformations. Use Kitchen to run jobs.
Startup script options
Pan and Kitchen recognize the startup-script options used by the PDI client. These options are in Spoon.bat (Windows) and Spoon.sh (Linux).
To use these options with Pan or Kitchen, add them to your startup script.
Note: The default directory for the startup script is design-tools/data-integration.
FILTER_GTK_WARNINGS
Suppresses GTK warnings from spoon.sh and kitchen.sh. Set to true to suppress warnings. Leave empty to show warnings.
SKIP_WEBKITGTK_CHECK
Suppresses warnings about missing libwebkitgtk when launching the PDI client. Set to true to suppress warnings. Leave empty to show warnings.
KETTLE_HOME
Identifies the user's home directory for PDI configuration files. Use it to change the location of files normally in <user home>/.kettle.
KETTLE_LOG_SIZE_LIMIT
Limits the log size for transformations and jobs that do not set a log size limit property.
KETTLE_JNDI_ROOT
Changes the Simple JNDI path, which contains jdbc.properties.
KETTLE_DIR
Directory where the PDI client is installed.
KETTLE_REPOSITORY
Repository that Kettle connects to at startup.
LIBPATH
Value passed as the -Djava.library.path Java parameter.
PENTAHO_DI_JAVA_OPTIONS
Additional Java arguments when running Kettle. Use it for settings like memory limits.
Pan (run transformations)
Pan runs transformations from a PDI repository (database or enterprise) or a local file. The options are the same for the shell script and batch file.
Note: Windows uses the forward slash (/) and colon (:) syntax. If option values contain spaces, quote the full argument. Example: "-param:MASTER_HOST=192.168.1.3" "-param:MASTER_PORT=8181".
Example:
rep
Enterprise repository name.
user
Repository username.
pass
Repository password.
trans
Name of the transformation to run.
dir
Repository directory that contains the transformation, including the leading slash.
file
Local .ktr file path.
level
Logging level: Basic, Detailed, Debug, Rowlevel, Error, Nothing.
logfile
Log file path.
listdir
Lists directories in the specified repository.
listtrans
Lists transformations in the specified repository directory.
listrep
Lists available repositories.
exprep
Exports all repository objects to one XML file.
norep
Prevents Pan from logging into a repository. Useful when environment variables like KETTLE_REPOSITORY are set, but you want to run a local .ktr.
safemode
Runs in safe mode with extra checking.
version
Shows version, revision, and build date.
param
Sets a named parameter in name=value format. Example: -param:Foo=bar.
listparam
Lists information about named parameters in the specified transformation.
metrics
Gathers metrics during execution.
maxloglines
Maximum number of log lines kept internally. 0 keeps all lines (default).
maxlogtimeout
Maximum age (minutes) of a log line kept internally. 0 keeps lines indefinitely (default).
Pan status codes
Pan returns one of these status codes:
0
Transformation ran without a problem.
1
Errors occurred during processing.
2
Unexpected error during loading or running the transformation.
3
Unable to prepare and initialize the transformation.
7
Transformation could not be loaded from XML or the repository.
8
Error loading steps or plugins.
9
Command line usage was printed.
Kitchen (run jobs)
Kitchen runs jobs from a PDI repository (database or enterprise) or a local file. The options are the same for the shell script and batch file.
Note: Windows uses the forward slash (/) and colon (:) syntax. If option values contain spaces, quote the full argument. Example: "-param:MASTER_HOST=192.168.1.3" "-param:MASTER_PORT=8181".
rep
Enterprise or database repository name.
user
Repository username.
pass
Repository password.
job
Name of the job (as it appears in the repository) to run.
dir
Repository directory that contains the job, including the leading slash.
file
Local .kjb file path.
level
Logging level: Basic, Detailed, Debug, Rowlevel, Error, Nothing.
logfile
Log file path.
listdir
Lists subdirectories within the specified repository directory.
listjob
Lists jobs in the specified repository directory.
listrep
Lists available repositories.
export
Exports all linked resources of the specified job. Argument is a ZIP filename.
norep
Prevents Kitchen from logging into a repository. Useful when environment variables like KETTLE_REPOSITORY are set, but you want to run a local .kjb.
version
Shows version, revision, and build date.
param
Sets a named parameter in name=value format. Example: -param:FOO=bar.
listparam
Lists information about named parameters in the specified job.
maxloglines
Maximum number of log lines kept internally. 0 keeps all lines (default).
maxlogtimeout
Maximum age (minutes) of a log line kept internally. 0 keeps lines indefinitely (default).
Example:
Kitchen status codes
Kitchen returns one of these status codes:
0
Job ran without a problem.
1
Errors occurred during processing.
2
Unexpected error during loading or running the job.
7
Job could not be loaded from XML or the repository.
8
Error loading steps or plugins.
9
Command line usage was printed.
Import .kjb or .ktr files from a ZIP archive
.kjb or .ktr files from a ZIP archivePan and Kitchen can read PDI content from ZIP files. Use the ! switch.
Windows example:
Linux and Solaris example (escape !):
Export repository content from the command line
To export repository objects into XML format using command-line tools, pass named parameters when calling Kitchen or Pan.
Example (Kitchen):
rep_folder
Repository folder
rep_name
Repository name
rep_password
Repository password
rep_user
Repository username
target_filename
Target filename
Note: You can use obfuscated passwords with Encr, the command line tool for encrypting strings for storage and use by PDI.
Example batch file that checks for errors:
Use Pan and Kitchen with a Hadoop cluster
To use Pan or Kitchen on a Hadoop cluster, configure Pentaho to run transformations and jobs with the PDI client or the Pentaho Server. You do not need these configurations if the PDI client connects to the Pentaho Repository.
To use Pan and Kitchen from a repository directly on the Pentaho Server, create the named cluster definition in the server repository. See Connecting to a Hadoop cluster with the PDI client.
Note: If the PDI client and Pentaho Server run on the same platform, cluster configuration files in /home/<user>/.pentaho/metastore can be overwritten. Use the same cluster connection names on both hosts.
Configure the PDI client
Create a connection to the Hadoop cluster where you want to run the job or transformation.
Create and test the job or transformation in the PDI client.
Go to
design-tools/data-integration/plugins/pentaho-big-data-plugin.Open
plugin.propertiesin a text editor.Set
hadoop.configurations.pathto the directory that containsmetastore.Example:
hadoop.configurations.path=/home/<user>/.pentahoThe default metastore location is
/home/<user>/.pentaho/metastore.Save and close
plugin.properties.
Configure the Pentaho Server
If the server is on a different host, copy the
metastoredirectory and its contents from the PDI client to a location the server can access.The default metastore location for the PDI client is
/home/<user>/.pentaho/metastore.Go to
pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin.Open
plugin.propertiesin a text editor.Set
hadoop.configurations.pathto the directory that containsmetastore.Save and close
plugin.properties.
Last updated
Was this helpful?

