Working with jobs
Create, configure, and run jobs to orchestrate ETL activities. After running a job, analyze its results to identify improvements and problems.
In this topic
Create a job
Create a job to coordinate resources, execution, and dependencies of an ETL activity.
To create a job in Pipeline Designer, follow these steps:
Log in to the Pentaho User Console.
Open Pipeline Designer:
If you are using the Modern Design, in the menu on the left side of the page, select Pipeline Designer.
If you are using the Classic Design, select Switch to the Modern Design, then select Pipeline Designer.
Pipeline Designer opens with the Quick Access section expanded.
In the Job card, select Create Job.
A new, blank job opens with the Design pane selected.
Add steps to the job:
In the Design pane, search for or browse to each step you want to use in the job.
Drag the steps you want to use onto the canvas.
Work with steps on the canvas.
Hover over a step to open the step menu, then select an option:
Menu optionDescriptionDelete
Deletes the step from the canvas.
Edit
Opens the Step Name window where you can configure the properties of the step. Step properties may appear in multiple sections, tabs, or both.
Note: To learn more about the step you're configuring, in the lower-left corner of the Step Name window, click Help.
Duplicate
Adds a copy of the step to the canvas.
Add hops between steps.
Hover over a step handle until a plus sign (+) appears, then drag the connection to another step handle.
Optional: Add a note on the canvas.
In the canvas toolbar, select the Add Note icon. In the Notes dialog box, enter your note, then select Save.
Note: To format the note, select Style and set font, color, and shadow options.
Save the job:
Select Save.
The Select File or Directory dialog box opens.
Search for or browse to the folder where you want to save the job.
Optional: Create a folder.
Select the New Folder icon. In the New folder dialog box, enter a folder name, then select Save.
Optional: Delete a folder.
Select the folder, then select the Delete icon.
In the Select File or Directory dialog box, select Save.
The Save Change dialog box opens.
Select Yes to confirm.
Edit job properties
Job properties control how a job behaves and how it logs what it is doing.
To configure job properties, follow these steps:
Log in to the Pentaho User Console.
Open Pipeline Designer:
If you are using the Modern Design, in the menu on the left side of the page, select Pipeline Designer.
If you are using the Classic Design, select Switch to the Modern Design, then select Pipeline Designer.
Pipeline Designer opens with the Quick Access section expanded.
In the table at the bottom of the screen, select the Recently opened tab or the Favorites tab.
Open the job:
Search for or browse to the job, then select Open.
Select Open files, then in the Select File or Directory dialog box, select the job and select Open.
In the Canvas Action toolbar, select the Settings icon.
The Job Properties window opens.
Configure the properties in each tab.
For details, see the tab sections in this topic.
Optional: Generate SQL for the logging table.
Select SQL.
The Simple SQL editor opens with DDL generated from the job properties.
Optional: Edit the SQL statements.
See Use the SQL Editor.
Optional: Clear cached results.
Select Clear cache.
Select Execute.
Select Save.
Job tab
General properties for jobs are found on the Job tab.
Job Name
The name of the job.
Note: This information is required if you want to save to a repository.
Job filename
The file name of the job if it is not stored in the repository.
Description
A user-defined short description of the job which is shown in the repository explorer.
Extended description
A user-defined longer description of the job.
Status
The status of the job. The values are draft and production.
Version
A description of the version.
Directory
The directory in the repository where the job is kept.
Created by
The original creator of the job.
Created at
The date and time when the job was created.
Last modified by
The name of the last user who modified the job.
Last modified at
The date and time when the job was last modified.
Parameters tab
Use the Parameters tab to define parameters for your jobs.
Parameter
A user-defined parameter.
Default value
The default value of the user-defined parameter.
Description
A description of the parameter.
Settings tab
Pass batch ID?
Select to pass the identification number of the batch to the transformation.
Shared objects file
PDI uses a single shared objects file for each user. The default filename is shared.xml and is located in the .kettle directory in the user’s home directory. You can define a different shared objects file, location, and name.
Log tab
Use the Log tab to specify logging settings.
Log connection
Specify the database connection you are using for logging. You can configure a new connection by selecting New.
Log Schema
Specify the schema name, if supported by your database.
Log table
Specify the name of the log table. If you also use transformation logging, use a different table name for job logging.
Logging interval (seconds)
Specify the interval in which logs are written to the table. This property only applies to Transformation and Performance logging types.
Log line timeout (days)
Specify the number of days to keep log entries in the table before they are deleted. This property only applies to Transformation and Performance logging types.
Log size limit in lines
Enter the limit for the number of lines that are stored in the LOG_FIELD. PDI stores logging for the transformation in a long text field (CLOB). This property only applies to Transformation and Performance logging types.
SQL button
Generates the SQL needed to create the logging table and lets you run the SQL statement.
Run a job
After you create a job and configure its properties, you can run the job. You can also control how the job runs without modifying the job itself by configuring run options.
To run a job, follow these steps:
Log in to the Pentaho User Console.
Open Pipeline Designer:
If you are using the Modern Design, in the menu on the left side of the page, select Pipeline Designer.
If you are using the Classic Design, select Switch to the Modern Design, then select Pipeline Designer.
Pipeline Designer opens with the Quick Access section expanded.
In the table at the bottom of the screen, select the Recently opened tab or the Favorites tab.
Open the job:
Search for or browse to the job, then select Open.
Select Open files, then in the Select File or Directory dialog box, select the job and select Open.
In the Canvas Action toolbar, select the Run icon, then select an option:
To run the job, select Run.
To run the job with options, select Run Options. Configure options, then select Run.
For details, see Job run options.
The job runs and the Preview panel opens with the Logging tab selected.
Note: To stop a job while it is running, see Stop transformations and jobs.
Job run options
Select configuration
All jobs are run using the Pentaho server configuration.
Clear log before running
Indicates whether to clear all your logs before you run your job. If your log is large, you might need to clear it before the next execution to conserve space.
Enable safe mode
Checks every row passed through your job and ensures all layouts are identical. If a row does not have the same layout as the first row, an error is generated and reported.
Gather performance metrics
Monitors the performance of your job execution. You can view performance metrics in the Pentaho Data Integration client. For details see Use performance graphs.
Log level
Specifies how much logging is performed and the amount of information captured:
Nothing: No logging occurs.
Error: Only errors are logged.
Minimal: Only use minimal logging.
Basic: This is the default level.
Detailed: Give detailed logging output.
Debug: For debugging purposes, very detailed output.
Row Level (very detailed): Logging at a row level, which generates a lot of log data.
Debug and Row Level logging levels contain information you may consider too sensitive to be shown. Consider the sensitivity of your data when selecting these logging levels. See the Administer Pentaho Data Integration and Analytics guide for instructions on how best to use these logging methods.
Expand Remote Job
Bundles all required files for a job, including its sub-components, so they can be sent to a remote server for execution. The remote server runs the complete job without needing to retrieve additional files from the original environment.
Start job at
Specifies the step where the job begins execution. By default, execution begins at the Start step.
Parameters
Applies parameter values during runtime. A parameter is a local variable. For details, see Parameters.
Variables
Applies temporary values for user-defined and environment variables during runtime. For details, see Variables.
Arguments
Applies a named, user-supplied, single-value input given as a command line argument when running the job manually or with a script. Arguments are handled according to a job's design. If the job is not designed to handle arguments, nothing happens. Typically, argument values are numbers, strings, or system or script variables. Each job can have a maximum of 10 arguments. For details, see Arguments.
Analyze job results
You can see how your job performed and if errors occurred by viewing logs and job metrics. After you run a job, the Logs panel opens with tabs that help you pinpoint errors.
Logging
The Logging tab displays details for the most recent execution of the job. Error lines are highlighted in red.
Job metrics
The Job Metrics tab shows statistics for each step in your job. Statistics include records read and written, processing speed (rows per second), and errors. Steps that caused the job to fail are highlighted in red.
Job steps in Pipeline Designer
Steps extend and expand the functionality of jobs. You can use the following steps in Pipeline Designer.
File management
Delete specified folders. If a the folder contains files, PDI will delete them all.
General
Defines the starting point for job execution. Every job must have one (and only one) Start.
Last updated
Was this helpful?

