Working with jobs

Create, configure, and run jobs to orchestrate ETL activities. After running a job, analyze its results to identify improvements and problems.

In this topic

Create a job

Create a job to coordinate resources, execution, and dependencies of an ETL activity.

To create a job in Pipeline Designer, follow these steps:

  1. Log in to the Pentaho User Console.

  2. Open Pipeline Designer:

    • If you are using the Modern Design, in the menu on the left side of the page, select Pipeline Designer.

    • If you are using the Classic Design, select Switch to the Modern Design, then select Pipeline Designer.

    Pipeline Designer opens with the Quick Access section expanded.

  3. In the Job card, select Create Job.

    A new, blank job opens with the Design pane selected.

  4. Add steps to the job:

    1. In the Design pane, search for or browse to each step you want to use in the job.

    2. Drag the steps you want to use onto the canvas.

  5. Work with steps on the canvas.

    Hover over a step to open the step menu, then select an option:

    Menu option
    Description

    Delete

    Deletes the step from the canvas.

    Edit

    Opens the Step Name window where you can configure the properties of the step. Step properties may appear in multiple sections, tabs, or both.

    Note: To learn more about the step you're configuring, in the lower-left corner of the Step Name window, click Help.

    Duplicate

    Adds a copy of the step to the canvas.

  6. Add hops between steps.

    Hover over a step handle until a plus sign (+) appears, then drag the connection to another step handle.

  7. Optional: Add a note on the canvas.

    In the canvas toolbar, select the Add Note icon. In the Notes dialog box, enter your note, then select Save.

    Note: To format the note, select Style and set font, color, and shadow options.

  8. Save the job:

    1. Select Save.

      The Select File or Directory dialog box opens.

    2. Search for or browse to the folder where you want to save the job.

    3. Optional: Create a folder.

      Select the New Folder icon. In the New folder dialog box, enter a folder name, then select Save.

    4. Optional: Delete a folder.

      Select the folder, then select the Delete icon.

    5. In the Select File or Directory dialog box, select Save.

      The Save Change dialog box opens.

    6. Select Yes to confirm.

Edit job properties

Job properties control how a job behaves and how it logs what it is doing.

To configure job properties, follow these steps:

  1. Log in to the Pentaho User Console.

  2. Open Pipeline Designer:

    • If you are using the Modern Design, in the menu on the left side of the page, select Pipeline Designer.

    • If you are using the Classic Design, select Switch to the Modern Design, then select Pipeline Designer.

    Pipeline Designer opens with the Quick Access section expanded.

  3. In the table at the bottom of the screen, select the Recently opened tab or the Favorites tab.

  4. Open the job:

    1. Search for or browse to the job, then select Open.

    2. Select Open files, then in the Select File or Directory dialog box, select the job and select Open.

  5. In the Canvas Action toolbar, select the Settings icon.

    The Job Properties window opens.

  6. Configure the properties in each tab.

    For details, see the tab sections in this topic.

  7. Optional: Generate SQL for the logging table.

    1. Select SQL.

      The Simple SQL editor opens with DDL generated from the job properties.

    2. Optional: Edit the SQL statements.

      See Use the SQL Editor.

    3. Optional: Clear cached results.

      Select Clear cache.

    4. Select Execute.

  8. Select Save.

Job tab

General properties for jobs are found on the Job tab.

Option
Description

Job Name

The name of the job.

Note: This information is required if you want to save to a repository.

Job filename

The file name of the job if it is not stored in the repository.

Description

A user-defined short description of the job which is shown in the repository explorer.

Extended description

A user-defined longer description of the job.

Status

The status of the job. The values are draft and production.

Version

A description of the version.

Directory

The directory in the repository where the job is kept.

Created by

The original creator of the job.

Created at

The date and time when the job was created.

Last modified by

The name of the last user who modified the job.

Last modified at

The date and time when the job was last modified.

Parameters tab

Use the Parameters tab to define parameters for your jobs.

Option
Description

Parameter

A user-defined parameter.

Default value

The default value of the user-defined parameter.

Description

A description of the parameter.

Settings tab

Option
Description

Pass batch ID?

Select to pass the identification number of the batch to the transformation.

Shared objects file

PDI uses a single shared objects file for each user. The default filename is shared.xml and is located in the .kettle directory in the user’s home directory. You can define a different shared objects file, location, and name.

Log tab

Use the Log tab to specify logging settings.

Option
Description

Log connection

Specify the database connection you are using for logging. You can configure a new connection by selecting New.

Log Schema

Specify the schema name, if supported by your database.

Log table

Specify the name of the log table. If you also use transformation logging, use a different table name for job logging.

Logging interval (seconds)

Specify the interval in which logs are written to the table. This property only applies to Transformation and Performance logging types.

Log line timeout (days)

Specify the number of days to keep log entries in the table before they are deleted. This property only applies to Transformation and Performance logging types.

Log size limit in lines

Enter the limit for the number of lines that are stored in the LOG_FIELD. PDI stores logging for the transformation in a long text field (CLOB). This property only applies to Transformation and Performance logging types.

SQL button

Generates the SQL needed to create the logging table and lets you run the SQL statement.

Run a job

After you create a job and configure its properties, you can run the job. You can also control how the job runs without modifying the job itself by configuring run options.

To run a job, follow these steps:

  1. Log in to the Pentaho User Console.

  2. Open Pipeline Designer:

    • If you are using the Modern Design, in the menu on the left side of the page, select Pipeline Designer.

    • If you are using the Classic Design, select Switch to the Modern Design, then select Pipeline Designer.

    Pipeline Designer opens with the Quick Access section expanded.

  3. In the table at the bottom of the screen, select the Recently opened tab or the Favorites tab.

  4. Open the job:

    1. Search for or browse to the job, then select Open.

    2. Select Open files, then in the Select File or Directory dialog box, select the job and select Open.

  5. In the Canvas Action toolbar, select the Run icon, then select an option:

    1. To run the job, select Run.

    2. To run the job with options, select Run Options. Configure options, then select Run.

      For details, see Job run options.

The job runs and the Preview panel opens with the Logging tab selected.

Note: To stop a job while it is running, see Stop transformations and jobs.

Job run options

Option
Description

Select configuration

All jobs are run using the Pentaho server configuration.

Clear log before running

Indicates whether to clear all your logs before you run your job. If your log is large, you might need to clear it before the next execution to conserve space.

Enable safe mode

Checks every row passed through your job and ensures all layouts are identical. If a row does not have the same layout as the first row, an error is generated and reported.

Gather performance metrics

Monitors the performance of your job execution. You can view performance metrics in the Pentaho Data Integration client. For details see Use performance graphsarrow-up-right.

Log level

Specifies how much logging is performed and the amount of information captured:

  • Nothing: No logging occurs.

  • Error: Only errors are logged.

  • Minimal: Only use minimal logging.

  • Basic: This is the default level.

  • Detailed: Give detailed logging output.

  • Debug: For debugging purposes, very detailed output.

  • Row Level (very detailed): Logging at a row level, which generates a lot of log data.

Debug and Row Level logging levels contain information you may consider too sensitive to be shown. Consider the sensitivity of your data when selecting these logging levels. See the Administer Pentaho Data Integration and Analytics guide for instructions on how best to use these logging methods.

Expand Remote Job

Bundles all required files for a job, including its sub-components, so they can be sent to a remote server for execution. The remote server runs the complete job without needing to retrieve additional files from the original environment.

Start job at

Specifies the step where the job begins execution. By default, execution begins at the Start step.

Parameters

Applies parameter values during runtime. A parameter is a local variable. For details, see Parameters.

Variables

Applies temporary values for user-defined and environment variables during runtime. For details, see Variables.

Arguments

Applies a named, user-supplied, single-value input given as a command line argument when running the job manually or with a script. Arguments are handled according to a job's design. If the job is not designed to handle arguments, nothing happens. Typically, argument values are numbers, strings, or system or script variables. Each job can have a maximum of 10 arguments. For details, see Arguments.

Analyze job results

You can see how your job performed and if errors occurred by viewing logs and job metrics. After you run a job, the Logs panel opens with tabs that help you pinpoint errors.

Logging

The Logging tab displays details for the most recent execution of the job. Error lines are highlighted in red.

Job metrics

The Job Metrics tab shows statistics for each step in your job. Statistics include records read and written, processing speed (rows per second), and errors. Steps that caused the job to fail are highlighted in red.

Job steps in Pipeline Designer

Steps extend and expand the functionality of jobs. You can use the following steps in Pipeline Designer.

Name
Category
Description

Utility

Abort the job.

Conditions

Checks if files exist.

File management

Create a folder.

File management

Create an empty file.

File management

Delete a file.

File management

Delete files.

File management

Delete specified folders. If a the folder contains files, PDI will delete them all.

File management

Compare two files.

File management

Get or upload a file using HTTP (Hypertext Transfer Protocol).

General

Execute a job.

General

Set one or several variables.

Scripting

Execute a shell script.

General

Defines the starting point for job execution. Every job must have one (and only one) Start.

General

Clear any error state encountered in a job and forces it to a success state.

General

Run a transformation.

Conditions

Wait for a delay.

File management

Wait for a file.

Utility

Write message to log.

Last updated

Was this helpful?