Performance monitoring

Use logging and data marts to monitor Pentaho performance.

Use PDI logging for transformation and job status.

Use Pentaho Server logging for exceptions and debugging.

Use Pentaho Operations Mart for prebuilt performance and audit reports.

Use SNMP to integrate third-party monitoring tools.

Best practices for logging

Kettle logging lets you control location, detail level, and output format.

  • Store logs in a centralized database.

  • Ensure the logging user can insert into log tables.

  • Install JDBC drivers on every server and client.

  • Use implied schemas when possible.

  • Use templates for jobs and transformations with logging enabled.

  • Use global Kettle logging variables when possible.

  • Use separate log tables for jobs and transformations.

If you use kettle.properties:

  • Back up kettle.properties before making bulk edits.

  • Keep a master copy and sync it across nodes.

  • Test changes locally, then on the server.

Tracking access to sensitive data

You can log who accessed sensitive data and when.

This helps with audits and incident response.

You can correlate users and IP addresses with SQL, MDX, and parameter values.

Supported tools:

  • Analyzer

  • Mondrian

  • Interactive Reports (PRTI)

  • Dashboards

  • Report Designer (PRPT)

  • CTools (CDA)

  • Content run by the Pentaho Scheduler

MDC tracking properties

These properties use Log4j2 Mapped Diagnostic Context (MDC).

Add the properties you want to log4j2.xml.

  • sessionId: Pentaho session ID. Usually the JSession cookie.

  • sessionName: User name.

  • instanceId: UUID for a report execution instance.

  • remoteAddr: Client IP address.

  • remoteHost: Client host.

  • remotePort: Client port.

  • serverName: Server name accessed by the client.

  • serverPort: Server port accessed by the client.

  • localAddr: Local server IP.

  • localName: Local server name.

  • localPort: Local server port.

Pentaho writes a rolling session_data_access.log file.

Enable MDC logging

1

Stop the Pentaho Server

2

Edit Log4j2 configuration

Open server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes/log4j2.xml.

3

Add an appender for sensitive data access

Add or uncomment an appender inside <Appenders>.

Update the Pattern to include the MDC properties you need.

4

Add loggers that write to the appender

Add or uncomment these entries inside <Loggers>.

5

Restart the Pentaho Server

Open server/pentaho-server/logs/session_data_access.log to review results.

Work with Operations Mart

MDC properties work well with the BA Operations Mart.

You can join MDC attributes to the PRO_AUDIT table:

  • sessionIdPRO_AUDIT.INST_ID

  • sessionNamePRO_AUDIT.ACTOR

  • instanceIdPRO_AUDIT.MESSAGE_NAME

Correlate logs with third-party tools

For large environments, index logs with tools like Splunk or an ELK stack.

Maintain logging

You can maintain system logs using rotation and by monitoring execution status.

Log rotation

This procedure assumes you are not using OS-level log rotation.

Pentaho Server uses Log4j2.

The default log4j2.xml config rotates logs daily.

To rotate by file size, use SizeBasedTriggeringPolicy.

  1. Stop all relevant servers.

  2. Edit server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes/log4j2.xml.

  3. Update the RollingFile appender. Example:

  4. Start servers and validate rotation.

Execution status

Use the PDI Status page to view scheduled and remotely executed jobs and transformations.

Open:

You must sign in first.

The list clears when the server restarts.

It also clears based on object_timeout_minutes.

On Carte (Pentaho Server embedded Carte)

Carte settings are in:

pentaho-server/pentaho-solutions/system/kettle/slave-server-config.xml

circle-exclamation

Key settings:

  • max_log_lines: 0 means no limit.

  • max_log_timeout_minutes: 0 means no timeout.

  • object_timeout_minutes: 0 means no timeout.

Example:

PDI logging

PDI logging gives you job and transformation logs without digging through full server logs.

For Pentaho Server, this logging is separate from the platform pentaho.log.

Configure the PDI log file

Transformation and job logging is enabled by default.

Logging levels and rotation are configured separately for the PDI client and server.

  1. Stop the Pentaho Server or close the PDI client.

  2. Edit log4j2.xml:

    • Pentaho Server: server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes/log4j2.xml

    • PDI client: design-tools/data-integration/classes/log4j2.xml

  3. Set logger levels for:

    • org.pentaho.di.trans.Trans

    • org.pentaho.di.job.Job

PDI log level mapping:

PDI log level
Log4j2 level

BASIC

INFO

DETAILED

INFO

MINIMAL

WARN

DEBUG

DEBUG

ERROR

ERROR

ROWLEVEL

TRACE

  1. Set a rollover pattern for pdi-execution-appender using FileNamePattern:

Parameter
Rotation

yyyy-MM-dd

Daily (default)

yyyy-MM

Monthly

yyyy-MM-dd-HH-mm

Every minute

  1. Set the timestamp format for pdi-execution-appender using Pattern.

Examples:

Pattern
Example

{yyyy-MM-dd HH:mm:ss.SSS}

2012-11-02 14:34:02.123

{HH:mm:ss,SSS}

14:34:02,123

{dd MMM yyyy HH:mm:ss,SSS}

02 Nov 2012 14:34:02,123

{MMM dd,yyyy HH:mm:ss}

Nov 02,2012 14:34:02

circle-info

If the format is invalid, PDI falls back to yyyy/MM/dd HH:mm:ss.

Optional: add a time zone:

  • {yyyy-MM-dd HH:mm:ss.SSS}{GMT-5}

  • {HH:mm:ss}{GMT+0}

  1. Save and restart.

Read the PDI log file

Log file locations:

  • Pentaho Server: server/pentaho-server/logs/pdi.log

  • PDI client: design-tools/data-integration/logs/pdi.log

Common fields:

  • DateTimeStamp: date and time.

  • LogThreshold: INFO, ERROR, DEBUG, WARN, TRACE.

  • ThreadID: unique execution key.

  • Filepath: absolute path to the transformation or job.

  • Message: log message.

Example:

2018-03-07 11:40:36.290 INFO <Launch transformation UUID: 1246b616-a845-4cbc-9f4c-8a4a2cbfb4f1> [...] Starting entry

Third-party monitoring with SNMP

You can use the SNMP plugin to integrate third-party tools for monitoring PDI events.

These events help you identify long-running jobs and transformations.

Events can also capture milestones like database connections and Carte startup.

Before you begin

Download pentaho-monitoring-plugin-mib-assembly-11.0.0.0-<build number>.zip from the Support Portalarrow-up-right.

The ZIP contains the MIB file you upload to your monitoring tool.

Send traps through a centralized monitoring server

You can use a Pentaho Server as a JMS monitoring server.

That server forwards events to your SNMP manager.

Step 1: Configure Pentaho Server to forward traps

You need the IP address of the JMS monitoring server.

  1. Stop the Pentaho Server.

  2. Open the SNMP config file in:

    pentaho-solutions/system/karaf/system/pentaho/pentaho-osgi-config/11.0.0.0-<build number>/

  3. Update:

    • fromHost: IP of this Pentaho Server

    • toHost: IP of the JMS monitoring server

    • port (default is 162)

  4. In pentaho-solutions/system/karaf/etc/org.apache.karaf.features.cfg, add:

  5. If pentaho-server/pentaho-solutions/system/karaf/etc/pentaho.snmp.cfg exists, delete it.

  6. Start the Pentaho Server.

Step 2: Configure your monitoring tool

Steps vary by tool.

For Nagios-style workflows:

  1. Upload PENTAHO-MIB-V2C.mib to your monitoring tool.

  2. Convert it to a .conf file using snmpttconvertmib.

  3. Add the generated .conf file to snmptt.ini.

  4. Restart the SNMP components and the monitoring tool.

Send traps directly from PDI tools

You can also send traps from PDI tools like Pan, Kitchen, Spoon, or Carte.

Enable the pentaho-monitoring-to-snmp feature in the relevant Karaf features file.

Configure monitored event types

By default, Pentaho monitors all supported event types.

To reduce noise, edit the comma-separated list in monitoring.properties.

Keep only the event types you want.

Restart the Pentaho Server after changes.

Available extension points

These extension points can emit monitoring events:

Extension point
Description

TransformationPrepareExecution

Transformation prepares execution

TransformationStart

Transformation starts

TransformationHeartbeat

Periodic signal during execution

TransformationFinish

Transformation finishes

JobStart

Job starts

JobHeartbeat

Periodic signal during execution

JobFinish

Job finishes

JobBeforeJobEntryExecution

Before a job entry runs

JobAfterJobEntryExecution

After a job entry runs

DatabaseConnected

After a successful connection

DatabaseDisconnected

After a successful disconnection

CarteStartup

After Carte starts

CarteShutdown

Before Carte stops

Pentaho Operations Mart

Pentaho Operations Mart helps you monitor Pentaho Server performance.

It includes:

  • Business Analytics (BA) Operations Mart. Aggregates Pentaho Server audit data.

  • Data Integration (DI) Operations Mart. Stores and summarizes PDI logging data.

Business Analytics Operations Mart

Pentaho can automatically set up Operations Mart content for Analyzer, Interactive Reports, Report Designer, and dashboards.

BA Operations Mart aggregates data from Pentaho Server log files into prebuilt audit reports.

If the prebuilt reports do not meet your needs, use the DI Operations Mart to change them.

Download and install Operations Mart files

Operations Mart files are stored in a prepackaged ZIP file.

To install, stop the Pentaho Server, download and unpack the ZIP, then restart the server to import the files.

  1. Stop the Pentaho Server.

    circle-info

    See the Install Pentaho Data Integration and Analytics document for start and stop steps.

  2. Download pentaho-operations-mart-11.0.0.0-<build number>.zip from the Support Portalarrow-up-right.

    1. Sign in using the support credentials from your welcome packet.

    2. In the Pentaho card, select Download.

    3. In the 11.x list, select Pentaho 11.0 GA Release.

    4. Scroll to the bottom of the release page.

    5. In the file component section, select the Operations Mart folder.

    6. Download the ZIP file.

  3. Unpack the ZIP file to a temporary directory.

  4. Move the unpacked files to pentaho/server/pentaho-server/pentaho-solutions/system/default-content.

  5. In default-content, delete files you do not need for your repository database type.

    Repository database type
    Files to keep

    PostgreSQL

    pentaho-operations-mart-clean-11.0.0.0-<build number>.zip* pentaho-operations-mart-etl-11.0.0.0-<build number>.zip pentaho-operations-mart-operations-bi-11.0.0.0-<build number>.zip

    MySQL

    pentaho-operations-mart-clean-mysql5-11.0.0.0-<build number>.zip* pentaho-operations-mart-etl-mysql5-11.0.0.0-<build number>.zip pentaho-operations-mart-operations-bi-11.0.0.0-<build number>.zip

    Oracle

    pentaho-operations-mart-clean-oracle10g-11.0.0.0-<build number>.zip* pentaho-operations-mart-etl-oracle10g-11.0.0.0-<build number>.zip pentaho-operations-mart-operations-bi-11.0.0.0-<build number>.zip

    Microsoft SQL Server

    pentaho-operations-mart-clean-mssql-11.0.0.0-<build number>.zip* pentaho-operations-mart-etl-mssql-11.0.0.0-<build number>.zip pentaho-operations-mart-operations-bi-11.0.0.0-<build number>.zip

    * Keep the pentaho-operations-mart-clean-<database>-11.0.0.0-<build number>.zip file only if you want Pentaho to delete old entries on a schedule. See Clean up the BA Operations Mart.

  6. Restart the Pentaho Server.

Increase the maximum character length in audit table fields

You can increase the maximum audit table field length from 200 to 1024 characters.

The scripts in this procedure adjust the column width and reindex the Operations Mart tables.

  1. Stop the Pentaho Server.

    circle-info

    See the Install Pentaho Data Integration and Analytics document for start and stop steps.

  2. Download pentaho-server-ee-11.0.0.0-<build number>.zip from the Support Portalarrow-up-right.

  3. Unpack the ZIP to a temporary directory.

  4. Go to pentaho/server/pentaho-server/data.

  5. Open the directory for your repository database type: mysql, oracle, postgresql, or sqlserver.

  6. Locate:

    • alter_script_<repository database type>_BISERVER-13674.sql

    • pentaho_mart_upgrade_audit_<repository database type>.sql

  7. Run the scripts in order.

  8. Delete the temporary directory.

  9. Start the Pentaho Server.

Choose a prebuilt Operations Mart report

Choose the report that fits your needs.

Then follow View and edit Operations Mart reports.

Information shown
Report name

Time to run a report

Content Duration

Content failures by time range

Content Failures

Compare login metrics by day in month

Content Request Day of Month

Compare login metrics by day in week

Day of Week Request and Login Metrics

Content sorted by type

Content Type Usage

Content usage by time range

Content Usage

Compare login metrics by hour

Hours in Day Request and Login Metrics

Login duration and logins per user

Session Duration Counts

View and edit Operations Mart reports

  1. Double-click Browse Files.

  2. Select public > Pentaho Operations Mart > BA Audit Reports.

  3. Select the file, then click Open.

  4. Optional: edit the report in the tool you used to open it.

Create Operations Mart reports

If the prebuilt reports do not meet your needs, create your own.

  1. In the User Console, select Create New.

  2. Select the report type.

  3. Select the data source you need.

    Information shown
    Data source

    .xaction execution details

    pentaho_operations_mart: BA Operations Mart - Component

    Content execution details

    pentaho_operations_mart: BA Operations Mart - Content

    Session metrics

    pentaho_operations_mart: BA Operations Mart - User Session

    Job entry executions

    pentaho_operations_mart: PDI Operations Mart - Job Entry

    Server performance detail

    pentaho_operations_mart: - PDI Operations Mart - Performance

    Step executions

    pentaho_operations_mart: PDI Operations Mart - Step

    Transformation and job runs

    pentaho_operations_mart: PDI_Operations_Mart

Update the Operations Mart

  1. In the User Console, select Browse Files > public > Pentaho Operations Mart > Update Audit Mart.

  2. Double-click Update Operations Mart Date & Time or Update BA Operations Mart Data.

  3. View a report to confirm updates.

Clean up the BA Operations Mart

Operations Mart tables remove entries older than 365 days by default.

To change the age, add ba.cleanup.max.age to {user.home}/.kettle/kettle.properties.

  1. Stop the Pentaho Server.

  2. Open {user.home}/.kettle/kettle.properties.

  3. Add ba.cleanup.max.age. Example:

  4. Save the file.

  5. Restart the Pentaho Server.

Data Integration Operations Mart

The DI Operations Mart stores and summarizes PDI log data.

Use it to monitor jobs and transformations, and build reports and dashboards.

Set up the Data Integration Operations Mart

Setup depends on your installation method.

Follow these instructions if you installed Pentaho using the archive method.

Required databases

Your Pentaho Server and repository must use one of:

  • PostgreSQL

  • MySQL or MariaDB

  • Oracle

  • Microsoft SQL Server

Required scripts

You need:

  • pentaho_logging_<databasename>.sql

  • pentaho_mart_<databasename>.sql

<databasename> matches your repository database type:

  • postgresql

  • mysql15 (MariaDB only)

  • oracle10g or oracle12c

  • sqlserver

Overview

  • Step 1: Get the Data Integration Operations Mart files

  • Step 2: Run the setup scripts

  • Step 3: Set the global Kettle logging variables

  • Step 4: Add the JNDI connections for logging

  • Step 5: Add a JDBC connection for the Pentaho Server

  • Step 6: Add the DI Operations Mart ETL solutions to the default content folder

  • Step 7: Initialize the DI Operations Mart

  • Step 8: Verify the DI Operations Mart is working

Step 1: Get the Data Integration Operations Mart files

If you used a PostgreSQL repository, skip to Step 2.

If you do not have pentaho-operations-mart-11.0.0.0-<build number>.zip, download it from the Support Portalarrow-up-right.

  1. Download pentaho-server-ee-11.0.0.0-<build number>.zip from the Support Portalarrow-up-right.

  2. Unpack the ZIP file to a temporary directory.

  3. Verify these files exist:

    • pentaho-operations-mart-operations-di-11.0.0.0-<build number>.zip

    • pentaho-operations-mart-operations-bi-11.0.0.0-<build number>.zip

  4. Verify you also have the two files required for your repository database type:

    Repository database type
    Required files

    Oracle

    pentaho-operations-mart-etl-oracle10g-11.0.0.0-<build number>.zip pentaho-operations-mart-clean-oracle10g-11.0.0.0-<build number>.zip

    MySQL5

    pentaho-operations-mart-etl-mysql5-11.0.0.0-<build number>.zip pentaho-operations-mart-clean-mysql5-11.0.0.0-<build number>.zip

    Microsoft SQL Server

    pentaho-operations-mart-etl-mssql-11.0.0.0-<build number>.zip pentaho-operations-mart-clean-mssql-11.0.0.0-<build number>.zip

Step 2: Run the setup scripts

Scripts are in:

<install-directory>/pentaho-server/data/<databasename>

Run in order:

  1. pentaho_logging_<databasename>.sql

  2. pentaho_mart_<databasename>.sql

Step 3: Set the global Kettle logging variables

Run this on the machine with the PDI client and Pentaho Server.

kettle.properties is created at:

$USER_HOME/.kettle/kettle.properties

  1. In Spoon, select Edit > Edit the kettle.properties file.

  2. Add or edit variables for the log tables.

    circle-info

    For Oracle and Microsoft SQL Server, leave the value blank for variables that contain _SCHEMA_.

    Variable
    Value

    KETTLE_CHANNEL_LOG_DB

    live_logging_info

    KETTLE_CHANNEL_LOG_TABLE

    channel_logs

    KETTLE_CHANNEL_LOG_SCHEMA

    pentaho_dilogs

    Variable
    Value

    KETTLE_JOBENTRY_LOG_DB

    live_logging_info

    KETTLE_JOBENTRY_LOG_TABLE

    jobentry_logs

    KETTLE_JOBENTRY_LOG_SCHEMA

    pentaho_dilogs

    Variable
    Value

    KETTLE_JOB_LOG_DB

    live_logging_info

    KETTLE_JOB_LOG_TABLE

    job_logs

    KETTLE_JOB_LOG_SCHEMA

    pentaho_dilogs

    Variable
    Value

    KETTLE_METRICS_LOG_DB

    live_logging_info

    KETTLE_METRICS_LOG_TABLE

    metrics_logs

    KETTLE_METRICS_LOG_SCHEMA

    pentaho_dilogs

    Variable
    Value

    KETTLE_STEP_LOG_DB

    live_logging_info

    KETTLE_STEP_LOG_TABLE

    step_logs

    KETTLE_STEP_LOG_SCHEMA

    pentaho_dilogs

    Variable
    Value

    KETTLE_TRANS_LOG_DB

    live_logging_info

    KETTLE_TRANS_LOG_TABLE

    trans_logs

    KETTLE_TRANS_LOG_SCHEMA

    pentaho_dilogs

    Variable
    Value

    KETTLE_TRANS_PERFORMANCE_LOG_DB

    live_logging_info

    KETTLE_TRANS_PERFORMANCE_LOG_TABLE

    transperf_logs

    KETTLE_TRANS_PERFORMANCE_LOG_SCHEMA

    pentaho_dilogs

Step 4: Add the JNDI connections for logging

  1. Go to <pdi-install>/data-integration/simple-jndi.

  2. Open jdbc.properties.

  3. Update properties for your database type.

PostgreSQL example:

Step 5: Add a JDBC connection for the Pentaho Server

  1. Go to <pentaho-server>/tomcat/webapps/pentaho/META-INF/.

  2. Open context.xml.

  3. Update the resources for your database type.

Optional: for encrypted passwords, set:

factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory"

Step 6: Add the DI Operations Mart ETL solutions to the default content folder

If you use PostgreSQL for the repository database, skip to Step 7.

  1. Stop the Pentaho Server.

  2. Locate:

    • pentaho-operations-mart-operations-di-11.0.0.0-<build number>.zip

    • pentaho-operations-mart-operations-bi-11.0.0.0-<build number>.zip

  3. Locate the two ZIP files that match your repository database type.

  4. Copy all four ZIP files into:

    $PENTAHO_HOME/pentaho-server/pentaho-solution/system/default-content

  5. Start the Pentaho Server.

    circle-info

    After processing, the server renames the ZIP files with a timestamp. Keep the renamed files.

Step 7: Initialize the DI Operations Mart

  1. Launch Spoon.

  2. Connect to the Pentaho Repository through the Pentaho Server.

  3. Select File > Open.

  4. Select Browse Files > Public > Pentaho Operations Mart > DI Ops Mart ETL.

  5. Open each transformation and job.

  6. In each, open Properties > Logging.

  7. Run Fill_in_DIM_DATE_and_DIM_TIME.

  8. Run a few sample KTRs to generate log activity.

  9. Run Update_Dimensions_then_Logging_Data.

Step 8: Verify the DI Operations Mart is working

  1. In the User Console, select Browse Files > Public > Pentaho Operations Mart > DI Audit Reports > Last_Run.

  2. Verify jobs and transformations ran.

Charts, reports, and dashboards

After you populate the DI Operations Mart, use the User Console to create reports, charts, and dashboards.

Pentaho provides prebuilt content you can modify.

To understand fields, see Logging dimensions and metrics.

Logging tables status

Transformation log status values

Status
Description

start

Started. Stays until end when no interval is set.

end

Ended successfully.

stop

Stopped by a user.

error

Failed with an error.

running

Briefly after starting. Not shown without an interval.

paused

Paused by a user. Not shown without an interval.

Job log status values

Status
Description

start

Started. Stays until end when no interval is set.

end

Ended successfully.

stop

Stopped by a user.

error

Failed with an error.

running

Briefly after starting. Not shown without an interval.

paused

Paused by a user. Not shown without an interval.

Logging dimensions and metrics

These tables identify the dimensions and metrics used to create ETL log charts and reports.

Fact table (fact_execution)

Field name
Description

execution_date_tk

Technical key linking to the execution date.

execution_time_tk

Technical key linking to the execution time.

batch_tk

Technical key linking to batch information.

execution_tk

Technical key linking to execution information.

executor_tk

Technical key linking to executor information.

parent_executor_tk

Technical key linking to parent executor information.

root_executor_tk

Technical key linking to root executor information.

execution_timestamp

Date and time of execution.

duration

Duration in seconds (LOGDATE to max DEPDATE).

rows_input

Rows read from disk or network by the step.

rows_output

Rows output during execution.

rows_read

Rows read from the step input stream.

rows_written

Rows written during execution.

rows_rejected

Rows rejected during execution.

errors

Errors during execution.

Batch dimension (dim_batch)

Field name
Description

batch_tk

Technical key for batch info.

batch_id

Batch ID.

logchannel_id

Logging channel ID.

parent_logchannel_id

Parent logging channel ID.

Date dimension (dim_date)

Field name
Description

date_tk

Technical key linking to date.

date_field

Date value.

ymd

YYYY-MM-DD.

ym

YYYY-MM.

year

Year.

quarter

Quarter number (1-4).

quarter_code

Q1 to Q4.

month

Month number (1-12).

month_desc

Month name.

month_code

JAN, FEB, and so on.

day

Day number (1-31).

day_of_year

Day of year (1-366).

day_of_week

Day of week (1-7).

day_of_week_desc

Day name.

day_of_week_code

SUN, MON, and so on.

week

Week of year (1-53).

Execution dimension (dim_execution)

Field name
Description

execution_tk

Technical key for execution info.

execution_id

Unique execution ID.

server_name

Server name.

server_host

Server host.

executing_user

User who initiated execution.

execution_status

start, stop, end, error.

Executor dimension (dim_executor)

Field name
Description

executor_tk

Technical key for executor.

version

Executor version.

date_from

Start of validity range.

date_to

End of validity range.

executor_id

Executor ID.

executor_source

Source location.

executor_environment

Reserved for future use.

executor_type

job or transformation.

executor_name

Executor name.

executor_desc

Description.

executor_revision

Revision string.

executor_version_label

Change comments.

exec_enabled_table_logging

Y or N.

exec_enabled_detailed_logging

Y or N.

exec_enabled_perf_logging

Y or N.

exec_enabled_history_logging

Y or N.

last_updated_date

Last update date.

last_updated_user

Last update user.

Log table (dim_log_table)

circle-exclamation
Field name
Description

log_table_tk

Technical key.

object_type

Object type.

table_connection_name

Log table connection name.

table_name

Log table name.

schema_name

Log table schema.

step_entry_table_conn_name

Step/job entry table connection name.

step_entry_table_name

Step/job entry table name.

step_entry_schema_name

Step/job entry table schema.

perf_table_conn_name

Performance table connection name.

perf_table_name

Performance table name.

perf_schema_name

Performance table schema.

Time-of-day dimension (dim_time)

Field name
Description

time_tk

Technical key linking to time-of-day.

hms

HH:MM:SS.

hm

HH:MM.

ampm

am or pm.

hour

0-23.

hour12

1-12.

minute

0-59.

second

0-59.

Step fact table (fact_step_execution)

Field name
Description

execution_date_tk

Technical key for execution date.

execution_time_tk

Technical key for execution time.

batch_tk

Technical key for batch.

executor_tk

Technical key for executor.

parent_executor_tk

Technical key for parent executor.

root_executor_tk

Technical key for root executor.

execution_timestamp

Execution date and time.

step_tk

Technical key for step.

step_copy

Step copy number.

rows_input

Rows read by the step.

rows_output

Rows written by the step.

rows_read

Rows read from previous steps.

rows_written

Rows written to following steps.

rows_rejected

Rows rejected by error handling.

errors

Errors during execution.

Step dimension (dim_step)

Field name
Description

step_tk

Technical key for step/job entry.

step_id

Step/job entry name.

original_step_name

Reserved for future use.

Job entry fact table (fact_jobentry_execution)

Field name
Description

execution_date_tk

Technical key for date.

execution_time_tk

Technical key for time.

batch_tk

Technical key for batch.

executor_tk

Technical key for executor.

parent_executor_tk

Technical key for parent executor.

root_executor_tk

Technical key for root executor.

step_tk

Technical key for job entry.

execution_timestamp

Execution timestamp.

rows_input

Rows read.

rows_output

Rows written.

rows_read

Rows read from input stream.

rows_written

Rows written to output stream.

rows_rejected

Rows rejected.

errors

Errors.

result

Y or N.

nr_result_rows

Result rows after execution.

nr_result_files

Result files after execution.

Execution performance fact table (fact_perf_execution)

Field name
Description

execution_date_tk

Technical key for date.

execution_time_tk

Technical key for time.

batch_tk

Technical key for batch.

executor_tk

Technical key for executor.

parent_executor_tk

Technical key for parent executor.

root_executor_tk

Technical key for root executor.

step_tk

Technical key for step.

seq_nr

Snapshot sequence number.

step_copy

Step copy number.

execution_timestamp

Execution timestamp.

rows_input

Rows read during interval.

rows_output

Rows written during interval.

rows_read

Rows read from previous steps.

rows_written

Rows written to following steps.

rows_rejected

Rows rejected during interval.

errors

Errors.

input_buffer_rows

Input buffer size at snapshot time.

output_buffer_rows

Output buffer size at snapshot time.

Clean up the DI Operations Mart tables

Run a cleanup job or transformation to delete data older than a maximum age.

  1. In Spoon, open:

    • Clean_up_PDI_Operations_Mart.kjb (job) or

    • Clean_up_PDI_Operations_Mart_fact_table.ktr (transformation)

  2. Set parameters:

    • max.age.days (required): maximum data age in days

    • schema.prefix (optional): for PostgreSQL, set <schema>. (with trailing period)

  3. Run it.

See the Pentaho Data Integration document for scheduling cleanup.

Last updated

Was this helpful?