Optimize Pentaho Analyzer

Once you've properly tuned your data warehouse, you can move on to tuning your ROLAP schema, the Mondrian engine, and the Analyzer client tool.

Partitioning high-cardinality dimensions

If you cannot avoid creating high-cardinality dimensions, then you must devise a strategy to make them more performant without reducing their size. Typically a database will partition large tables, which makes querying one partition a quick operation. However, the Analyzer engine does not have any way of detecting which tables are partitioned and which are not. Therefore, MDX queries will be translated into SQL statements that are too broad, resulting in a query that traverses all of a table's partitions.

To instruct the Analyzer engine to properly address a (partitioned) high-cardinality dimension, you must modify the ROLAP schema and explicitly set the highCardinality property of the ElementCubeDimension element to true on each applicable dimension. This will streamline SQL generation for partitioned tables; ultimately, only the relevant partitions will be queried, which could greatly increase query performance.

Mondrian log analysis

To determine Analyzer performance problems, you can view a log of the Analyzer engine and your data warehouse database. This will reveal information about the infrastructure and the SQL and MDX queries involved in your Analyzer calculations. Your DBA should perform the initial database performance-tuning work by looking at the database logs, ensuring that statistics are up to date (access plans are computed and rational) and that your usage is profiled. Make sure the aggregation levels are based on the top 50-80 common uses.

Base all of your performance tuning on this data; it will tell you everything that you need to know about bottlenecks in your data structure.

You can also determine the causes behind hanging queries in an Analyzer report by viewing Mondrian log information directly through the Analyzer interface:

  1. Log into the BA Server as an administrator.

  2. Create or load an Analyzer report.

  3. Click the More actions and options icon in the report toolbar, and select Administration > Clear Cache. Click Ok.

  4. Click the icon again and select Administration > XML. Click Ok.

  5. Click the icon again and select Administration > Log.

A new browser tab will open with log information about the open report. You can refresh this page to see the query progress in real time. The following log entries are the most important to watch out for:

  • If each SQL query is reported twice. The first time is for Mondrian to get the first record and the second time is to retrieve all records

  • SQL queries with high execution times

  • SQL queries that return large volumes of data (more than 1000 rows)

  • SQL queries that don't join tables

  • SQL queries that don't include filters

  • This log entry: WARN mondrian.rolap.RolapUtil Unable to use native SQL evaluation for 'NonEmptyCrossJoin'; reason: arguments not supported. If you see this, try switching the contains filter into an includes filter, or make the contains filter more selective

Configuring Pentaho Analyzer for large data warehouses

Analyzer has some low-level configuration options that will improve performance when working with large data warehouses and high-cardinality dimensions:

  • filter.members.max.count=500

  • filter.dialog.apply.report.context=false

  • filter.dialog.useTopCount=true

  • report.request.service.result.expire.time.seconds=30

  • report.request.service.result.cleanup.time.seconds=300

These analyzer.properties settings are explained in the following table:

Property

Description

filter.members.max.count

Controls the maximum number of values to show in the filter dialogue, such as include/exclude filters and date range dropdowns.

filter.dialog.apply.report.context

If set to true, when showing available members in the filter dialog, Analyzer will limit those members to the existing filters or measures on the report. This means that when retrieving the list of members, Analyzer will perform the join in the fact table and then apply dimension filters. For a high-cardinality dimension, this may significantly reduce the list of members loaded into memory.

filter.dialog.useTopCount

If both this and mondrian.native.topcount.enable in mondrian.properties are set to true, when showing the first set of members in the filter dialogue, Analyzer will only show that set of members sorted within hierarchy. For high-cardinality dimensions, this is required to avoid loading all members into memory. However, if a user uses the Find box in the filter dialogue or if you have filter.dialog.apply.report.context set to true, then the TopCount will not be used.

report.request.service.result.expire.time.seconds

Report results are released after this amount of time has passed.Analyzer report requests are processed asynchronously and immediately cleaned up after the first download. While this is efficient because clients usually don't need to download a report more than once, it causes issues with popup blockers that will block the first download and re-submit the download after prompting the user. If you expire the request after 30 seconds, you will work around the popup blocker issues while also enabling people to refresh the browser to re-download a report. This only applies to PDF, Excel or CSV downloads.

report.request.service.result.cleanup.time.seconds

Report result cleanup occurs after this amount of time.

Configuring the Mondrian engine for large data warehouses

There are several mondrian.properties options that control how the Analyzer engine interacts with large data warehouse volumes in conjunction with Pentaho Analyzer:

  • mondrian.result.limit=5000000

  • mondrian.rolap.iterationLimit=5000000

  • mondrian.rolap.queryTimeout=300

  • mondrian.native.crossjoin.enable=true

  • mondrian.native.topcount.enable=true

  • mondrian.native.filter.enable=true

  • mondrian.native.nonempty.enable=true

  • mondrian.rolap.maxConstraints=1000

  • mondrian.native.ExpandNonNative=true

  • mondrian.expCache.enable=true

These mondrian.properties settings are explained in the following table:

Property
Description

mondrian.result.limit

Controls the largest cross join size that Mondrian will handle in-memory. Ideally, no queries should involve large cross joins in-memory; instead, they should be handled by the database.

mondrian.rolap.iterationLimit

This is similar to mondrian.result.limit, except this applies to calculating aggregates in-memory such as SUM, MAX, AGGREGATE, etc. This should be set to the same value as mondrian.result.limit.

mondrian.rolap.queryTimeout

If any query runs past this number of seconds, then the query is immediately cancelled. The total sum of all SQL statements to process a single MDX statement must be less than this timeout. Setting this to zero disables query timeout, which is not recommended because runaway queries can deprive system resources from other necessary processes.

mondrian.native.crossjoin.enable

If this is set to true, when Mondrian needs to cross join multiple dimensions in a report, if the cross join is non-empty, meaning a fact table relationship has been defined, then the join operation is done using SQL. The resultant SQL query returns only combined dimension members that actually have fact data. This reduces the amount of tuples that need to be processed and is critical for performance on high-cardinality dimensions.

mondrian.native.topcount.enable

If set to true, when fetching the first set of records for the filter dialog, Mondrian will only read that set of records into memory. If set to false, all records from the dimension level will be read into memory.

mondrian.native.nonempty.enable

If set to true, when fetching the first set of records for the filter dialog, Mondrian will only read that set of records into memory. If set to false, all records from the dimension level will be read into memory.

mondrian.rolap.maxConstraints

This should be set to the largest number of values that the data warehouse database supports in an IN list.

mondrian.native.ExpandNonNative

Works in conjunction with native evaluation of cross joins. If set to true, Mondrian will expand cross join inputs to simple member lists that are candidates for pushdown.

Redesigning Analyzer reports for maximum performance

Once you have an idea of what you want to show with your Analyzer report, you will almost certainly have to redesign it to be more performant. Because an Analyzer report is basically a hierarchical list of actions, the order in which fields and filters are added to the report can make a big difference in query response time. Even though this does not change the report's graphical output, what happens behind the scenes can make that output display more quickly.

When you re-create your reports, follow this process for best performance:

  1. Add and filter by low-cardinality dimensions first

  2. Add measures to the report

  3. Add high-cardinality dimensions last

    Note: When filtering, always choose include/exclude over contains/doesn't contain.

Mondrian cache control

You can control the cache infrastructure that the Pentaho Analyzer engine uses for OLAP data. See Install Pentaho Data Integration and Analytics for details about Mondrian cache control.

Last updated

Was this helpful?