Multidimensional Data Modeling in Pentaho

Pentaho Business Analytics is built on the Mondrian online analytical processing (OLAP) engine. OLAP relies on a multidimensional data model that, when queried, returns a dataset that resembles a grid. The rows and columns that describe and bring meaning to the data in that grid are dimensions, and the hard numerical values in each cell are the measures or facts. In Pentaho Analyzer, dimensions are shown in yellow and measures are in blue.

OLAP requires a properly prepared data source in the form of a star or snowflake schema that defines a logical multi-dimensional database and maps it to a physical database model. When you have the initial data structure in place, you must design a descriptive layer for it in the form of a Mondrian schema, which consists of one or more cubes, hierarchies, and members. Only when you have a tested and optimized Mondrian schema is the data prepared on a basic level for end-user tools like Pentaho Analyzer.

Pentaho also offers expanded functionality in Pentaho Analysis Enterprise Edition, including:

The Pentaho Analyzer visualization tool.
A pluggable Enterprise Cache with support for highly scalable, distributable cache implementations including Infinispan and Memcached.

A special Pentaho Server package must also be installed. This process is covered in Installation of the Pentaho design tools.

All relevant configuration options for these features are covered in this section.

In this topic

Prepare your data

To prepare data for use with the Pentaho Analyzer and Report Designer, you should perform the following basic tasks:

Design a star or snowflake schema
The entire process starts with a data warehouse. This section will not attempt to explain how to build this structure -- there are entire books on the subject, and an entire consulting industry dedicated to it already. The end result should be data model in the star or snowflake schema pattern. You do not have to worry too much about getting the model exactly right on your first try. Just cover all of your anticipated business needs; part of the process is coming back to the data warehouse design step and making changes to your initial data model after you have discovered what your operational needs are.
Populate the star or snowflake schema
Once your data model is designed, the next step is to populate it with actual data, thereby creating your data warehouse. The best tool for this job is Pentaho Data Integration, an enterprise-grade extract, transform, and load (ETL) application.
Build a Mondrian schema
Now that your initial data warehouse project is complete, you must build a Mondrian schema to organize and describe it in terms that Pentaho Analyzer can understand. You can use the Pentaho Schema Workbench to create an analysis schema.
Initial testing
At this point, you should have a multi-dimensional data structure with an appropriate metadata layer. You can now start using the data inspection tools to drill down into your data and see if your first attempt at data modeling was successful. See the Pentaho Data Integration document for details on our data inspection tools. In all likelihood, it will need some adjustment, so take note of all of the schema limitations that you are unhappy with during this initial testing phase. Do not be concerned with performance issues at this time, just concentrate on the completeness and comprehensiveness of the data model.
Adjust and repeat until satisfied
Use the notes you took during the testing phase to redesign your data warehouse and Mondrian schema appropriately. Adjust hierarchies and relational measure aggregation methods. Create virtual cubes for analyzing multiple fact tables by conforming dimensions. Re-test the new implementation and continue to refine the data model until it matches your business needs perfectly.
Test for performance
Once you are satisfied with the design and implementation of your data model, you should try to find performance problems and address them by tuning your data warehouse database, and by creating aggregation tables. The testing can only be reasonably done by hand, using Pentaho Analyzer. Take note of all of the measures that take an unreasonably long time to calculate. Also, enable SQL logging and locate slow-performing queries, and build indexes for optimizing query performance.
Create aggregation tables
Using your notes as a guide, create aggregation tables in Pentaho Aggregation Designer to store frequently computed Analyzer reports. Re-test and create new aggregation tables as necessary. If you are working with a relatively small data warehouse or a limited number of dimensions, you may not have a real need for aggregation tables. However, be aware of the possibility that performance issues may come up in the future. Check in with your users occasionally to see if they have any particular concerns about the speed of their content.
Deploy to production
Your data warehouse and Mondrian schema have been created, tested, and refined. You're now ready to put it all into production. You may need to train or purchase Pentaho training for those in your organization who use Pentaho client tools.

Dimensional modeling

With the initial data structure in place, you can use dimensional modeling to design a descriptive layer. Dimensional modeling is the process of transforming data from multiple sources in non-human-friendly formats into a single data source that is organized to support business analytics. Below is a typical workflow for developing a dimensional model:

Collect user requirements for business logic and processes.
Considering the entirety of your data, break it down into subjects.
Isolate groups of facts into one or more fact tables.
Design dimensional tables which draw relationships between levels (fact groups).
Determine which members of each level are useful for each dimensional table.
Build and publish a Mondrian (Pentaho Analyzer) schema and collect feedback from users.
Refine your model based on user feedback, continue iterating through this list until users are productive.

Or, expressed as a series of questions:

What topics or subjects are important to the users who are analyzing the data? What do your users need to learn from the data?
What are the important details your users will need to examine in the data?
How should each data column relate to other data columns?
How should datasets be grouped and organized?
What are some useful short descriptions for each dimensional level in a hierarchy (for each element, decide what is useful within that element; for instance, in a dimensional table representing time, your levels might be year, month, and day, and your members for the year level might be 2003, 2004, 2005).
How effective is this dimensional model for the intended user base? How can it be improved?

Pentaho Data Integration offers data inspection tools to make dimensional modeling much easier than more traditional methods. Through PDI, you can quickly adjust your business logic, the granularity of your fact tables, and the attributes of your dimension tables, then generate a new model and push it out to a test environment for evaluation. See the Pentaho Data Integration document for details.

Understanding data cubes

Another name for a dimensional model is a cube. Each cube represents one fact table and several dimensional tables. This model should be useful for reporting and analysis on the subject of the data in the fact table. However, if you want to cross-reference this data with another cube (if you need to analyze data across two or more cubes, or need to combine information from two fact tables on the same subject but with different granularity) then you must create a virtual cube. The XML elements that compose a virtual cube are explained in detail below.

Note: Virtual cubes cannot presently be created through Pentaho Data Integration's model perspective; you must use Schema Workbench instead.

The <CubeUsages> element specifies the cubes that are imported into the virtual cube. It holds <CubeUsage> elements.

The <CubeUsage> element specifies the base cube that is imported into the virtual cube. Alternatively you can define a <VirtualCubeMeasure> and use similar imports from the base cube without defining a <CubeUsage>. The cubeName attribute specifies the name of the base cube. The ignoreUnrelatedDimensions attribute determines whether or not the measures from this base cube will have non-joining dimension members pushed to the top level member. This attribute is false by default because it is still experimental.

The <VirtualCubeDimension> element imports a dimension from one of the constituent cubes. If you do not specify the cubeName attribute, this means you are importing a shared dimension.

Note: If a shared dimension is used more than once in a cube, there is no way to determine which usage of the shared dimension you intend to import.

The <VirtualCubeMeasure> element imports a measure from one of the constituent cubes. It is imported with the same name. If you want to create a formula or rename a measure as you import it, use the <CalculatedMember> element instead.

Virtual cubes are useful for situations where there are fact tables of different granularities (for instance, one Time fact table might be configured on a Day level, another at the Month level), or fact tables of different dimensionalities (for instance one on Products, Time and Customer, another on Products, Time and Warehouse), and you need to present the results to users who don't know how the data is structured.

Any common dimensions, shared dimensions which are used by both constituent cubes, are automatically synchronized. In this example, [Time] and [Products] are common dimensions. So if the context is ([Time].[2005].[Q2], [Products].[Productname].[P-51-D Mustang]), measures from either cube will relate to this context.

Dimensions which only belong to one cube are called non-conforming dimensions. The [Gender] dimension is an example of this; it exists in the Sales cube, but not Warehouse. If the context is ([Gender].[F], [Time].[2005].[Q1]), it makes sense to ask the value of the [Unit Sales] measure (which comes from the [Sales] cube) but not the [Units Ordered] measure (from [Warehouse]). In the context of [Gender].[F], [Units Ordered] has value NULL.

Map a model with Schema Workbench

With a physical multidimensional data model in place, you must create a logical model that maps to it. A Mondrian schema is essentially an XML file that performs this mapping, thereby defining a multidimensional database structure. You can create Mondrian schemas using the Pentaho Schema Workbench. See the Pentaho Schema Workbench document for instructions.

Configure Mondrian engine

The Mondrian engine is configured through mondrian.properties.

Use these settings to improve performance and behavior in Analyzer.

Manage Analyzer data sources

To use a prepared data source with Pentaho Analyzer, you must define it first.

Use one of these approaches:

Create a Native (JDBC) data source in the User Console.
Create a JNDI data source in your web application server.

You must know the data source name later.

You supply this name in Pentaho Schema Workbench.

Schema Workbench supports one data source per Mondrian schema.

You can define multiple data sources, but you can use only one at a time.

Edit the Mondrian properties file

mondrian.properties is in pentaho/server/pentaho-server/pentaho-solutions/system/mondrian.

Edit this file to tune Mondrian for your environment.

This guidance targets Pentaho Analyzer performance.

For all settings, see http://mondrian.pentaho.com/documentation/configuration.php.

Performance settings

Push as much work to the database as possible.

This matters most for high-cardinality dimensions.

It avoids large, sparse Cartesian products in memory.

mondrian.expCache.enable=true
Improves summary calculations, like ranks and running sums.
mondrian.native.crossjoin.enable=true
Enables database pushdown for cross joins.
Prevents large in-memory cross joins.
mondrian.native.filter.enable=true
Pushes NOT IN and exclude filters to the database.
mondrian.native.nonempty.enable=false
Optimizes SQL generation for Analyzer-generated MDX.
Keeps NON EMPTY cross joins in the database.
mondrian.olap.maxConstraints=1000
Used with mondrian.native.ExpandNonNative.
Increase this as your database allows.
mondrian.native.ExpandNonNative=true
Pushes down more cross joins.
Materializes inputs into IN lists when needed.
mondrian.olap.elements.NeedDimensionPrefix=true
Analyzer generates dimension-qualified members.
Mondrian can skip cross-dimension member searches.

Usability settings

These settings reduce user-facing failures and confusing results.

mondrian.result.limit=5000000
Caps the largest cross join Mondrian will process in memory.
This limit avoids long server hangs when pushdown fails.
Analyzer can warn users when this limit is reached.
mondrian.olap.case.sensitive=true
Preserves casing for equals filters.
Example: John Doe matches only John Doe.
mondrian.olap.ignoreInvalidMembers=true
Keeps saved reports running when members disappear after ETL.
mondrian.olap.ignoreInvalidMembersDuringQuery=true
Use this with mondrian.olap.ignoreInvalidMembers.
mondrian.olap.iterationLimit=5000000
Limits aggregate evaluation work.
This is similar to mondrian.result.limit.
mondrian.olap.compareSiblingsByOrderKey=true
Enables A–Z and Z–A member sorting.
This fixes a Mondrian sorting issue.
mondrian.olap.NullDenominatorProducesNull=true
Controls divide-by-null behavior.
Example: Booked Deals / Quota.
When true, divide-by-null returns null.
NON EMPTY can then filter those rows out.
When false, divide-by-null can return infinity.

Mondrian cache control

You can configure and control the cache infrastructure that the Pentaho Analyzer engine uses for OLAP data, which is useful for properly updating your OLAP cubes when your data warehouse is refreshed, and for performance-tuning.

Most of the advanced cache features explained here are for Enterprise Edition deployments only. Within that, most of the Enterprise Edition features of the Analysis engine are only beneficial to large, multi-node OLAP deployments that are performing poorly.

The Analysis engine does not ship with a segment cache, but it does have the ability to use third-party cache systems. If you've installed Pentaho Analysis Enterprise Edition, then you have a default configuration for the JBoss Infinispan distributed cache, though the actual Infinispan software is not included and must be downloaded separately. Infinispan supports a wide variety of sub-configurations and can be adapted to cache in memory, to the disk, to a relational database, or (the default setting) to a distributed cache cluster.

The Infinispan distributed cache is a highly scalable solution that distributes cached data across a self-managed cluster of Mondrian instances. Every Mondrian instance running the Analysis Enterprise Edition plugin on a local network will automatically discover each other using UDP multicast. An arbitrary number of segment data copies are stored across all available nodes. The total size of the cache will be the sum of all of the nodes' capacities, divided by the number of copies to maintain. This is all fully configurable; options are explained later in this section.

Cache control topics

Switch to Memcached

Use this procedure when you want Mondrian segment caching to use Memcached. This replaces the default Infinispan cache framework configuration.

Pentaho and Mondrian developers recommend against using Memcached. Infinispan usually gives better performance.

Before you begin

Set up your Memcached cluster.
Install the Analysis Enterprise Edition package on your Pentaho Server or standalone Mondrian engine.

Switch from Infinispan to Memcached

Shut down the Pentaho Server or standalone Mondrian engine.
Verify the required JARs exist in WEB-INF/lib/ in your deployed pentaho.war or Mondrian engine.
If you installed the Analysis Enterprise Edition package, you usually already have these files.
Required JARs:
- pentaho-analysis-ee
- commons-lang
- commons-io
- commons-codec
- pentaho-ee-dsc-core
- memcached

Edit pentaho-analysis-config.xml in WEB-INF/classes/ and set SEGMENT_CACHE_IMPL:

<entry key="SEGMENT_CACHE_IMPL">com.pentaho.analysis.segmentcache.impl.memcached.MemcachedSegmentCache</entry>

Edit memcached-config.xml in WEB-INF/classes/ and set SALT, SERVERS, and WEIGHTS:

<entry key="SALT">YOUR SECRET SALT VALUE HERE</entry>
<entry key="SERVERS">192.168.0.1:1642,192.168.0.2:1642</entry>
<entry key="WEIGHTS">1,1</entry>

Restart the Pentaho Server or Mondrian engine.

Memcached configuration options

Set these options in memcached-config.xml in WEB-INF/classes/ in your deployed pentaho.war or Mondrian engine.

This list is not comprehensive. It covers the options most critical for Pentaho Analyzer.

SERVERS: Comma-separated list of Memcached nodes and port numbers.
WEIGHTS: Comma-separated list of relative capacities for the servers in SERVERS. Use one value per server. For example, 1,2 means the second server has double capacity.
SALT: Secret key prefix for segment data. Use the same value on all Mondrian nodes. If it differs, nodes cannot share cached data.

Segment cache architecture

Each Mondrian segment cache node, regardless of configuration, loads the segments required to answer a query into system memory. This cache space is called the query cache. It is composed of hard Java references to segment objects. Each node must have enough memory available to answer any query. Mondrian uses optimized data structures that usually take only a few megabytes, even for queries returning thousands of rows.

After the query finishes, Mondrian usually tries to keep the data locally by using a weak reference to the segment data object. A weak reference does not force the JVM to keep an object in memory. As a node keeps answering queries, the JVM may free that space for higher-priority work, such as answering a larger query. This cache is the local cache.

You can enable or disable the local cache in the Pentaho Analysis EE configuration file. Set DISABLE_LOCAL_SEGMENT_CACHE to true or false. This setting does not affect the query cache.

The segment cache features in this topic target large OLAP deployments.

How the Analysis engine uses memory

This is the order in which Mondrian obtains data for a required segment after receiving a query:

Parse the query and determine which segments are required.
Check the local cache, if enabled.
If the data is not in the local cache, check the external segment cache (Pentaho Analysis plugin). If found, copy it into the query cache.
If the data is not in the external cache, load it from SQL and store it in the query cache.
If the data was loaded from SQL, send the segment to the external cache so other Mondrian nodes can reuse it.
Answer the query.
Release the data from the query cache.
If the local cache is enabled, keep a weak reference to the data in the local cache.

Cache control and propagation

The CacheControl API lets you modify the cache contents of a node. It controls both the data cache and the OLAP schema member cache. The API is documented in the Mondrian project documentation at http://mondrian.pentaho.com/.

When you flush a segment region on a node, that node propagates the change to the external cache through the SegmentCache SPI. If nodes are not using local cache space, the next node that receives a query requiring that segment data will likely fetch it again through SQL. After loading from SQL, the node stores the segment in the external segment cache again.

Do not use the local cache space when you use the external cache. For this reason, the local cache is disabled by default in Pentaho Analysis Enterprise Edition.

Local cache space can improve performance because it increases data locality. It also requires cluster-wide propagation of cache control operations. Mondrian nodes do not propagate cache control operations across cluster members. If you deploy a cluster of Mondrian nodes and do not propagate changes manually across all nodes, some nodes will answer queries with stale data.

Cache configuration files

You can configure the Pentaho analysis cache frameworks by editing specific settings files.

The following files contain configuration settings for Pentaho analysis cache frameworks. All of them are in the same directory inside of the deployed pentaho.war: /WEB-INF/classes/.

pentaho-analysis-config.xml: Defines the global behavior of the Pentaho Analysis Enterprise Edition plugin. Settings in this file enable you to define which segment cache configuration to use, and to turn off the segment cache altogether.
infinispan-config.xml: The InfinispanSegmentCache settings file. It configures the Infinispan system.
jgroups-udp.xml: Configures the cluster backing the Infinispan cache. It defines how the nodes find each other and how they communicate. By default, Pentaho uses UDP and multicast discovery, which enables you to run many instances on a single machine or many instances on many machines. There are examples of other communication setups included in the JAR archive. This file is referenced by infinispan as specified in the infinispan-config.xml configuration file.
memcached-config.xml: Configures the Memcached-based segment cache. It is not used by default. To enable it, modify SEGMENT_CACHE_IMPL in pentaho-analysis-config.xml.

Modify the JGroups configuration

The default Infinispan configuration uses JGroups to distribute the cache across all Mondrian instances it finds on the local network. If you want to modify how those communications are done, you must edit the JGroups configuration file.

The segment cache features in this topic target large OLAP deployments.

Each node might require a different configuration. The default configuration is highly portable.

If you are deploying this plugin on Amazon EC2, JGroups has a configuration file that you copied to your /WEB-INF/classes/ directory when you installed the Analysis Enterprise Edition package.

Default JGroups configuration files are inside of the JAR archive.

Fine-grained JGroups configuration is covered in the JGroups documentation. Read it before making changes.

To switch implementations, edit infinispan-config.xml and set the appropriate configurationFile value:

UDP communication:

<property name="configurationFile" value="jgroups-udp.xml"/>

TCP communication:

<property name="configurationFile" value="jgroups-tcp.xml"/>

Amazon EC2:

<property name="configurationFile" value="jgroups-ec2.xml"/>

Switch to another cache framework

Pentaho ships with configuration files that assume a JBoss Infinispan deployment.

The segment cache features in this topic are for very large OLAP deployments.

Pentaho recommends Infinispan for best OLAP performance.

Choose your cache framework

Switch to Pentaho Platform Delegating Cache

You must install the Analysis Enterprise Edition package on the Pentaho Server.

Use this option to share the Pentaho Server solution cache with the Analyzer segment cache.

This cache system is experimental.

Do not use it in production.

Use a custom SegmentCache SPI

Use this approach when you want your own cache implementation.

Create a Java class that implements mondrian.spi.SegmentCache.
Compile your class and add it to Mondrian’s classpath.
Edit mondrian.properties.
Set mondrian.olap.SegmentCache to your fully qualified class name.
Restart the Pentaho Server or Mondrian engine.

Contact Pentaho Support for developer assistance.

Analysis schema security

You can restrict what users see in your Mondrian OLAP schema. You do this by defining roles in the schema. Then map those schema roles to Pentaho Server roles.

Pentaho Server then enforces those restrictions in Analyzer and Dashboard Designer.

Use Schema Workbench to change an OLAP schema. Republish the schema to Pentaho Server after changes.

Edit XML configuration files with a text editor. For example, Pentaho Server and Mondrian engine settings.

Restrict access to specific members

Restrict access to parts of a schema with the <HierarchyGrant> element. Use it to define access at the hierarchy level.

The following example grants access to most of California. It also hides Los Angeles.

<Role name="California manager">
    <SchemaGrant access="none">
        <CubeGrant cube="Sales" access="all">
            <HierarchyGrant hierarchy="[Store]" access="custom" topLevel="[Store].[Store Country]">
                <MemberGrant member="[Store].[USA].[CA]" access="all"/>
                <MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/>
            </HierarchyGrant>
            <HierarchyGrant hierarchy="[Customers]" access="custom" topLevel="[Customers].[State Province]" bottomLevel="[Customers].[City]">
                <MemberGrant member="[Customers].[USA].[CA]" access="all"/>
                <MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/>
            </HierarchyGrant>
            <HierarchyGrant hierarchy="[Gender]" access="none"/>
        </CubeGrant>
    </SchemaGrant>
</Role>

access attribute values

The access attribute can be:

all: All members are visible.
none: The hierarchy is hidden.
custom: Customized access.

Custom access options

With access="custom", you can:

Use topLevel to define the highest visible level.
Use bottomLevel to define the lowest visible level.
Control member visibility with nested <MemberGrant> elements.

You can only define <MemberGrant> when the enclosing <HierarchyGrant> has access="custom". Member grants apply to a member and all its children.

Rules to keep in mind:

Members inherit access from their parents.
Grants are order-dependent.
A member is visible if any child is visible.
Member grants do not override topLevel and bottomLevel.

Mondrian role mapping in the Pentaho Server

The role mapper connects user role restrictions in a Mondrian schema to roles in the Pentaho Server. This lets you manage schema access control in one place.

If you do not configure a role mapper, roles defined in your schema do not restrict access in Pentaho Server.

Configure the role mapper

Configure the role mapper in pentaho-solutions/system/pentahoObjects.spring.xml.

The file includes disabled example configurations for each role mapper implementation. Enable the one you want to use.

Mapper implementations

Mondrian One-To-One UserRoleMapper

Mondrian-One-To-One-UserRoleMapper maps each Pentaho Server role name to a role in the OLAP schema.

Use this mapper when your schema role names match your Pentaho Server role names.

For example, if your schema has a role named CTO and the Pentaho Server also has a role named CTO, this mapper works well.

<bean id="Mondrian-UserRoleMapper"
name="Mondrian-One-To-One-UserRoleMapper"
class="org.pentaho.platform.plugin.action.mondrian.mapper.MondrianOneToOneUserRoleListMapper"
scope="singleton" />

Mondrian-SampleLookupMap-UserRoleMapper

Mondrian-SampleLookupMap-UserRoleMapper uses a translation table to map Pentaho Server roles to OLAP schema roles.

Lookups are key/value pairs:

Key: Pentaho Server role name
Value: OLAP schema role name

In this example, the ceo role in Pentaho Server maps to the California manager role in the schema.

<bean id="Mondrian-UserRoleMapper"
        name="Mondrian-SampleLookupMap-UserRoleMapper"
        class="org.pentaho.platform.plugin.action.mondrian.mapper.MondrianLookupMapUserRoleListMapper"
        scope="singleton">
    <property name="lookupMap">
        <map>
            <entry key="ceo" value="California manager" />
            <entry key="cto" value="M_CTO" />
            <entry key="dev" value="M_DEV" />
        </map>
    </property>
</bean>

Mondrian-SampleUserSession-UserRoleMapper

Mondrian-SampleUserSession-UserRoleMapper retrieves OLAP schema roles from a named HTTP session variable.

In this example, the session variable name is MondrianUserRoles.

<bean id="Mondrian-UserRoleMapper"
    name="Mondrian-SampleUserSession-UserRoleMapper"
    class="org.pentaho.platform.plugin.action.mondrian.mapper.MondrianUserSessionUserRoleListMapper"
    scope="singleton">
        <property name="sessionProperty" value="MondrianUserRoles" />
</bean>

OLAP log output

The Pentaho Server and the standalone Mondrian engine use log4j2 to log generated OLAP SQL queries. This output is useful for troubleshooting configuration and performance issues.

Analysis SQL output logging

You can log SQL generation, MDX generation, and debug MDX execution information in the Pentaho Server, Mondrian engine, or Schema Workbench.

The default log4j2.xml configuration in the Pentaho Server includes the following rolling file appenders:

SQLLOG Logs SQL generated by Mondrian, including execution times and row counts. Re-running the same MDX query often produces no new SQL logs. Mondrian caches results from the first run.
MDXLOG Logs MDX executed by Mondrian, including execution times.
MONDRIAN Logs detailed MDX execution events. This output is verbose. Use it only for targeted troubleshooting.

To enable SQL logging:

Stop the Pentaho Server, standalone Mondrian engine, or Schema Workbench.
Open the log4j2.xml file in a text editor:
- Pentaho Server: server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/classes/log4j2.xml
- Standalone Mondrian engine: location depends on your runtime. log4j2.xml can be loaded from the classpath or passed as a JVM parameter.
- Schema Workbench: check workbench.bat or workbench.sh to find the log4j2.xml location.
Add the appender and its logger:
- In the Pentaho Server, each appender has a corresponding logger.
- For a standalone Mondrian engine or Schema Workbench, copy the relevant appenders and loggers from a Pentaho Server log4j2.xml.
Save the file.
Start the Pentaho Server, Mondrian engine, or Schema Workbench.

You can now review Analysis query logs in your application server logs. For segment-cache-specific logging, also enable the segment cache categories described in Enabling segment cache logging.

Enabling segment cache logging

The Pentaho Server lets you view engine segment cache logs by opening the log window in Analyzer.

If you use Analyzer with a standalone Mondrian engine, cache log information is available through log4j2. Update your log4j2.xml as described in Analysis SQL output logging, and set com.pentaho.analysis.segmentcache to DEBUG.

The following log4j2 categories can help diagnose configuration and performance issues:

Class Name

Cache System

com.pentaho.analysis.segmentcache.impl.infinispan

Outputs information related to the Infinispan segment cache implementation.

com.pentaho.analysis.segmentcache.impl.memcached

Outputs information related to the Memcached segment cache implementation.

com.pentaho.analysis.segmentcache.impl.pentaho

Outputs information related to the Pentaho BI Platform Delegating Cache implementation.

View log output in Analyzer

To view log output in Analyzer:

Create a new Pentaho Analyzer report.
Drag a measure or dimension into the report.
When the report displays results, select More actions and options > Administration > Log.

If you do not see Administration, you are not signed in as an administrator. Sign out of the Pentaho User Console, then sign back in with an administrator account.

If your plugin is configured correctly, you will see segment cache entries. If configuration errors exist, Analyzer shows the related exception messages.

Multidimensional Expression Language

Multidimensional Expression Language (MDX) is an OLAP query and calculation language. It is similar to SQL. Use MDX to retrieve datasets from an OLAP database.

You can use Pentaho tools without knowing MDX. Learn the basics to test Mondrian schemas.

MDX syntax

If you know SQL, much of MDX syntax looks familiar. You can edit MDX in Analyzer or a text editor.

MDX has six data types:

Dimension or hierarchy
Level
Member
Tuple
Scalar
Set

Example MDX query:

SELECT
   { [Measures].[Salesfact] } ON COLUMNS,
   { [Date].[2004], [Date].[2005] } ON ROWS
FROM Sales

Microsoft originally developed MDX for SQL Server analysis products. It later became an independent standard.

See these references:

MDX implementations vary by product. Pentaho Analysis does not support every MDX function or extension.

Mondrian schema element reference

Mondrian schema elements are listed below in the hierarchy in which they are used. Element details appear later in this section.

Element

Definition

A complete Mondrian schema; a collection of cubes, virtual cubes, shared dimensions, and roles.

<Cube>

A collection of dimensions and measures, all centered on a fact table.

A cube defined by combining the dimensions and measures of one or more cubes. A measure originating from another cube can be a <CalculatedMember>.

Base cubes that are imported into a virtual cube.

Usage of a base cube by a virtual cube.

Usage of a dimension by a virtual cube.

Usage of a measure by a virtual cube.

Defines a dimension: a collection of hierarchies.

Grants or denies this role access to a dimension.

Usage of a shared dimension by a cube.

Specifies a predefined drill-down.

<Level>

A level of a hierarchy.

SQL expression used as key of the level, in lieu of a column.

SQL expression used to compute the name of a member, in lieu of Level.nameColumn.

SQL expression used to compute the caption of a member, in lieu of Level.captionColumn.

SQL expression used to sort members of a level, in lieu of Level.ordinalColumn.

SQL expression used to compute a measure, in lieu of Level.parentColumn.

A member property. The definition is contained in a hierarchy or level, but the property will be available to all members.

SQL expression used to compute the value of a property, in lieu of Property.column.

Specifies an aggregated numeric value.

A member whose value is derived using a formula, defined as part of a cube.

A set whose value is derived using a formula, defined as part of a cube.

<Table>

A fact or dimension table.

<View>

Defines a table by using an SQL query, which can have different variants for different underlying databases.

<Join>

Defines a table by joining a set of queries.

Defines a table using an inline dataset.

Maps a parent-child hierarchy onto a closure table.

Exclude a candidate aggregate table by name or pattern matching.

Declares an aggregate table to be matched by name.

Declares a set of aggregate tables by regular expression pattern.

Specifies name of the column in the candidate aggregate table which contains the number of fact table rows.

Tells Mondrian to ignore a column in an aggregate table.

Maps a foreign key in the fact table to a foreign key column in the candidate aggregate table.

Maps a measure to a column in the candidate aggregate table.

Maps a level to a column in the candidate aggregate table.

A definition of an aggregate table for a base fact table.

<Role>

An access-control profile.

A set of rights to a schema.

A set of rights to a cube.

A set of rights to both a hierarchy and levels within that hierarchy.

A set of rights to a member and its children.

<Union>

Definition of a set of rights as the union of a set of roles.

A reference to a role.

Imports a user-defined function.

Part of the definition of a hierarchy; passed to a MemberReader, if present.

Property of a calculated member.

Holds the formula text within a <NamedSet> or <CalculatedMember>.

Holder for <ColumnDef> elements.

Definition of a column in an <InlineTable> dataset.

<Rows>

Holder for <Row> elements.

<Row>

Row in an <InlineTable> dataset.

<Value>

Value of a column in an <InlineTable> dataset.

SQL expression used to compute a measure, in lieu of Measure.column.

<SQL>

The SQL expression for a particular database dialect.

Element details

CalculatedMember

A member whose value is derived using a formula, defined as part of a cube.

Attributes

Attribute

Data Type

Definition

name

String

Name of this calculated member

formatString

String

Format string with which to format cells of this member. See the Format class under mondrian.util in https://mondrian.pentaho.com/api/index.html for more details.

caption

String

A string being displayed instead of the name. Can be localized from Properties file using #{propertyname}.

description

String

Description of this calculated member. Can be localized from Properties file using #{propertyname}.

formula

String

MDX expression which gives the value of this member. Equivalent to the Formula sub-element.

dimension

String

Name of the dimension which this member belongs to

hierarchy

String

Name of the hierarchy that this member belongs to

parent

String

Fully-qualified name of the parent member. If not specified, the member will be at the lowest level (besides the 'all' level) in the hierarchy.

visible

Boolean

Whether this member is visible in the user-interface. Default true.

Constituent Elements

Element

Definition

Formula

MDX expression which gives the value of this member.

CalculatedMemberProperty

Property of a calculated member.

NamedSet

A set whose value is derived using a formula, defined as part of a cube.

Attributes

Attribute

Data Type

Definition

name

String

Name of this named set

caption

String

Caption of this named set. Can be localized from Properties file using #{propertyname}.

description

String

Description of this named set. Can be localized from Properties file using #{propertyname}.

formula

String

MDX expression which gives the value of this set. Equivalent to the Formula sub-element.

Constituent Elements

Element

Definition

Formula

MDX expression which gives the value of this set

Table

A fact or dimension table.

Attributes

Attribute

Data Type

Definition

name

String

The name of the table

schema

String

Optional qualifier for table

alias

String

Alias to be used with this table when it is used to form queries. If not specified, defaults to the table name, but in any case, must be unique within the schema. You can use the same table in different hierarchies, but it must have different aliases.

Constituent Elements

Element

Definition

SQL

The SQL WHERE clause expression to be appended to any select statement

AggExclude

Exclude a candidate aggregate table by name or pattern matching

AggTable

A definition of an aggregate table for a base fact table.

View

Defines a table using a SQL query, which can have different variants for different underlying databases.

Attributes

Attribute

Data Type

Definition

alias

String

Alias to be used with this view when it is used to form queries

Constituent Elements

Element

Definition

SQL

The SQL expression for a particular database dialect

Join

Defines a table by joining a set of queries.

Attributes

Attribute

Data Type

Definition

leftAlias

String

Defaults to left's alias if left is a table, otherwise required.

leftKey

String

rightAlias

String

Defaults to right's alias if right is a table, otherwise required.

rightKey

String

Constituent Elements

A <Join> must define exactly two relations:

Left relation: <Table>, <View>, <Join>, or <InlineTable>
Right relation: <Table>, <View>, <Join>, or <InlineTable>

InlineTable

Defines a table using an inline dataset.

Attributes

Attribute

Data Type

Definition

alias

String

Constituent Elements

Element

Definition

ColumnDefs

Holder for <ColumnDef> elements

Rows

Holder for <Row> elements

Closure

Specifies the transitive closure of a parent-child hierarchy.

This element is optional. It is recommended for better performance.

Attributes

Attribute

Data Type

Definition

parentColumn

String

The parent column

childColumn

String

The child column

Constituent Elements

Element

Definition

Table

A fact or dimension table

AggExclude

This element excludes a candidate aggregate table via name or pattern matching.

Attributes

Attribute

Data Type

Definition

pattern

String

A Table pattern not to be matched.

name

String

The Table name not to be matched.

ignorecase

Boolean

Whether or not the match should ignore case.

AggName

Declares an aggregate table to be matched by name.

Attributes

Attribute

Data Type

Definition

name

String

The table name of a specific aggregate table

approxRowCount

String

The estimated number of rows in this aggregation table. Setting this property improves the performance of the aggregation optimizer and prevents it from issuing a select count(*) query over the aggregation table.

AggPattern

Declares a set of aggregate tables by regular expression pattern.

Attributes

Attribute

Data Type

Definition

pattern

String

A Table pattern used to define a set of aggregate tables.

Constituent Elements

Element

Definition

AggExclude

Exclude a candidate aggregate table by name or pattern matching.

AggFactCount

Specifies name of the column in the candidate aggregate table which contains the number of fact table rows.

Attributes

(None)

AggIgnoreColumn

Tells Mondrian to ignore a column in an aggregate table.

Attributes

(None)

AggForeignKey

Maps foreign key in the fact table to a foreign key column in the candidate aggregate table.

Attributes

Attribute

Data Type

Definition

factColumn

String

The name of the base fact table foreign key.

aggColumn

String

The name of the aggregate table foreign key.

AggMeasure

Maps a measure to a column in the candidate aggregate table.

Attributes

Attribute

Data Type

Definition

column

String

The name of the column mapping to the measure name.

name

String

The name of the Cube measure.

AggLevel

Maps a level to a column in the candidate aggregate table.

Attributes

Attribute

Data Type

Definition

column

String

The name of the column mapping to the level name.

name

String

The name of the Dimension Hierarchy level.

collapse

Boolean

Whether this is a collapsed level. The parents of that level are also present in the aggregation table.

AggTable

A definition of an aggregate table for a base fact table. This aggregate table must be in the same schema as the base fact table.

Attributes

Attribute

Data Type

Definition

ignorecase

Boolean

Whether or not the match should ignore case.

Constituent Elements

Element

Definition

AggFactCount

Describes what the fact_count column looks like.

AggIgnoreColumn

Tells Mondrian to ignore a column in an aggregate table.

AggForeignKey

Maps a foreign key in the fact table to a foreign key column in the candidate aggregate table.

AggMeasure

Maps a measure to a column in the candidate aggregate table.

AggLevel

Maps a level to a column in the candidate aggregate table.

Role

A role defines an access-control profile. It has a series of grants (or denials) for schema elements.

Attributes

Attribute

Data Type

Definition

name

String

The name of this role

Constituent Elements

Element

Definition

SchemaGrant

A set of rights to a schema

Union

Definition of a set of rights as the union of a set of roles

SchemaGrant

Grants (or denies) this role access to this schema.

Access may be all, all_dimensions, or none. If access is all_dimensions, the role has access to all dimensions but still needs explicit access to cubes.

Attributes

(None)

Constituent Elements

Element

Definition

CubeGrant

A set of rights to a cube

CubeGrant

Grants (or denies) this role access to a cube. access may be all or none.

Attributes

Attribute

Data Type

Definition

cube

String

The unique name of the cube

Constituent Elements

Element

Definition

DimensionGrant

Grants (or denies) this role access to a dimension

HierarchyGrant

A set of rights to both a hierarchy and levels within that hierarchy

HierarchyGrant

Grants (or denies) this role access to a hierarchy.

Access may be all, custom, or none. If access is custom, you may also specify topLevel, bottomLevel, and the member grants.

Attributes

Attribute

Data Type

Definition

hierarchy

String

The unique name of the hierarchy

topLevel

String

Unique name of the highest level of the hierarchy from which this role is allowed to see members. May only be specified if the HierarchyGrant.access is custom. If not specified, role can see members up to the top level.

bottomLevel

String

Unique name of the lowest level of the hierarchy from which this role is allowed to see members. May only be specified if the HierarchyGrant.access is custom. If not specified, role can see members down to the leaf level.

rollupPolicy

String

Policy which determines how cell values are calculated if not all of the children of the current cell are visible to the current role. Allowable values are full (the default), partial, and hidden.

Constituent Elements

Element

Definition

MemberGrant

A set of rights to a member and its children

MemberGrant

A set of rights to a member and its children.

Attributes

Attribute

Data Type

Definition

member

String

The unique name of the member

access

String

(None)

Union

Body of a Role definition which defines a Role to be the union of several Roles.

The <RoleUsage> elements must refer to roles that were declared earlier in the schema file.

Attributes

(None)

Constituent Elements

Element

Definition

RoleUsage

A reference to a role

RoleUsage

Usage of a role in a union role.

Attributes

Attribute

Data type

Definition

roleName

String

The name of the role

Constituent Elements

(Empty)

UserDefinedFunction

A UserDefinedFunction extends the MDX language.

It must be implemented by a Java class that implements mondrian.spi.UserDefinedFunction.

Attributes

Attribute

Data Type

Definition

name

String

Name with which the user-defined function will be referenced in MDX expressions.

className

String

Name of the class which implements this user-defined function. Must implement the mondrian.spi.UserDefinedFunction interface.

Parameter

Part of the definition of a hierarchy. It is passed to a MemberReader, if present.

Attributes

Attribute

Data Type

Definition

name

String

Name of this parameter

description

String

Description of this parameter

type

String

Indicates the type of this parameter: String, Numeric, Integer, Boolean, Date, Time, Timestamp, or Member

modifiable

Boolean

If set to false, statement cannot change the value of this parameter; the parameter becomes effectively constant (provided that its default value expression always returns the same value). Default is true.

defaultValue

String

Expression for the default value of this parameter

CalculatedMemberProperty

Property of a calculated member defined against a cube. It must have either an expression or a value.

Attributes

Attribute

Data Type

Definition

name

String

Name of this member property

caption

String

A string being displayed instead of the Properties's name. Can be localized from Properties file using #{propertyname}.

description

String

Description of this calculated member property. Can be localized from Properties file using #{propertyname}.

expression

String

MDX expression which defines the value of this property. If the expression is a constant string, you could enclose it in quotes, or just specify the value attribute instead.

value

String

Value of this property. If the value is not constant, specify the expression attribute instead.

Formula

Holds the formula text within a <NamedSet> or <CalculatedMember>.

Attributes

(None)

ColumnDefs

Holder for an array of <ColumnDef> elements.

Attributes

(None)

Constituent Elements

Element

Definition

ColumnDef

The columns to include in the array

ColumnDef

Column definition for an inline table.

Attributes

Attribute

Data Type

Definition

name

String

Name of the column.

type

String

Type of the column: String, Numeric, Integer, Boolean, Date, Time or Timestamp.

Rows

Holder for an array of <Row> elements.

Attributes

(None)

Constituent Elements

Element

Definition

Row

Row in an <InlineTable> dataset.

Row

Row definition for an inline table.

Must have one column value for each <ColumnDef> in the <InlineTable>.

Attributes

(None)

Constituent Elements

Element

Definition

Value

Value

Column value for an inline table. The element body holds the value.

Attributes

Attribute

Data Type

Definition

column

String

Name of the column

MeasureExpression

SQL expression used to compute a measure, in lieu of Measure.column.

Attributes

(None)

SQL

The SQL expression for a particular database dialect.

The element body holds the SQL.

Attributes

Attribute

Data Type

Definition

dialect

String

Dialect of SQL the view is intended for. Example values include generic, access, db2, firebird, hsqldb, mssql, mysql, oracle, postgres, sybase, teradata, ingres, and infobright.

PreviousPentaho upgrade NextRelational Data Modeling in Pentaho

Last updated 1 month ago

Was this helpful?

hashtagIn this topic

hashtagPrepare your data

hashtagDimensional modeling

hashtagUnderstanding data cubes

hashtagMap a model with Schema Workbench

hashtagConfigure Mondrian engine

hashtagManage Analyzer data sources

hashtagEdit the Mondrian properties file

hashtagMondrian cache control

hashtagCache control topics

hashtagSwitch to Memcached

hashtagSegment cache architecture

hashtagCache configuration files

hashtagModify the JGroups configuration

hashtagSwitch to another cache framework

hashtagAnalysis schema security

hashtagRestrict access to specific members

hashtagMondrian role mapping in the Pentaho Server

hashtagOLAP log output

hashtagAnalysis SQL output logging

hashtagEnabling segment cache logging

hashtagView log output in Analyzer

hashtagMultidimensional Expression Language

hashtagMDX syntax

hashtagMondrian schema element reference

hashtagElement details

In this topic

Prepare your data

Dimensional modeling

Understanding data cubes

Map a model with Schema Workbench

Configure Mondrian engine

Manage Analyzer data sources

Edit the Mondrian properties file

Mondrian cache control

Cache control topics

Switch to Memcached

Segment cache architecture

Cache configuration files

Modify the JGroups configuration

Switch to another cache framework

Analysis schema security

Restrict access to specific members

Mondrian role mapping in the Pentaho Server

OLAP log output

Analysis SQL output logging

Enabling segment cache logging

View log output in Analyzer

Multidimensional Expression Language

MDX syntax

Mondrian schema element reference

Element details