Components Reference

Pentaho aims to accommodate diverse computing environments. This list provides details about the environment components and versions we support. Where applicable, versions are listed as certified or supported:

  • Certified

    The version has been tested and validated for compatibility with Pentaho.

  • Supported

    Support is available for listed non-certified versions.

If you have questions about your particular computing environment, contact Pentaho Support.

Hitachi Vantara products

The following Hitachi Vantara product is certified for Pentaho 10.2:

  • Hitachi Content Platform 9.7

Server

The Pentaho Server is hardware-independent and runs on server-class computers.

Your server-class computer must comply with the specifications for minimum hardware and required operating systems:

Hardware - 64 bit

Certified Operating System - 64 bit

  • Processor

Intel EM64T or AMD64 Dual-Core or later

  • RAM

8 GB with 4 GB dedicated to Pentaho servers

  • Disk Space

20 GB free after installation

  • Microsoft Windows 2022 Server

  • Red Hat Enterprise 9*

  • Ubuntu Server 22.04 LTS

* Pentaho Data Integration and Analytics is supported on any Linux distribution binary-compatible with RHEL 9 and Ubuntu Server 22, including in virtualized and cloud environments. If you have any questions, contact Pentaho Support.

**Note:** Mac servers are not supported as an operating system.

Container deployment

Supported technology for deploying Pentaho in containers.

Technology
Certified
Supported

Docker

24.0.6

24.0.6

Note: Kubernetes environments that use this Docker version are also supported.

You can also deploy pre-configured Docker images of specific Pentaho products on your AWS environments. See Hyperscalers in the Install Pentaho Data Integration and Analytics document for details.

Workstation

These Pentaho design tools are hardware-independent and run on client-class computers that comply with these specifications for minimum hardware and required operation systems.

  • Pentaho Aggregation Designer

  • Pentaho Data Integration

  • Pentaho Metadata Editor

  • Pentaho Report Designer

  • Pentaho Schema Workbench

Hardware—64 bit

Operating System—64 bit

Certified

Supported

  • Processors

    • Apple Macintosh Dual-Core

    • Apple Mac M1, M2, and M3 chipset

    • Intel EM64T or AMD64 Dual-Core or later

  • RAM

2 GB RAM for most of the design tools, PDI requires 2 GB dedicated

  • Disk Space

2 GB free after installation

  • Minimum Screen Size

1280 x 960 pixels

  • Ubuntu Desktop 22.04

  • Microsoft Windows 11

  • macOS 13 (Ventura)

  • Ubuntu Desktop 20.04, 22.04

  • Microsoft Windows 10 & 11

  • macOS 13 (Ventura)

**Note:** Ubuntu Linux requires `libwebkitgtk-1.0`. See **Install Pentaho Data Integration and Analytics** for more information.

Embedded software

When embedding Pentaho software into other applications, the computing environment should comply with these specifications for minimum hardware and required operation systems.

  • Embedded Pentaho Reporting

  • Embedded Pentaho Analysis

  • Embedded Pentaho Data Integration

Note: Pentaho Data Integration and Analytics is officially certified to run on the Red Hat Enterprise and Ubuntu Linux distributions. It is compatible with any binary-compatible Linux distribution that meets the necessary software and hardware requirements, including in virtualized and cloud environments. If you have any questions, contact Pentaho Support.

The following specifications comply with minimum hardware and required operating systems for embedding Pentaho reporting, analysis, and data integration:

Hardware—64 bit

Certified Operating System—64 bit

  • Processors

Intel EM64T or AMD64 Dual-Core

  • RAM

8 GB with 4 GB dedicated to Pentaho servers

  • Disk Space

20 GB free after installation

  • Microsoft Windows 2022 Server

  • Red Hat Enterprise 9

  • Ubuntu Server 22.04 LTS

Application servers

The server to which you deploy Pentaho software must run the following application server:

  • Tomcat 9.0.86 (Certified)

Solution database repositories

Pentaho software stores processing artifacts in these database repositories:

Certified
Supported

PostgreSQL 15*

PostgreSQL 14 & 15

MySQL 8.026

MySQL 8.026

Oracle 23ai

Oracle 19c & 23ai (including patched versions)

MS SQL Server 2019

Microsoft SQL Server 2017 & 2019 (including patched versions)

Maria DB 11.1.2

Maria DB 11.1.2

* The default installed solution database.

Apache Hadoop vendors

Pentaho software has certified or supported data sources from these Hadoop Vendors.

Vendor
Driver Version

Amazon EMR

7.0.0

Apache Vanilla Hadoop

3.3.0

Cloudera Data Platform (CDP) Private Cloud

7.1.x

Cloudera Data Platform (CDP) Public Cloud

7.2

Google Dataproc

2.1

Microsoft Azure HDInsight

4.0

Data Sources: General

Pentaho software supports the following data sources. Check this list if you are evaluating Pentaho or checking for general compatibility with a specific vendor.

Data Source
Certified
Supported

Salesforce

60

60

Amazon Redshift

1.2.34.1058

1.2.34.1058, 2.1

Snowflake

3.14.1

3.13.33, 3.14.1

Data Sources: Pentaho Tools

This table summarizes which data sources are compatible with the main Pentaho tools.

Pentaho Software

Data Source

Pentaho Reporting

  • JDBC 3/4*

  • ODBC

  • OLAP4J

  • XML

  • Pentaho Analysis

  • Pentaho Data Integration

  • Pentaho Metadata

  • Scriptable

  • Snowflake

Pentaho Server, Action Sequences

  • Relational (JDBC)

  • Hibernate

  • Javascript

  • Metadata (MQL)

  • Mondrian (MDX)

  • XML (XQuery)

  • Security User/Role List Provider

  • Snowflake

  • Data Integration Steps (PDI)

  • Other Action Sequences

  • Web Services

  • XMLA

Pentaho Data Integration

  • JDBC 3/4*

  • OLAP4J

  • Salesforce

  • Snowflake

  • XML

  • CSV

  • Microsoft Excel

* Use a JDBC 3.x or 4.x compliant driver that is compatible with SQL-92 standards when communicating with relational data sources. For a list of drivers to use with relational JDBC databases, see the JDBC drivers reference.

Big Data Sources: General

Pentaho software supports the following Big Data sources. Check this list if you are evaluating Pentaho or checking for general compatibility with a specific vendor.

Data Source
Supported Version

Amazon EMR (via Hive)

7.0.0 (Certified)

Apache Vanilla Hadoop

3.3.0 (Certified)

Cassandra (Datastax)

6.8 (Certified)

Cloudera Data Platform (CDP) on-prem (Private cloud)

7.1.9 (Certified)

Cloudera Data Platform (Public cloud)

7.2.17

Google BigQuery

1.2.25

Google Dataproc

2.1

Greenplum

4.3

Microsoft Azure HDInsight

4.0

MongoDB

7 (Certified)

Vertica

11

Big Data Sources: Details

This table shows the Big Data sources that are compatible with specific Pentaho tools.

Data Source

Versions

Analyzer

PIR/PDD

Pentaho Reporting

DSW

PDIServer/Client

PRD

PSW

PME

Amazon EMR

7.0.0e (Certified)

No

No

No

No

Yes

Yes

No

No

Apache Vanilla Hadoop

3.3.0 (Certified)

No

No

No

Yes

Yes

No

No

No

Cassandra (Datastax)

6.8 (Certified)

No

No

No

No

Yes

No

No

No

Cloudera Data Platform (CDP) Private Cloud

7.1.9 (for job execution)

No

No

No

No

Yes

Yes

No

Yes

via Impala (as data source)

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

via Hive3a (as data source)

No

Yes

Yes

Yes

Yes

Yes

No

Yes

Google BigQuery

1.5.4.1008b

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Google Dataprocc (for job execution)

2.1d

No

No

No

No

Yes

Yes

No

No

via Hive2 and Google BigQuery (as data source)

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Greenplum

4.3

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Microsoft Azure HDInsight

4.0

Yes

Yes

No

No

Yes

No

No

Yes

MongoDB

7

No

No

Yes

No

Yes

Yes

No

No

Vertica

11

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Notes: A generic Apache Hadoop driver is included in the Pentaho distribution for version 10.2: Other supported drivers can be downloaded from the Support Portal.a Hive3 as a data source for CDP also supports Hive LLAP, and Hive3 on Tez.

b The Simba driver required for Google BigQuery is the JDBC 4.2-compatible version, which you can download from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip.

c HBase is not supported with Google Dataproc.

d Use the Google Dataproc 2.1 driver for your Google Dataproc 2.2 cluster. The Google Dataproc 2.1 driver is certified to work for Google Dataproc 2.2.

e EMR clusters (version 7.x and later) built with JDK 17 exclude the commons-lang-2.6.jar library from their standard Hadoop library directories ($HADOOP_HOME/lib). To use the EMR driver for EMR 7.x, obtain the commons-lang-2.6.jar file from a trusted source, such as the official Maven repository (Maven Repository: commons-lang » commons-lang » 2.6). Then manually copy the downloaded JAR file to the $HADOOP_HOME/lib or $HADOOP_MAPRED_HOME/lib directory on each node within the EMR cluster to ensure that all worker nodes have access to the library.

SQL dialect-specific

Pentaho software generates dialect-specific SQL when communicating with these data sources. Certified indicates the SQL dialect has been tested for compatibility with Pentaho.

Pentaho Software

Data Source

Pentaho Analyzer

Certified

  • Amazon Redshift

  • Azure SQL

  • Impala

  • MySQL

  • Microsoft SQL Server

  • Oracle

  • PostgreSQL

  • Snowflake

Supported

  • Access

  • Firebird

  • Greenplum

  • Hsqldb

  • IBM DB2

  • IBM MQ 9.2

  • Informix

  • Ingres

  • Interbase

  • Neoview

  • SqlStream

  • Sybase

  • Vectorwise

  • Vertica

  • Other SQL-89 compliant*

Pentaho Metadata

Certified

  • Azure SQL

  • Hive 2

  • Impala

  • MySQL

  • PostgreSQL

Supported

  • Amazon Redshift

  • ASSQL

  • Firebird

  • H2

  • Hypersonic

  • IBM DB2

  • IBM MQ 9.2

  • Ingres

  • Interbase

  • MS Access

  • MS SQL Server (JTDS Driver)

  • MS SQL Server (Microsoft Driver)

  • Snowflake

  • Sybase

  • Vertica

  • Other SQL-92 compliant*

Pentaho Data Integration

Certified

  • Amazon Redshift

  • Azure SQL

  • Hive

  • Hive 2

  • Impala

  • MS SQL Server (JTDS Driver)

  • MS SQL Server (Microsoft Driver)

  • MySQL

  • Oracle

  • PostgreSQL

  • Snowflake

  • Vertica

Supported

  • AS/400

  • InfiniDB

  • Exasol 4

  • Firebird SQL

  • Greenplum

  • H2

  • Hypersonic

  • IBM DB2

  • IBM MQ 9.2

  • Informix

  • Ingres

  • Ingres VectorWise

  • MaxDB (SAP DB)

  • Neoview

  • Oracle RDB

  • SQLite

  • UniVerse database

  • Other SQL-92 compliant*

* If your data source is not in this list and is compatible with SQL-92, Pentaho software uses a generic SQL dialect.

Security

Pentaho software integrates with these third-party security authentication systems:

  • Active Directory

  • CAS 6.6 (Certified)

  • Integrated Microsoft Windows Authentication

  • LDAP

  • RDBMS

Java virtual machine

Pentaho software requirements for Java Runtime Environment (JRE).

Pentaho Software

Certified

Supported

All Pentaho software

  • Oracle Java 17

  • Oracle OpenJDK 17

  • Oracle Java 11.x & 17.x

  • Oracle OpenJDK 11.x & 17.x

  • Eclipse Temurin by Adoptium

  • Zulu from Azul Systems

**Note:** The PDI client requires at least Java 11.x to run on Windows 11.

Web browsers

Pentaho supports major versions of web browsers that are publicly available six weeks before the finalization of a Pentaho release.

Certified Browsers

Supported Browsers

  • Apple Safari 16.4 (On macOS only)

  • Google Chrome 126

  • Microsoft Edge 126

  • Mozilla Firefox 127

  • Apple Safari 16.4 and later (On macOS only)

  • Google Chrome 126 and later

  • Microsoft Edge 126 and later

  • Mozilla Firefox 127 and later

Support statement for Analyzer on Impala

These are the minimum requirements for Analyzer to work with Impala:

  • Pentaho 7.1 or later

  • Impala 1.3.x or later

  • Recommend using Parquet compressed file format for tables in Impala

  • Make sure that the JDBC driver is dropped into the Pentaho Server and Schema Workbench directories. See the Install Pentaho Data Integration and Analytics document for details.

  • Turn off connection pooling in Pentaho Server.

  • In Mondrian schemas, divide dimension tables with high cardinality into several levels

Note: As with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the best practice, "Pentaho Analyzer with Impala as a Data Source" located at: https://support.pentaho.com/hc/en-us/articles/208652846 or download the PDF.

There are some compiled Mondrian automated test suite results for Analyzer on Impala with OEM Simba, as well as the community Apache Hive driver:

Google BigQuery

You can use Google BigQuery as a data source with the Pentaho User Console or with the PDI client.

Before you begin, you must have a Google account and must create service account credentials in the form of a key file in JSON format to connect to Google BigQuery. To create service account credentials, see the Google Cloud Storage Authentication documentation.

Additionally, you must set permissions for your BigQuery and Google Cloud accounts. To configure your service account authentication, see the Google Service Account documentation.

Perform the following steps to create a JDBC connection to a Google BigQuery data source from the User Console or PDI client.

  1. Stop the Pentaho Server.

  2. Download the ZIP file containing the Simba version 1.5.4.1008 JDBC 4.2 driver for Google BigQuery from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip.

  3. Navigate to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib directory for the User Console or the design-tools/data-integration/lib directory for the PDI client and delete any files associated with previous versions of Google BigQuery.

    Visually verify each file to ensure the older version is deleted.

  4. Extract the following files to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib folder for the User Console or the design-tools/data-integration/lib directory for the PDI client.

    • animal-sniffer-annotations-1.14.jar

    • api-common-1.7.0.jar

    • avro-1.9.0.jar

    • checker-compat-qual-2.5.2.jar

    • error_prone_annotations-2.1.3.jar

    • gax-1.42.0.jar

    • gax-grpc-1.42.0.jar

    • google-api-client-1.28.0.jar

    • google-api-services-bigquery-v2-rev426-1.25.0.jar

    • google-auth-library-credentials-0.15.0.jar

    • google-auth-library-oauth2-http-0.13.0.jar

    • GoogleBigQueryJDBC42.jar

    • google-cloud-bigquerystorage-0.85.0-alpha.jar

    • google-cloud-core-1.67.0.jar

    • google-cloud-core-grpc-1.67.0.jar

    • google-http-client-1.29.0.jar

    • google-http-client-apache-2.0.0.jar

    • google-http-client-jackson2-1.28.0.jar

    • google-oauth-client-1.28.0.jar

    • grpc-alts-1.18.0.jar

    • grpc-auth-1.18.0.jar

    • grpc-context-1.18.0.jar

    • grpc-core-1.18.0.jar

    • grpc-google-cloud-bigquerystorage-v1beta1-0.50.0.jar

    • grpc-grpclb-1.18.0.jar

    • grpc-netty-shaded-1.18.0.jar

    • grpc-protobuf-1.18.0.jar

    • grpc-protobuf-lite-1.18.0.jar

    • grpc-stub-1.18.0.jar

    • gson-2.7.jar

    • j2objc-annotations-1.1.jar

    • javax.annotation-api-1.3.2.jar

    • jsr305-3.0.2.jar

    • opencensus-api-0.18.0.jar

    • opencensus-contrib-grpc-metrics-0.18.0.jar

    • opencensus-contrib-http-util-0.18.0.jar

    • protobuf-java-3.7.0.jar

    • protobuf-java-util-3.7.0.jar

    • proto-google-cloud-bigquerystorage-v1beta1-0.50.0.jar

    • proto-google-common-protos-1.15.0.jar

    • proto-google-iam-v1-0.12.0.jar

    • threetenbp-1.3.3.jarNote: The Google BigQuery connection name does not display in the User Console Database Connection dialog box until you copy these files.

  5. Restart the Pentaho Server.

  6. Log on to the User Console or the PDI client, then open the Database Connection dialog box.

    See the Install Pentaho Data Integration and Analytics document for more information on the Database Connection dialog box.

  7. In the Database Connection dialog box, select General, then select Google BigQuery as the Database Type.

  8. In the Settings area, enter the information for your Google BigQuery account.

    • The Host Name is the URL to Google's BigQuery web services API. For example, https://www.googleapis.com/bigquery/v2

    • The Project ID in the PDI client and the Database name in the User Console are identical.

    • The Port Number is 443.

  9. Click Options, then add the following parameters and values.

    Parameter
    Value

    OAuthType

    0 (Zero)

    OAuthServiceAcctEmail

    Specify your service account email address.

    OAuthPvtKeyPath

    Specify the path to your private key credential file.

    Timeout

    Specify the amount of time, in seconds, before the server closes the connection. The recommended value is 120 seconds.

  10. Click Test to verify that you can connect to your data.

Last updated

Was this helpful?