Components Reference
Pentaho aims to accommodate diverse computing environments. This list provides details about the environment components and versions we support. Where applicable, versions are listed as certified or supported:
Certified
The version has been tested and validated for compatibility with Pentaho.
Supported
Support is available for listed non-certified versions.
If you have questions about your particular computing environment, contact Pentaho Support.
Hitachi Vantara products
The following Hitachi Vantara product is certified for Pentaho 10.2:
Hitachi Content Platform 9.7
Server
The Pentaho Server is hardware-independent and runs on server-class computers.
Your server-class computer must comply with the specifications for minimum hardware and required operating systems:
Hardware - 64 bit
Certified Operating System - 64 bit
Processor
Intel EM64T or AMD64 Dual-Core or later
RAM
8 GB with 4 GB dedicated to Pentaho servers
Disk Space
20 GB free after installation
Microsoft Windows 2022 Server
Red Hat Enterprise 9*
Ubuntu Server 22.04 LTS
* Pentaho Data Integration and Analytics is supported on any Linux distribution binary-compatible with RHEL 9 and Ubuntu Server 22, including in virtualized and cloud environments. If you have any questions, contact Pentaho Support.
**Note:** Mac servers are not supported as an operating system.
Container deployment
Supported technology for deploying Pentaho in containers.
Docker
24.0.6
24.0.6
Note: Kubernetes environments that use this Docker version are also supported.
You can also deploy pre-configured Docker images of specific Pentaho products on your AWS environments. See Hyperscalers in the Install Pentaho Data Integration and Analytics document for details.
Workstation
These Pentaho design tools are hardware-independent and run on client-class computers that comply with these specifications for minimum hardware and required operation systems.
Pentaho Aggregation Designer
Pentaho Data Integration
Pentaho Metadata Editor
Pentaho Report Designer
Pentaho Schema Workbench
Hardware—64 bit
Operating System—64 bit
Certified
Supported
Processors
Apple Macintosh Dual-Core
Apple Mac M1, M2, and M3 chipset
Intel EM64T or AMD64 Dual-Core or later
RAM
2 GB RAM for most of the design tools, PDI requires 2 GB dedicated
Disk Space
2 GB free after installation
Minimum Screen Size
1280 x 960 pixels
Ubuntu Desktop 22.04
Microsoft Windows 11
macOS 13 (Ventura)
Ubuntu Desktop 20.04, 22.04
Microsoft Windows 10 & 11
macOS 13 (Ventura)
**Note:** Ubuntu Linux requires `libwebkitgtk-1.0`. See **Install Pentaho Data Integration and Analytics** for more information.
Embedded software
When embedding Pentaho software into other applications, the computing environment should comply with these specifications for minimum hardware and required operation systems.
Embedded Pentaho Reporting
Embedded Pentaho Analysis
Embedded Pentaho Data Integration
Note: Pentaho Data Integration and Analytics is officially certified to run on the Red Hat Enterprise and Ubuntu Linux distributions. It is compatible with any binary-compatible Linux distribution that meets the necessary software and hardware requirements, including in virtualized and cloud environments. If you have any questions, contact Pentaho Support.
The following specifications comply with minimum hardware and required operating systems for embedding Pentaho reporting, analysis, and data integration:
Hardware—64 bit
Certified Operating System—64 bit
Processors
Intel EM64T or AMD64 Dual-Core
RAM
8 GB with 4 GB dedicated to Pentaho servers
Disk Space
20 GB free after installation
Microsoft Windows 2022 Server
Red Hat Enterprise 9
Ubuntu Server 22.04 LTS
Application servers
The server to which you deploy Pentaho software must run the following application server:
Tomcat 9.0.86 (Certified)
Solution database repositories
Pentaho software stores processing artifacts in these database repositories:
PostgreSQL 15*
PostgreSQL 14 & 15
MySQL 8.026
MySQL 8.026
Oracle 23ai
Oracle 19c & 23ai (including patched versions)
MS SQL Server 2019
Microsoft SQL Server 2017 & 2019 (including patched versions)
Maria DB 11.1.2
Maria DB 11.1.2
* The default installed solution database.
Apache Hadoop vendors
Pentaho software has certified or supported data sources from these Hadoop Vendors.
Amazon EMR
7.0.0
Apache Vanilla Hadoop
3.3.0
Cloudera Data Platform (CDP) Private Cloud
7.1.x
Cloudera Data Platform (CDP) Public Cloud
7.2
Google Dataproc
2.1
Microsoft Azure HDInsight
4.0
Data Sources: General
Pentaho software supports the following data sources. Check this list if you are evaluating Pentaho or checking for general compatibility with a specific vendor.
Salesforce
60
60
Amazon Redshift
1.2.34.1058
1.2.34.1058, 2.1
Snowflake
3.14.1
3.13.33, 3.14.1
Data Sources: Pentaho Tools
This table summarizes which data sources are compatible with the main Pentaho tools.
Pentaho Software
Data Source
Pentaho Reporting
JDBC 3/4*
ODBC
OLAP4J
XML
Pentaho Analysis
Pentaho Data Integration
Pentaho Metadata
Scriptable
Snowflake
Pentaho Server, Action Sequences
Relational (JDBC)
Hibernate
Javascript
Metadata (MQL)
Mondrian (MDX)
XML (XQuery)
Security User/Role List Provider
Snowflake
Data Integration Steps (PDI)
Other Action Sequences
Web Services
XMLA
Pentaho Data Integration
JDBC 3/4*
OLAP4J
Salesforce
Snowflake
XML
CSV
Microsoft Excel
* Use a JDBC 3.x or 4.x compliant driver that is compatible with SQL-92 standards when communicating with relational data sources. For a list of drivers to use with relational JDBC databases, see the JDBC drivers reference.
Big Data Sources: General
Pentaho software supports the following Big Data sources. Check this list if you are evaluating Pentaho or checking for general compatibility with a specific vendor.
Amazon EMR (via Hive)
7.0.0 (Certified)
Apache Vanilla Hadoop
3.3.0 (Certified)
Cassandra (Datastax)
6.8 (Certified)
Cloudera Data Platform (CDP) on-prem (Private cloud)
7.1.9 (Certified)
Cloudera Data Platform (Public cloud)
7.2.17
Google BigQuery
1.2.25
Google Dataproc
2.1
Greenplum
4.3
Microsoft Azure HDInsight
4.0
MongoDB
7 (Certified)
Vertica
11
Big Data Sources: Details
This table shows the Big Data sources that are compatible with specific Pentaho tools.
Data Source
Versions
Analyzer
PIR/PDD
Pentaho Reporting
DSW
PDIServer/Client
PRD
PSW
PME
Amazon EMR
7.0.0e (Certified)
No
No
No
No
Yes
Yes
No
No
Apache Vanilla Hadoop
3.3.0 (Certified)
No
No
No
Yes
Yes
No
No
No
Cassandra (Datastax)
6.8 (Certified)
No
No
No
No
Yes
No
No
No
Cloudera Data Platform (CDP) Private Cloud
7.1.9 (for job execution)
No
No
No
No
Yes
Yes
No
Yes
via Impala (as data source)
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
via Hive3a (as data source)
No
Yes
Yes
Yes
Yes
Yes
No
Yes
Google BigQuery
1.5.4.1008b
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Google Dataprocc (for job execution)
2.1d
No
No
No
No
Yes
Yes
No
No
via Hive2 and Google BigQuery (as data source)
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Greenplum
4.3
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Microsoft Azure HDInsight
4.0
Yes
Yes
No
No
Yes
No
No
Yes
MongoDB
7
No
No
Yes
No
Yes
Yes
No
No
Vertica
11
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Notes: A generic Apache Hadoop driver is included in the Pentaho distribution for version 10.2: Other supported drivers can be downloaded from the Support Portal.a Hive3 as a data source for CDP also supports Hive LLAP, and Hive3 on Tez.
b The Simba driver required for Google BigQuery is the JDBC 4.2-compatible version, which you can download from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip.
c HBase is not supported with Google Dataproc.
d Use the Google Dataproc 2.1 driver for your Google Dataproc 2.2 cluster. The Google Dataproc 2.1 driver is certified to work for Google Dataproc 2.2.
e EMR clusters (version 7.x and later) built with JDK 17 exclude the commons-lang-2.6.jar
library from their standard Hadoop library directories ($HADOOP_HOME/lib
). To use the EMR driver for EMR 7.x, obtain the commons-lang-2.6.jar
file from a trusted source, such as the official Maven repository (Maven Repository: commons-lang » commons-lang » 2.6). Then manually copy the downloaded JAR file to the $HADOOP_HOME/lib
or $HADOOP_MAPRED_HOME/lib
directory on each node within the EMR cluster to ensure that all worker nodes have access to the library.
SQL dialect-specific
Pentaho software generates dialect-specific SQL when communicating with these data sources. Certified indicates the SQL dialect has been tested for compatibility with Pentaho.
Pentaho Software
Data Source
Pentaho Analyzer
Certified
Amazon Redshift
Azure SQL
Impala
MySQL
Microsoft SQL Server
Oracle
PostgreSQL
Snowflake
Supported
Access
Firebird
Greenplum
Hsqldb
IBM DB2
IBM MQ 9.2
Informix
Ingres
Interbase
Neoview
SqlStream
Sybase
Vectorwise
Vertica
Other SQL-89 compliant*
Pentaho Metadata
Certified
Azure SQL
Hive 2
Impala
MySQL
PostgreSQL
Supported
Amazon Redshift
ASSQL
Firebird
H2
Hypersonic
IBM DB2
IBM MQ 9.2
Ingres
Interbase
MS Access
MS SQL Server (JTDS Driver)
MS SQL Server (Microsoft Driver)
Snowflake
Sybase
Vertica
Other SQL-92 compliant*
Pentaho Data Integration
Certified
Amazon Redshift
Azure SQL
Hive
Hive 2
Impala
MS SQL Server (JTDS Driver)
MS SQL Server (Microsoft Driver)
MySQL
Oracle
PostgreSQL
Snowflake
Vertica
Supported
AS/400
InfiniDB
Exasol 4
Firebird SQL
Greenplum
H2
Hypersonic
IBM DB2
IBM MQ 9.2
Informix
Ingres
Ingres VectorWise
MaxDB (SAP DB)
Neoview
Oracle RDB
SQLite
UniVerse database
Other SQL-92 compliant*
* If your data source is not in this list and is compatible with SQL-92, Pentaho software uses a generic SQL dialect.
Security
Pentaho software integrates with these third-party security authentication systems:
Active Directory
CAS 6.6 (Certified)
Integrated Microsoft Windows Authentication
LDAP
RDBMS
Java virtual machine
Pentaho software requirements for Java Runtime Environment (JRE).
Pentaho Software
Certified
Supported
All Pentaho software
Oracle Java 17
Oracle OpenJDK 17
Oracle Java 11.x & 17.x
Oracle OpenJDK 11.x & 17.x
Eclipse Temurin by Adoptium
Zulu from Azul Systems
**Note:** The PDI client requires at least Java 11.x to run on Windows 11.
Web browsers
Pentaho supports major versions of web browsers that are publicly available six weeks before the finalization of a Pentaho release.
Certified Browsers
Supported Browsers
Apple Safari 16.4 (On macOS only)
Google Chrome 126
Microsoft Edge 126
Mozilla Firefox 127
Apple Safari 16.4 and later (On macOS only)
Google Chrome 126 and later
Microsoft Edge 126 and later
Mozilla Firefox 127 and later
Support statement for Analyzer on Impala
These are the minimum requirements for Analyzer to work with Impala:
Pentaho 7.1 or later
Impala 1.3.x or later
Recommend using Parquet compressed file format for tables in Impala
Make sure that the JDBC driver is dropped into the Pentaho Server and Schema Workbench directories. See the Install Pentaho Data Integration and Analytics document for details.
Turn off connection pooling in Pentaho Server.
In Mondrian schemas, divide dimension tables with high cardinality into several levels
Note: As with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the best practice, "Pentaho Analyzer with Impala as a Data Source" located at: https://support.pentaho.com/hc/en-us/articles/208652846 or download the PDF.
There are some compiled Mondrian automated test suite results for Analyzer on Impala with OEM Simba, as well as the community Apache Hive driver:
Google BigQuery
You can use Google BigQuery as a data source with the Pentaho User Console or with the PDI client.
Before you begin, you must have a Google account and must create service account credentials in the form of a key file in JSON format to connect to Google BigQuery. To create service account credentials, see the Google Cloud Storage Authentication documentation.
Additionally, you must set permissions for your BigQuery and Google Cloud accounts. To configure your service account authentication, see the Google Service Account documentation.
Perform the following steps to create a JDBC connection to a Google BigQuery data source from the User Console or PDI client.
Stop the Pentaho Server.
Download the ZIP file containing the Simba version 1.5.4.1008 JDBC 4.2 driver for Google BigQuery from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip.
Navigate to the
server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib
directory for the User Console or thedesign-tools/data-integration/lib
directory for the PDI client and delete any files associated with previous versions of Google BigQuery.Visually verify each file to ensure the older version is deleted.
Extract the following files to the
server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib
folder for the User Console or thedesign-tools/data-integration/lib
directory for the PDI client.animal-sniffer-annotations-1.14.jar
api-common-1.7.0.jar
avro-1.9.0.jar
checker-compat-qual-2.5.2.jar
error_prone_annotations-2.1.3.jar
gax-1.42.0.jar
gax-grpc-1.42.0.jar
google-api-client-1.28.0.jar
google-api-services-bigquery-v2-rev426-1.25.0.jar
google-auth-library-credentials-0.15.0.jar
google-auth-library-oauth2-http-0.13.0.jar
GoogleBigQueryJDBC42.jar
google-cloud-bigquerystorage-0.85.0-alpha.jar
google-cloud-core-1.67.0.jar
google-cloud-core-grpc-1.67.0.jar
google-http-client-1.29.0.jar
google-http-client-apache-2.0.0.jar
google-http-client-jackson2-1.28.0.jar
google-oauth-client-1.28.0.jar
grpc-alts-1.18.0.jar
grpc-auth-1.18.0.jar
grpc-context-1.18.0.jar
grpc-core-1.18.0.jar
grpc-google-cloud-bigquerystorage-v1beta1-0.50.0.jar
grpc-grpclb-1.18.0.jar
grpc-netty-shaded-1.18.0.jar
grpc-protobuf-1.18.0.jar
grpc-protobuf-lite-1.18.0.jar
grpc-stub-1.18.0.jar
gson-2.7.jar
j2objc-annotations-1.1.jar
javax.annotation-api-1.3.2.jar
jsr305-3.0.2.jar
opencensus-api-0.18.0.jar
opencensus-contrib-grpc-metrics-0.18.0.jar
opencensus-contrib-http-util-0.18.0.jar
protobuf-java-3.7.0.jar
protobuf-java-util-3.7.0.jar
proto-google-cloud-bigquerystorage-v1beta1-0.50.0.jar
proto-google-common-protos-1.15.0.jar
proto-google-iam-v1-0.12.0.jar
threetenbp-1.3.3.jar
Note: The Google BigQuery connection name does not display in the User Console Database Connection dialog box until you copy these files.
Restart the Pentaho Server.
Log on to the User Console or the PDI client, then open the Database Connection dialog box.
See the Install Pentaho Data Integration and Analytics document for more information on the Database Connection dialog box.
In the Database Connection dialog box, select General, then select Google BigQuery as the Database Type.
In the Settings area, enter the information for your Google BigQuery account.
The Host Name is the URL to Google's BigQuery web services API. For example, https://www.googleapis.com/bigquery/v2
The Project ID in the PDI client and the Database name in the User Console are identical.
The Port Number is
443
.
Click Options, then add the following parameters and values.
ParameterValueOAuthType
0
(Zero)OAuthServiceAcctEmail
Specify your service account email address.
OAuthPvtKeyPath
Specify the path to your private key credential file.
Timeout
Specify the amount of time, in seconds, before the server closes the connection. The recommended value is 120 seconds.
Click Test to verify that you can connect to your data.
Last updated
Was this helpful?