Secure the Pentaho system

Lock down access to the Pentaho Server first. Then secure Hadoop and CDP integrations.

User security

You can choose from two different user security options: Pentaho Security or advanced security providers (such as LDAP, Single Sign-On, or Microsoft Active Directory).

For the Pentaho User Console (PUC), your predefined users and roles can be used if you are already using a security provider such as Lightweight Directory Access Protocol (LDAP), Microsoft Active Directory (MSAD), or Single Sign-On (SSO). Pentaho Data Integration (PDI) can also be configured to use your implementation of these providers or Kerberos to authenticate users and authorize data access.

These articles guide you through the process of configuring third-party security frameworks for the Pentaho Server.

Note: If you are evaluating Pentaho or have a production environment with fewer than a hundred users, you may decide to use Pentaho default security. See the Install Pentaho Data Integration and Analytics document for details.

Before you can implement advanced security, you must have installed and configured the Pentaho Server. You should have administrative-level knowledge of the security provider you want to use, details about your user community, and a plan for the user roles to be used in PDI. You should also know how to use the command line to issue commands for Microsoft Windows or Linux.

PUC can be used to perform most security tasks pertaining to the console. For some cases with PDI, you will need a text editor to modify text files. Some of these security tasks also require that you work on the actual machine where the Pentaho Server is installed.

All of the tasks that use the Administration page in PUC require that you log on to the User Console with the Pentaho administrator user name and password.

For information on the two different user security options you can choose from, see the following sections:

Pentaho Server security

Pentaho Security is a quick way to configure security. It works well without a security provider. It also works well for communities under 100 users.

The Pentaho User Console (PUC) lets you define security by users and roles. The Pentaho Server controls which users and roles can access web resources and repository content.

Hiding user folders in PUC and PDI

One way you can centralize and secure content created by users is to hide individual users' Home folders in the Pentaho User Console (PUC) or in the PDI client. For example, if your organization implements multi-tenancy, you may want to prevent individual users from viewing their Home folders for security reasons.

You can configure your server to hide the Home folders by default for both PUC and PDI. When you create new users in your system, their Home folders will be hidden. If a user needs to create, edit, or save content, you can provide the Write permission in a folder that is visible to that user. Those users can then view the folder and access the content. You can add the Write permission in PUC and in the PDI client.

These tasks assume you are a Pentaho Administrator.

Perform the following steps to edit the system.properties file so that when you create new users, their Home folders will be hidden by default.

  1. Stop the Pentaho Server.

    See the Install Pentaho Data Integration and Analytics document for instructions on starting and stopping the Pentaho Server.

  2. Navigate to /Pentaho/server/pentaho-server/pentaho-solutions/system and open system.properties in a text editor.

  3. Locate the hideUserHomeFolderOnCreate property. By default, this property is set to false.

  4. Change the setting to true:

    hideUserHomeFolderOnCreate=true

  5. Save and close system.properties.

  6. Start the Pentaho Server.

    See the Install Pentaho Data Integration and Analytics document for instructions on starting and stopping the Pentaho Server.

Now when you add a new user in either PUC or the PDI client, that user's Home folder is hidden by default.

Next steps:

Override the hidden Home folder for a user

Follow these steps to override the hidden Home folder for a specific user.

  1. Log in to PUC with your Pentaho Administrator credentials.

  2. Go to Browse Files.

  3. Select the user’s Home folder.

  4. In the Properties dialog box, clear Hidden.

The user's Home folder is now visible.

Stop hiding the Home folder for new users

Follow these steps to stop creating users with their Home folders hidden by default.

  1. Stop the Pentaho Server.

  2. Navigate to /Pentaho/server/pentaho-server/pentaho-solutions/system and open system.properties in a text editor.

  3. Locate the hideUserHomeFolderOnCreate property.

  4. Change the setting to false:

    hideUserHomeFolderOnCreate=false

  5. Save and close system.properties.

  6. Start the Pentaho Server.

When you add a new user in either PUC or the PDI client, that user's Home folder is now visible.

See the Install Pentaho Data Integration and Analytics document for instructions on starting and stopping the Pentaho Server.

Assign the Write permission to a user folder in PUC

In PUC you can assign Write permission in a public folder to a user whose Home folder is hidden. When this permission is granted, the user can save and edit content they create using PUC.

  1. Log in to PUC with your Pentaho Administrator credentials.

  2. Select the Public folder, and then select or create the folder you want the user to access.

  3. Assign Write permission:

    1. Click Properties > Share and clear Inherits folder permissions.

    2. Click Add and select the user.

    3. Select Write for the user.

    4. Click OK to save your changes.

The user can now save content in the assigned folder.

Assign the Write permission to a user folder in the PDI client

In the PDI client you can assign the Write permission in a public folder to a user whose Home folder is hidden. When this permission is granted, the user can save and edit content they create using the PDI client.

  1. Start the PDI client.

    See the Pentaho Data Integration document for instructions on starting the PDI client.

  2. Connect to a Pentaho Repository with your Pentaho Administrator credentials.

  3. Open the Repository Explorer: Tools > Repository > Explore.

  4. On the Browse tab, select the Public folder, and then select or create the folder you want the user to access.

  5. Select the folder and grant Write permission:

    1. On the Access Control panel, clear Inherit access control from parent.

    2. Click the Plus sign to add a user.

    3. Select the user and move them to Selected. Click OK.

    4. Select the user in User/Role and select Write.

    5. Click Apply, then click OK.

  6. Close the Repository Explorer.

The user can now save and edit content in the assigned folder.

Restrict or share files and folders

Access to files or folders can be refined using the Pentaho User Console. Each file or folder can either use the default permissions or you can tailor them for specific users and roles.

Prior to performing this task, determine whether you will use the default Pentaho roles or create custom users and roles. You must also have successfully set up your security back end.

  1. Log in to the User Console using the administrator role.

  2. From Browse Files, choose the folder you want to set permissions on from the Folders pane.

    If you want to set permissions on a specific file in that folder, highlight the file in the Files pane.

  3. Click Properties in the Actions pane.

    The Properties window appears.

  4. On the Share tab, select the Role that you want to set permissions for. Then clear Inherits folder permissions.

    Permissions for [Role] becomes available.

  5. Select permissions for that role, then click OK.

The permissions are set for that file or folder and are associated with the selected role.

For additional security in multi-tenancy organizations, you can hide individual users' Home folders. See Hiding user folders in PUC and PDI.

Pass authentication credentials in URL parameters

circle-info

This section is currently a placeholder. Add the approved guidance for passing credentials in URL parameters.

Remove Pentaho Server security

You can remove Pentaho Server security by enabling anonymous access or by modifying data source management.

Enable anonymous access

You can bypass built-in security on the Pentaho Server by giving all permissions to anonymous users. An "anonymousUser" is any user, either existing or newly created, that you specify as an all-permissions, no-login user, and to whom you grant the Anonymous role.

triangle-exclamation

All of the files you will be using are located in /pentaho/server/pentaho-server/pentaho-solutions/system. Before you begin, stop the Pentaho Server.

Modify application security

Perform the following steps to modify application security:

  1. Open applicationContext-spring-security.xml in a text editor.

  2. Make sure a default anonymous role is defined. Match your bean definition and property value to the following example:

    Note: These next steps permit PDI client tools to publish to the Pentaho Server without a user name and password.

  3. Find these two beans in the same file:

    • filterInvocationInterceptor

    • filterInvocationInterceptorForWS

  4. Locate the securityMetadataSource property in the beans and match the contents to the following example:

  5. Save and close applicationContext-spring-security.xml.

Modify Pentaho configuration

Perform the following steps to modify the Pentaho configuration:

  1. Open pentaho.xml in a text editor.

  2. Find the anonymous-authentication section under pentaho-system, and define the anonymous user and role as shown in the following example:

  3. Save and close pentaho.xml.

Modify repository properties

Perform the following steps to modify the repository properties:

  1. Open repository-spring.properties in a text editor.

  2. Find singleTenantAdminAuthorityName and replace the value with Anonymous.

  3. Find singleTenantAdminUserName and replace the value with your anonymous user name.

  4. Save and close the file.

Map the appropriate role

Perform the following steps to map roles:

  1. Find all references to the bean id="Mondrian-UserRoleMapper". Make sure the only active mapper is the one shown in the following example:

  2. If you changed pentahoObjects.spring.xml, save and close the file.

You have now worked around Pentaho Server security. If you use the relational metadata database model, refer to Remove Security from Metadata Domain Repository for the next steps.

Remove security from data source management

This procedure changes your data source management so that an anonymous user can access it. These steps are necessary to completely remove security from the Pentaho Server. However, this procedure does not remove all security. If you need to remove all security, enable anonymous access as described above.

Perform the following steps to completely remove security from the Pentaho Server:

  1. Stop the Pentaho Server (if needed).

  2. Open /pentaho/server/pentaho-server/pentaho-solutions/system/data-access/settings.xml in a text editor.

    1. Find <data-access-roles>Administrator</data-access-roles> and change Administrator to Anonymous.

    2. Find <data-access-view-roles>Authenticated,Administrator</data-access-view-roles> and change Authenticated,Administrator to Anonymous.

    3. Find <data-access-view-users>suzy</data-access-view-users> and change suzy to anonymousUser.

    4. Find <data-access-datasource-solution-storage>admin</data-access-datasource-solution-storage> and change admin to anonymousUser.

  3. Save and close the file.

  4. Restart the Pentaho Server.

Advanced security providers

Use these options when the default Pentaho security provider is not enough.

This section consolidates configuration guidance for common advanced providers and related hardening tasks.

Spring (authentication providers) security

Spring security is a cascading security implementation that moves down through a list of authentication providers. If the first provider fails to authenticate, then the application looks to the next provider in the list to authenticate. If you are using multiple AuthenticationProviders at the same time, you must add each security provider to the applicationContext.spring.security.xml file. You must also add provider name values to the activeUserDetailsService beans in the pentahoObjects.spring.xml file. We recommend that you make a backup of these files before altering them.

ApplicationContext

Perform the following steps to add security providers to the ApplicationContext:

  1. Stop the Pentaho Server and the solution repository.

  2. Navigate to the /pentaho-solutions/system directory and open the applicationContext-­spring-security.xml file with any text editor.

  3. Locate the following authenticationManager bean tags:

  4. Add your AuthenticationProvider information below the list tag. The example below adds the jackrabbitprovider:

  5. Then, add providerName information right beneath the jackrabbit information. LDAP is used in this example. You can add as many providers as needed:

  6. After you are finished adding AuthenticationProvider information, save and close the file.

The following code block is a more complete example of the authenticationManager portion of the applicationContext-­spring-security.xml file:

Add the Jackrabbit provider

The Jackrabbit provider is required in the activeUserDetailsService bean, even if you configure another provider. Perform the following steps to add the Jackrabbit provider to the activeUserDetailsService bean:

  1. Navigate to the /pentaho-solutions/system directory and open the pentahoObjects.spring.xml file with any text editor.

  2. Locate the activeUserDetailsService bean tag:

  3. Replace ${security.provider} with the jackrabbit provider value. For example:

Add another provider

Perform the following steps to add more provider names:

  1. Duplicate the activeUserDetailsService bean shown in Substep 2 of the Add the Jackrabbit Provider section.

  2. Rename the bean ID, for example: bean id="activeUserDetailsService2"

  3. Replace the jackrabbit value with the new provider value. For example:

  4. Locate the following UserDetailsService bean tags:

  5. Add your bean ID to the list element. For example:

  6. Restart the Pentaho Server and solution repository.

Configure authentication

To configure Web resource authentication to correspond with your user roles in the Pentaho Server, perform the following instructions.

  1. Ensure that the Pentaho Server is not currently running; if it is, run the stop-pentaho script.

  2. Open a Terminal or Command Prompt window and navigate to the .../pentaho-solutions/system/ directory.

  3. Edit the applicationContext-spring-security.xml file with a text editor.

  4. Find and examine the following property: <property name="objectDefinitionSource">

  5. Modify the regex patterns to include your roles.

    The objectDefinitionSource property associates URL patterns with roles. RoleVoter specifies that if any role on the right hand side of the equals sign is granted to the user, the user may view any page that matches that URL pattern. The default roles in this file are not required. You can replace, delete, or change them in any way that suits you.

You should now have coarse-grained permissions established for user roles.

Authentication provider examples

Provider Name
Short Description
Application Context for AuthenticationProvider

Jackrabbit

Default Pentaho security.

applicationContext-spring-security-jackrabbit.xml

LDAP

LDAP security

applicationContext-spring-security-ldap.xml

JDBC

JDBC security allows you to use your own security tables

applicationContext-spring-security-jdbc.xml

Memory

In memory authentication

applicationContext-spring-security-memory.xml

JDBC security

You must have existing security tables in a relational database in order to proceed with this task.

Switch to JDBC security

Follow the instructions below to switch from Pentaho default security to JDBC security, which will allow you to use your own security tables.

If you want to use encrypted passwords for JDBC security as explained in the Install Pentaho Data Integration and Analytics document, use the encrypted password for all password values.

If you are using the Pentaho Server and choose to switch to a JDBC security shared object, you will no longer be able to use the role and user administration settings in the Administration portion of the User Console.

  1. Stop the Pentaho Server.

  2. Open /pentaho-solutions/system/security.properties with a text editor.

  3. Change the value of the provide property to jdbc.

  4. Set up the connection to the database that holds the users and authorities:

    1. Open the /pentaho-solutions/system/applicationContext-spring-security-jdbc.properties file with a text editor. Find the following two lines and change the jdbcDriver and URL to the appropriate values.

    2. Change the user name and password by editing the following two items:

    3. Set the validation.query by editing its row. Examples of different validation queries are shown in the file.

    4. Set the wait timeout, max pool, and max idle by editing the following three items to change the defaults.

    5. Save the file and close the editor.

  5. If needed, modify the user queries that pull information about users and authorities:

    1. Open /pentaho-solutions/system/applicationContext-spring-security-jdbc.xml with a text editor.

    2. Find the following line and change the SQL query returning the user and roles for which the user is a member to the appropriate statement:

    3. Find the following line and change the SQL query that determines the user, password, and whether they can log in to the appropriate statement:

  6. If needed, modify the following role queries that pull information about users and authorities.

    1. Open the /pentaho-solutions/system/applicationContext-pentaho-security-jdbc.xml file with a text editor.

    2. Find the following line and change the SQL query showing the roles for security on objects to the appropriate statement:

    3. Find the following line and change the SQL query that returns all users in a specific role to the appropriate statement:

    4. Find the following line and change the SQL query that returns all users by order to the appropriate statement:

    5. Save the file and close the editor.

  7. Update the default Pentaho admin user on the system to map to your JDBC admin user:

    1. Open the /pentaho-solutions/system/repository.spring.properties file with a text editor.

    2. Find the following lines and change the default value from <admin> to map to your <admin username> in your JDBC system:

    3. Save the file and close the editor.

  8. To fully map the JDBC's admin role to other configuration files, specify the name of the administrator role for your JDBC authentication database in the applicationContext-pentaho-security-jdbc.xml file.

    1. Open the /pentaho-solutions/system/applicationContext-pentaho-security-jdbc.xml file with a text editor.

    2. Find the following lines and change the entry key to the key assigned to the administrator role in your JDBC authentication database:

    3. Save and close the file.

  9. Start the Pentaho Server.

    The server is configured to authenticate users against the specified database.

Manual LDAP/JDBC hybrid configuration

You might need to create a hybrid between an LDAP security solution and a JDBC security table for role definitions. This is common in situations where LDAP roles can't be redefined for Pentaho Server use. These instructions help you switch the Pentaho Server's authentication back-end from the Pentaho data access object to an LDAP/JDBC hybrid.

Before you begin configuring LDAP and JDBC for the Pentaho Server, you will need to verify a couple of things.

Task
Description

Verify Successful Default Pentaho Security Deployment

Make sure your Pentaho Server has been successfully deployed using default Pentaho Security (Jackrabbit authentication).

Configure Pentaho for LDAP Authentication

Verify that your Pentaho system is configured for LDAP authentication.

Verify Database with User Roles

Verify that you have a database populated with your user roles.

After you finish the prerequisite tasks above, there are a few things that you need to do in order set up a hybrid LDAP/JDBC configuration successfully. The table structure described here is for example purposes.

These sections will guide you through the remaining steps of this process:

  • Step 1: Create user/authorities database tables

  • Step 2: Set up inserts for tables

  • Step 3: Update JDBC security queries

  • Step 4: Enable JDBC authorization beans

  • Step 5: Verify LDAP/JDBC configuration

Step 1: Create user/authorities database tables

You will need to create a few database tables in order to get LDAP and JDBC to work together.

  1. Create a table called USERS and populate it with the following values:

    Column Name
    Column Type
    Column Description

    username

    VARCHAR(50)

    The User name.

    password

    VARCHAR(50)

    This column value is not considered in a hybrid LDAP/JDBC solution.

    enables

    VARCHAR(100)

    Set to true if user is enables; false if not enabled.

  2. Create a table called AUTHORITIES and populate it with the following values:

    Column Name
    Column Type
    Column Description

    authority

    VARCHAR(50)

    The Pentaho role, such as Administrator, Report Author, etc.

  3. Create a table called GRANTED_AUTHORITIES and populate it with the following values:

    Column Name
    Column Type
    Column Description

    username

    VARCHAR(50)

    The User name.

    authority

    VARCHAR(5)

    Associated Pentaho role.

Step 2: Update user and role values for tables

Next, you will need to perform a series of updates for the tables you just created. Users will be authenticated using their Active Directory password.

Note: Some syntax examples are provided here for you to customize with your own values.

  1. Update usernames and passwords in the USERS table as shown:

  2. Update roles in the AUTHORITIES table as shown:

  3. Update users with their associated roles in the GRANTED_AUTHORITIES table as shown:

Step 3: Update JDBC security queries

You might have different names for your created tables than are provided in these examples. If so, after you have updated your user and role values in your tables, you need to update a couple of queries and other items to match your system names.

  1. Locate the /pentaho-server/pentaho-solutions/system directory and update these two files with the noted information.

    1. applicationContext-pentaho-security-jdbc.xml

    2. applicationContext-spring-security-jdbc.xml

  2. Update the query, as well as field names such as username, password, and enabled that are expected by spring framework security. Be sure to use an alias if you are using different field names.

  3. Stop the Pentaho Server.

  4. Copy your respective database JDBC driver to the tomcat/lib directory.

    See the JDBC drivers reference in the Try Pentaho Data Integration and Analytics document for information on supported drivers.

Step 4: Enable JDBC Authorization beans

Last, you will need to enable some JDBC Authorization beans.

Update security properties file

Perform the following steps to update the security.properties file:

  1. Stop the Pentaho Server.

  2. Locate the pentaho-server/pentaho-solutions/system directory.

  3. Open the security.properties file with any text editor.

    1. Locate the LDAP property bean and add the role provider as shown here:

  4. Save and close the file.

Update spring security-jdbc properties file

Perform the following steps to update the applicationContext-spring-security-jdbc.properties file:

  1. Open the applicationContext-spring-security-jdbc.properties file.

  2. Add or update this database information with your system values.

    Database Setting
    Description

    datasource.driver.classname

    Fully-qualified Java class name of the JDBC driver you are using.

    datasource.url

    Connection URL to be passed to your JDBC driver to establish a connection.

    datasource.username

    Connection username to be passed to our JDBC driver to establish a connection

    datasource.password

    Connection password to be passed to our JDBC driver to establish a connection

    datasource.validation.query

    SQL query that is used to validate connections from this pool before returning them to the caller. This query must be a SELECT statement that returns at least one row.

    datasource.pool.max.wait

    Maximum number of milliseconds that the pool will wait when there are no available connections. For a connection to be returned before throwing an exception, or <= 0, to wait indefinitely. Default is -1.

    datasource.pool.max.active

    Maximum number of active connections that can be allocated from this pool at the same time, or negative for no limit. Default value is 8.

    datasource.max.idle

    Maximum number of connections that can remain idle in the pool, without extra ones being destroyed, or negative for no limit. Default value is 8.

    datasource.min.idle

    Minimum number of active connections that can remain idle in the pool, without extra ones being created when the evictor runs, or 0 to create none. Default value is 0.

  3. Save and close the file.

Update the pentaho-security-jdbc file

Perform the following steps to update the applicationContext-pentaho-security-jdbc.xml file:

  1. Open the applicationContext-pentaho-security-jdbc.xml file.

  2. Change the entry key to show your admin role for your database.

From:

To:

  1. Save and close the editor.

Update the pentahoObjects spring file

Perform the following steps to update the pentahoObjects.spring.xml file:

  1. Open the pentahoObjects.spring.xml file.

  2. Change these beans as shown, then save and close the file.

From:

To:

Update spring security-ldap file

Perform the following steps to update the applicationContext-spring-security-ldap.xml file:

  1. Open the applicationContext-spring-security-ldap.xml file.

  2. Remove the bean for org.springframework.security.ldap.populator.DefaultLdapAuthoritiesPopulator and replace it with this one:

  3. Save and close the file.

Update the repository spring properties file

Perform the following steps to update the repository.spring.properties file:

  1. Open the repository.spring.properties file.

  2. Locate the value for the singleTenantAdminUserName and make sure that it points to the correct admin user for your system.

  3. Restart the Pentaho Server.

Step 5: Verify LDAP/JDBC Configuration

Pentaho should now be successfully configured with hybrid authentication. Users are authenticated through LDAP, and the roles are authorized through JDBC. You can verify this by logging into PUC as an admin and checking in the Users & Roles tab in the Administration perspective.

SAML security

Security Assertion Markup Language (SAML) is a web­based authentication mechanism that relies on the browser as an agent to broker the authentication flow. There are numerous third-party Identity Providers (IdP) available, such as OpenSSO, OKTA, and SSOCircle.com.

The following diagram is a high-level sketch of a SAML identification structure containing a third-party Identity Provider (IdP), an End-User Browser (the Pentaho User Console), and a Service Provider (the Pentaho Server):

See the guidelines for SAMLarrow-up-right on the Support Portal for help in setting up an example instance of SAML security. If you want to extend your SAML set up further, please work with with your Customer Success Manager or contact Supportarrow-up-right.

LDAP security

To use Lightweight Directory Access Protocol (LDAP) for user security, you must switch from the default Pentaho security to LDAP, then you must configure LDAP.

Switch to LDAP

To connect to your LDAP server, you must import the certificate into the JRE's truststore/keystore used by the Pentaho Server (java/lib/security/cacerts).

  1. From the User Console Home menu, click Administration, then select Authentication from the left.

    The Authentication interface appears. Local - Use basic Pentaho Authentication is selected by default.

  2. Select the External - Use LDAP / Active Directory server option.

    User console authentication set to external

    The LDAP Server Connection fields populate with a default URL, user name, and password.

  3. Change the Server URL, User Name, and Password as needed.

  4. Click Test Server Connection to verify the connection to your LDAP server and to complete the set up.

  5. Click the node to select the Pentaho System Administrator user and role to match your LDAP configuration, then click OK.

    Note: The Admin user is required for all system-related operations, including the creation of user folders. The Administrator Role is required for mapping a third-party admin role to the Pentaho admin role (Administrator).

  6. Select your LDAP Provider from the drop-down menu.

  7. Configure the LDAP connection as explained in LDAP properties.

  8. Stop the Pentaho Server.

    See the Install Pentaho Data Integration and Analytics document for instructions on starting and stopping the Pentaho Server.

  9. Delete the server/pentaho-server/pentaho-solutions/system/karaf/caches folder.

  10. Restart the Pentaho Server and test the LDAP functionality.

    See the Install Pentaho Data Integration and Analytics document for instructions on starting and stopping the Pentaho Server.

The Pentaho Server is now configured to authenticate users against your LDAP directory server.

Big data security

Secure Hadoop and CDP integrations in Pentaho.

Use Kerberos authentication or secure impersonation. Use Knox when you need a gateway.

Supported CDP security integrations

circle-info

If you use Knox, see Use Knox to access CDP.

These systems are supported when using Pentaho with a secured CDP cluster.

System
Kerberos
Knox

HDFS

Supported

Supported

Avro

Supported

Supported

OFC

Supported

Not supported

Parquet

Supported

Not supported

YARN

Supported

Not supported

YARN-based Pentaho MapReduce (PMR)*

Supported

Not supported

Sqoop

Supported

Not supported

Hive

Supported

Supported

YARN-based HBase

Supported

Supported

YARN-based Hadoop job executor

Supported

Not supported

Oozie

Not supported

Supported

Impala

Supported

Supported

Spark submit

Supported

Not supported

* You must have permission to write to /opt in the HDFS root directory to run PMR in CDP Public Cloud with Kerberos.

circle-info

Per CDP support requirements, all cluster nodes must run CentOS. You cannot customize it.

Choose an approach

Kerberos authentication validates each user directly against Kerberos.

Secure impersonation uses a service identity, then impersonates the Pentaho user.

Use secure impersonation when jobs run on the Pentaho Server.

Use Kerberos authentication when you need user-level access without impersonation.

Secure impersonation

The mapping value simple in the driver configuration file enables secure impersonation.

This value is set when you define impersonation settings in a named connection.

How secure impersonation works

At startup, the Pentaho Server checks the mapping type value in the configuration file:

  • If the value is disabled or blank, the server does not use authentication.

  • If the value is simple, requests are evaluated by origin.

    • Requests from a client tool use Kerberos authentication.

    • Requests from the Pentaho Server use secure impersonation when supported.

    • If the component does not support secure impersonation, Kerberos is used.

When impersonation succeeds, the Pentaho Server log shows:

Secure impersonation overview
circle-info

Restart the server after changing the mapping type value.

Secure impersonation prerequisites

To use secure impersonation:

  • The cluster must be secured with Kerberos.

  • The Kerberos server must be reachable from the Pentaho Server.

  • Kerberos must be installed and configured on the Pentaho machine.

  • A cluster-side Kerberos principal must represent Pentaho.

  • That principal must be allowed to impersonate users.

  • Requests must originate from the Pentaho Server.

  • Target components must support secure impersonation.

The cluster administrator is responsible for cluster users and Kerberos server setup.

See the vendor and Hadoop security docs for details:

Secure impersonation supported components

Secure impersonation support is determined by the underlying Hadoop components.

Supported:

  • Cloudera-Impala

  • HBase

  • HDFS

  • Hadoop MapReduce

  • Hive

  • Oozie

  • Pentaho MapReduce (PMR)

    • You can securely connect to Hive and HBase within the mapper, reducer, or combiner.

Not supported:

  • Carte on Yarn

  • Impala

  • Sqoop

  • Spark SQL

Secure impersonation directly from these tools is not supported:

  • PDI client (Spoon)

  • Scheduled jobs and transformations

  • Pentaho Report Designer

  • Pentaho Metadata Editor

  • Kitchen

  • Pan

  • Carte

Configure MapReduce jobs (Windows only)

On Windows, update mapred-site.xml so MapReduce jobs run with secure impersonation.

  1. Open:

    <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<connection name>/mapred-site.xml

  2. Add these properties:

  3. Save the file.

Connect to a Cloudera Impala database (Cloudera only)

If you connect to a secure Impala database, update the PDI database connection options.

  1. Download the Cloudera Impala JDBC driver: https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.htmlarrow-up-right

    circle-info

    Secure impersonation with Impala is supported only with the Cloudera Impala JDBC driver.

  2. Extract ImpalaJDBC41.jar into:

    <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/cdp71/lib

  3. Create a database connection in the PDI client.

  4. Set these general values:

    • Connection Type: Cloudera Impala

    • Database Name: default

    • Port Number: 443

  5. On Options, set:

    • KrbHostFQDN: Fully qualified domain name of the Impala host

    • KrbServiceName: Service principal name of the Impala server

    • KrbRealm: Kerberos realm used by the cluster

  6. Select Test.

Next steps

When the cluster is connected to the Pentaho Server, you can run jobs and transformations using secure impersonation.

circle-info

Secure impersonation from the PDI client is not supported.

Kerberos authentication

Use Kerberos to authenticate access to secure Hadoop and CDP components.

In this section

Set up Kerberos for Pentaho

How you set up Kerberos on a machine that the Pentaho Server can access depends on your operating system.

Configure Kerberos

To configure Kerberos, complete the tasks for your operating system.

Configure JCE

The KDC configuration uses an “unlimited” AES-256 encryption setting by default for the Java Cryptographic Extension (JCE) files. Cryptographic policy requirements vary by country.

Do these steps only if you must reduce the encryption strength:

  1. Open pentaho/java/conf/security/java.security.

  2. Find crypto.policy and set it to:

    crypto.policy=limited

  3. Save the file.

Modify the Kerberos configuration file

  1. Open krb5.conf. The default location is /etc/krb5.conf.

  2. Add your realm, KDC, and admin server values. Example:

  3. Save the file.

  4. Restart the machine.

Synchronize clocks

Synchronize the client clock with the cluster clock. Kerberos fails if timestamps drift too far.

Obtain a Kerberos ticket

  1. Run kinit.

  2. Enter the password when prompted.

  3. Confirm the ticket exists by running klist.

Set up user accounts and network access (all OS)

Ensure user accounts and network access exist before connecting.

  • Open the required network ports between the cluster and Pentaho components.

  • Confirm forward and reverse DNS resolution.

  • Create a Kerberos principal for each Pentaho user who needs access.

  • Ensure UID and GID match across all cluster nodes for the run user.

Next step

Continue cluster connection setup in the Install Pentaho Data Integration and Analytics guide.

Use Kerberos with MongoDB

If you use Kerberos to authenticate access to MongoDB, you can also use Kerberos to authenticate PDI users who access MongoDB through a transformation step.

When a user runs a transformation containing a MongoDB step, the step credentials are validated against the Kerberos administrative database. If the credentials match, the KDC grants a ticket.

In this section

Complete MongoDB and client prerequisites

Add users to the Kerberos database

Add a Kerberos principal for each PDI client user who needs MongoDB access.

  1. Sign in to the host that runs the Kerberos database as root (or equivalent).

  2. Add a principal. Example:

The principal should match the user created in MongoDB.

Use Knox to access CDP

Apache Knox provides perimeter security for CDP services. It gives you a single gateway endpoint instead of per-service endpoints.

Knox typically authenticates a user via LDAP, then authenticates to Kerberos, then authorizes via Ranger.

Knox environment

Setup requirements for Knox with Pentaho

As a cluster administrator, provide this information to Pentaho users:

Hive configuration with Knox

  1. Open your Hive database connection.

  2. In the Database Connection dialog, select Options.

  3. Set these parameters:

    • httpPath: datahub_cluster_name/cdp-proxy-api/hive

    • knox (optional): true

    • transportMode: http

    • ssl: true

  4. In General, set Port number to 443.

You can now use the connection in Hive steps.

Legacy and moved pages

Some related pages are kept for backward compatibility and older navigation paths.

Last updated

Was this helpful?