Kerberos authentication versus secure impersonation

circle-info

This topic is now consolidated into Big data security.

Use Kerberos authentication or secure impersonation to access secured Hadoop and CDP components.

Choose an approach

Kerberos authentication validates each user directly against Kerberos.

Secure impersonation uses a service identity, then impersonates the Pentaho user.

Use secure impersonation when jobs run on the Pentaho Server.

Use Kerberos authentication when you need user-level access without impersonation.

Secure impersonation

The mapping value simple in the driver configuration file enables secure impersonation.

This value is set when you define impersonation settings in a named connection.

How secure impersonation works

At startup, the Pentaho Server checks the mapping type value in the configuration file:

  • If the value is disabled or blank, the server does not use authentication.

  • If the value is simple, requests are evaluated by origin.

    • Requests from a client tool use Kerberos authentication.

    • Requests from the Pentaho Server use secure impersonation when supported.

    • If the component does not support secure impersonation, Kerberos is used.

When impersonation succeeds, the Pentaho Server log shows:

Secure impersonation overview
circle-info

Restart the server after changing the mapping type value.

Secure impersonation prerequisites

To use secure impersonation:

  • The cluster must be secured with Kerberos.

  • The Kerberos server must be reachable from the Pentaho Server.

  • Kerberos must be installed and configured on the Pentaho machine.

  • A cluster-side Kerberos principal must represent Pentaho.

  • That principal must be allowed to impersonate users.

  • Requests must originate from the Pentaho Server.

  • Target components must support secure impersonation.

The cluster administrator is responsible for cluster users and Kerberos server setup.

See the vendor and Hadoop security docs for details:

Secure impersonation supported components

Secure impersonation support is determined by the underlying Hadoop components.

Supported:

  • Cloudera-Impala

  • HBase

  • HDFS

  • Hadoop MapReduce

  • Hive

  • Oozie

  • Pentaho MapReduce (PMR)

    • You can securely connect to Hive and HBase within the mapper, reducer, or combiner.

Not supported:

  • Carte on Yarn

  • Impala

  • Sqoop

  • Spark SQL

Secure impersonation directly from these tools is not supported:

  • PDI client (Spoon)

  • Scheduled jobs and transformations

  • Pentaho Report Designer

  • Pentaho Metadata Editor

  • Kitchen

  • Pan

  • Carte

Configure MapReduce jobs (Windows only)

On Windows, update mapred-site.xml so MapReduce jobs run with secure impersonation.

  1. Open:

    <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<connection name>/mapred-site.xml

  2. Add these properties:

  3. Save the file.

Connect to a Cloudera Impala database (Cloudera only)

If you connect to a secure Impala database, update the PDI database connection options.

  1. Download the Cloudera Impala JDBC driver: https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.htmlarrow-up-right

    circle-info

    Secure impersonation with Impala is supported only with the Cloudera Impala JDBC driver.

  2. Extract ImpalaJDBC41.jar into:

    <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/cdp71/lib

  3. Create a database connection in the PDI client.

  4. Set these general values:

    • Connection Type: Cloudera Impala

    • Database Name: default

    • Port Number: 443

  5. On Options, set:

    • KrbHostFQDN: Fully qualified domain name of the Impala host

    • KrbServiceName: Service principal name of the Impala server

    • KrbRealm: Kerberos realm used by the cluster

  6. Select Test.

Next steps

When the cluster is connected to the Pentaho Server, you can run jobs and transformations using secure impersonation.

circle-info

Secure impersonation from the PDI client is not supported.

Kerberos authentication

Use Kerberos to authenticate access to secure Hadoop and CDP components.

In this section

Set up Kerberos for Pentaho

How you set up Kerberos on a machine that the Pentaho Server can access depends on your operating system.

Configure Kerberos

To configure Kerberos, complete the tasks for your operating system.

Configure JCE

The KDC configuration uses an “unlimited” AES-256 encryption setting by default for the Java Cryptographic Extension (JCE) files. Cryptographic policy requirements vary by country.

Do these steps only if you must reduce the encryption strength:

  1. Open pentaho/java/conf/security/java.security.

  2. Find crypto.policy and set it to:

    crypto.policy=limited

  3. Save the file.

Modify the Kerberos configuration file

  1. Open krb5.conf. The default location is /etc/krb5.conf.

  2. Add your realm, KDC, and admin server values. Example:

  3. Save the file.

  4. Restart the machine.

Synchronize clocks

Synchronize the client clock with the cluster clock. Kerberos fails if timestamps drift too far.

Obtain a Kerberos ticket

  1. Run kinit.

  2. Enter the password when prompted.

  3. Confirm the ticket exists by running klist.

Set up user accounts and network access (all OS)

Ensure user accounts and network access exist before connecting.

  • Open the required network ports between the cluster and Pentaho components.

  • Confirm forward and reverse DNS resolution.

  • Create a Kerberos principal for each Pentaho user who needs access.

  • Ensure UID and GID match across all cluster nodes for the run user.

Next step

Continue cluster connection setup in the Install Pentaho Data Integration and Analytics guide.

Use Kerberos with MongoDB

If you use Kerberos to authenticate access to MongoDB, you can also use Kerberos to authenticate PDI users who access MongoDB through a transformation step.

When a user runs a transformation containing a MongoDB step, the step credentials are validated against the Kerberos administrative database. If the credentials match, the KDC grants a ticket.

In this section

Complete MongoDB and client prerequisites

Add users to the Kerberos database

Add a Kerberos principal for each PDI client user who needs MongoDB access.

  1. Sign in to the host that runs the Kerberos database as root (or equivalent).

  2. Add a principal. Example:

The principal should match the user created in MongoDB.

Start Kerberos services automatically (optional)

Start the Kerberos Admin Server and KDC at boot.

  • Kerberos Admin Server service name is typically kadmin.

  • KDC service name is typically krb5kdc.

How you do this depends on your operating system.

Configure client-side nodes

After you add users and configure Kerberos services, configure each client node that runs the PDI client.

Install JCE (optional)

Install JCE policy files only if you require AES-256 and your Java distribution needs it.

  1. Download JCE for your supported Java version.

  2. Follow Oracle installation instructions.

  3. Copy JCE JARs to java/lib/security in your PDI install.

Install a Kerberos client

Install a Kerberos client using your OS package manager.

Update krb5.conf

  1. Open /etc/krb5.conf (or your OS-specific location).

  2. Add your realm, KDC, and admin server values.

  3. Restart the machine.

macOS: specify krb5.conf location (older Java only)

Do this if the PDI/PRD JRE is earlier than Java 1.7.0_40.

Update the relevant launcher.properties file and set:

Synchronize clocks

Synchronize the client clock with the MongoDB host clock.

Obtain a Kerberos ticket

  1. Run kinit.

  2. Run klist to confirm the ticket exists.

Test authentication with the PDI client

Use one of these options:

  1. Start the PDI client.

  2. Create a new transformation.

  3. Add MongoDB Input and open it.

  4. Select Configure Fields.

  5. Enter the MongoDB host name and port.

  6. Enter the Kerberos principal as the username:

    <primary>/<instance>@KERBEROS_REALM

  7. Leave password blank.

  8. Select Authenticate using Kerberos.

  9. On Input options, set a database you can read.

  10. Select Get Collections.

  11. Select Preview and confirm you see data.

Use Kerberos with Spark Submit

Submit Spark jobs to secure CDP clusters by passing the Kerberos keytab and principal as Spark utility parameters.

Prerequisites

  • Install a Spark client.

  • Ensure the cluster is secured with Kerberos.

  • Ensure the Kerberos server is reachable from the Pentaho Server.

  • Configure Kerberos on the Pentaho machine.

circle-info

Have a valid Kerberos ticket in the client ticket cache before you submit the job.

Spark Submit entry properties

Spark Submit dialog box

Configure these properties in the Spark Submit job entry:

  • Entry name: Any descriptive name.

  • Spark Submit Utility: The script name that launches Spark. Example: spark2-submit.

  • Master URL: yarn-cluster or yarn-client.

  • Type: Java, Scala, or Python.

  • Utility Parameters:

    • spark.yarn.keytab: Path to the keytab file.

    • spark.yarn.principal: Kerberos principal for cluster authentication.

  • Enable Blocking: Enable if the entry should wait for job completion.

Authentication by password is not supported.

Use Knox to access CDP

Apache Knox provides perimeter security for CDP services. It gives you a single gateway endpoint instead of per-service endpoints.

Knox typically authenticates a user via LDAP, then authenticates to Kerberos, then authorizes via Ranger.

Knox environment

Setup requirements for Knox with Pentaho

As a cluster administrator, provide this information to Pentaho users:

Hive configuration with Knox

  1. Open your Hive database connection.

  2. In the Database Connection dialog, select Options.

  3. Set these parameters:

    • httpPath: datahub_cluster_name/cdp-proxy-api/hive

    • knox (optional): true

    • transportMode: http

    • ssl: true

  4. In General, set Port number to 443.

You can now use the connection in Hive steps.

Last updated

Was this helpful?