Kerberos authentication versus secure impersonation
This topic is now consolidated into Big data security.
Use Kerberos authentication or secure impersonation to access secured Hadoop and CDP components.
Choose an approach
Kerberos authentication validates each user directly against Kerberos.
Secure impersonation uses a service identity, then impersonates the Pentaho user.
Use secure impersonation when jobs run on the Pentaho Server.
Use Kerberos authentication when you need user-level access without impersonation.
Secure impersonation
The mapping value simple in the driver configuration file enables secure impersonation.
This value is set when you define impersonation settings in a named connection.
How secure impersonation works
At startup, the Pentaho Server checks the mapping type value in the configuration file:
If the value is disabled or blank, the server does not use authentication.
If the value is simple, requests are evaluated by origin.
Requests from a client tool use Kerberos authentication.
Requests from the Pentaho Server use secure impersonation when supported.
If the component does not support secure impersonation, Kerberos is used.
When impersonation succeeds, the Pentaho Server log shows:

Restart the server after changing the mapping type value.
Secure impersonation prerequisites
To use secure impersonation:
The cluster must be secured with Kerberos.
The Kerberos server must be reachable from the Pentaho Server.
Kerberos must be installed and configured on the Pentaho machine.
A cluster-side Kerberos principal must represent Pentaho.
That principal must be allowed to impersonate users.
Requests must originate from the Pentaho Server.
Target components must support secure impersonation.
The cluster administrator is responsible for cluster users and Kerberos server setup.
See the vendor and Hadoop security docs for details:
Secure impersonation supported components
Secure impersonation support is determined by the underlying Hadoop components.
Supported:
Cloudera-Impala
HBase
HDFS
Hadoop MapReduce
Hive
Oozie
Pentaho MapReduce (PMR)
You can securely connect to Hive and HBase within the mapper, reducer, or combiner.
Not supported:
Carte on Yarn
Impala
Sqoop
Spark SQL
Secure impersonation directly from these tools is not supported:
PDI client (Spoon)
Scheduled jobs and transformations
Pentaho Report Designer
Pentaho Metadata Editor
Kitchen
Pan
Carte
Configure MapReduce jobs (Windows only)
On Windows, update mapred-site.xml so MapReduce jobs run with secure impersonation.
Open:
<username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<connection name>/mapred-site.xmlAdd these properties:
Save the file.
Connect to a Cloudera Impala database (Cloudera only)
If you connect to a secure Impala database, update the PDI database connection options.
Download the Cloudera Impala JDBC driver: https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.html
Secure impersonation with Impala is supported only with the Cloudera Impala JDBC driver.
Extract
ImpalaJDBC41.jarinto:<username>/.pentaho/metastore/pentaho/NamedCluster/Configs/cdp71/libCreate a database connection in the PDI client.
Set these general values:
Connection Type: Cloudera Impala
Database Name:
defaultPort Number:
443
On Options, set:
KrbHostFQDN: Fully qualified domain name of the Impala hostKrbServiceName: Service principal name of the Impala serverKrbRealm: Kerberos realm used by the cluster
Select Test.
Next steps
When the cluster is connected to the Pentaho Server, you can run jobs and transformations using secure impersonation.
Secure impersonation from the PDI client is not supported.
Kerberos authentication
Use Kerberos to authenticate access to secure Hadoop and CDP components.
In this section
Set up Kerberos for Pentaho
How you set up Kerberos on a machine that the Pentaho Server can access depends on your operating system.
Configure Kerberos
To configure Kerberos, complete the tasks for your operating system.
Configure JCE
The KDC configuration uses an “unlimited” AES-256 encryption setting by default for the Java Cryptographic Extension (JCE) files. Cryptographic policy requirements vary by country.
Do these steps only if you must reduce the encryption strength:
Open
pentaho/java/conf/security/java.security.Find
crypto.policyand set it to:crypto.policy=limitedSave the file.
Modify the Kerberos configuration file
Open
krb5.conf. The default location is/etc/krb5.conf.Add your realm, KDC, and admin server values. Example:
Save the file.
Restart the machine.
Synchronize clocks
Synchronize the client clock with the cluster clock. Kerberos fails if timestamps drift too far.
Obtain a Kerberos ticket
Run
kinit.Enter the password when prompted.
Confirm the ticket exists by running
klist.
Configure JCE
The KDC configuration uses an “unlimited” AES-256 encryption setting by default for the Java Cryptographic Extension (JCE) files. Cryptographic policy requirements vary by country.
Do these steps only if you must reduce the encryption strength:
Open
pentaho\\java\\conf\\security\\java.security.Find
crypto.policyand set it to:crypto.policy=limitedSave the file.
Download and install Kerberos
Install a Kerberos client. Heimdal is a common option: https://www.secure-endpoints.com/heimdal/.
Modify the Kerberos configuration file
Open
krb5.conf. The default location isC:\\ProgramData\\Kerberos\\krb5.conf.Add your realm, KDC, and admin server values. Example:
Save the file.
Copy the file to
C:\\Windows\\krb5.ini.Restart the machine.
Synchronize clocks
Synchronize the client clock with the cluster clock. Kerberos fails if timestamps drift too far.
Obtain a Kerberos ticket
Run
kinit.Enter the password when prompted.
Confirm the ticket exists by running
klist.
If you use Heimdal, klist output should not show Current LoginId is ....
Set up user accounts and network access (all OS)
Ensure user accounts and network access exist before connecting.
Open the required network ports between the cluster and Pentaho components.
Confirm forward and reverse DNS resolution.
Create a Kerberos principal for each Pentaho user who needs access.
Ensure UID and GID match across all cluster nodes for the run user.
Next step
Continue cluster connection setup in the Install Pentaho Data Integration and Analytics guide.
Use Kerberos with MongoDB
If you use Kerberos to authenticate access to MongoDB, you can also use Kerberos to authenticate PDI users who access MongoDB through a transformation step.
When a user runs a transformation containing a MongoDB step, the step credentials are validated against the Kerberos administrative database. If the credentials match, the KDC grants a ticket.
In this section
Complete MongoDB and client prerequisites
Install and configure MongoDB Enterprise.
Configure MongoDB for Kerberos authentication.
Install the current PDI client on each client machine.
Verify forward and reverse DNS resolution for MongoDB hosts.
Add users to the Kerberos database
Add a Kerberos principal for each PDI client user who needs MongoDB access.
Sign in to the host that runs the Kerberos database as
root(or equivalent).Add a principal. Example:
The principal should match the user created in MongoDB.
Start Kerberos services automatically (optional)
Start the Kerberos Admin Server and KDC at boot.
Kerberos Admin Server service name is typically
kadmin.KDC service name is typically
krb5kdc.
How you do this depends on your operating system.
Configure client-side nodes
After you add users and configure Kerberos services, configure each client node that runs the PDI client.
Install JCE (optional)
Install JCE policy files only if you require AES-256 and your Java distribution needs it.
Download JCE for your supported Java version.
Follow Oracle installation instructions.
Copy JCE JARs to
java/lib/securityin your PDI install.
Install a Kerberos client
Install a Kerberos client using your OS package manager.
Update krb5.conf
Open
/etc/krb5.conf(or your OS-specific location).Add your realm, KDC, and admin server values.
Restart the machine.
macOS: specify krb5.conf location (older Java only)
Do this if the PDI/PRD JRE is earlier than Java 1.7.0_40.
Update the relevant launcher.properties file and set:
Synchronize clocks
Synchronize the client clock with the MongoDB host clock.
Obtain a Kerberos ticket
Run
kinit.Run
klistto confirm the ticket exists.
Install JCE (optional)
Install JCE policy files only if you require AES-256 and your Java distribution needs it.
Download JCE for your supported Java version.
Follow Oracle installation instructions.
Copy JCE JARs to
java\\lib\\securityin your PDI install.
Install a Kerberos client
Install a Kerberos client. Heimdal is a common option: https://www.secure-endpoints.com/heimdal/.
Update krb5.conf
Open
krb5.conf. The default location isC:\\ProgramData\\Kerberos\\krb5.conf.Add your realm, KDC, and admin server values.
Copy the file to
C:\\Windows\\krb5.ini.Restart the machine.
Synchronize clocks
Synchronize the client clock with the MongoDB host clock.
Obtain a Kerberos ticket
Run
kinit.Run
klistto confirm the ticket exists.
Test authentication with the PDI client
Use one of these options:
Start the PDI client.
Create a new transformation.
Add MongoDB Input and open it.
Select Configure Fields.
Enter the MongoDB host name and port.
Enter the Kerberos principal as the username:
<primary>/<instance>@KERBEROS_REALMLeave password blank.
Select Authenticate using Kerberos.
On Input options, set a database you can read.
Select Get Collections.
Select Preview and confirm you see data.
Start the PDI client.
Create a new transformation.
Add MongoDB Input and open it.
Select Connection String.
Enter the MongoDB host name and port.
Use a connection string like:
mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPISelect Test.
On Input options, set a database you can read.
Select Get Collections.
Select Preview and confirm you see data.
Use Kerberos with Spark Submit
Submit Spark jobs to secure CDP clusters by passing the Kerberos keytab and principal as Spark utility parameters.
Prerequisites
Install a Spark client.
Ensure the cluster is secured with Kerberos.
Ensure the Kerberos server is reachable from the Pentaho Server.
Configure Kerberos on the Pentaho machine.
Have a valid Kerberos ticket in the client ticket cache before you submit the job.
Spark Submit entry properties

Configure these properties in the Spark Submit job entry:
Entry name: Any descriptive name.
Spark Submit Utility: The script name that launches Spark. Example:
spark2-submit.Master URL:
yarn-clusteroryarn-client.Type: Java, Scala, or Python.
Utility Parameters:
spark.yarn.keytab: Path to the keytab file.spark.yarn.principal: Kerberos principal for cluster authentication.
Enable Blocking: Enable if the entry should wait for job completion.
Authentication by password is not supported.
Use Knox to access CDP
Apache Knox provides perimeter security for CDP services. It gives you a single gateway endpoint instead of per-service endpoints.
Knox typically authenticates a user via LDAP, then authenticates to Kerberos, then authorizes via Ranger.

Setup requirements for Knox with Pentaho
As a cluster administrator, provide this information to Pentaho users:
Credentials: Cluster name, gateway URL, username, and password.
SSL certificate: Knox URLs are HTTPS. Install the certificate.
See SSL Security.
LDAP directory server: Knox commonly authenticates users against LDAP.
See LDAP security.
Hive configuration with Knox
Open your Hive database connection.
In the Database Connection dialog, select Options.
Set these parameters:
httpPath:datahub_cluster_name/cdp-proxy-api/hiveknox(optional):truetransportMode:httpssl:true
In General, set Port number to
443.
You can now use the connection in Hive steps.
Last updated
Was this helpful?

