How to enable Kerberos authentication
Use Kerberos to authenticate access to secure Hadoop and CDP components.
This topic covers setup and common integration scenarios.
In this topic
Set up Kerberos for Pentaho
How you set up Kerberos on a machine that the Pentaho Server can access depends on your operating system.
Configure Kerberos
To configure Kerberos, complete the tasks for your operating system.
Configure JCE
The KDC configuration uses an “unlimited” AES-256 encryption setting by default for the Java Cryptographic Extension (JCE) files. Cryptographic policy requirements vary by country.
Do these steps only if you must reduce the encryption strength:
Open
pentaho/java/conf/security/java.security.Find
crypto.policyand set it to:crypto.policy=limitedSave the file.
Modify the Kerberos configuration file
Open
krb5.conf. The default location is/etc/krb5.conf.Add your realm, KDC, and admin server values. Example:
[libdefaults] default_realm = <YOUR_REALM.COM> ... [realms] <YOUR_REALM.COM>= { kdc=<KDC IP Address, or resolvable Hostname> admin_server=<Admin Server IP Address, or resolvable Hostname> ... } [domain_realm] <.your_realm.com> = <YOUR_REALM.COM> <your_realm.com> = <YOUR_REALM.COM>Save the file.
Restart the machine.
Synchronize clocks
Synchronize the client clock with the cluster clock. Kerberos fails if timestamps drift too far.
Obtain a Kerberos ticket
Run
kinit.Enter the password when prompted.
Confirm the ticket exists by running
klist.
Configure JCE
The KDC configuration uses an “unlimited” AES-256 encryption setting by default for the Java Cryptographic Extension (JCE) files. Cryptographic policy requirements vary by country.
Do these steps only if you must reduce the encryption strength:
Open
pentaho\java\conf\security\java.security.Find
crypto.policyand set it to:crypto.policy=limitedSave the file.
Download and install Kerberos
Install a Kerberos client. Heimdal is a common option: https://www.secure-endpoints.com/heimdal/.
Modify the Kerberos configuration file
Open
krb5.conf. The default location isC:\ProgramData\Kerberos\krb5.conf.Add your realm, KDC, and admin server values. Example:
Save the file.
Copy the file to
C:\Windows\krb5.ini.Restart the machine.
Synchronize clocks
Synchronize the client clock with the cluster clock. Kerberos fails if timestamps drift too far.
Obtain a Kerberos ticket
Run
kinit.Enter the password when prompted.
Confirm the ticket exists by running
klist.
If you use Heimdal, klist output should not show Current LoginId is ....
Set up user accounts and network access (all OS)
Ensure user accounts and network access exist before connecting.
Open the required network ports between the cluster and Pentaho components.
Confirm forward and reverse DNS resolution.
Create a Kerberos principal for each Pentaho user who needs access.
Ensure UID and GID match across all cluster nodes for the run user.
Next step
Continue cluster connection setup in the Install Pentaho Data Integration and Analytics guide.
Use Kerberos with MongoDB
If you use Kerberos to authenticate access to MongoDB, you can also use Kerberos to authenticate PDI users who access MongoDB through a transformation step.
When a user runs a transformation containing a MongoDB step, the step credentials are validated against the Kerberos administrative database. If the credentials match, the KDC grants a ticket.
In this section
Complete MongoDB and client prerequisites
Install and configure MongoDB Enterprise.
Configure MongoDB for Kerberos authentication.
Install the current PDI client on each client machine.
Verify forward and reverse DNS resolution for MongoDB hosts.
Add users to the Kerberos database
Add a Kerberos principal for each PDI client user who needs MongoDB access.
Sign in to the host that runs the Kerberos database as
root(or equivalent).Add a principal. Example:
The principal should match the user created in MongoDB.
Start Kerberos services automatically (optional)
Start the Kerberos Admin Server and KDC at boot.
Kerberos Admin Server service name is typically
kadmin.KDC service name is typically
krb5kdc.
How you do this depends on your operating system.
Configure client-side nodes
After you add users and configure Kerberos services, configure each client node that runs the PDI client.
Install JCE (optional)
Install JCE policy files only if you require AES-256 and your Java distribution needs it.
Download JCE for your supported Java version.
Follow Oracle installation instructions.
Copy JCE JARs to
java/lib/securityin your PDI install.
Install a Kerberos client
Install a Kerberos client using your OS package manager.
Update krb5.conf
Open
/etc/krb5.conf(or your OS-specific location).Add your realm, KDC, and admin server values.
Restart the machine.
macOS: specify krb5.conf location (older Java only)
Do this if the PDI/PRD JRE is earlier than Java 1.7.0_40.
Update the relevant launcher.properties file and set:
Synchronize clocks
Synchronize the client clock with the MongoDB host clock.
Obtain a Kerberos ticket
Run
kinit.Run
klistto confirm the ticket exists.
Install JCE (optional)
Install JCE policy files only if you require AES-256 and your Java distribution needs it.
Download JCE for your supported Java version.
Follow Oracle installation instructions.
Copy JCE JARs to
java\lib\securityin your PDI install.
Install a Kerberos client
Install a Kerberos client. Heimdal is a common option: https://www.secure-endpoints.com/heimdal/.
Update krb5.conf
Open
krb5.conf. The default location isC:\ProgramData\Kerberos\krb5.conf.Add your realm, KDC, and admin server values.
Copy the file to
C:\Windows\krb5.ini.Restart the machine.
Synchronize clocks
Synchronize the client clock with the MongoDB host clock.
Obtain a Kerberos ticket
Run
kinit.Run
klistto confirm the ticket exists.
Test authentication with the PDI client
Use one of these options:
Start the PDI client.
Create a new transformation.
Add MongoDB Input and open it.
Select Configure Fields.
Enter the MongoDB host name and port.
Enter the Kerberos principal as the username:
<primary>/<instance>@KERBEROS_REALMLeave password blank.
Select Authenticate using Kerberos.
On Input options, set a database you can read.
Select Get Collections.
Select Preview and confirm you see data.
Start the PDI client.
Create a new transformation.
Add MongoDB Input and open it.
Select Connection String.
Enter the MongoDB host name and port.
Use a connection string like:
mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPISelect Test.
On Input options, set a database you can read.
Select Get Collections.
Select Preview and confirm you see data.
Use Kerberos with Spark Submit
Submit Spark jobs to secure CDP clusters by passing the Kerberos keytab and principal as Spark utility parameters.
Prerequisites
Install a Spark client.
Ensure the cluster is secured with Kerberos.
Ensure the Kerberos server is reachable from the Pentaho Server.
Configure Kerberos on the Pentaho machine. See Set up Kerberos for Pentaho.
Have a valid Kerberos ticket in the client ticket cache before you submit the job.
Spark Submit entry properties

Configure these properties in the Spark Submit job entry:
Entry name: Any descriptive name.
Spark Submit Utility: The script name that launches Spark. Example:
spark2-submit.Master URL:
yarn-clusteroryarn-client.Type: Java, Scala, or Python.
Utility Parameters:
spark.yarn.keytab: Path to the keytab file.spark.yarn.principal: Kerberos principal for cluster authentication.
Enable Blocking: Enable if the entry should wait for job completion.
Authentication by password is not supported.
Use Knox to access CDP
Apache Knox provides perimeter security for CDP services. It gives you a single gateway endpoint instead of per-service endpoints.
Knox typically authenticates a user via LDAP, then authenticates to Kerberos, then authorizes via Ranger.

Setup requirements for Knox with Pentaho
As a cluster administrator, provide this information to Pentaho users:
Credentials: Cluster name, gateway URL, username, and password.
SSL certificate: Knox URLs are HTTPS. Install the certificate.
See SSL Security.
LDAP directory server: Knox commonly authenticates users against LDAP.
See LDAP security.
Hive configuration with Knox
Open your Hive database connection.
In the Database Connection dialog, select Options.
Set these parameters:
httpPath:datahub_cluster_name/cdp-proxy-api/hiveknox(optional):truetransportMode:httpssl:true
In General, set Port number to
443.
You can now use the connection in Hive steps.
Last updated
Was this helpful?

