Advanced configuration

After installing Data Catalog, there may be other components you need to set up, depending on your environment. Use the following topics as needed, to finish setting up your environment.

Configure system environment variables

Although not common, there might be instances where you need to change the default settings for Data Catalog system environment variables. These configuration modifications allow you to override default system behavior to align with the specific needs.

CAUTION: Modifying these settings can have system-wide implications, and incorrect changes might negatively impact the functionality of the other platforms. It is a best practice to collaborate with the Pentaho Data Catalog partner to ensure that any modifications align with the intended objectives.

  1. In a terminal window, navigate to the pdc-docker-deployment folder and open the hidden environment variable configuration file (.env). This file is located in the /opt folder by default.

  2. Verify the system environment variables set in the /opt/pentaho/pdc-docker-deployment/vendor/.env.default file:

    • For example, the number of worker instances that Data Catalog uses to run processes is set to 5:

      PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE=5
      PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE=5

      Note: Make sure that PDC_WS_DEFAULT_OPS_JOBPOOLMINSIZE and PDC_WS_DEFAULT_OPS_JOBPOOLMAXSIZE have the same value for consistent worker instance management.

  3. To override an environment variable set in the vendor/.env.default file, you can create a new .env file in the opt/pentaho/pdc-docker-deployment/conf/ folder:

    vi opt/pentaho/pdc-docker-deployment/conf/.env

  4. (Optional) The data in Business Intelligence Database refreshes daily by default, as set in the .env file. To modify the data refresh frequency, update the variable in the .env file to one of the options listed in the following table:

Value
Description

@yearly (or @annually)

Run once a year, midnight, Jan 1st

@monthly

Run once a month, midnight, first of the month

@weekly

Run once a week, midnight between Sat/Sun

@daily (or @midnight)

Run once a day, midnight

@hourly

Run once an hour, the beginning of the hour

C_CRON_BI_VIEWS_INIT_SCHEDULE=@daily
  1. After adding all required system variables, save the changes and restart the Data Catalog system services.

    ./pdc.sh stop
    ./pdc.sh up

Install user-provided SSL certificates

To provide a greater level of security to your data, you can use signed Secure Sockets Layer (SSL) certificates from your Certificate Authority (CA) with Data Catalog.

Data Catalog automatically installs self-signed certs in the <install-directory>/conf/https directory as server.key (PEM-encoded private key) and server.crt (PEM-encoded self-signed certificate). You can replace these files with certificates signed by your CA.

Use this procedure to install signed SSL certificates for Data Catalog:

  1. On your Data Catalog server, navigate to the Data Catalog installation directory <install-directory>/conf/https directory, where <install-directory> is the directory where Data Catalog is installed.

    • server.key is a PEM-formatted file that contains the private key of a specific certificate.

    • server.crt is a PEM-formatted file containing the certificate.

  2. Replace the <install-directory>/conf/https/server.key file with the PEM-encoded private key used to sign the SSL certificate or generate a new private key in PEM-encoded format.

  3. Replace the <install-directory>/conf/https/server.crt file with the PEM-encoded signed certificate associated with the private key in Step 1.

    If a new private key is generated, then you need to download a new PEM-encoded signed SSL certificate from your CA.

  4. Append the <install-directory>/conf/extra-certs/bundle.pem file with the following three certificates in this order:

    1. Top level PEM-encoded signed SSL certificate (basically the content of the <install-directory>/conf/https/server.crt file).

    2. Intermediate PEM-encoded certificate, if any, from your CA.

    3. Root PEM-encoded certificate, if any, from your CA.

  5. Navigate to the Data Catalog <install-directory>.

  6. Use the following command to restart Data Catalog:

    ./pdc.sh restart

The SSL certificates are installed.

Add email domains to the safe list in Data Catalog after deployment

During the initial deployment of Data Catalog, it is typically configured to allow only a predefined set of email domains for user authentication. However, you might grant access to users with email addresses from new domains. Instead of redeploying Data Catalog, which can cause downtime and operational delays, you can dynamically update the list of allowed email domains using the Identity & Access Management (IAM) APIs.

Note: Adding email domains and SMTP details during the initial Data Catalog deployment is always a best practice. For more information, see the Installing Data Catalog topic in Get started with Pentaho Data Catalog document.

Perform the following steps to add email domains to the safe list using IAM APIs after deployment:

Ensure you have sufficient access to use the IAM APIs.

  1. Open the CMD prompt and run the following cURL command to generate an authentication token to interact with the IAM APIs:

    curl -k --location 'http://<your-server-url>/keycloak/realms/master/protocol/openid-connect/token' 
    --header 'Content-Type: application/x-www-form-urlencoded' \ 
    --data-urlencode 'username=<admin-username>' \
    --data-urlencode 'password=<admin-password>' \
    --data-urlencode 'client_id=admin-cli' \
    --data-urlencode 'grant_type=password'
    
    • Replace <your-server-url> with your Data Catalog server URL.

    • Replace <admin-username> and <admin-password> credentials with the actual admin credentials. The response includes the token value.

    {"access_token":"<TOKEN_VALUE>"}
  2. Run the following IAM API cURL request to update email domains:

    curl -X 'PUT' \
    'https://<pdc-server>/css-admin-api/api/internal/css-auth-proxy/v1/provider' \
    --header 'Authorization: Bearer <ACCESS_TOKEN>' \
    --header 'Content-Type: application/json' \
    --data '{
      "allowed_email_domains": ["example.com", "newdomain.com", "company.com"]
    }'
    
    • Replace <pdc-server> with your Data Catalog server URL.

    • Replace <ACCESS_TOKEN> with the token obtained in the previous step.

    • Modify the list of "allowed_email_domains" as needed.

  3. You have successfully added new email domains to the Data Catalog safe list so that users with new email domains can log in to Data Catalog.

Set up email server to send Data Catalog notifications

To set up Data Catalog to send email notifications to users, you can configure any Simple Mail Transfer Protocol (SMTP) server that meets your needs.

Examples of notifications are when a user is tagged with '@' in a comment or set up in a data pipe template to be notified when a job completes.

Note: The steps to set up an SMTP server that are in the Installing Data Catalog topic in Get started with Pentaho Data Catalog only set up forgot password functionality.

To integrate an SMTP server with Data Catalog, use the following steps:

  1. Gather the following information for the SMTP server you want to use:

    • Host name of SMTP server (IP address or domain name)

    • Port number for SMTP server

    • Username on SMTP server in <mail userID>@<domain>.com format

    • Password for username

    • Sender mail ID in <mail userID>@<domain>.com format

    • Whether to use Transport Layer Security (TLS) or Secure Sockets Layer (SSL) security.

    • TLS or SSL port number For example, you can use Gmail’s SMTP server to send emails from your application. Here are the SMTP server configuration settings for Gmail:

    • SMTP Server Address

      smtp.gmail.com

    • Secure Connection

      TLS/SSL based on your mail client/website SMTP plugin

    • SMTP Username

      your Gmail account ([email protected])

    • SMTP Password

      your Gmail password

    • Gmail SMTP port

      465 (SSL) or 587 (TLS)

  2. Log into Data Catalog using root user credentials to configure Data Catalog to use the SMTP server, as in the following example:

    https://*&lt;full domain name for PDC server&gt;*/

  3. Navigate to the configuresystem/smtp directory on the Data Catalog server, as in the following example:

    https://*&lt;full domain name for PDC server&gt;*/configuresystem/smtp

    The Configure Your System page opens.

  4. Specify the SMTP server information as detailed in the following table:

Field
Description

Host

IP address or domain name of SMTP server

Port

Port number for SMTP server

Username

User name in *&lt;mail userID&gt;*@*&lt;domain&gt;*.com format

Password

Password for user name specified above

Sender Mail

Sender mail ID in *&lt;mail userID&gt;*@*&lt;domain&gt;*.com format

Encryption

  • TLS: Default value (leave the Use SSL checkbox blank)

  • SSL: Select the Use SSL checkbox

5. Click Test Connection to test the integration. A success confirmation message is displayed next to the Test Connection button.

6. Click Save Changes.

The SMTP server is configured.

Update SMTP details in Data Catalog after deployment

Adding Simple Mail Transfer Protocol (SMTP) details in Data Catalog enables email notifications and alerts within the application, such as:

  • Alerts about Data Catalog changes, approvals, and errors like data ingestion, metadata extraction, or synchronization failures.

  • Password reset links when users forget their credentials.

  • Notification alerts when tagged in the comments tab.

SMTP details are typically configured during the initial deployment of Data Catalog. However, if you want to update SMTP details post deployment, you can use the Identity & Access Management (IAM) APIs without redeploying Data Catalog, which might cause downtime and operational delays.

Note: Adding email domains and SMTP details during the initial Data Catalog deployment is always the best practice. For more information, see the Installing Data Catalog topic in Get started with Pentaho Data Catalog document.

Perform the following steps to update SMTP details in Data Catalog using IAM APIs after deployment:

Ensure you have sufficient access to use the IAM APIs.

  1. To generate an authentication token to interact with the IAM APIs, open the CMD prompt, and run the following cURL command:

    curl -k --location 'http://<your-server-url>/keycloak/realms/master/protocol/openid-connect/token' \
    --header 'Content-Type: application/x-www-form-urlencoded' \
    --data-urlencode 'username=<admin-username>' \
    --data-urlencode 'password=<admin-password>' \
    --data-urlencode 'client_id=admin-cli' \
    --data-urlencode 'grant_type=password'
    
    • Replace <your-server-url> with your Data Catalog server URL.

    • Replace <admin-username> and <admin-password> credentials with the actual admin credentials. The response includes the token value.

    {"access_token":"<TOKEN_VALUE>"}
  2. To update SMTP details, run the following IAM API cURL request:

    curl -X PUT \
      'https://<PDC_HOST>/css-admin-api/api/v1/tenants/<TENANT_NAME>' \
      -H 'Accept: */*' \
      -H 'Authorization: Bearer <TOKEN_VALUE>' \
      -H 'Content-Type: application/json' \
      -d '{
        "realm": "<TENANT_NAME>",
        "smtpServer": {
          "password": "<SMTP_PASSWORD>",
          "replyToDisplayName": "<REPLY_TO_DISPLAY_NAME>",
          "starttls": "<true|false>",
          "auth": "<true|false>",
          "port": "<SMTP_PORT>",
          "host": "<SMTP_HOST>",
          "replyTo": "<REPLY_TO_EMAIL>",
          "from": "<FROM_EMAIL>",
          "fromDisplayName": "<FROM_DISPLAY_NAME>",
          "envelopeFrom": "<ENVELOPE_FROM>",
          "ssl": "<true|false>",
          "user": "<SMTP_USERNAME>"
        }
      }'
    
    Parameter
    Description

    <PDC_HOST>

    The host name or IP address of your Data Catalog instance.

    <TENANT_NAME>

    The tenant name, typically "pdc".

    <TOKEN_VALUE>

    A valid authentication token (must be obtained through IAM authentication).

    <SMTP_PASSWORD>

    The password for the SMTP server authentication.

    <REPLY_TO_DISPLAY_NAME>

    The display name for the reply-to email address.

    <SMTP_PORT>

    The port number used by the SMTP server.

    <SMTP_HOST>

    The SMTP server host address).

    <REPLY_TO_EMAIL>

    The reply-to email address.

    <FROM_EMAIL>

    The email address used to send notifications.

    <FROM_DISPLAY_NAME>

    The display name associated with the sender’s email.

    <ENVELOPE_FROM>

    The envelope sender address (optional).

    <SMTP_USERNAME>

    The username for SMTP authentication.

You have successfully updated SMTP details in Data Catalog.

Configure proxy server settings for the Licensing-API service

In Pentaho Data Catalog, the Licensing-API service is responsible for managing and validating software licenses, ensuring that only authorized users and services can access Data Catalog features. When Data Catalog is deployed in an enterprise environment that restricts direct internet access, services like the Licensing-API require a proxy server to reach external licensing servers and authenticate endpoints.

Post deployment of Data Catalog, perform the following steps to configure the proxy server for the Licensing-API service:

Note: When configuring the proxy server for the Licensing-API service, use the domain name instead of the IP address. SSL certificates are typically issued for domain names, ensuring secure communication.

Ensure that you have:

  • Access to the conf/.env and vendor/docker-compose.licensing.yml files.

  • Administrative privileges to modify configuration files and restart services.

  • The required proxy server details (domain, port, username, and password).

  • The SSL certificate file (proxy-cert.pem) if required for secure proxy connections.

  1. To configure proxy environment variables, go to Data Catalog root folder and then open the conf/.env file.

  2. In the conf/.env file, update the following proxy variables with respective values:

    Variable
    Description
    Example Value

    LICENSING_SERVER_PROXY_ENABLED

    Enables or disables proxy configuration.

    true or false

    LICENSING_SERVER_PROXY_DOMAIN

    The domain or IP address of the proxy server.

    10.177.176.126

    LICENSING_SERVER_PROXY_PORT

    The port number used for proxy communication.

    443

    LICENSING_SERVER_PROXY_USER

    The username for proxy authentication.

    admin

    LICENSING_SERVER_PROXY_PASSWORD

    The password for proxy authentication.

    password

    LICENSING_SERVER_PROXY_ENABLED=true
    LICENSING_SERVER_PROXY_DOMAIN=10.177.176.126
    LICENSING_SERVER_PROXY_PORT=443
    LICENSING_SERVER_PROXY_USER=user
    LICENSING_SERVER_PROXY_PASSWORD=password
    

    Note: It is a best practice to avoid hard coding sensitive credentials like PROXY_USER and PROXY_PASSWORD. Use secret management tools or environment variables to secure them.

  3. To update proxy server configuration in Docker Compose, open the vendor/docker-compose.licensing.yml file and update the licensing-api service configuration as follows:

    services:
      licensing-api:
        image: ${GLOBAL_IMAGE_PREFIX}/${LICENSING_API_IMAGE}
        restart: always
        environment:
          LICENSING_SERVER_URL: ${LICENSING_SERVER_URL}
          PROXY_ENABLED: ${LICENSING_SERVER_PROXY_ENABLED}
          # Use domain because SSL requires a domain, not an IP, for configuration
          PROXY_HOST: ${LICENSING_SERVER_PROXY_DOMAIN}
          PROXY_PORT: ${LICENSING_SERVER_PROXY_PORT}
          PROXY_USER: ${LICENSING_SERVER_PROXY_USER}
          PROXY_PASSWORD: ${LICENSING_SERVER_PROXY_PASSWORD}
          # Used for configuring SSL certificate for proxy
          JAVA_EXTRA_CERTS: "cert.pem"
        platform: linux/amd64
        profiles:
          - core
        volumes:
          - ${PDC_CLIENT_PATH}/proxy-cert.pem:/app/cert.pem
    

    Note:

    • The PROXY_ENABLED, PROXY_HOST, PROXY_PORT, PROXY_USER, and PROXY_PASSWORD environment variables are mapped inside the Docker container.

    • The JAVA_EXTRA_CERTS is set to "cert.pem" to configure SSL certificates for proxy authentication.

    • A volume mount is added to ensure that the SSL certificate file proxy-cert.pem is accessible within the container.

  4. (Optional) If the proxy server requires SSL authentication, place the SSL certificate file (proxy-cert.pem) in the specified directory:

    cp /path/to/proxy-cert.pem ${PDC_CLIENT_PATH}/proxy-cert.pem

    Note: Ensure that the file permissions allow access by the Licensing-API service.

  5. After updating the configuration, restart the Data Catalog services to apply the changes:

    ./pdc.sh restart

You have successfully configured the proxy server settings for Licensing APIs in Data Catalog.

Connect to Business Intelligence Database (BIDB)

Data Catalog includes the Business Intelligence Database (BIDB) server, which contains a range of collections with specific metadata. Use the Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) connector and connect to the BIDB server to access reporting data and build dashboards. See the Reporting and data visualization section in the Use Pentaho Data Catalog document for details about BIDB and the collections available in BIDB.

Configure Java Database Connectivity (JDBC) connector

Perform the following steps to configure the JDBC connector for connecting to BIDB:

  1. Download the MySQL JDBC Connector JAR file from the MySQL website after selecting the appropriate version for the operating system.

  2. Download the DBeaver application from the DBeaver website and install it on the system. See DBeaver installation for more details.

  3. To add the MySQL JDBC Driver and MySQL authentication plugin to DBeaver, open DBeaver and go to Database > Driver Manager.

  4. Click New to add a new driver.

  5. Select MySQL from the list and enter a name for the driver.

  6. Click Browse to locate and select the downloaded JDBC driver (JAR file) and the MySQL authentication plugin, then click OK or Finish to add the driver.

  7. After adding the MySQL driver, to create a New Connection, go to the DBeaver home page, click New Database Connection, and select MySQL as the database type.

  8. Enter the MySQL server connection details, such as host, port, username, password, and so on.

  9. Specify the jars in the local client configuration as shown in the following section.

    jdbc:mysql://20.8.222.21:3307?useSSL=false&authenticationPlugins=org.mongodb.mongosql.auth.plugin.MongoSqlAuthenticationPlugin
  10. Click Test Connection to verify the connection is working.

  11. Click Finish to save the connection configuration.

You are now connected to BIDB using the JDBC connector.

Use any third-party BI tool to connect to BIDB to analyze data and create dashboards.

Configure Open Database Connectivity (ODBC) connector

The MongoDB ODBC connector allows you to connect tools that support ODBC to MongoDB and query the data using SQL. Perform the following steps to configure the JDBC connector for connecting to BIDB.

  1. Download and install the MongoDB ODBC connector.

    See MongoDB BI Connector ODBC Driver for more information.

  2. Download and install an ODBC driver manager on your system.

    For example, on the Windows operating system, you can use the default Windows ODBC Data Source Administrator.

  3. Open the ODBC Data Source Administrator on your machine and go to the System DSN tab.

  4. Click Add to add a new data source and select the MongoDB Driver.

  5. To configure the DSN (Data Source Name) settings:

    1. Set the server field to the address of your MongoDB server.

    2. Enter the port number if it differs from the default (27017).

    3. Enter the required details for authentication, username, and password.

    4. As a part of the connection details, enter the plugin directory details.

    5. Set the SSL Mode to Disabled in the SSL configuration.

  6. Click Test to verify that connection is working.

  7. Click OK to save the connection configuration.

You are now connected to BIDB using the MongoDB ODBC connector.

Use any third-party BI tool to connect to BIDB to analyze data and create dashboards.

Configure a machine learning (ML) server connection in Data Catalog

You can connect a machine learning (ML) server to Data Catalog and import ML model server components, including ML models, experiments, versions, and runs into the ML Models hierarchy within Data Catalog. For more information about ML Models, see the ML Models section in Use Pentaho Data Catalog.

Perform the following steps to configure a connection between the ML server and Data Catalog:

  • Make sure you have access to the ML server you want to connect to.

  • If the ML server requires authentication, make sure you have the necessary credentials, either a valid username and password or an access token.

  1. Verify whether the file external-data-source-config.yml exists in the path ${PDC_CLIENT_PATH}/external-datasource/. If not available, create it.

  2. Open the external-data-source-config.yml file and add ML server configuration:

    servers:
       -   id: {SERVER_ID}
    	name: {SERVER_NAME}
    	type: {SERVER_TYPE}
    	url: {SERVER_URL}
    	config:
       	username: {username}
       	password: {password}
       	access_token: {access_token}
    
Parameter
Description
Example

id

Unique identifier (UUID) for the ML server.

916d3b20-7fd6-49d2-b911-cc051f56e837

name

Display name for the server. This name appears in the UI.

MLflowServer

type

Type of server (enum value). For ML server, use ‘MlFlow’.

URL

The base URL of the ML server.

http://mlflow.mycompany.com

config

Configuration keys specific to ML server you are configuring. Include either, only if authentication is enabled:- Username and password

  • Access token

3. After configuring the ML server in the YAML file, restart the PDC services to apply the changes.

You have successfully configured the ML server in Data Catalog as an external data source. It appears under the Synchronize card in the Management section of Data Catalog.

You can now import ML model server components into the ML Models hierarchy of Data Catalog. For more information, see Import ML model server components into ML Models hierarchy.

Configure a Tableau server connection in Data Catalog

You can configure a connection between a Tableau server and Data Catalog to import Tableau metadata such as dashboards, workbooks, projects, and data sources into the Business Intelligence (BI) section of Data Catalog. To learn more, see the Business Intelligence section in the Use Pentaho Data Catalog document.

Perform the following steps to configure a connection between the Tableau server and Data Catalog:

  • Make sure you have access to the Tableau Cloud or Tableau Server instance you want to connect to. The URL format looks like:

    https://<region>.online.tableau.com/#/site/<site-id>/home
  • Identify the Site ID for the Tableau site. For Tableau Cloud, you can find this in the URL after /site/.

  • Generate a valid Personal Access Token (PAT) in Tableau, including PAT name and PAT secret.

  1. Verify whether the file external-data-source-config.yml exists in the path $ {PDC_CLIENT_PATH}/external-datasource/. If not available, create it.

  2. Open the external-data-source-config.yml file and add Tableau server configuration:

    servers:
      - id: dev-8f012f9ca7
        name: Test_Server
        type: Tableau
        url: https://prod-apnortheast-a.online.tableau.com/api/3.22/auth/signin
        config:
          pat_name: 'test'
          pat_secret: 'kITbTaYmTPSdZ7ADeP11VA==:hwt9jkehQqGuq72Lh9V4wiFlfZcIpny8'
    
    Parameter
    Description
    Example

    id

    The site ID (unique identifier) of the Tableau site to connect to, as seen in the Tableau Cloud URL.

    dev-8f012f9ca7

    name

    Display name for the server. This name appears in the UI.

    TableauServer

    type

    Type of server (enum value). For Tableau server, use ‘Tableau’.

    Tableau

    URL

    The Tableau REST API authentication endpoint. Use the signin endpoint for the Tableau site.

    https://prod-apnortheast-a.online.tableau.com/api/3.22/auth/signin

    config

    Configuration keys specific to the Tableau server you are configuring.

    - pat_name

    The name of the Tableau Personal Access Token (PAT) used for authentication.

    - pat_secret

    The secret key associated with the PAT. Ensure this is stored securely and never exposed.

  3. After configuring the Tableau server in the YAML file, restart the following PDC services to apply the changes:

    • Frontend service (fe)

    • Worker service (ws-default)

    # Restart the frontend and worker services
    ./pdc.sh restart fe
    ./pdc.sh restart ws-default
    

You have successfully configured the Tableau server in Data Catalog as an external data source. It appears under the Synchronize card in the Management section of Data Catalog.

You can now import Tableau server components into the Business Intelligence hierarchy of Data Catalog.

Configure the Physical Assets service in Data Catalog

In Pentaho Data Catalog, you can import operational technology (OT) components, including device services, locations, devices, and values, and view them in the Physical Assets section of Data Catalog in a hierarchical structure. With the Physical Assets feature, you can understand how data flows from physical sources into analytical systems, enabling better traceability and context. Additionally, users can enrich asset nodes with business terms, policies, lineage, and metadata to strengthen data governance and compliance. For more information, see Physical Assets in the Use Pentaho Data Catalog document.

To use the Physical Assets feature in Data Catalog, you must first configure the Physical Assets service. This involves completing the following procedures:

Note: The configuration steps assume Data Catalog is already installed. For installation instructions, see Install Pentaho Data Catalog in Get started with Pentaho Data Catalog.

Enable the Physical Assets service in Data Catalog

Perform the following steps to enable the Physical Assets service in the existing Data Catalog deployment.

  1. Go to the vendor folder:

    /pentaho/pdc-docker-deployment/vendor
  2. Open the .env.default file:

  3. Update the following lines:

    COMPOSE_PROFILES=core,mongodb,collab,pdso,mdm,physical-assets
    ASSET_HIERARCHY_URL=/physical-assets-service
    
  4. Add the Pentaho Edge connection details:

    PENTAHO_EDGE_URL=http://<PE-IP>:4000
    PENTAHO_EDGE_USERNAME_PASSWORD=admin:admin
    PENTAHO_EDGE_BACKEND_URL=https://<PE-IP>:8443
    

    Replace <PE-IP> with the IP where Pentaho Edge is installed.

  5. Restart PDC to apply changes:

    cd /pdc-docker-deployment
    ./pdc.sh up
    

You have successfully enabled the Physical Assets service in the Data Catalog deployment. The service is now active and ready to connect with Pentaho Edge to receive physical assets metadata.

Configure Pentaho Edge for the Physical Assets service

Perform the following steps to configure Pentaho Edge to connect it to Data Catalog.

  1. Clone the Pentaho Edge installer repository and navigate to the installer folder:

    git clone <repo-url>
    cd installer
    
  2. Edit the docker-compose-pentaho-edge.yml file:

    ENABLE_ASSET_HIERARCHY_FEATURE=true
  3. Save and close the docker-compose-pentaho-edge.yml file.

  4. Open the .env file:

    vi .env
  5. Update the following properties:

    PDC_ASSET_HIE_SERVICE_BASE_URL=https://<PDC-IP>/physical-assets-service/api/v1/assets
    AUTH_URL=https://<PDC-IP>/keycloak
    PDC_INSECURE_SKIP_VERIFY=true
    ENABLE_ASSET_HIERARCHY_FEATURE=true
    

    Note:

    • Replace the <PDC-IP> address of the URL with the IP where pdc-docker-deployment is installed.

    • Use the FQDN instead of the IP address if needed.

      PDC_ASSET_HIE_SERVICE_BASE_URL=https://<FQDN>/physical-assets-service/api/v1/assets
      AUTH_URL=https://<FQDN>
      PDC_INSECURE_SKIP_VERIFY=true
      ENABLE_ASSET_HIERARCHY_FEATURE=true
      
  6. Add authentication properties:

    AUTH_USERNAME=
    AUTH_PASSWORD=
    AUTH_CLIENT_ID=
    AUTH_REALM=
    
  7. Run the Edge installer script:

    ./install.sh
  8. When prompted, provide a user ID and password.

You have successfully configured Pentaho Edge to support the Physical Assets hierarchy and configured the connection to Data Catalog. You can now view OT assets in the Physical Assets in Data Catalog.

Configure PDI to send lineage to Data Catalog

Use this task to set up Pentaho Data Integration (PDI) to write lineage information from key lineage events into the Data Catalog metadata store. See Data lineage in Use Pentaho Data Catalog for information on the specific lineage events that are supported.

Data Catalog continuously runs an API to read the lineage information from PDI. PDI and Data Catalog support the OpenLineage open framework for data lineage collection and analysis.

Note: You must perform these steps on PDI.

Before you begin this task, turn off PDI and the Pentaho Server.

Perform the following steps to set up PDI to send lineage metadata to Data Catalog.

  1. On the Support Portal home page, sign in using the Pentaho support username and password provided in your Pentaho Welcome Packet, or obtain the credentials from yourPDI administrator.

  2. On the Pentaho card, click Download.

  3. Navigate to the Marketplace location with plugin downloads.

  4. Download the PDI OpenLineage plugin.

  5. Unzip the downloaded package.

  6. Run the installer for PDI:

    1. Run install.sh if on Linux, or install.bat if on Windows.

    2. Install in the <data-integration> folder.

  7. Run the installer for Pentaho Server:

    1. Run install.sh if on Linux, or install.bat if on Windows.

    2. Install in the <pentaho-server> folder.

  8. Create a config.yml file, adding the correct users and passwords for your environment, and the URL for Data Catalog.

    There is an example in the readme.txt file:

    Example of a configuration file
    =================================
    ```yaml
    version: 0.0.1
    consumers:
     console:
     file:
      - path: /path/to/file
     http:
      - name: Marquez
       url: http://localhost:5001
      - name: PDC
       url: https://pdc.example.com
       endpoint: /lineage/api/events
       authenticationParameters:
        endpoint: /keycloak/realms/pdc/protocol/openid-connect/token
        username: user
        password: pass
        client_id: pdc-client-in-keycloak
        scope: openid
  9. Edit the ~/.kettle/kettle.properties file and add the following properties:

    KETTLE_OPEN_LINEAGE_CONFIG_FILE=</full/path/to/your/openlineage/config.yml>  
    KETTLE_OPEN_LINEAGE_ACTIVE=true
    
  10. Start PDI and the Pentaho Server.

The PDI OpenLineage plugin is set up to send PDI lineage data to Data Catalog.

Integrate Active Directory with Pentaho Data Catalog

You can integrate Microsoft Active Directory (AD) with Pentaho Data Catalog (PDC) to enable users of AD to have single sign-on access to PDC. Part of this integration includes configuring the Keycloak identity and access management tool to use AD as an identity provider.

The configuration includes the following topics:

Important: After importing AD users to PDC, you need to perform the following operations from Active Directory, because they can no longer be done from the Data Catalog User Management page:

  • Edit a user

  • Add a new user

  • Delete a user

Verify the LDAP server configuration

To integrate Active Directory with Pentaho Data Catalog, you need to integrate Lightweight Directory Access Protocol (LDAP) with Keycloak. You first need to check that your LDAP server is configured correctly.

For detailed information on how to configure LDAP in your environment, consult your LDAP server documentation.

You should have the following components in an example configuration:

  • Base DN: Base Distinguished Name, such as: dc=example,dc=com, where dc is the domain component. The Base DN is the root entry where you want to start your LDAP searches.

  • User DN: User Distinguished Name, such as: ou=users,dc=example,dc=com, where ou is the organizational unit and dc is the domain component.

  • Groups DN: Groups Distinguished Name, such as: ou=groups,dc=example,dc=com, where ou is the organizational unit and dc is the domain component.

Next steps

Configure the LDAP provider

To integrate Active Directory (AD) with Pentaho Data Catalog (PDC), you need to configure the LDAP provider for PDC in the Keycloak interface.

Use the following steps to configure the LDAP provider:

  1. Navigate to your Keycloak admin console (such as https://<FQDN>/keycloak/) and log in with admin credentials.

  2. Select the PDC realm.

    If you haven't already configured an LDAP provider, click Add provider and select ldap. If you have an existing LDAP provider, click on it to edit.

  3. Enter the following information for the LDAP provider:

    Field
    Value

    Vendor

    Active Directory

    Connection URL

    ldap://*&lt;LDAP\_SERVER&gt;*:*&lt;PORT&gt;* such as: ldap://localhost:389

  4. Click Test connection.

    You should get a success message.

  5. Enter the following information on the remainder of the page:

    Field
    Value

    Bind type

    Select simple

    Bind DN

    DN for your LDAP admin user, such as: cn=administrator,dc=example,dc=com

    Bind credentials

    Password for the LDAP admin user

  6. Click Test authentication.

    You should get a success message.

The LDAP provider is configured for use with AD.

Connect to AD using the LDAP server's SSL certificate (Optional)

When you use an LDAP server with Pentaho Data Catalog (PDC), you can use the LDAP server's SSL certificate to securely connect to Active Directory (AD). This is an optional step in integrating AD with PDC.

For more information on integrating AD with PDC, see Integrate Active Directory with Pentaho Data Catalog.

Note: Refer to Keycloak documentation if necessary.

Perform the following steps to use the LDAP server's SSL certificate to connect to AD.

  1. To retrieve the certificate from your LDAP server, enter the following command:

    openssl s_client -connect ldap.example.com:636 -showcerts

  2. Copy the entire certificate chain (from -----BEGIN CERTIFICATE----- to -----END CERTIFICATE-----) and save it to a file, such as ldap-cert.pem.

  3. Update the *&lt;PDC\_INSTALL\_LOCATION&gt;*/conf/extra-certs/bundle.pem file with the LDAP server’s SSL certificate, where *&lt;PDC\_INSTALL\_LOCATION&gt;* is the directory where PDC is installed.

  4. Restart PDC services by entering the following command:

    sh pdc.sh restart

  5. Log in to the Keycloak admin console (https://*&lt;FQDN&gt;*/keycloak/).

  6. Navigate to the PDC realm.

  7. Click User Federation.

  8. Click the LDAP provider to edit it.

  9. Enter the following LDAP settings:

Field
Value

UI display name

Name to display, such as LDAPS

Vendor

Select Active Directory

Connection URL

ldaps://*&lt;LDAP\_SERVER&gt;*:*&lt;PORT&gt;*such as: ldaps://ldap.example.com:636

10. Click Test connection. You should see a success message.

11. Enter the remaining LDAP connection and authentication settings:

Field
Value

Bind type

Select simple

Bind DN

DN to bind to the LDAP server, such as: cn=admin,dc=example,dc=com.

Bind credentials

password for the Bind DN

12. Click Test authentication. You should see a success message.

13. Enter values for the required LDAP searching and updating settings:

Field
Value

Edit mode

It is a best practice to set this to Readonly

Users DN

Specify the DN where the user entries are located, such as: ou=users,dc=example,dc=com

Username LDAP attribute

cn

RDN LDAP attribute

cn

UUID LDAP attribute

objectGUID

User object classes

person, organizationalPerson, user

14. Click Save to save the configuration.

AD is set up to use the SSL certificate of the LDAP server for a secure connection.

Optionally, you can configure the following settings:

  • Set how often Keycloak should sync with LDAP.

  • Set periodic full sync and periodic changed users sync.

Configure LDAP mappers in Keycloak

To integrate Active Directory (AD) with Pentaho Data Catalog (PDC), you need to configure LDAP mappers so that PDC has the necessary information (such as usernames, email addresses, or group memberships) to connect to an LDAP directory.

The Keycloak LDAP mapper translates attributes stored in an LDAP directory into the corresponding attributes needed by PDC.

Use the following steps in Keycloak to configure the LDAP mappers for Data Catalog. See Keycloak documentation for more information.

  1. In your Keycloak admin console (such as https://<FQDN>/keycloak/), log in with admin credentials.

  2. Select the PDC realm and go to the User Federation settings.

  3. Click the LDAP provider and go to the Mappers tab.

  4. Map the LDAP attribute for the username.

  5. Map other user attributes as needed (such as email, first name, last name).

  6. To add additional mappers to assign default roles for the users being imported from AD, enter the following settings under User federation > Settings > Mapper details.

    For the Business User role:

    Field
    Value

    Name

    Business_User_Role_Mapper_To_LDAP_USERS

    Mapper type

    hardcoded-ldap-role-mapper

    Role

    Business_User (select from menu and click Assign)

  7. Click Save.

  8. Repeat step 6 for the Data User role, using the following values:

    Field
    Value

    Name

    Data_User_Role_Mapper_To_LDAP_USERS

    Mapper type

    hardcoded-ldap-role-mapper

    Role

    Data_User (select from menu and click Assign)

  9. Click Save.

  10. Save the configuration.

  11. From the Action menu, click Sync all users to import users from LDAP.

    A success message displays.

    When users are synced from AD, the default PDC realm assigns the Business User and Data User roles to all the users.

    Note: PDC applies limits for licensing when users receive one or more of the following roles:

    • Business Steward

    • Data Steward

    • Data Developer

    • Admin

  12. Go to Users and verify that the users were imported correctly into Keycloak.

The LDAP mappers are configured.

Configure PDC permissions for an AD user

The last step in integrating Active Directory (AD) with Pentaho Data Catalog (PDC) is to set up permissions in PDC for the AD users.

Use the following steps to create and verify an AD user.

  1. Log into Data Catalog as the admin user.

  2. Click Management, and on the Users & Communities card, click Users.

  3. Check that the imported users display correctly and make any needed adjustments.

  4. Select an AD user and assign a community or role to the user.

  5. Click Save, and log out.

  6. Log in as the AD user to verify the login is working properly.

Active Directory is now integrated with PDC.

Integrate Okta with Pentaho Data Catalog

You can integrate Okta authentication with Pentaho Data Catalog for the added security provided by multi-factor authentication. To integrate Okta with Pentaho Data Catalog, you need to configure Okta in parallel with the Keycloak identity and access management tool.

The steps in the integration process are:

Add an OIDC provider in Keycloak

To integrate Okta with Pentaho Data Catalog, you need to set up an identity provider in Keycloak. Keycloak uses the OpenID Connect (OIDC) protocol to connect to identity providers.

If necessary, see the Keycloak documentation to complete this task.

Perform the following steps in the Keycloak interface:

  1. Log in to Keycloak and select the PDC realm.

    If a PDC realm does not already exist, consult your PDC administrator or see Creating a realm in the Keycloak documentation to create one.

  2. Click Identity Providers and select OpenID Connect v1.0.

    If necessary, see OpenID Connect v1.0 identity providers in the Keycloak documentation.

  3. Use the following steps to add an OIDC ID provider:

    1. Enter an alias in the Alias field.

      This populates the Redirect URI field, in a format like the following:

      http://localhost:8180/realms/master/broker/<alias>/endpoint
    2. Copy the Redirect URI to be used in the next task, Add an OpenID Connect application in Okta.

You have added an OpenID Connect provider in Keycloak.

Perform the following tasks:

Add an OIDC application in Okta

The next step in integrating Okta with Pentaho Data Catalog is to add an OpenID Connect (OIDC) application in Okta.

Before beginning this task, make sure to perform this task:

In this task, you need the Keycloak Identity Provider Redirect URI you copied in the Add an OpenID Connect provider in Keycloak task. If necessary, see Launch the wizard in the Okta documentation.

Perform the following steps in the Okta Admin console:

  1. From the left menu, click Applications and then Applications.

  2. Click Create App Integration.

  3. Select or enter the following values:

    Field
    Value

    Sign-in method

    OIDC – OpenID Connect

    Application type

    Web Application

    App integration name

    CatalogPlus_10.2.1

    Grant type

    Authorization Code

    Sign-in redirect URIs

    Keycloak Identity Provider Redirect URI copied in the Add OpenID Connect provider in Keycloak task

    https://<application_url>/keycloak/realms/<realm_name>/broker/<alias_name>/endpoint/logout_response

  4. For Sign-out redirect URIs, configure your logout URI in this format:

    For example:

    https://<ip address>/keycloak/realms/pdc/broker/okta/endpoint/logout_response
  5. Continue entering values in Okta screens:

    Field
    Value

    Controlled access

    Select the default value, Allow everyone in your organization to access

    Enable immediate access

    Clear the checkbox

  6. In the General tab, make a note of the Client Id and Client Secret to use in the Configure an identity provider in Keycloak task.

  7. Click Save.

  8. On the left menu, click Applications.

  9. From the down arrow, select Assign to Groups.

  10. Assign a group to your application.

You have set up an OpenID Connect application in Okta.

Perform the following tasks:

Set up security in Okta

When integrating Okta with Pentaho Data Catalog, you need to set up security in Okta for the connection to PDC.

Before beginning this task, make sure to perform these tasks:

Perform the following steps in the Okta admin console:

  1. On the left menu, click Security, then API, then Default.

  2. On the Access Policies tab, click Add New Access Policy.

  3. Add details for the policy and click Create Policy.

  4. Click Add Rule.

  5. Add details for the rule and click Create Rule.

You have set up security for the Okta connection to PDC.

Perform the following tasks:

Configure an identity provider in Keycloak

To integrate Okta with Pentaho Data Catalog, you need to configure an identity provider in Keycloak. If necessary, see the Keycloak documentation.

Before beginning this task, make sure to perform these tasks:

In this task, you need the Client Id and Client Secret you noted during the Add an OpenID Connect application in Okta task.

Perform the following steps in the Keycloak admin console:

  1. From the left menu, click Identity providers.

  2. Click OpenID Connect v1.0.

  3. Make sure the Use discovery endpoint switch is on.

  4. In the Discovery endpoint field, enter the discovery endpoint URL in the following format:

    https://hostname/auth/realms/master/.well-known/openid-configuration

    The Authorization URL, Token URL, Logout URL, and User Info URL and other fields populate automatically.

  5. Enter the Client Id and Client Secret noted during the Add an OpenID Connect application in Okta task.

  6. On the Settings tab, select the following settings:

    • First login flow override: First login flow override

    • Sync mode: Force

  7. Expand the Advanced settings and set the Scopes setting to openid email profile (separated by a single space).

  8. Click Save.

You have configured the identity provider in Keycloak.

Perform the following task:

Sign in to Pentaho Data Catalog using Okta

After Pentaho Data Catalog is integrated with Okta, you have the option to log in to PDC with Okta.

Before beginning this task, make sure to perform these tasks:

To log in to PDC using Okta, perform the following steps:

  1. On the PDC login screen, click the button corresponding to the Okta alias.

    Note: The alias matches whatever is set for Okta OpenID Connect in Keycloak.

    In the following example, the button is labeled CATALOG+OKTA:

    Updated PDC login screen after Okta integration
  2. On the Okta login screen that appears, enter the credentials for the Okta user assigned to PDC.

    Okta prompts you to enter a code.

  3. To finish logging in, enter the code that Okta provides.

You have completed the integration of Okta with PDC.

Integrating Jira with Pentaho Data Catalog

You can integrate Jira as an external ticketing system with Data Catalog to manage data access requests. This guide describes how to configure Jira integration using a config.yaml file or environment variables, and how to create a custom field in Jira to use for the data access request statuses.

Note: You can set any administrative user as the default administrator to manage data access requests. However, there can be only one default administrator set, because the ACCESS_REQUEST_SERVICE_DEFAULT_ASSIGNEE environment variable only supports a single user. If necessary, the default administrator can edit a data access request to assign it to another administrative user.

To integrate Jira with Data Catalog, perform the following tasks:

Integrate Jira with Data Catalog using a config.yaml file

To integrate the Jira ticketing system with Pentaho Data Catalog, you can use a config.yaml file with settings for connection details, credentials, project information, and status mappings. If your system does not have a config.yaml file, you can also integrate Jira with Data Catalog using environment variables. For more information, see Integrate Jira with Data Catalog using environment variables.

Perform the following steps to integrate Jira with Data Catalog using a config.yaml file:

  1. Go to /pentaho/pdc-docker-deployment/conf folder and open config.yaml file. If not available, create it.

  2. Add the following configuration to the config.yaml file.

    Note: Replace the placeholder values in angle brackets (< >) with your actual Jira credentials and project details.

    tools:
      jira:
        url: <your_jira_url>
        username: <your_jira_username>
        password: <your_jira_password>
        project_name: <your_jira_project_name>
        access_status_key: <your_jira_access_status_key>
        status_mapping:
          approved_status: <your_jira_approved_status_value>
          rejected_status: <your_jira_rejected_status_value>
    
    database:
      postgres:
        host: 'um-postgresql'
        port: 5432
        user: 'postgres'
        password: 'admin123#'
        dbname: 'pdc_access_request_db'
        sslmode: 'disable'
    
    tools:
      jira:
        url: 'https://teamwork7com.atlassian.net'
        username: '[email protected]'
        password: 'ATATT3xFfGF0u8CftyjZn0PO-p-M_J8VJUtYcn3ZiZzfM0pF7iqpnUT3TPd-7q7QO8PevM7IDIyzSwwwUoksY7tfKnJwXV1EokuGhp1YmcIkP-78H0H1-io2bkkSVL-bpRmwL4Tha0yWWHZRnEvhSRE1SX984WX4vQMp1hi9w6ua5a_jAq9gWwg=0C4170DE'
        project_name: 'KAN'
        access_status_key: 'Access Status'
        status_mapping:
          approved_status: 'Approved'
          rejected_status: 'Rejected'
    
    auth:
      url: 'https://10.177.177.7'
      auth_host: 'um-css-admin-api:5000'
      username: 'admin'
      password: 'admin'
      client_id: 'admin-cli'
    
  3. Save the changes and close the config.yaml file.

  4. Open vendor/docker-compose-um.yml file. Under the access request service container configuration, add the Volumes section parallel to the environment section, save the changes, and close the file.

    volumes:
          - <path-to-the-config.yaml>:/app/config.yaml:ro
    
  5. Restart the access request service to apply changes:

    ./pdc.sh restart access-request-service

You have successfully configured Jira with Data Catalog using the config.yaml file.

You now need to add a custom field to Jira, to include the data access request statuses. For more information, see Add a custom field to Jira.

Integrate Jira with Data Catalog using environment variables

To integrate the Jira ticketing system with Data Catalog, you can use environment variables to set connection details, credentials, project information, and status mappings. You can also Integrate Jira with Data Catalog using a config.yaml file.

Perform the following steps to integrate Jira with Data Catalog using environment variables:

  1. Edit the /opt/pentaho/pdc-docker-deployment/vendor/.env.default file and add the following lines:

    Note: Instead of the default location, your environment variables may be set in an opt/pentaho/pdc-docker-deployment/conf/.env file.

    ACCESS_REQUEST_SERVICE_PROVIDER_TOOL=Jira
    ACCESS_REQUEST_SERVICE_JIRA_URL=your_jira_url
    ACCESS_REQUEST_SERVICE_JIRA_USER_NAME=your_jira_username
    ACCESS_REQUEST_SERVICE_JIRA_PASSWORD=your_jira_password
    ACCESS_REQUEST_SERVICE_JIRA_PROJECT_NAME=your_jira_project_name
    ACCESS_REQUEST_SERVICE_JIRA_ACCESS_STATUS_KEY=your_jira_access_status_key
    ACCESS_REQUEST_SERVICE_JIRA_ACCESS_STATUS_APPROVED_VALUE=your_jira_approved_status_value
    ACCESS_REQUEST_SERVICE_JIRA_ACCESS_STATUS_REJECTED_VALUE=your_jira_rejected_status_value
    ACCESS_REQUEST_SERVICE_DEFAULT_ASSIGNEE=PDC_admin_user_email
    
  2. Save the changes and close the file.

  3. Restart the access request service to apply changes:

    ./pdc.sh restart access-request-service

You have successfully configured Jira with Data Catalog using environment variables.

You now need to add a custom field to Jira to include the data access request statuses. For more information, see Add a custom field to Jira.

Add a custom field to Jira

If you have configured Data Catalog to connect to Jira for managing data access requests, you need to add a custom field to Jira to map the Data Catalog data access request statuses to complete the Jira integration.

Perform the following steps to add a custom field to Jira:

  1. Log in to Jira with administrative rights.

  2. Go to the Jira Admin settings.

    If you cannot find the Jira Admin settings, use these steps:

    1. Open any issue.

    2. In the Details section, click the settings icon, then click Manage Fields. In the bottom right corner, you see the Go to Custom Fields option.

    3. Click Go to Custom Fields, and you are taken to the Jira Admin settings.

  3. Click Custom Fields and then click Create custom field.

  4. Select the Select List type and enter Access Status as the name.

  5. Add the options: Approved, Rejected, and Pending, and click Create.

  6. Open any issue. In the Details section, click the settings icon, then click Manage Fields.

  7. Locate the newly created Access Status field in the list of fields on the right side.

  8. Drag and drop the Access Status field into the Context Fields section.

Jira is now updated to use data access request statuses with Data Catalog.

Integrating ServiceNow with Data Catalog

You can integrate ServiceNow as an external ticketing system with Data Catalog to manage data access requests. This guide describes how to configure ServiceNow integration using a config.yaml file or environment variables, and how to create a custom field in ServiceNow to track data access request statuses.

Note: You can set any administrative user as the default administrator to manage data access requests. However, there can be only one default administrator set, because the ACCESS_REQUEST_SERVICE_DEFAULT_ASSIGNEE environment variable only supports a single user. If necessary, the default administrator can edit a data access request to assign it to another administrative user.

To integrate ServiceNow with Data Catalog, you need to perform the following tasks:

Integrate ServiceNow with Data Catalog using a config.yaml file

To integrate the ServiceNow ticketing system with Data Catalog, you can use a config.yaml file with settings for connection details, credentials, project information, and status mappings. If your system does not have a config.yaml file, you can also Integrate ServiceNow with Data Catalog using environment variables.

To integrate ServiceNow with Data Catalog using a config.yaml file, use the following steps:

  1. Go to /pentaho/pdc-docker-deployment/conf folder and open config.yaml file. If not available, create it.

  2. Add the following configuration to the config.yaml file.

    Note: Replace the placeholder values in angle brackets (< >) with your actual ServiceNow credentials and project details.

    tools:
      servicenow:
        url: <your_servicenow_url>
        username: <your_servicenow_username>
        password: <your_servicenow_password>
        client_id: <your_servicenow_client_id>
        client_secret: <your_servicenow_client_secret>
        access_status_key: <your_servicenow_access_status_key>
        status_mapping:
          approved_status: <your_servicenow_approved_status_value>
          rejected_status: <your_servicenow_rejected_status_value>
    
    database:
      postgres:
        host: 'um-postgresql'
        port: 5432
        user: 'postgres'
        password: 'admin123#'
        dbname: 'pdc_access_request_db'
        sslmode: 'disable'
    
    auth:
      url: 'https://10.177.177.7'
      auth_host: 'um-css-admin-api:5000'
      username: 'admin'
      password: 'admin'
      client_id: 'admin-cli'
    
    tools:
      servicenow:
        url: 'https://mycompany.service-now.com'
        username: 'mySNusername'
        password: 'mjfo39847tnd'
        client_id: 'ljdfsae9087534rmlvspe495rfnv'
        client_secret: '(YGVFJMKLOUJIHY'
        access_status_key: 'u_access_request_status'
        status_mapping:
          approved_status: 'Granted'
          rejected_status: 'Denied'
    
  3. Save the changes and close the config.yaml file.

  4. Open vendor/docker-compose-um.yml file. Under the access request service container configuration, add the Volumes section parallel to the environment section, save the changes, and close the file.

    volumes:
          - <path-to-the-config.yaml>:/app/config.yaml:ro
    
  5. Restart the access request service to apply changes:

    ./pdc.sh restart access-request-service

You have successfully configured ServiceNow with Data Catalog using the config.yaml file.

You now need to add a custom field to ServiceNow, to include the data access request statuses. For more information, see Add a custom field to ServiceNow.

Integrate ServiceNow with Data Catalog using environment variables

To integrate the ServiceNow ticketing system with Data Catalog, you can use environment variables to set connection details, credentials, project information, and status mappings. You can also integrate ServiceNow with Data Catalog using a config.yaml file.

Perform the following steps to integrate ServiceNow with Data Catalog using environment variables:

  1. Edit the /opt/pentaho/pdc-docker-deployment/vendor/.env.default file and add the following lines:

    Note: Instead of the default location, your environment variables may be set in an opt/pentaho/pdc-docker-deployment/conf/.env file.

    ACCESS_REQUEST_SERVICE_PROVIDER_TOOL= ServiceNow
    ACCESS_REQUEST_SERVICE_SERVICENOW_URL=your_servicenow_url
    ACCESS_REQUEST_SERVICE_SERVICENOW_USER_NAME=your_servicenow_username
    ACCESS_REQUEST_SERVICE_SERVICENOW_PASSWORD=your_servicenow_password
    ACCESS_REQUEST_SERVICE_SERVICENOW_CLIENT_ID=your_servicenow_client_id
    ACCESS_REQUEST_SERVICE_SERVICENOW_CLIENT_SECRET=your_servicenow_client_secret
    ACCESS_REQUEST_SERVICE_SERVICENOW_ACCESS_STATUS_KEY=your_servicenow_access_status_key
    ACCESS_REQUEST_SERVICE_SERVICENOW_ACCESS_STATUS_APPROVED_VALUE=your_servicenow_approved_status_value
    ACCESS_REQUEST_SERVICE_SERVICENOW_ACCESS_STATUS_REJECTED_VALUE=your_servicenow_rejected_status_value
    ACCESS_REQUEST_SERVICE_DEFAULT_ASSIGNEE=PDC_admin_user_email
    
  2. Save the changes and close the file.

  3. Restart the access request service to apply changes:

    ./pdc.sh restart access-request-service

You have successfully configured ServiceNow with Data Catalog using environment variables.

You now need to add a custom field to ServiceNow to include the data access request statuses. For more information, see Add a custom field to ServiceNow.

Add a custom field to ServiceNow

If you have configured Data Catalog to connect to ServiceNow for managing data access requests, you need to add a custom field to ServiceNow to map the Data Catalog data access request statuses to complete the ServiceNow integration.

Perform the following steps to add a custom field to ServiceNow:

  1. Log in to the ServiceNow instance with administrative rights.

  2. Go to System Definition > Tables and locate the Incident table.

  3. Open the Incident table, and at the bottom, in the Columns section, click New to add a new column (custom field).

  4. Configure the following properties:

Property
Description

Column Label

Enter a descriptive name like Access Request Status.

Column Name

Automatically generated as u_access_request_status (prefixed with u_ to indicate it’s a custom field).

Type

Select the appropriate field type, which should be Choice for values like Pending, Granted, and Denied.

Choices

Once the field type is set to Choice, there is an option to add Choice List Values. Add the following values:- Pending

  • Granted

  • Denied You can also set a default value if desired.

5. Verify the changes you have made and click Submit.

ServiceNow is now updated to use data access request statuses with Data Catalog.

Last updated

Was this helpful?