Create an S3 bucket

Create an S3 bucket only if you want to take one or more of the following actions. Otherwise proceed to create the EKS cluster.

  • Add third party JAR files like JDBC drivers or custom JAR files for Pentaho to use.

  • Customize the default Pentaho configuration.

  • Replace the server files.

  • Upload or update the metastore.

  • Add files to the Platform and PDI Server's /home/pentaho/.kettle directory. This is mapped to the "KETTLE_HOME_DIR" environment variable, which is used by the content-config.properties file.

  1. Create an S3 bucket.

    To create an S3 bucket, see Creating a bucket.

    To upload a file to S3, see Uploading objects.

  2. Record the newly created S3 bucket name in the Worksheet for AWS hyperscaler.

  3. Upload files into the S3 bucket.

    After the S3 bucket is created, manually create any needed directories as shown in the following table and upload the relevant files to an appropriate directory location by using the AWS Management Console.

    The following table lists the relevant Pentaho directories and the actions related to each directory.

Directory

Actions

/root

All the files in the S3 bucket are copied to the Platform and PDI Server's /home/pentaho/.kettle directory.

If you must copy a file to the /home/pentaho/.kettle directory, drop the file in the root directory of the S3 bucket.

custom-lib

If Pentaho needs custom JAR libraries, add thecustom-lib directory to the S3 bucket and place the libraries there.

Any files within this directory will be copied to Pentaho’s lib directory.

Jdbc-drivers

If the Pentaho installation needs JDBC drivers, do the following:

  1. Add the jdbc-drivers directory to the S3 bucket.

  2. Place the drivers in this directory. Any files within this directory will be copied to Pentaho’s lib directory.

plugins

If the Pentaho installation needs additional plugins installed, do the following:

  1. Add the plugins directory to the S3 bucket.

  2. Copy the plugins to the plugins directory. Any files within this directory are copied to Pentaho’s plugins directory. For this reason, the plugins should be organized in their own directories as expected by Pentaho.

drivers

If the Pentaho installation needs big data drivers installed, do the following:

  1. Add the drivers directory to the S3 bucket.

  2. Place the big data drivers in this directory. Any files placed within this directory will be copied to Pentaho’s drivers directory.

metastore

Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.

If you must provide the Pentaho metastore to Pentaho, copy the local metastore directory to the root of the S3 Storage bucket. From there, the metastore directory is copied to the proper location within the Docker image.

server-structured-override

The server-structured-override directory is the last resort if you want to make changes to any other files in the image at runtime.

For example, you can use it for configuring authentication and authorization.

Any files and directories within this directory will be copied to the pentaho-server directory the same way they appear in the server-structured-override directory.

If the same files exist in the pentaho-server directory, they will be overwritten.

The following table lists the relevant Pentaho files and the actions related to each file.

File

Actions

context.xml

The Pentaho configuration YAML is included with the image in the templates project directory and is used to install this product. You must set the RDS host and RDS port parameters when you install Pentaho. Upon installation, the parameters in the configuration YAML are used to generate a custom context.xml file for the Pentaho installation so it can connect to the database-specific repository.

If these are the only changes required in the context.xml, you don’t need to provide a context.xml in the S3 bucket. On the other hand, if you must configure additional parameters in the context.xml, you must provide the custom.xml file in the S3 bucket.

In the context.xml template, replace the <RDS_HOST_NAME> and <RDS_PORT> entries with the values you recorded on the Worksheet for AWS hyperscaler.

content-config.properties

The content-config.properties file is used by the Pentaho Docker image to provide instructions on, which S3 files to copy over and their location.

The instructions are populated as multiple lines in the following format:

${KETTLE_HOME_DIR}/<some-dir-or-file>=${SERVER_DIR}/<some-dir>A template for this file can be found in the templates project directory.

The template has an entry where the file context.xml is copied to the required location within the Docker image:

${KETTLE_HOME_DIR}/context.xml=${SERVER_DIR}/tomcat/webapps/pentaho/META-INF/context.xml

content-config.sh

A bash script that can be used to configure files, change file and directory ownership, move files around, install missing apps, and so on.

You can add the script to the S3 bucket.

The script is executed in the Docker image after the other files are processed.

metastore.zip

Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.

If you must provide the Pentaho metastore to Pentaho, zip the content of the local.pentaho directory with the name metastore.zip and add it to the root of the Cloud Storage bucket. The metastore.zip file is extracted to the proper location within the Docker image.

Note: The VFS connections cannot be copied to the hyperscaler server from PDI the same way as the named connection. You must connect to Pentaho on the hyperscaler and create the new VFS connection.

For instructions on how to dynamically update server configuration content from the S3 bucket, see [Dynamically update server configuration content from S3](Dynamically%20update%20server%20configuration%20content%20from%20S3.md).

Last updated

Was this helpful?