Create an S3 bucket
Create an S3 bucket only if you want to take one or more of the following actions. Otherwise proceed to create the EKS cluster.
Add third party JAR files like JDBC drivers or custom JAR files for Pentaho to use.
Customize the default Pentaho configuration.
Replace the server files.
Upload or update the metastore.
Add files to the Platform and PDI Server's
/home/pentaho/.kettle
directory. This is mapped to the "KETTLE_HOME_DIR" environment variable, which is used by thecontent-config.properties
file.
Create an S3 bucket.
To create an S3 bucket, see Creating a bucket.
To upload a file to S3, see Uploading objects.
Record the newly created S3 bucket name in the Worksheet for AWS hyperscaler.
Upload files into the S3 bucket.
After the S3 bucket is created, manually create any needed directories as shown in the following table and upload the relevant files to an appropriate directory location by using the AWS Management Console.
The following table lists the relevant Pentaho directories and the actions related to each directory.
Directory
Actions
/root
All the files in the S3 bucket are copied to the Platform and PDI Server's /home/pentaho/.kettle
directory.
If you must copy a file to the /home/pentaho/.kettle
directory, drop the file in the root
directory of the S3 bucket.
custom-lib
If Pentaho needs custom JAR libraries, add thecustom-lib
directory to the S3 bucket and place the libraries there.
Any files within this directory will be copied to Pentaho’s lib
directory.
Jdbc-drivers
If the Pentaho installation needs JDBC drivers, do the following:
Add the
jdbc-drivers
directory to the S3 bucket.Place the drivers in this directory. Any files within this directory will be copied to Pentaho’s
lib
directory.
plugins
If the Pentaho installation needs additional plugins installed, do the following:
Add the
plugins
directory to the S3 bucket.Copy the plugins to the
plugins
directory. Any files within this directory are copied to Pentaho’splugins
directory. For this reason, the plugins should be organized in their own directories as expected by Pentaho.
drivers
If the Pentaho installation needs big data drivers installed, do the following:
Add the
drivers
directory to the S3 bucket.Place the big data drivers in this directory. Any files placed within this directory will be copied to Pentaho’s
drivers
directory.
metastore
Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.
If you must provide the Pentaho metastore to Pentaho, copy the local metastore
directory to the root of the S3 Storage bucket. From there, the metastore
directory is copied to the proper location within the Docker image.
server-structured-override
The server-structured-override
directory is the last resort if you want to make changes to any other files in the image at runtime.
For example, you can use it for configuring authentication and authorization.
Any files and directories within this directory will be copied to the pentaho-server
directory the same way they appear in the server-structured-override
directory.
If the same files exist in the pentaho-server
directory, they will be overwritten.
The following table lists the relevant Pentaho files and the actions related to each file.
File
Actions
context.xml
The Pentaho configuration YAML is included with the image in the templates
project directory and is used to install this product. You must set the RDS host and RDS port parameters when you install Pentaho. Upon installation, the parameters in the configuration YAML are used to generate a custom context.xml
file for the Pentaho installation so it can connect to the database-specific repository.
If these are the only changes required in the context.xml
, you don’t need to provide a context.xml
in the S3 bucket. On the other hand, if you must configure additional parameters in the context.xml
, you must provide the custom.xml
file in the S3 bucket.
In the context.xml template, replace the <RDS_HOST_NAME> and <RDS_PORT> entries with the values you recorded on the Worksheet for AWS hyperscaler.
content-config.properties
The content-config.properties
file is used by the Pentaho Docker image to provide instructions on, which S3 files to copy over and their location.
The instructions are populated as multiple lines in the following format:
${KETTLE_HOME_DIR}/<some-dir-or-file>=${SERVER_DIR}/<some-dir>
A template for this file can be found in the templates project directory.
The template has an entry where the file context.xml
is copied to the required location within the Docker image:
${KETTLE_HOME_DIR}/context.xml=${SERVER_DIR}/tomcat/webapps/pentaho/META-INF/context.xml
content-config.sh
A bash script that can be used to configure files, change file and directory ownership, move files around, install missing apps, and so on.
You can add the script to the S3 bucket.
The script is executed in the Docker image after the other files are processed.
metastore.zip
Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.
If you must provide the Pentaho metastore to Pentaho, zip the content of the local.pentaho
directory with the name metastore.zip
and add it to the root of the Cloud Storage bucket. The metastore.zip
file is extracted to the proper location within the Docker image.
Note: The VFS connections cannot be copied to the hyperscaler server from PDI the same way as the named connection. You must connect to Pentaho on the hyperscaler and create the new VFS connection.
For instructions on how to dynamically update server configuration content from the S3 bucket, see [Dynamically update server configuration content from S3](Dynamically%20update%20server%20configuration%20content%20from%20S3.md).
Last updated
Was this helpful?