Google Cloud Storage
This configuration task is intended for Pentaho administrators and Hadoop cluster administrators who want to set up access to Google Cloud Storage (GCS) for PDI transformations running on Spark.
This task assumes that you have obtained the settings for your site's Google Cloud Storage (GCS) configuration from your Hadoop cluster administrator.
Perform the following steps to set up Hadoop cluster access to GCS:
Log on to the cluster and stop the AEL daemon by running the shutdown script,
daemon.sh stop
, from the command line interface.Download the GCS Hadoop Connector JAR file and save it in a location where you can access it. You can use the following UNIX command to download the GCS Hadoop Connector Jar:
wget https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar
Use the following command to add the GCS Hadoop Connector JAR file to the
SPARK_DIST_CLASSPATH
where /full/path/to is the location where you stored the JAR file:export SPARK_DIST_CLASSPATH=$(hadoop classpath):*/full/path/to*/gcs-connector-hadoop2-latest.jar
Configure your clusters with the GCS connector with Hadoop/Spark using the instructions located in the Google Cloud Platform interoperability GitHub repository: https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md
Configure AEL to use the GCS Hadoop Connector. Possible ways of configuring AEL include on of the following.
Adding the GCS properties to the
/etc/hadoop/conf/core-site.xml
file.Adding JSON keyfile parameters for GCS to the AEL daemon
application.properties
file. Follow the instructions in Step 6.Note: The JSON keyfile for GCS must be present on all the nodes in the cluster.
(Optional) If you choose to add a JSON keyfile to the
application.properties
file, follow these steps.Navigate to the
data-integration/adaptive-execution/config/
directory and open theapplication.properties
file with any text editor.Add the following lines of code:
spark.hadoop.google.cloud.auth.service.account.enable=true
spark.hadoop.google.cloud.auth.service.account.json.keyfile=/path/to/keyfile.json
Save the file and close it.
Restart the AEL daemon by running the startup script,
daemon.sh
, from the command line interface.
Last updated
Was this helpful?