Spark version 2.x.x
Complete the following steps to install and configure the Spark client:
On the client, download the Spark distribution of the same or higher version as the one used on the cluster.
Set the HADOOP_CONF_DIR environment variable to a folder containing cluster configuration files as shown in the following sample for an already-configured driver:
*<username>*/.pentaho/metastore/pentaho/NamedCluster/Configs/*<user-defined connection name>*
Navigate to
<SPARK_HOME>/conf
and create thespark-defaults.conf
file using the instructions outlined in https://spark.apache.org/docs/latest/configuration.html.Create a ZIP archive containing all the JAR files in the
SPARK_HOME/jars
directory.Copy the ZIP file from the local file system to a world-readable location on the cluster.
Edit the
spark-defaults.conf
file to set the spark.yarn.archive property to the world-readable location of your ZIP file on the cluster as shown in the following examples:spark.yarn.archive hdfs://*NameNode hostname*:8020/user/spark/lib/*your ZIP file*
Add the following line of code to the
spark-defaults.conf
file:spark.hadoop.yarn.timeline-service.enabled false
If you are connecting to an HDP cluster, add the following lines in the
spark-defaults.conf
file:spark.driver.extraJavaOptions -Dhdp.version=2.3.0.0-2557
spark.yarn.am.extraJavaOptions -Dhdp.version=2.3.0.0-2557
Note: The-Dhdp
version should be the same as Hadoop version used on the cluster.
If you are connecting to an HDP cluster, also create a text file named
java-opts
in the<SPARK_HOME>/conf
folder and add your HDP version to it as shown in the following example:-Dhdp.version=2.3.0.0-2557
Note: Run thehdp-select status Hadoop client
command to determine your version of HDP.
If you are connecting to a supported version of the HDP or CDH cluster, open the
core-site.xml
file, then comment out the net.topology.script.file property as shown in the following code block:<!-- <property> <name>net.topology.script.file.name</name> <value>/etc/hadoop/conf/topology_script.py</value> </property> -->
The Spark client is now ready for use with Spark Submit in PDI.
Last updated
Was this helpful?