Application tuning parameters for Spark
Application tuning parameters use the spark. prefix and are passed directly to the Spark cluster for configuration. Pentaho offers full support of Spark properties. See the Spark properties documentation for a full list.
Available application tuning parameters for Spark may depend on your deployment or cluster management. All the Spark parameters in PDI support the use of variables. The following table lists the Spark parameters available in PDI. See the Spark properties documentation for full descriptions, default values, and recommendations.
spark.executor.instances
Integer
The number of executors for the Spark application.
spark.executor.memoryOverhead
Integer
The amount of off-heap memory to be allocated per executor.
spark.executor.memory
Integer
The amount of memory to use per executor process.
spark.driver.memoryOverhead
Integer
The amount of off-heap memory to be allocated per driver in cluster mode.
spark.driver.memory
Integer
The amount of memory to use for the driver process.
spark.executor.cores
Integer
The number of cores to use on each executor.
spark.driver.cores
Integer
The number of cores to use for the driver process in cluster mode.
spark.default.parallelism
Integer
The default number of partitions in RDDs returned by transformations, such as join, reduceByKey, and parallelize when not set by user.
If an identical property is set in a user's transformation, it overrides the setting on the cluster or Pentaho Server.
Note: Tuning parameters at the step-level do not use the spark. prefix and are executed on the Spark cluster as applications without affecting the cluster configuration. See Setting PDI step Spark tuning options for details.
Last updated

