Step 2: Adjust the Spark parameters in the transformation

If possible, ensure that no other jobs are running on the cluster.

Spark parameters specified by a transformation parameter apply to a specific user and temporarily override the baseline for additional KTR considerations. For example, if you want to change the spark.driver.memory, you can embed the appropriate Spark parameter setting in the KTR so that it executes only when the transformation is run.

Note: If an identical property is also set on the cluster or Pentaho Server, the user's KTR takes precedence.

Use the following steps to optimize Spark tuning locally.

Set the Spark parameters as described in Set the Spark parameters locally in PDI.
Run the transformation on the cluster and evaluate the results as recorded in the Logging tab in the Execution Results panel of PDI.
The local tuning for the Spark application is recorded in the Logging tab in the Execution Results panel of PDI.
PDI logging of the Spark transformation parameters
Modify the values of the Spark parameters then rerun the transformation.
Repeat step 3 as needed to collect data on the performance results of the different values.
Examine the results of your iterations in the log.
Set the Spark parameters in the transformation according to the values that produced the fastest runtime.

You have locally tuned Spark for your transformation. If needed, proceed to Step 3: Set the Spark tuning options on a PDI step in the transformation to apply step-level tuning. For example, additional tuning may be required to run the step if it runs slowly or if it inefficiently consumes available memory.

PreviousStep 1: Set the Spark parameters on the cluster NextStep 3: Set the Spark tuning options on a PDI step in the transformation

Last updated 8 months ago

Was this helpful?