Set the Spark parameters locally in PDI
In PDI, you can customize Spark properties in your transformation to further tune how the Spark cluster process your transformation. By adjusting the applicable tuning parameters in your transformation for the run instance, you are overriding the global settings for the cluster. You can set these properties as run modification parameters or as environment variables.
Note: When defining the parameter, you can assign it a default value to use if one is not fetched for it. If you prefer to set the Spark properties using environment variables, see the Pentaho Data Integration document for further information on environment variables.
Perform the following steps to set the Spark parameters in PDI:
In the PDI, double-click the transformation canvas, or press Ctrl T.
The transformation properties dialog box opens.
Click the Parameters tab.
The Parameters table opens.
Enter the Spark parameter in the Parameters column and the value for that property in the Default Value column of the table. Optionally, enter a description.
Note: If the parameter and the variable share the same name, the parameter takes precedence.

Parameters tab, Transformation properties dialog box Click OK.
The performance results of your executed transformations are available in the Logging in the Execution Results panel of PDI and on the YARN ResourceManager and Spark History Server. Consult your cluster administrator to view these logs. You can refine the tuning of the cluster at the step level as described in Optimizing Spark tuning.
Last updated

