Using Spark Submit

You can use the Spark Submit job entry, along with an externally developed Spark execution script, to run Spark jobs on your YARN clusters. If you want to run transformations without the need to develop a Spark execution script, then refer to the run configuration information about using the Spark engine in AEL. See the Pentaho Data Integration document for more information on run configurations.

The following instructions explain how to use the Spark Submit entry in PDI to run a Spark job. To learn how to execute Spark Submit jobs on secure Cloudera Hadoop clusters, see Use Kerberos with Spark Submit in the Administer Pentaho Data Integration and Analytics document.

Note: Before you start, you must install and configure the Spark client according to the instructions for the Spark Submit job entry in the Pentaho Data Integration document.

PreviousData Catalog searches returning incomplete or missing data NextModify the sample Spark Submit job

Last updated 3 months ago

Was this helpful?