Using the Avro Input step on the Spark engine

You can set up the Avro Input step to run on the Spark engine. Spark processes null values differently than the Pentaho engine, so you may need to adjust your transformation to successfully process null values according to Spark's processing rules.

Additionally, when using the Avro Input step on an Amazon EMR cluster, you must copy the spark-avro_2.11-2.4.2.jar file from your SPARK_HOME folder into the extra folder in your AEL data-integration setup location. The following is an example command to copy the file:

cp /usr/lib/spark/external/lib/spark-avro_2.11-2.4.2.jar <User>/data-integration/adaptive-execution/extra/

PreviousSample transformation walkthrough using the Lookup field NextAvro Output

Last updated 5 months ago

Was this helpful?