Hadoop libraries are missing

If you use the Spark libraries packaged with EMR, Cloudera, and Hortonworks’ distributions, you must add the Hadoop libraries to the classpath with the SPARK_DIST_CLASSPATH environment variable. These distributions are not packaged with the Hadoop libraries. For EMR, these libraries are required to access S3 resources.

Last updated

Was this helpful?