Using the pre-installed Apache Hadoop driver
You can access and use the installed Apache Hadoop driver for HDFS copy file operations as well as for executing input and output transformations and jobs. The driver works with both secure and unsecured clusters. Because the driver pre-installed, you do not have to install a KAR file.
The supported big data steps in Pentaho include:
Both operating system file browsers and the Pentaho virtual file system browsers are supported, as well as basic HDFS and VFS operations. For more information, see Connecting to Virtual File Systems.
Note: Only Hadoop clusters that conform with standard Hadoop connection rules work with the Apache Hadoop Driver. For example, while EMR clusters may work, MapR does not work with this driver because the connection rules for MapR are not standard. The Apache Hadoop Driver is not intended to support higher level Hadoop operations such as Hive, HBase, Sqoop, and Oozie. If you require these operations, install the KAR file for the applicable vendor.
Last updated
Was this helpful?