Before you begin
Before you begin setting up Pentaho to connect to an Amazon EMR cluster, perform the following tasks.
Check the Components Reference to verify that your Pentaho version supports your version of the Amazon EMR cluster.
Prepare your Amazon EMR cluster by performing the following tasks:
Configure an Amazon EC2 cluster.
View the Amazon's documentation if you need help.
Install any required services and service client tools.
Test the cluster.
Install PDI on an Amazon EC2 instance that is within the same Amazon Virtual Private Cloud (VPC) as the Amazon EMR cluster.
Note: As a best practice, you should install PDI on your Amazon EC2 instance. Otherwise, you might not be able to write or read files to or from the cluster. To resolve this issue, see Unable to read or write files to HDFS on the Amazon EMR cluster.
Get the connection information for the cluster and services that you intend to use from your Hadoop administrator. Some of this information may be available from a cluster management tool. You also need to supply some of this information to users after you are finished.
Add the YARN user on the cluster to the group defined by dfs.permissions.superusergroup property. The dfs.permissions.superusergroup property can be found in
hdfs-site.xml
file on your cluster or in the cluster management application.
Last updated
Was this helpful?