Add files to the YARN Workspace folder
These instructions explain how to configure the Start a PDI Cluster on YARNentry so that following files are copied at runtime, to the YARN Workspace folder and then to the YARN cluster: kettle.properties, shared.xml, and repositories.xml. These instructions also explain how to manually copy additional files to the folder.
If the job is run from your local installation, the configuration files from your KETTLE_HOME directory are copied to the YARN Workspacefolder. If the job is scheduled or is run on a Pentaho Server, the configuration files from the server's configured KETTLE_HOME are copied to the YARN Workspace folder.
Complete these steps:
Ensure active hadoop driver is configured
Update properties in the yarn-site.xml
yarn.application.classpath
Classpaths needed to execute YARN applications. Separate paths with a comma.
yarn.resourcemanger.hostname
Update the hostname in your environment
yarn.resourcemanager.address
Update hostname and port to match your environment.
yarn.resourcemanager.admin.address
Update hostname and port to match your environment.
In Spoon, create or open a job that contains the Start a YARN Kettle Cluster entry.
Open the Start a PDI Cluster on YARN entry.
Select any combination of the
kettle.properties,shared.xml, andrepository.xmlcheckboxes in the Copy Local Resource Files to YARN section of the window.Save and close the Start a PDI Cluster on YARN entry.
If you want to copy other files to the cluster, manually copy them to the
YARN Workspacefolder here:pentaho-big-data-plugin/workspace.Save and run the job.
At runtime, the kettle.properties, shared.xml, and repositories.xml files (whatever was selected) are copied to the YARN Workspace folder and then to the YARN cluster.
Last updated
Was this helpful?

