# Amazon Hive Job Executor

The **Amazon Hive Job Executor** job entry runs Hive jobs in Amazon Elastic MapReduce (EMR). You can use this entry to access job flows in your Amazon Web Services (AWS) account.

### Before you begin

* You must have an AWS account configured for EMR.
* You must have a Hive script created to control the remote job.

### Entry name

**Entry name** specifies the unique name of the job entry on the canvas. You can change it.

### Configure the entry (tabs)

#### Hive settings tab

![Hive settings tab, Amazon Hive Job Executor](/files/MT8bgPagz8IzNbsJycNP)

Use this tab to connect to your AWS account and select or create the EMR cluster.

**AWS connection**

| Option         | Description                                                                                                                                                                                                                                                                             |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Access key** | Unique identifier for your AWS account. The access key and secret key are used to sign requests, identify the sender, and help prevent request tampering.                                                                                                                               |
| **Secret key** | Secret key associated with the access key. The access key and secret key are used to sign requests, identify the sender, and help prevent request tampering.                                                                                                                            |
| **Region**     | Amazon EC2 region where the job flow runs. Available regions depend on your AWS account. See the AWS documentation for [regions and availability zones](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones). |

Select **Connect** to establish the connection.

**Cluster**

Select **New** to create a new job flow (cluster), or **Existing** if you already have a job flow ID.

If you select **New**, configure these options:

| Option                   | Description                                                                                                                                                                                                                                                             |
| ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **EC2 role**             | Amazon EC2 instance profile role for the cluster. Processes running on cluster instances use this role when calling other AWS services. Available roles depend on your AWS account.                                                                                     |
| **EMR role**             | Role that permits Amazon EMR to call other AWS services (for example, Amazon EC2) on your behalf. See the AWS documentation for [EMR IAM roles](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html). Available roles depend on your AWS account. |
| **Master instance type** | Amazon EC2 instance type used as the Hadoop master (handles task distribution).                                                                                                                                                                                         |
| **Slave instance type**  | Amazon EC2 instance type used as one or more Hadoop workers. Valid only when **Number of instances** is greater than `1`.                                                                                                                                               |
| **EMR release**          | EMR release version (defines service components and versions).                                                                                                                                                                                                          |
| **Number of instances**  | Number of EC2 instances for the job flow.                                                                                                                                                                                                                               |
| **Bootstrap actions**    | References to scripts that run before the node begins processing data. See the AWS documentation for [bootstrap actions](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html).                                                               |

If you select **Existing**, specify the existing ID in **Existing JobFlow ID**.

#### Job settings tab

![Job settings tab, Amazon Hive Job Executor](/files/sNWDk4tO4vDQ8TWnCpUL)

| Option                     | Description                                                                                                                                                                                                                             |
| -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Hive job flow name**     | Name of the Hive job flow to execute.                                                                                                                                                                                                   |
| **S3 staging directory**   | Amazon S3 location (bucket/path) where job flow logs are stored. Artifacts required for execution (for example, the Hive script) are also stored here before execution.                                                                 |
| **Hive script**            | Location of the Hive script to execute (Amazon S3 or local file system).                                                                                                                                                                |
| **Command line arguments** | Command-line arguments passed to the Hive script. Separate multiple arguments with spaces.                                                                                                                                              |
| **Keep job flow alive**    | Keeps the job flow active after the entry finishes. If not selected, the job flow terminates when the entry finishes.                                                                                                                   |
| **Enable blocking**        | Waits for the EMR Hive job to complete before continuing to the next entry. Blocking is required for PDI to track job status and to support error handling and routing. If cleared, the job is submitted and PDI continues immediately. |
| **Logging interval**       | When **Enable blocking** is selected, number of seconds between status log messages.                                                                                                                                                    |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-job-entries-reference-overview/amazon-hive-job-executor.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
