# MongoDB Input

The **MongoDB Input** step retrieves documents (records) from a collection in MongoDB.

### Step name

**Step name** specifies the unique name of the step on the canvas. You can change it.

### Configure the step (tabs)

#### Configure connection tab

You can configure a MongoDB connection using either a connection string or individual connection fields.

**Connection string**

Select **Connection String**, then enter a connection string URI. Verify the connection by selecting **Test Connection**.

For connection string formats and options, see the [MongoDB documentation](https://www.mongodb.com/docs/manual/reference/connection-string/).

Common examples:

| Connection type                                     | Connection string format                                                                                                                |
| --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| SSL                                                 | `mongodb://<hostname>:<port>/?tls=true`                                                                                                 |
| SSL and LDAP                                        | `mongodb://<username>:<password>@<hostname>:<port>/?tls=true&authSource=$external&authMechanism=PLAIN`                                  |
| LDAP                                                | `mongodb://<username>:<password>@<hostname>:<port>/?authSource=$external&authMechanism=PLAIN`                                           |
| SSL and LDAP cluster servers with `replicaSet`      | `mongodb://<username>:<password>@<hostname>:<port>/?tls=true&authsource=$external&authMechanism=PLAIN&replicaSet=rs0`                   |
| SSL and LDAP with `replicaSet` and `readPreference` | `mongodb://<username>:<password>@<hostname>/?tls=true&authSource=$external&authMechanism=PLAIN&replicaSet=rs0&readPreference=secondary` |
| Kerberos                                            | `mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPI`                                            |
| Kerberos and SSL                                    | `mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPI&tls=true`                                   |
| Atlas Cloud/SAAS                                    | `mongodb+srv://<username>:<password>@mycluster.qj8y0.mongodb.net/myFirstDatabase?retryWrites=true&w=majority`                           |

**Configure fields**

If you select **Configure Fields**, specify these values:

| Field                                  | Description                                                                                                                                                                                                       |
| -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Host name(s) or IP address(es)**     | Host names or IP addresses for MongoDB instances. You can specify ports by using `host:port`. Separate multiple hosts with commas.                                                                                |
| **Port**                               | Default port (used if ports are not specified with hosts). Default: `27017`.                                                                                                                                      |
| **Enable SSL connection**              | Connects to a MongoDB server configured for SSL.                                                                                                                                                                  |
| **Use all replica set members/mongos** | Uses all replica sets when multiple hosts are specified. If a replica set has more than one host, the Java driver discovers hosts automatically and tries the next replica set if the current set is unavailable. |
| **Authentication database**            | Authentication database.                                                                                                                                                                                          |
| **Authenticate mechanism**             | Authentication method. Values include `SCRAM-SHA-1`, `MONGODB-CR`, and `PLAIN`.                                                                                                                                   |
| **Username**                           | Username required to access the database. For Kerberos, enter the Kerberos principal.                                                                                                                             |
| **Password**                           | Password for the username (not required for Kerberos).                                                                                                                                                            |
| **Authenticate using Kerberos**        | Enables Kerberos authentication. When selected, use the Kerberos principal in **Username**.                                                                                                                       |
| **Connection timeout**                 | Connection timeout in milliseconds. Leave blank for no timeout.                                                                                                                                                   |
| **Socket timeout**                     | Write operation timeout in milliseconds. Leave blank for no timeout.                                                                                                                                              |
| **Preview**                            | Displays preview rows generated by this step.                                                                                                                                                                     |

#### Input options tab

Use the **Input options** tab to select the database and collection and specify read preference.

| Option              | Definition                                                                                                             |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **Database**        | Database to retrieve data from. Select **Get DBs** to list databases on the server.                                    |
| **Collection**      | Collection to retrieve data from. Select **Get collections** to list collections in the selected database.             |
| **Read preference** | Node selection preference: **primary**, **primary preferred**, **secondary**, **secondary preferred**, or **nearest**. |

**Tag set specification**

Tag sets allow you to customize read preferences for a replica set. See MongoDB’s [Tag Sets](https://docs.mongodb.com/manual/tutorial/configure-replica-set-tag-sets/index.html).

| Field            | Description                                                                                                                                              |
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **#**            | Tag set number.                                                                                                                                          |
| **Tag set**      | Tag set criteria. You can join, delete, copy, and paste tag sets, then select **Test tag set** to see matching replica set members.                      |
| **Get tags**     | Retrieves tag sets in the source database. Tag sets are listed in order of execution.                                                                    |
| **Join tags**    | Appends selected tag sets so that matching nodes are queried simultaneously. This change applies only in the MongoDB Input window (not on the database). |
| **Test tag set** | Displays replica set members matching the tag set criteria (ID, host name, priority, and tags).                                                          |

#### Query tab

Use the **Query** tab to refine read requests.

This tab supports two query modes:

* **Query expression** mode (default)
* **Aggregation pipeline specification** mode

Select **Query is aggregation pipeline** to switch modes.

| Field/option                                  | Definition                                                                                                                            |
| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| **Query expression (JSON)**                   | Query expression to limit output (query expression mode).                                                                             |
| **Aggregation pipeline specification (JSON)** | Aggregation pipeline expression (pipeline mode). When enabled, the field label includes the method name with the selected collection. |
| **Query is aggregation pipeline**             | Enables aggregation pipeline mode.                                                                                                    |
| **Allow disk use**                            | Enables `allowDiskUse` for aggregation pipelines so you can process pipelines that exceed the standard 100 MB RAM allocation.         |
| **Execute for each row**                      | Runs the query for each incoming row.                                                                                                 |
| **Fields expression (JSON)**                  | Projection argument that controls which fields are returned (query expression mode only). If empty, all fields are returned.          |

#### Fields tab

Use the **Fields** tab to control the output fields.

The Fields tab supports two output modes:

* Output all fields as a single JSON string field.
* Output a selected set of fields into separate columns.

{% hint style="info" %}
When **Output single JSON field** is selected, only **Name of JSON output field** is active. When it is not selected, **Name of JSON output field** is inactive.
{% endhint %}

| Option                           | Definition                                                                                                                                                                                                                   |
| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Output single JSON field**     | Outputs query results as a single JSON field (String).                                                                                                                                                                       |
| **Name of JSON output field**    | Output field name for the JSON string.                                                                                                                                                                                       |
| **Get fields**                   | Generates a sample set of documents and populates the table with sample field names/paths/types. You can edit the results.                                                                                                   |
| **#**                            | Entry order.                                                                                                                                                                                                                 |
| **Name**                         | Output field name in the transformation. You can edit it.                                                                                                                                                                    |
| **Path**                         | JSON path of the MongoDB field. Arrays can be addressed using index notation (for example, `$.emails[0]`) or wildcards (for example, `$.emails[*]`). For arrays of records, you can address fields like `$.emails[].sender`. |
| **Type**                         | Data type.                                                                                                                                                                                                                   |
| **Indexed values**               | Comma-separated list of legal values for String fields. When set, PDI applies the Kettle indexed data type.                                                                                                                  |
| **Sample: array min: max index** | Sampled minimum and maximum array indexes.                                                                                                                                                                                   |
| **Sample: #occur/#docs**         | Sampled frequency and number of documents processed.                                                                                                                                                                         |
| **Sample: disparate types**      | Indicates if the same field appears with multiple data types across sampled documents. When `Y`, **Type** is set to String.                                                                                                  |

If you output a single JSON field, you can parse it using the [JSON Input](/pdia-data-integration/pdi-transformation-steps-reference-overview/json-input.md) step or a [User Defined Java Class](/pdia-data-integration/pdi-transformation-steps-reference-overview/user-defined-java-class.md) step.

#### Examples

<details>

<summary>Query expression examples</summary>

MongoDB supports JSON-like query expressions. MongoDB’s [Extended JSON](http://docs.mongodb.org/manual/reference/mongodb-extended-json/) documentation explains the broader query language.

| Query expression                                                     | Description                                                                            |
| -------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
| `{ name : "MongoDB" }`                                               | Queries documents where `name` equals `MongoDB`.                                       |
| `{ name : { '$regex' : "m.*", '$options' : "i" } }`                  | Uses a regular expression to find `name` values starting with `m` (case-insensitive).  |
| `{ name : { '$gt' : "M" } }`                                         | Finds strings greater than `M`.                                                        |
| `{ name : { '$lte' : "T" } }`                                        | Finds strings less than or equal to `T`.                                               |
| `{ name : { '$in' : [ "MongoDB", "MySQL" ] } }`                      | Finds all names that are either `MongoDB` or `MySQL`.                                  |
| `{ name : { '$nin' : [ "MongoDB", "MySQL" ] } }`                     | Finds names that are not `MongoDB` or `MySQL`, or where the field is not set.          |
| `{ created_at : { $gte : { $date : "2014-12-31T00:00:00.000Z" } } }` | Finds documents where `created_at` is greater than or equal to the specified UTC date. |
| `{ $where : "this.count == 1" }`                                     | Uses JavaScript to evaluate a condition.                                               |
| `{ $query: {}, $orderby: { age : -1 } }`                             | Returns all documents sorted by `age` descending.                                      |

</details>

<details>

<summary>Aggregation pipeline examples</summary>

MongoDB supports an aggregation pipeline framework. See MongoDB’s [Aggregation](http://docs.mongodb.org/manual/tutorial/aggregation-examples/) documentation for more examples.

| Pipeline expression                                                                                                                                  | Description                                                                                                    |
| ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| `{ $match : {state : "FL", city : "ORLANDO" } }, { $sort : {pop : -1 } }`                                                                            | Returns documents where `state` is `FL` and `city` is `ORLANDO`, sorted by `pop` descending.                   |
| `{ $group : { _id: "$state"} }, { $sort : { _id : 1 } }`                                                                                             | Returns distinct values for `state` (as `_id`) in ascending order.                                             |
| `{ $match : {state : "FL" } }, { $group: {_id: "$city" , pop: { $sum: "$pop" } } }, { $sort: { pop: -1 } }, { $project: {_id : 0, city : "$_id" } }` | Filters to `state = FL`, aggregates population per city, sorts descending, and returns one field named `city`. |
| `{ $unwind : "$result" }`                                                                                                                            | Unwinds array elements and returns one document per array element.                                             |

</details>

### Metadata injection support

All fields of this step support metadata injection. You can use it with [ETL metadata injection](/pdia-data-integration/pdi-transformation-steps-reference-overview/etl-metadata-injection.md) to pass metadata to your transformation at runtime.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdia-data-integration/pdi-transformation-steps-reference-overview/mongodb-input.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
