MongoDB Input

The MongoDB Input step retrieves documents (records) from a collection in MongoDB.

Step name

Step name specifies the unique name of the step on the canvas. You can change it.

Configure the step (tabs)

Configure connection tab

You can configure a MongoDB connection using either a connection string or individual connection fields.

Connection string

Select Connection String, then enter a connection string URI. Verify the connection by selecting Test Connection.

For connection string formats and options, see the MongoDB documentationarrow-up-right.

Common examples:

Connection type
Connection string format

SSL

mongodb://<hostname>:<port>/?tls=true

SSL and LDAP

mongodb://<username>:<password>@<hostname>:<port>/?tls=true&authSource=$external&authMechanism=PLAIN

LDAP

mongodb://<username>:<password>@<hostname>:<port>/?authSource=$external&authMechanism=PLAIN

SSL and LDAP cluster servers with replicaSet

mongodb://<username>:<password>@<hostname>:<port>/?tls=true&authsource=$external&authMechanism=PLAIN&replicaSet=rs0

SSL and LDAP with replicaSet and readPreference

mongodb://<username>:<password>@<hostname>/?tls=true&authSource=$external&authMechanism=PLAIN&replicaSet=rs0&readPreference=secondary

Kerberos

mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPI

Kerberos and SSL

mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPI&tls=true

Atlas Cloud/SAAS

mongodb+srv://<username>:<password>@mycluster.qj8y0.mongodb.net/myFirstDatabase?retryWrites=true&w=majority

Configure fields

If you select Configure Fields, specify these values:

Field
Description

Host name(s) or IP address(es)

Host names or IP addresses for MongoDB instances. You can specify ports by using host:port. Separate multiple hosts with commas.

Port

Default port (used if ports are not specified with hosts). Default: 27017.

Enable SSL connection

Connects to a MongoDB server configured for SSL.

Use all replica set members/mongos

Uses all replica sets when multiple hosts are specified. If a replica set has more than one host, the Java driver discovers hosts automatically and tries the next replica set if the current set is unavailable.

Authentication database

Authentication database.

Authenticate mechanism

Authentication method. Values include SCRAM-SHA-1, MONGODB-CR, and PLAIN.

Username

Username required to access the database. For Kerberos, enter the Kerberos principal.

Password

Password for the username (not required for Kerberos).

Authenticate using Kerberos

Enables Kerberos authentication. When selected, use the Kerberos principal in Username.

Connection timeout

Connection timeout in milliseconds. Leave blank for no timeout.

Socket timeout

Write operation timeout in milliseconds. Leave blank for no timeout.

Preview

Displays preview rows generated by this step.

Input options tab

Use the Input options tab to select the database and collection and specify read preference.

Option
Definition

Database

Database to retrieve data from. Select Get DBs to list databases on the server.

Collection

Collection to retrieve data from. Select Get collections to list collections in the selected database.

Read preference

Node selection preference: primary, primary preferred, secondary, secondary preferred, or nearest.

Tag set specification

Tag sets allow you to customize read preferences for a replica set. See MongoDB’s Tag Setsarrow-up-right.

Field
Description

#

Tag set number.

Tag set

Tag set criteria. You can join, delete, copy, and paste tag sets, then select Test tag set to see matching replica set members.

Get tags

Retrieves tag sets in the source database. Tag sets are listed in order of execution.

Join tags

Appends selected tag sets so that matching nodes are queried simultaneously. This change applies only in the MongoDB Input window (not on the database).

Test tag set

Displays replica set members matching the tag set criteria (ID, host name, priority, and tags).

Query tab

Use the Query tab to refine read requests.

This tab supports two query modes:

  • Query expression mode (default)

  • Aggregation pipeline specification mode

Select Query is aggregation pipeline to switch modes.

Field/option
Definition

Query expression (JSON)

Query expression to limit output (query expression mode).

Aggregation pipeline specification (JSON)

Aggregation pipeline expression (pipeline mode). When enabled, the field label includes the method name with the selected collection.

Query is aggregation pipeline

Enables aggregation pipeline mode.

Allow disk use

Enables allowDiskUse for aggregation pipelines so you can process pipelines that exceed the standard 100 MB RAM allocation.

Execute for each row

Runs the query for each incoming row.

Fields expression (JSON)

Projection argument that controls which fields are returned (query expression mode only). If empty, all fields are returned.

Fields tab

Use the Fields tab to control the output fields.

The Fields tab supports two output modes:

  • Output all fields as a single JSON string field.

  • Output a selected set of fields into separate columns.

circle-info

When Output single JSON field is selected, only Name of JSON output field is active. When it is not selected, Name of JSON output field is inactive.

Option
Definition

Output single JSON field

Outputs query results as a single JSON field (String).

Name of JSON output field

Output field name for the JSON string.

Get fields

Generates a sample set of documents and populates the table with sample field names/paths/types. You can edit the results.

#

Entry order.

Name

Output field name in the transformation. You can edit it.

Path

JSON path of the MongoDB field. Arrays can be addressed using index notation (for example, $.emails[0]) or wildcards (for example, $.emails[*]). For arrays of records, you can address fields like $.emails[].sender.

Type

Data type.

Indexed values

Comma-separated list of legal values for String fields. When set, PDI applies the Kettle indexed data type.

Sample: array min: max index

Sampled minimum and maximum array indexes.

Sample: #occur/#docs

Sampled frequency and number of documents processed.

Sample: disparate types

Indicates if the same field appears with multiple data types across sampled documents. When Y, Type is set to String.

If you output a single JSON field, you can parse it using the JSON Input step or a User Defined Java Class step.

Examples

chevron-rightQuery expression exampleshashtag

MongoDB supports JSON-like query expressions. MongoDB’s Extended JSONarrow-up-right documentation explains the broader query language.

Query expression
Description

{ name : "MongoDB" }

Queries documents where name equals MongoDB.

{ name : { '$regex' : "m.*", '$options' : "i" } }

Uses a regular expression to find name values starting with m (case-insensitive).

{ name : { '$gt' : "M" } }

Finds strings greater than M.

{ name : { '$lte' : "T" } }

Finds strings less than or equal to T.

{ name : { '$in' : [ "MongoDB", "MySQL" ] } }

Finds all names that are either MongoDB or MySQL.

{ name : { '$nin' : [ "MongoDB", "MySQL" ] } }

Finds names that are not MongoDB or MySQL, or where the field is not set.

{ created_at : { $gte : { $date : "2014-12-31T00:00:00.000Z" } } }

Finds documents where created_at is greater than or equal to the specified UTC date.

{ $where : "this.count == 1" }

Uses JavaScript to evaluate a condition.

{ $query: {}, $orderby: { age : -1 } }

Returns all documents sorted by age descending.

chevron-rightAggregation pipeline exampleshashtag

MongoDB supports an aggregation pipeline framework. See MongoDB’s Aggregationarrow-up-right documentation for more examples.

Pipeline expression
Description

{ $match : {state : "FL", city : "ORLANDO" } }, { $sort : {pop : -1 } }

Returns documents where state is FL and city is ORLANDO, sorted by pop descending.

{ $group : { _id: "$state"} }, { $sort : { _id : 1 } }

Returns distinct values for state (as _id) in ascending order.

{ $match : {state : "FL" } }, { $group: {_id: "$city" , pop: { $sum: "$pop" } } }, { $sort: { pop: -1 } }, { $project: {_id : 0, city : "$_id" } }

Filters to state = FL, aggregates population per city, sorts descending, and returns one field named city.

{ $unwind : "$result" }

Unwinds array elements and returns one document per array element.

Metadata injection support

All fields of this step support metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.

Last updated

Was this helpful?