MongoDB Input
The MongoDB Input step retrieves documents (records) from a collection in MongoDB.
Step name
Step name specifies the unique name of the step on the canvas. You can change it.
Configure the step (tabs)
Configure connection tab
You can configure a MongoDB connection using either a connection string or individual connection fields.
Connection string
Select Connection String, then enter a connection string URI. Verify the connection by selecting Test Connection.
For connection string formats and options, see the MongoDB documentation.
Common examples:
SSL
mongodb://<hostname>:<port>/?tls=true
SSL and LDAP
mongodb://<username>:<password>@<hostname>:<port>/?tls=true&authSource=$external&authMechanism=PLAIN
LDAP
mongodb://<username>:<password>@<hostname>:<port>/?authSource=$external&authMechanism=PLAIN
SSL and LDAP cluster servers with replicaSet
mongodb://<username>:<password>@<hostname>:<port>/?tls=true&authsource=$external&authMechanism=PLAIN&replicaSet=rs0
SSL and LDAP with replicaSet and readPreference
mongodb://<username>:<password>@<hostname>/?tls=true&authSource=$external&authMechanism=PLAIN&replicaSet=rs0&readPreference=secondary
Kerberos
mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPI
Kerberos and SSL
mongodb://<service-principal>@<hostname>:<port>/?authSource=$external&authMechanism=GSSAPI&tls=true
Atlas Cloud/SAAS
mongodb+srv://<username>:<password>@mycluster.qj8y0.mongodb.net/myFirstDatabase?retryWrites=true&w=majority
Configure fields
If you select Configure Fields, specify these values:
Host name(s) or IP address(es)
Host names or IP addresses for MongoDB instances. You can specify ports by using host:port. Separate multiple hosts with commas.
Port
Default port (used if ports are not specified with hosts). Default: 27017.
Enable SSL connection
Connects to a MongoDB server configured for SSL.
Use all replica set members/mongos
Uses all replica sets when multiple hosts are specified. If a replica set has more than one host, the Java driver discovers hosts automatically and tries the next replica set if the current set is unavailable.
Authentication database
Authentication database.
Authenticate mechanism
Authentication method. Values include SCRAM-SHA-1, MONGODB-CR, and PLAIN.
Username
Username required to access the database. For Kerberos, enter the Kerberos principal.
Password
Password for the username (not required for Kerberos).
Authenticate using Kerberos
Enables Kerberos authentication. When selected, use the Kerberos principal in Username.
Connection timeout
Connection timeout in milliseconds. Leave blank for no timeout.
Socket timeout
Write operation timeout in milliseconds. Leave blank for no timeout.
Preview
Displays preview rows generated by this step.
Input options tab
Use the Input options tab to select the database and collection and specify read preference.
Database
Database to retrieve data from. Select Get DBs to list databases on the server.
Collection
Collection to retrieve data from. Select Get collections to list collections in the selected database.
Read preference
Node selection preference: primary, primary preferred, secondary, secondary preferred, or nearest.
Tag set specification
Tag sets allow you to customize read preferences for a replica set. See MongoDB’s Tag Sets.
#
Tag set number.
Tag set
Tag set criteria. You can join, delete, copy, and paste tag sets, then select Test tag set to see matching replica set members.
Get tags
Retrieves tag sets in the source database. Tag sets are listed in order of execution.
Join tags
Appends selected tag sets so that matching nodes are queried simultaneously. This change applies only in the MongoDB Input window (not on the database).
Test tag set
Displays replica set members matching the tag set criteria (ID, host name, priority, and tags).
Query tab
Use the Query tab to refine read requests.
This tab supports two query modes:
Query expression mode (default)
Aggregation pipeline specification mode
Select Query is aggregation pipeline to switch modes.
Query expression (JSON)
Query expression to limit output (query expression mode).
Aggregation pipeline specification (JSON)
Aggregation pipeline expression (pipeline mode). When enabled, the field label includes the method name with the selected collection.
Query is aggregation pipeline
Enables aggregation pipeline mode.
Allow disk use
Enables allowDiskUse for aggregation pipelines so you can process pipelines that exceed the standard 100 MB RAM allocation.
Execute for each row
Runs the query for each incoming row.
Fields expression (JSON)
Projection argument that controls which fields are returned (query expression mode only). If empty, all fields are returned.
Fields tab
Use the Fields tab to control the output fields.
The Fields tab supports two output modes:
Output all fields as a single JSON string field.
Output a selected set of fields into separate columns.
When Output single JSON field is selected, only Name of JSON output field is active. When it is not selected, Name of JSON output field is inactive.
Output single JSON field
Outputs query results as a single JSON field (String).
Name of JSON output field
Output field name for the JSON string.
Get fields
Generates a sample set of documents and populates the table with sample field names/paths/types. You can edit the results.
#
Entry order.
Name
Output field name in the transformation. You can edit it.
Path
JSON path of the MongoDB field. Arrays can be addressed using index notation (for example, $.emails[0]) or wildcards (for example, $.emails[*]). For arrays of records, you can address fields like $.emails[].sender.
Type
Data type.
Indexed values
Comma-separated list of legal values for String fields. When set, PDI applies the Kettle indexed data type.
Sample: array min: max index
Sampled minimum and maximum array indexes.
Sample: #occur/#docs
Sampled frequency and number of documents processed.
Sample: disparate types
Indicates if the same field appears with multiple data types across sampled documents. When Y, Type is set to String.
If you output a single JSON field, you can parse it using the JSON Input step or a User Defined Java Class step.
Examples
Query expression examples
MongoDB supports JSON-like query expressions. MongoDB’s Extended JSON documentation explains the broader query language.
{ name : "MongoDB" }
Queries documents where name equals MongoDB.
{ name : { '$regex' : "m.*", '$options' : "i" } }
Uses a regular expression to find name values starting with m (case-insensitive).
{ name : { '$gt' : "M" } }
Finds strings greater than M.
{ name : { '$lte' : "T" } }
Finds strings less than or equal to T.
{ name : { '$in' : [ "MongoDB", "MySQL" ] } }
Finds all names that are either MongoDB or MySQL.
{ name : { '$nin' : [ "MongoDB", "MySQL" ] } }
Finds names that are not MongoDB or MySQL, or where the field is not set.
{ created_at : { $gte : { $date : "2014-12-31T00:00:00.000Z" } } }
Finds documents where created_at is greater than or equal to the specified UTC date.
{ $where : "this.count == 1" }
Uses JavaScript to evaluate a condition.
{ $query: {}, $orderby: { age : -1 } }
Returns all documents sorted by age descending.
Aggregation pipeline examples
MongoDB supports an aggregation pipeline framework. See MongoDB’s Aggregation documentation for more examples.
{ $match : {state : "FL", city : "ORLANDO" } }, { $sort : {pop : -1 } }
Returns documents where state is FL and city is ORLANDO, sorted by pop descending.
{ $group : { _id: "$state"} }, { $sort : { _id : 1 } }
Returns distinct values for state (as _id) in ascending order.
{ $match : {state : "FL" } }, { $group: {_id: "$city" , pop: { $sum: "$pop" } } }, { $sort: { pop: -1 } }, { $project: {_id : 0, city : "$_id" } }
Filters to state = FL, aggregates population per city, sorts descending, and returns one field named city.
{ $unwind : "$result" }
Unwinds array elements and returns one document per array element.
Metadata injection support
All fields of this step support metadata injection. You can use it with ETL metadata injection to pass metadata to your transformation at runtime.
Last updated
Was this helpful?

