-
Notifications
You must be signed in to change notification settings - Fork 478
System Overview
This page explains some of the mongo-connector internals from the perspective of running mongo-connector for the first time.
-
The main Connector thread determines the type of MongoDB node provided in the address given to it by issuing a
isdbgrid
command to the node. The response reveals whether the node is amongod
or amongos
, and thus whether the user intends to replicate from a replica set or sharded cluster, respectively. Based on this information, the Connector thread then creates an OplogThread for the primary in the replica set, or for the primary node in each shard within the sharded cluster administered by the mongos as necessary. It also initializes one or more DocManagers for each replication endpoint and provides these to the OplogThreads. -
An OplogThread creates a tailable cursor into the
oplog.rs
collection of themongod
. This collection is a running record of all operations that happen on that node. -
The OplogThread initiates a "collection dump," by which it upserts every document in the namespaces we're interested in through the specified DocManagers. These DocManagers pass on the documents to their respective target systems. The "collection dump" happens only the first time mongo-connector is started, and it does not happen again as long as mongo-connector can find the timestamp of the last oplog record it processed (more on this in the next step).
-
The OplogThread goes into a loop, efficiently polling the oplog for new documents. Each document corresponds to one operation and contains information such as time of the operation, namespace of the operation, what operation was performed, and which documents were affected. Based on the operation provided in the oplog document, the OplogThread calls the appropriate method of each DocManager. If the operation is an 'upsert', then the OplogThread retrieves the inserted or updated document from MongoDB, annotates it with the timestamp and namespace from the corresponding oplog document (in the
_ts
andns
fields, respectively), and passes the document along to the DocManager. Lastly, the OplogThread notes the timestamp from the oplog document it read and saves this information as its "checkpoint". The checkpoint acts like a bookmark, periodically written out to the oplog progress file ('config.txt' by default), and can be used to fast-forward to the proper place in the oplog if mongo-connector is shut-down.
The last step runs until mongo-connector is killed. Some events, such as temporarily losing a connection to MongoDB, replica set rollback, or falling very far behind in the oplog, can cause the OplogThread to take other actions. These actions will be covered in another page.