-
Notifications
You must be signed in to change notification settings - Fork 478
Writing Your Own DocManager
This page details how to write your own DocManager
class, which allows you to replicate CRUD operations from MongoDB to another indexing system. The first step to creating your own DocManager is to create a new Python module my_doc_manager.py
and define a class called DocManager
:
# filename: my_doc_manager.py
class DocManager(object):
'''DocManager that connects to MyIndexingSystem'''
# methods will go here
There are a number of methods that need to be defined within this class. They are:
-
__init__(self, url, auto_commit_interval=None, unique_key='_id', **kwargs)
This is the contructor and should be used to do any setup your client needs in order to communicate with the target system. The parameters are:
-
url
, the endpoint URL (this might have been provided to the-t
option) -
auto_commit_interval
is the time period, in seconds, between when the DocManager should attempt to commit any outstanding changes to the indexing system. A value ofNone
indicates that mongo-connector should not attempt to sync any changes automatically. -
unique_key
gives the unique key the DocManager should use in the target system for documents. The default is_id
, the same unique key used by MongoDB.
-
-
stop(self)
This method is called to stop the DocManager. If you started any threads threads to take care of auto commit, for example, this is the place to
join()
them. -
upsert(self, doc)
This should upsert (i.e., insert or write over) the document provided in the
doc
parameter.doc
is the full document to be upserted. -
bulk_upsert(self, docs)
This is used to insert documents in-bulk to the target system during a collection dump. This method is optional, and mongo-connector will fall back to calling
upsert
on each document serially if not provided. However, inserting documents in-bulk is a lot more efficient. -
remove(self, doc)
This should remove the document
doc
from the external system.doc
is the full document to be removed. -
search(self, start_ts, end_ts)
This should provide an iterator over all documents whose timestamp stored in the
_ts
falls betweenstart_ts
andend_ts
. This method is called when a MongoDB rollback occurs. -
commit(self)
This method should commit any outstanding changes to the target system.
-
get_last_doc(self)
This should return the document most recently modified in the target system.
It might be helpful to see an example implementation of a DocManager
. For this, we recommend taking a look at doc_manager_simulator.py
, which is used in the test suite to mock replicating CRUD operations.