Skip to content

Writing Your Own DocManager

Luke Lovett edited this page Apr 2, 2014 · 16 revisions

This page details how to write your own DocManager class, which allows you to replicate CRUD operations from MongoDB to another indexing system. The first step to creating your own DocManager is to create a new Python module my_doc_manager.py and define a class called DocManager:

# filename: my_doc_manager.py

class DocManager(object):
    '''DocManager that connects to MyIndexingSystem'''

    # methods will go here

There are a number of methods that need to be defined within this class. They are:

  1. __init__(self, url, auto_commit_interval=None, unique_key='_id', **kwargs)

    This is the contructor and should be used to do any setup your client needs in order to communicate with the target system. The parameters are:

    • url, the endpoint URL (this might have been provided to the -t option)
    • auto_commit_interval is the time period, in seconds, between when the DocManager should attempt to commit any outstanding changes to the indexing system. A value of None indicates that mongo-connector should not attempt to sync any changes automatically.
    • unique_key gives the unique key the DocManager should use in the target system for documents. The default is _id, the same unique key used by MongoDB.
  2. stop(self)

    This method is called to stop the DocManager. If you started any threads threads to take care of auto commit, for example, this is the place to join() them.

  3. upsert(self, doc)

    This should upsert (i.e., insert or write over) the document provided in the doc parameter. doc is the full document to be upserted.

  4. bulk_upsert(self, docs)

    This is used to insert documents in-bulk to the target system during a collection dump. This method is optional, and mongo-connector will fall back to calling upsert on each document serially if not provided. However, inserting documents in-bulk is a lot more efficient.

  5. remove(self, doc)

    This should remove the document doc from the external system. doc is the full document to be removed.

  6. search(self, start_ts, end_ts)

    This should provide an iterator over all documents whose timestamp stored in the _ts falls between start_ts and end_ts. This method is called when a MongoDB rollback occurs.

  7. commit(self)

    This method should commit any outstanding changes to the target system.

  8. get_last_doc(self)

    This should return the document most recently modified in the target system.

It might be helpful to see an example implementation of a DocManager. For this, we recommend taking a look at doc_manager_simulator.py, which is used in the test suite to mock replicating CRUD operations.

Clone this wiki locally