Usage with Solr

Setup

Mongo Connector stores metadata in each document to help handle rollbacks. To support these data, you'll need to add the following to your schema.xml:

<field name="_ts" type="long" indexed="true" stored="true" />
<field name="ns" type="string" indexed="true" stored="true"/>

The Basics

Mongo Connector can replicate to the Solr search engine using the Solr DocManager. The most basic usage is the following:

mongo-connector -m localhost:27017 -t http://localhost:8983/solr -d solr_doc_manager

old usage (before 2.0 release):

mongo-connector -m localhost:27017 -t http://localhost:8983/solr -d <your-doc-manager-folder>/solr_doc_manager.py

This assumes there is a MongoDB replica set running on port 27017 and that Solr is running on port 8983 both on the local machine.

Mongo Connector and schema.xml

Additionally, Mongo Connector comes with an example schema.xml file that can help get you started integrating MongoDB with Solr search. Solr reads schema.xml in order to find field types, fields that documents may have, the primary key, and more. Mongo Connector will try to obtain the schema for Solr using the LukeRequestHandler at a special URI admin/luke/?show=schema&wt=json that is appended to the base Solr URL. So, in the example above, Mongo Connector will try to obtain the schema for Solr by sending a GET request to http://localhost:8983/solr/admin/luke/?show=schema&wt=json.

Mongo Connector will drop fields from MongoDB documents that aren't declared in your Solr core's schema in order to avoid Solr throwing exceptions and failing to insert those documents. If you don't define the fields you want in schema.xml and reload the Solr core, Mongo Connector will merrily continue stripping your MongoDB documents of the offending fields. You can check what Solr thinks the schema to your core is by visiting the aforementioned endpoint in your browser.

Unique Keys between Solr and MongoDB

MongoDB generally uses a field called _id to store unique keys in documents. Solr by default uses id for the same purpose. In both databases, these fields have mandatory presence in a document, so submitting a document unchanged from MongoDB to Solr while the unique key is still id will result in an exception from Solr, and the document will not be inserted. In order for Mongo Connector to replicate to Solr successfully, Solr needs to see the expected unique key in each document. There are two ways to do this:

Mongo Connector can translate _id to id when operations are replicated to Solr if you specify the option --unique-key=id to mongo-connector. The new id field will hold a string-ified version of what was stored in the _id field.
You can switch Solr's unique key to _id instead of id. If you're working from the schema.xml provided as part of Mongo Connector, this is already done for you! Otherwise, you can accomplish this by editing the schema.xml file and replacing the line:
```
 <uniqueKey>id</uniqueKey>
```
with the line:
```
 <uniqueKey>_id</uniqueKey>
```
You'll also need to add a field definition for this key. Inside the <fields></fields> tags, you should insert:
```
 <field name="_id" type="string" indexed="true" stored="true" />
```
Finally, you'll need to reload your Solr core.

Managing Commit Behavior

Mongo Connector does not force a commit on every write operation; rather, a Solr administrator should configure commit behavior in solrconfig.xml. This generally increases overall performance, since not every operation has to be flushed to disk immediately.

Mongo Connector also provides the --auto-commit-interval option to override any option set in solrconfig.xml, though the former should be preferred if possible. This option takes as an argument a number which is to be the maximum number of seconds allowed before a write must be committed. An argument of 0 means that every write operation is committed immediately:

# commit every write immediately
mongo-connector --auto-commit-interval=0 -d solr_doc_manager -t http://localhost:8983/solr

Provide feedback

Saved searches