-
Notifications
You must be signed in to change notification settings - Fork 478
Usage with Elasticsearch
NOTE: in mongo-connector versions < 2.3, the elastic doc manager was packaged as part of mongo-connector and only supports Elastic 1.x. In mongo-connector versions >= 2.3, the doc managers for Elastic 1.x and 2.x are available as plugins. For more information on how to install the elastic doc managers, please see the Elastic doc manager documentation for the version of Elastic you prefer. These doc managers will only work with mongo-connector 2.3.0+.
Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager
Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager
Once the Elastic doc manager of your choice is installed, the following applies for running them:
New in Mongo Connector 2.5.0
The install command is different depending on the version of Elasticsearch you are targeting.
New in elastic2-doc-manager 0.3.0, support for Elasticsearch 5.x. Install with pip install 'mongo-connector[elastic5]'
but continue to use the elastic2_doc_manager
as the doc manager module name
Elasticsearch Version | Install Command |
---|---|
Elasticsearch 1.x | pip install 'mongo-connector[elastic]' |
Amazon Elasticsearch 1.x Service | pip install 'mongo-connector[elastic-aws]' |
Elasticsearch 2.x | pip install 'mongo-connector[elastic2]' |
Amazon Elasticsearch 2.x Service | pip install 'mongo-connector[elastic2-aws]' |
Elasticsearch 5.x | pip install 'mongo-connector[elastic5]' |
Mongo Connector can replicate to Elasticsearch using the Elastic DocManager. The most basic usage is the following:
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager
Or, if you are using the Elastic2 DocManager:
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager
old usage (before 2.0 release):
mongo-connector -m localhost:27017 -t localhost:9200 -d <your-doc-manager-folder>/elastic_doc_manager.py
This assumes there is a MongoDB replica set running on port 27017 and that Elasticsearch is running on port 9200 both on the local machine.
Mongo Connector gives each MongoDB database its own index in Elasticsearch. Each MongoDB collection becomes its own mapping type. For example, documents from the collection kittens
in the database animals
will put into the animals
index in Elasticsearch with a mapping type of kittens
. Mongo Connector also stores metadata in another index called mongodb_meta
by default (this can be configured by setting meta_index_name
in the args
document in the doc manager config.
You can set up all the indexes you want in advance, or you can have Mongo Connector create them automatically for you. If you want Mongo Connector to be able to create indexes automatically, make sure that action.auto_create_index
is set to true
in your elasticsearch.yml
.
Elasticsearch, like MongoDB, supports geographical field types and queries. In order to make geo queries to Elasticsearch, you must set up a mapping on Elasticsearch manually, before running mongo-connector. The dynamic mappings that Mongo Connector creates on first insert are not enough for Elasticsearch to detect geo field types. Please refer to the Elasticsearch documentation on setting up geospatial mapping types for points and shapes.
New in Mongo Connector 2.0
Starting in version 2.0, Mongo Connector can replicate files stored in GridFS to Elasticsearch using the attachment
mapping type. In order for this to work, you need to do the following:
-
Install the attachment plugin.
-
Create the index where you will store your GridFS documents:
curl -XPUT http://localhost:9200/myindex
-
Create a mapping corresponding to the MongoDB collection where your GridFS files are stored, and add a field called
content
with a type ofattachment
. For example, if your GridFS files are in thefs
collection in MongoDB, you will want to create afs
mapping in Elasticsearch:curl -XPUT http://localhost:9200/myindex/collection.fs/_mapping -d'{ "fs": { "properties": { "content": {"type": "attachment"} }}}'
Mongo Connector does not force a refresh for every write operation. The Elasticsearch administrator may configure refresh behavior as increase overall performance. You can configure how often Elasticsearch indexes are refreshed either by changing the refresh_interval
in the index module settings or index settings.
Mongo Connector also provides the --auto-commit-interval
option to override any configuration in Elasticsearch, though configuring refresh behavior in Elasticsearch should be preferred to this option. This option takes as an argument a number which is to be the maximum number of seconds allowed before a write must be committed. An argument of 0
means that every write operation is committed immediately:
# commit every write immediately (this was the old behavior)
mongo-connector --auto-commit-interval=0 -d elastic_doc_manager -t localhost:9200
The Elastic DocManager wraps the Python Elastic client and allow you to pass arbitrary options to its constructor in the mongo-connector config file. The constructor options are passed in a JSON object under the key args.clientOptions
. For example, if you wish to set the timeout
option to 200:
...
"docManagers": [
{
"docManager": "elastic_doc_manager",
"targetURL": "localhost:9200",
"args": {
"clientOptions": {"timeout": 200}
}
}
]
...
This results in the Elastic client in elastic_doc_manager.py
being created as:
Elasticsearch(hosts=["localhost:9200"], timeout=200)
New in Elastic Doc Managers 0.3.0, support for connecting to multiple Elasticsearch hosts:
...
"docManagers": [
{
"docManager": "elastic2_doc_manager",
"targetURL": ["host1:9200", "host2:9200"],
"args": {
"clientOptions": {"timeout": 200}
}
}
]
...
This results in the Elastic client in elastic2_doc_manager.py
being created as:
Elasticsearch(hosts=["host1:9200", "host2:9200"], timeout=200)
You must install the Elastic DocManager with extra dependencies to use this feature:
pip install mongo-connector[elastic-aws]
Or, if you are using the Elastic2 DocManager:
pip install mongo-connector[elastic2-aws]
Both Elasticsearch 1.5.2 and 2.3 versions that, at the time of this writing, are provided by Amazon Elasticsearch Service through AWS are suported by elastic-doc-manager (for version 1.x) and elastic2-doc-manager (for 2.x), respectively. The args
option for each doc manager accepts an aws
object with a required region_name
key and, optionally, an aws_access_key_id
and aws_secret_access_key
and/or profile_name
corresponding to your AWS credentials. More specifically, a boto3.session.Session
is created with the keyword arguments from the aws
object.
Note: If aws_access_key_id
and aws_secret_access_key
are unspecified, the credentials found in the ~/.aws/credentials
file (if any) for the user running mongo-connector will be used instead, which may be the behavior that you want, particularly if you've already used AWS CLI.
The following is an example of using AWS Elasticsearch service running Elasticsearch 1.5.2 using elastic-doc-manager. The same args
apply to both elastic-doc-manager
and elastic2-doc-manager
.
...
"docManagers": [
{
"docManager": "elastic_doc_manager",
"targetURL": "https://search-my-domain-29tg824978g24924t42.us-east-1.es.amazonaws.com/",
"args": {
"aws": {
"region_name": "us-east-1",
"aws_access_key_id": "ACCESS_ID",
"aws_secret_access_key": "SECRET_KEY"
}
}
}
]
...
As stated above, the aws_access_key_id
and aws_secret_access_key
arguments are optional if you are using credentials stored in ~/.aws/credentials
for the user running the mongo-connector
instance.
This error will occur if mongo-connector makes a request to an Elastic index that doesn't exist. If action.auto_create_index
is true
in your elasticsearch.yml, then mongo-connector can create indexes automatically for you. However, you can also create all the indexes ahead of time yourself. You'll need to create:
- A
mongodb_meta
index (used internally by mongo-connector) - One index for each database being copied by mongo-connector
Do this before running mongo-connector to prevent getting a TransportError(404, u'index_not_found_exception')