Chewy is ODM and wrapper for official elasticsearch client (https://github.com/elasticsearch/elasticsearch-ruby)
-
Multi-model indexes.
Index classes are independent from ORM/ODM models. Now implementing, e.g. cross-model autocomplete is much easier. You can just define index and work with it in object-oriented style. You can define several types for index - one per indexed model.
-
Every index is observable by all the related models.
Most of the indexed models are related to other and sometimes it is nessesary to denormalize this related data and put at the same object. For example, you need to index array of tags with article together. Chewy allows you to specify updatable index for every model separately. So, corresponding articles will be reindexed on any tag update.
-
Bulk import everywhere.
Chewy utilizes bulk ES API for full reindexing or index updates. Also it uses atomic updates concept. All the changed objects are collected inside the atomic block and index is updated once at the end of it with all the collected object. See
Chewy.atomic
for more details. -
Powerful querying DSL.
Chewy has AR-style query DSL. It is chainable, mergable and lazy. So you can produce queries in the most efficient way. Also it has object-oriented query and filter builders.
Add this line to your application's Gemfile:
gem 'chewy'
And then execute:
$ bundle
Or install it yourself as:
$ gem install chewy
There are 2 ways to configure Chewy client: Chewy.configuration
hash and chewy.yml
# config/initializers/chewy.rb
Chewy.configuration = {host: 'localhost:9250'} # do not use environments
# config/chewy.yml
# separate environment configs
test:
host: 'localhost:9250'
prefix: 'test'
development:
host: 'localhost:9250'
The result config merges both hashes. Client options are passed as is to Elasticsearch::Transport::Client except the :prefix
- it is used internally by chewy to create prefixed index names:
Chewy.configuration = {prefix: 'test'}
UsersIndex.index_name # => 'test_users'
Also logger might be set explicitly:
Chewy.logger = Logger.new
See config.rb for more details.
- Create
/app/chewy/users_index.rb
class UsersIndex < Chewy::Index
end
- Add one or more types mapping
class UsersIndex < Chewy::Index
define_type User.active # or just model instead_of scope: define_type User
end
Newly-defined index type class is accessible via UsersIndex.user
or UsersIndex::User
- Add some type mappings
class UsersIndex < Chewy::Index
define_type User.active.includes(:country, :badges, :projects) do
field :first_name, :last_name # multiple fields without additional options
field :email, analyzer: 'email' # elasticsearch-related options
field :country, value: ->(user) { user.country.name } # custom value proc
field :badges, value: ->(user) { user.badges.map(&:name) } # passing array values to index
field :projects do # the same block syntax for multi_field, if `:type` is specified
field :title
field :description # default data type is `string`
end
field :rating, type: 'integer' # custom data type
field :created, type: 'date', include_in_all: false,
value: ->{ created_at } # value proc for source object context
end
end
Mapping definitions - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
- Add some index- and type-related settings. Analyzers repositories might be used as well. See
Chewy::Index.settings
docs for details:
class UsersIndex < Chewy::Index
settings analysis: {
analyzer: {
email: {
tokenizer: 'keyword',
filter: ['lowercase']
}
}
}
define_type User.active.includes(:country, :badges, :projects) do
root date_detection: false do
template 'about_translations.*', type: 'string', analyzer: 'standard'
field :first_name, :last_name
field :email, analyzer: 'email'
field :country, value: ->(user) { user.country.name }
field :badges, value: ->(user) { user.badges.map(&:name) }
field :projects do
field :title
field :description
end
field :about_translations, type: 'object' # pass object type explicitely if necessary
field :rating, type: 'integer'
field :created, type: 'date', include_in_all: false,
value: ->{ created_at }
end
end
end
Index settings - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html Root object settings - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html
See mapping.rb for more details.
- Add model observing code
class User < ActiveRecord::Base
update_index('users#user') { self } # specifying index, type and backreference
# for updating after user save or destroy
end
class Country < ActiveRecord::Base
has_many :users
update_index('users#user') { users } # return single object or collection
end
class Project < ActiveRecord::Base
update_index('users#user') { user if user.active? } # you can return even `nil` from the backreference
end
class Badge < ActiveRecord::Base
has_and_belongs_to_many :users
update_index('users') { users } # if index has only one type
# there is no need to specify updated type
end
Also, you can use second argument for method name passing:
update_index('users#user', :self)
update_index('users#user', :users)
You are able to access index-defined types with the following API:
UsersIndex::User # => UsersIndex::User
UsersIndex.types_hash['user'] # => UsersIndex::User
UsersIndex.user # => UsersIndex::User
UsersIndex.types # => [UsersIndex::User]
UsersIndex.type_names # => ['user']
UsersIndex.delete # destroy index if exists
UsersIndex.delete!
UsersIndex.create
UsersIndex.create! # use bang or non-bang methods
UsersIndex.purge
UsersIndex.purge! # deletes then creates index
UsersIndex::User.import # import with 0 arguments process all the data specified in type definition
# literally, User.active.includes(:country, :badges, :projects).find_in_batches
UsersIndex::User.import User.where('rating > 100') # or import specified users scope
UsersIndex::User.import User.where('rating > 100').to_a # or import specified users array
UsersIndex::User.import [1, 2, 42] # pass even ids for import, it will be handled in the most effective way
UsersIndex.import # import every defined type
UsersIndex.import user: User.where('rating > 100') # import only active users to `user` type.
# Other index types, if exists, will be imported with default scope from the type definition.
UsersIndex.reset! # purges index and imports default data for all types
Also if passed user is #destroyed?
or #delete_from_index?
or specified id does not exists in the database, import will perform delete from index action for this object.
See actions.rb for more details.
There are 3 strategies for index updating: do not update index at all, update right after save and cumulative update. The first is by default.
WARN: It is preferred to use Chewy.atomic
block in most cases due to performance restrictions of the urgent updates!
By default Chewy indexes are not updated when the observed model is saved or destroyed.
This depends on the Chewy.urgent_update
(false by default) or on the per-model update config.
If you will perform Chewy.urgent_update = true
, all the models will start to update elasticsearch
index right after save. Also
class User < ActiveRecord::Base
update_index 'users#user', 'self', urgent: true
end
will make the same effect for User model only.
Note than urgent update options affects only outside-atomic-block behavour. Inside
the Chewy.atomic { }
block indexes updates as described below.
To perform atomic cummulative updates, use Chewy.atomic
:
Chewy.atomic do
user.each { |user| user.update_attributes(name: user.name.strip) }
end
Index update will be performed once per Chewy.atomic block for every affected type. This strategy is highly usable for rails actions:
class ApplicationController < ActionController::Base
around_action { |&block| Chewy.atomic(&block) }
end
Also atomic blocks might be nested and don't affect each other.
scope = UsersIndex.query(term: {name: 'foo'})
.filter(range: {rating: {gte: 100}})
.order(created: :desc)
.limit(20).offset(100)
scope.to_a # => will produce array of UserIndex::User or other types instances
scope.map { |user| user.email }
scope.total_count # => will return total objects count
scope.per(10).page(3) # supports kaminari pagination
scope.explain.map { |user| user._explanation }
scope.only(:id, :email) # returns ids and emails only
scope.merge(other_scope) # queries could be merged
Also, queries can be performed on a type individually
UsersIndex::User.filter(term: {name: 'foo'}) # will return UserIndex::User collection only
If you are performing more than one filter
or query
in the chain,
all the filters and queries will be concatenated in the way specified by
filter_mode
and query_mode
respectively.
Default filter_mode
is :and
and default query_mode
is bool
.
Available filter modes are: :and
, :or
, :must
, :should
and
any minimum_should_match-acceptable value
Available query modes are: :must
, :should
, :dis_max
, any
minimum_should_match-acceptable value or float value for dis_max
query with tie_breaker specified.
UsersIndex::User.filter{ name == 'Fred' }.filter{ age < 42 } # will be wrapped with `and` filter
UsersIndex::User.filter{ name == 'Fred' }.filter{ age < 42 }.filter_mode(:should) # will be wrapped with bool `should` filter
UsersIndex::User.filter{ name == 'Fred' }.filter{ age < 42 }.filter_mode('75%') # will be wrapped with bool `should` filter with `minimum_should_match: '75%'`
See query.rb for more details.
There is a test version of filters creating DSL:
UsersIndex.filter{ name == 'Fred' } # will produce `term` filter.
UsersIndex.filter{ age <= 42 } # will produce `range` filter.
The basis of the DSL is expression. There are 2 types of expressions:
-
Simple function
UsersIndex.filter{ s('doc["num"] > 1') } # script expression UsersIndex.filter{ q(query_string: {query: 'lazy fox'}) } # query expression
-
Field-dependant composite expression. Consists of the field name (with dot notation or not), value and action operator between them. Field name might take additional options for passing to the result expression.
UsersIndex.filter{ name == 'Name' } # simple field term filter UsersIndex.filter{ name(:bool) == ['Name1', 'Name2'] } # terms query with `execution: :bool` option passed UsersIndex.filter{ answers.title =~ /regexp/ } # regexp filter for `answers.title` field
You can combine expressions as you wish with combination operators help
UsersIndex.filter{ (name == 'Name') & (email == 'Email') } # combination produces `and` filter
UsersIndex.filter{
must(
should(name =~ 'Fr').should_not(name == 'Fred') & (age == 42), email =~ /gmail\.com/
) | ((roles.admin == true) & name?)
} # many of the combination possibilities
Also, there is a special syntax for cache enabling:
UsersIndex.filter{ ~name == 'Name' } # you can apply tilda to the field name
UsersIndex.filter{ ~(name == 'Name') } # or to the whole expression
# if you are applying cache to the one part of range filter
# the whole filter will be cached:
UsersIndex.filter{ ~(age > 42) & (age <= 50) }
# You can pass cache options as a field option also.
UsersIndex.filter{ name(cache: true) == 'Name' }
UsersIndex.filter{ name(cache: false) == 'Name' }
# With regexp filter you can pass _cache_key
UsersIndex.filter{ name(cache: 'name_regexp') =~ /Name/ }
# Or not
UsersIndex.filter{ name(cache: true) =~ /Name/ }
Compliance cheatsheet for filters and DSL expressions:
-
Term filter
{"term": {"name": "Fred"}} {"not": {"term": {"name": "Johny"}}}
UsersIndex.filter{ name == 'Fred' } UsersIndex.filter{ name != 'Johny' }
-
Terms filter
{"terms": {"name": ["Fred", "Johny"]}} {"not": {"terms": {"name": ["Fred", "Johny"]}}} {"terms": {"name": ["Fred", "Johny"], "execution": "or"}} {"terms": {"name": ["Fred", "Johny"], "execution": "and"}} {"terms": {"name": ["Fred", "Johny"], "execution": "bool"}} {"terms": {"name": ["Fred", "Johny"], "execution": "fielddata"}}
UsersIndex.filter{ name == ['Fred', 'Johny'] } UsersIndex.filter{ name != ['Fred', 'Johny'] } UsersIndex.filter{ name(:|) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:or) == ['Fred', 'Johny'] } UsersIndex.filter{ name(execution: :or) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:&) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:and) == ['Fred', 'Johny'] } UsersIndex.filter{ name(execution: :and) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:b) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:bool) == ['Fred', 'Johny'] } UsersIndex.filter{ name(execution: :bool) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:f) == ['Fred', 'Johny'] } UsersIndex.filter{ name(:fielddata) == ['Fred', 'Johny'] } UsersIndex.filter{ name(execution: :fielddata) == ['Fred', 'Johny'] }
-
Regexp filter (== and =~ are equivalent)
{"regexp": {"name.first": "s.*y"}} {"not": {"regexp": {"name.first": "s.*y"}}} {"regexp": {"name.first": {"value": "s.*y", "flags": "ANYSTRING|INTERSECTION"}}}
UsersIndex.filter{ name.first == /s.*y/ } UsersIndex.filter{ name.first =~ /s.*y/ } UsersIndex.filter{ name.first != /s.*y/ } UsersIndex.filter{ name.first !~ /s.*y/ } UsersIndex.filter{ name.first(:anystring, :intersection) == /s.*y/ } UsersIndex.filter{ name.first(flags: [:anystring, :intersection]) == /s.*y/ }
-
Prefix filter
{"prefix": {"name": "Fre"}} {"not": {"prefix": {"name": "Joh"}}}
UsersIndex.filter{ name =~ re' } UsersIndex.filter{ name !~ 'Joh' }
-
Exists filter
{"exists": {"field": "name"}}
UsersIndex.filter{ name? } UsersIndex.filter{ !!name } UsersIndex.filter{ !!name? } UsersIndex.filter{ name != nil } UsersIndex.filter{ !(name == nil) }
-
Missing filter
{"missing": {"field": "name", "existence": true, "null_value": false}} {"missing": {"field": "name", "existence": true, "null_value": true}} {"missing": {"field": "name", "existence": false, "null_value": true}}
UsersIndex.filter{ !name } UsersIndex.filter{ !name? } UsersIndex.filter{ name == nil }
-
Range
{"range": {"age": {"gt": 42}}} {"range": {"age": {"gte": 42}}} {"range": {"age": {"lt": 42}}} {"range": {"age": {"lte": 42}}} {"range": {"age": {"gt": 40, "lt": 50}}} {"range": {"age": {"gte": 40, "lte": 50}}} {"range": {"age": {"gt": 40, "lte": 50}}} {"range": {"age": {"gte": 40, "lt": 50}}}
UsersIndex.filter{ age > 42 } UsersIndex.filter{ age >= 42 } UsersIndex.filter{ age < 42 } UsersIndex.filter{ age <= 42 } UsersIndex.filter{ age == (40..50) } UsersIndex.filter{ (age > 40) & (age < 50) } UsersIndex.filter{ age == [40..50] } UsersIndex.filter{ (age >= 40) & (age <= 50) } UsersIndex.filter{ (age > 40) & (age <= 50) } UsersIndex.filter{ (age >= 40) & (age < 50) }
-
Bool filter
{"bool": { "must": [{"term": {"name": "Name"}}], "should": [{"term": {"age": 42}}, {"term": {"age": 45}}] }}
UsersIndex.filter{ must(name == 'Name').should(age == 42, age == 45) }
-
And filter
{"and": [{"term": {"name": "Name"}}, {"range": {"age": {"lt": 42}}}]}
UsersIndex.filter{ (name == 'Name') & (age < 42) }
-
Or filter
{"or": [{"term": {"name": "Name"}}, {"range": {"age": {"lt": 42}}}]}
UsersIndex.filter{ (name == 'Name') | (age < 42) }
{"not": {"term": {"name": "Name"}}} {"not": {"range": {"age": {"lt": 42}}}}
UsersIndex.filter{ !(name == 'Name') } # or UsersIndex.filter{ name != 'Name' } UsersIndex.filter{ !(age < 42) }
-
Match all filter
{"match_all": {}}
UsersIndex.filter{ match_all }
-
Has child filter
{"has_child": {"type": "blog_tag", "query": {"term": {"tag": "something"}}} {"has_child": {"type": "comment", "term": {"term": {"user": "john"}}}
UsersIndex.filter{ has_child(:blog_tag).query(term: {tag: 'something'}) } UsersIndex.filter{ has_child(:comment).filter{ user == 'john' } }
-
Has parent filter
{"has_parent": {"type": "blog", "query": {"term": {"tag": "something"}}}} {"has_parent": {"type": "blog", "filter": {"term": {"text": "bonsai three"}}}}
UsersIndex.filter{ has_parent(:blog).query(term: {tag: 'something'}) } UsersIndex.filter{ has_parent(:blog).filter{ text == 'bonsai three' } }
See filters.rb for more details.
Facets are an optional sidechannel you can request from elasticsearch describing certain fields of the resulting collection. The most common use for facets is to allow the user continue filtering specifically within the subset, as opposed to the global index.
For instance, let's request the country
field as a facet along with our users collection. We can do this with the #facets method like so:
UsersIndex.filter{ [...] }.facets({countries: {terms: {field: 'country'}}})
Let's look at what we asked from elasticsearch. The facets setter method accepts a hash. You can choose custom/semantic key names for this hash for your own convinience (in this case I used the plural version of the actual field), in our case: countries
. The following nested hash tells ES to grab and aggregate values (terms) from the country
field on our indexed records.
When the response comes back, it will have the :facets
sidechannel included:
< { ... ,"facets":{"countries":{"_type":"terms","missing":?,"total":?,"other":?,"terms":[{"term":"USA","count":?},{"term":"Brazil","count":?}, ...}}
It is possible to load source objects from database for every search result:
scope = UsersIndex.filter(range: {rating: {gte: 100}})
scope.load # => scope is marked to return User instances array
scope.load.query(...) # => since objects are loaded lazily you can complete scope
scope.load(user: { scope: ->{ includes(:country) }}) # you can also pass loading scopes for each
# possibly returned type
scope.load(user: { scope: User.includes(:country) }) # the second scope passing way.
scope.load(scope: ->{ includes(:country) }) # and more common scope applied to every loaded object type.
scope.only(:id).load # it is optimal to request ids only if you are not planning to use type objects
The preload
method takes the same options as load
and ORM/ODM objects will be loaded, but scope will still return array of Chewy wrappers. To access real objects use _object
wrapper method:
UsersIndex.filter(range: {rating: {gte: 100}}).preload(...).query(...).map(&:_object)
See loading.rb for more details.
Chewy has notifing the following events:
payload[:index]
: requested index classpayload[:request]
: request hash
-
payload[:type]
: currently imported type -
payload[:import]
: imports stast, total imported and deleted objects count:{index: 30, delete: 5}
-
payload[:erorrs]
: might not exists. Contains grouped errors with objects ids list:{index: { 'error 1 text' => ['1', '2', '3'], 'error 2 text' => ['4'] }, delete: { 'delete error text' => ['10', '12'] }}
To integrate with NewRelic you may use the following example source (config/initializers/chewy.rb):
ActiveSupport::Notifications.subscribe('import_objects.chewy') do |name, start, finish, id, payload|
metrics = "Database/ElasticSearch/import"
duration = (finish - start).to_f
logged = "#{payload[:type]} #{payload[:import].to_a.map{ |i| i.join(':') }.join(', ')}"
self.class.trace_execution_scoped([metrics]) do
NewRelic::Agent.instance.transaction_sampler.notice_sql(logged, nil, duration)
NewRelic::Agent.instance.sql_sampler.notice_sql(logged, metrics, nil, duration)
NewRelic::Agent.instance.stats_engine.record_metrics(metrics, duration)
end
end
ActiveSupport::Notifications.subscribe('search_query.chewy') do |name, start, finish, id, payload|
metrics = "Database/ElasticSearch/search"
duration = (finish - start).to_f
logged = "#{payload[:index]} #{payload[:request]}"
self.class.trace_execution_scoped([metrics]) do
NewRelic::Agent.instance.transaction_sampler.notice_sql(logged, nil, duration)
NewRelic::Agent.instance.sql_sampler.notice_sql(logged, metrics, nil, duration)
NewRelic::Agent.instance.stats_engine.record_metrics(metrics, duration)
end
end
Inside Rails application some index mantaining rake tasks are defined.
rake chewy:reset:all # resets all the existing indexes, declared in app/chewy
rake chewy:reset # alias for chewy:reset:all
rake chewy:reset[users] # resets UsersIndex
rake chewy:update:all # updates all the existing indexes, declared in app/chewy
rake chewy:update # alias for chewy:update:all
rake chewy:update[users] # updates UsersIndex
Just add require 'chewy/rspec'
to your spec_helper.rb and you will get additional features:
See update_index.rb for more details.
- Typecasting support
- Advanced (simplyfied) query DSL:
UsersIndex.query { email == '[email protected]' }
will produce term query - update_all support
- Other than ActiveRecord ORMs support (Mongoid)
- Maybe, closer ORM/ODM integration, creating index classes implicitly
- Fork it ( http://github.com/toptal/chewy/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Implement your changes, cover it with specs and make sure old specs are passing
- Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Use the following Rake tasks to control ElasticSearch cluster while developing.
rake elasticsearch:start # start Elasticsearch cluster on 9250 port for tests
rake elasticsearch:stop # stop Elasticsearch