- how to install cluster https://www.elastic.co/blog/running-elasticsearch-on-aws
- benchmarking https://www.elastic.co/blog/announcing-rally-benchmarking-for-elasticsearch
GET /my-index-000001/_count?q=user:kimchy
GET /my-index-000001/_count
{
"query" : {
"term" : { "user.id" : "kimchy" }
}
}
or search q
GET /my-index/_search
{"query":{"bool":{ "match_all": {} }},"size":0,"track_total_hits":true}
{"query":{"bool":{
"filter":[
{ "term": { "name": "niki" } }
]
}},"size":0,"track_total_hits":true}
note: set the size to 1 if you want sample doc in result
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
get CPU memory
GET /_nodes/stats/process
Filters are cached
term must be mapped to not_analyzed
why term query dont return any results and how filters are chached
- https://www.elastic.co/blog/customizing-your-document-routing
- https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html#mapping-routing-field
https://www.youtube.com/watch?v=YSd1aV3iVhM& https://www.youtube.com/watch?v=PgMtklprDfc https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html
Given articles
is name of the index
to check mapping
GET /articles/_mappings
post new index with desired mapping
PUT articles_v2
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
copy index over
POST _reindex
{
"source": {
"index": "articles"
},
"dest": {
"index": "articles_v2"
}
}
the Post request may time out but reindex is in progres. Took like 30 min to copy 13 mil simple indexes.
to check progres chec total num of documents in index
GET /articles_v2/_search
{"query":{
"match_all": {}
},"size":0,"track_total_hits":true}
a.k.a batch insert
def bulk_insert(articles)
Article.__elasticsearch__.client.bulk index: Article.index_name,
body: articles.map { |a| { index: { data: a.as_indexed_json } } },
refresh: true
end
bulk_insert(Article.last(1000))
-
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
-
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
- How to Reindex One Billion Documents in One Hour at SoundCloud
- Tune for indexing speededit elastic.co
you need to bulk insert data to ES
In order to better handle requests you want optimal primary shard number (but that's a static setting = once index is created cannot be changed)
Stuff you can do without creating new index (a.k.a Dynamic settings) In order to lower CPU & Memmory during bulk isert and therefore speed up data throughput to ES:
PUT /my-index-000001/_setting
{
"index" : {
"refresh_interval" : -1
}
}
when -1 is set, the index is not refreshed automatically default value is "1s" = 1 second refresh interval you can provide a value in seconds, e.g. "30s"
PUT /my-index-000001/_settings
{
"index" : {
"number_of_replicas" : 0
}
}
when 0 is set, no data is replicated to replica shards, set to a value > 0 to enable replication
PUT /my-index-000001/_settings
{
"index" : {
"translog": {
"durability": "async"
}
}
}
once bulk sync done set index.translog.durability to "request" to ensure that the translog is synced to disk after each request
PUT /my-index-000001/_settings
{
"index" : {
"translog": {
"flush_threshold_size": "2gb"
}
}
}
The translog stores all operations that are not yet safely persisted in Lucene (i.e., are not part of a Lucene commit point). Although these operations are available for reads, they will need to be replayed if the shard was stopped and had to be recovered. This setting controls the maximum total size of these operations, to prevent recoveries from taking too long. Once the maximum size has been reached a flush will happen, generating a new Lucene commit point. Defaults to 512mb
.
=========================================================================================================================================================
Total number of items in ES :
Elasticsearch::Model.client.count(index: Work.__elasticsearch__.index_name)['count']
Article.search("cats", search_type: 'count').results.total
Elasticsearch::Model.client.count(index: 'your_index_name_here')['count']
# match all - return all
{
query: {
"match_all": {}
}
}
seach priority:
# look at the ^num biger number bigger priority
query[:query][:filtered][:query] = {
multi_match: {
query: terms.join(" "),
fields: ['tags^4', 'school_title^5', 'title^3', 'description^1'],
type: "most_fields"
}
}
def self.search(query, options={})
__set_filters = lambda do |key, f|
@search_definition[:filter][:and] ||= []
@search_definition[:filter][:and] |= [f]
end
@search_definition = {
query: {},
filter: {},
}
unless query.blank?
@search_definition[:query] = {
bool: {
should: [
{ multi_match: {
query: query,
fields: ['title^10', 'body'],
operator: 'and',
analyzer: 'russian_morphology_custom'
}
}
]
}
}
@search_definition[:sort] = { updated_at: 'desc' }
# Without that parameter default is 10
@search_definition[:size] = 100
else
@search_definition[:query] = { match_all: {} }
@search_definition[:sort] = { updated_at: 'desc' }
end
__elasticsearch__.search(@search_definition)
end
#index one record
MyModel.last.__elasticsearch__.index_document
MyModel.import # reindex all
# only reindex some records
MyModel.import query: -> { where(id: MyModel.some_scope.pluck(:id)) }
MyModel.__elasticsearch__.create_index! # create document
MyModel.__elasticsearch__.delete_index! # delete document
# ...or
MyModel.__elasticsearch__.client.indices.delete index: MyModel.index_name
check version
curl localhost:9200
get everyting
curl -XGET elasticsearch:9200/
delete node
curl -XDELETE elasticsearch:9200/mymodel