Skip to content
This repository has been archived by the owner on Sep 21, 2021. It is now read-only.

Commit

Permalink
Renamed all dirs/files to order them as in the book
Browse files Browse the repository at this point in the history
  • Loading branch information
clintongormley committed Jan 20, 2014
1 parent bab523c commit 10ae14f
Show file tree
Hide file tree
Showing 156 changed files with 356 additions and 567 deletions.
24 changes: 24 additions & 0 deletions 010_Intro.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[[intro]]
== You know, for Search...

include::010_Intro/00_Intro.asciidoc[]

include::010_Intro/05_What_is_it.asciidoc[]

include::010_Intro/10_Installing_ES.asciidoc[]

include::010_Intro/15_API.asciidoc[]

include::010_Intro/20_Document.asciidoc[]

include::010_Intro/25_CRUD.asciidoc[]

include::010_Intro/30_Search.asciidoc[]

include::010_Intro/35_Mapping.asciidoc[]

include::010_Intro/40_Multi_tenancy.asciidoc[]

include::010_Intro/45_Distributed.asciidoc[]

include::010_Intro/50_Conclusion.asciidoc[]
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ keep in mind that shards are how Elasticsearch manages a distributed environment

As you read through this book, you'll encounter supplemental chapters about the
distributed nature of Elasticsearch. These chapters will teach you about
how the cluster scales and deals with failover (<<_life_inside_a_cluster>>),
handles document storage (<<distributed-docs>>) and executes searches
(<<_distributed_search_execution>>).
how the cluster scales and deals with failover (<<distributed-cluster>>),
handles document storage (<<distributed-docs>>) and executes searches
(<<distributed-search>>).

These chapters aren't necessary for working with Elasticsearch, but will provide
helpful information that will make your knowledge of Elasticsearch more complete.
Feel free to skim them and revisit at a later point when you need a more
Feel free to skim them and revisit at a later point when you need a more
complete understanding.

File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
[[distributed-cluster]]
== Life inside a Cluster

Elasticsearch is built to be always available, and to scale with your needs.
Expand Down
39 changes: 20 additions & 19 deletions Distributed/Search.asciidoc → 020_Distributed/Search.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
[[distributed-search]]
== Distributed Search Execution

Before moving on, we are going to take a detour and talk about how search is
executed in a distributed environment. Unlike basic CRUD operations (as we
executed in a distributed environment. Unlike basic CRUD operations (as we
learned in <<distributed-docs>>), search is a little more complicated.

.Content warning
Expand All @@ -16,7 +17,7 @@ but don't be overwhelmed by the detail.
****

Search requires a more complicated execution model because we don't know what
documents will match your query. For simple CRUD operations, we know exactly
documents will match your query. For simple CRUD operations, we know exactly
what document we are operating on, and more importantly, where to find it.
Using the documents index, type and ID we can immediately determine what shard
the document lives in.
Expand Down Expand Up @@ -53,16 +54,16 @@ a large priority queue

When a search request is sent to a node, that node becomes the coordinating node.
It is the job of this node to merge, sort and return search results to the client.
The first thing the coordinating node does is rebroadcast the query to each
The first thing the coordinating node does is rebroadcast the query to each
shard in the index.

Each shard now evaluates the query against the documents that reside within that
shard. If a document matches, it is placed in a *priority queue*, which is just
a data structure that maintains the "top N documents" according to the scoring
metric.

The size of the queue is equivalent to your pagination parameters (`from +
size`). So if you query with `from: 90` and `size: 10`, each shard will build
The size of the queue is equivalent to your pagination parameters (`from +
size`). So if you query with `from: 90` and `size: 10`, each shard will build
a priority queue that is 100 documents long.

Once each shard has finished executing the query and has built a queue of
Expand All @@ -71,7 +72,7 @@ very lightweight: it is simply the document ID and the `_score` of each doc.

On the coordinating node, we take these candidate lists and merge them together
to form one large priority queue. Since we have three shards being
queried, and each shard constructed a queue that was 100 documents long, the
queried, and each shard constructed a queue that was 100 documents long, the
final merged queue will be 300 documents long.

That ends the query phase and we are left with a list of documents that match
Expand All @@ -88,7 +89,7 @@ This means that Elasticsearch will choose one shard from each replication group
(either the primary or one of it's replicas) and query that. The rest of the
query and fetch process is identical.
This is why replicas can help with search performance: they spread the query
This is why replicas can help with search performance: they spread the query
load amongst your cluster.
****

Expand All @@ -103,7 +104,7 @@ haven't extracted their source yet. This is the job of the fetch phase.
.Fetch Phase of distributed search
image::images/distributed_search_fetch.png["Fetch Phase of distributed search"]

1. The coordinating node identifies which documents need to be retrieved and
1. The coordinating node identifies which documents need to be retrieved and
issues a GET to shard that holds the required documents

2. Participating shards load the document and send back the source
Expand All @@ -125,7 +126,7 @@ will be sent back to the client.

=== Why two round-trips?
You may have noticed that the Query Then Fetch method requires two inter-cluster
round-trips. Would it be more efficient to just send back the document after
round-trips. Would it be more efficient to just send back the document after
the query phase? That would remove the need for an extra network round-trip.

A single round-trip method exists, called "Query And Fetch", but it
Expand All @@ -136,14 +137,14 @@ the coordinating node (resulting in a final queue size of 300).

If we remove the Fetch phase, we would be forced to load all 300 documents off
disk and send those over the wire to the coordinating node. But since our
user only wanted 10 results, we would throw away 290 documents that we
user only wanted 10 results, we would throw away 290 documents that we
painstakingly loaded from disk!

The overhead of an extra round-trip is often much less than loading a large
The overhead of an extra round-trip is often much less than loading a large
number of documents from disk...only to throw them away moments later.

Query And Fetch is sometimes used as an internal optimization. If Elasticsearch
recognizes that only a single shard is being queried, it will
recognizes that only a single shard is being queried, it will
execute everything in one pass since two phases are not needed.

=== Avoid deep pagination
Expand All @@ -161,7 +162,7 @@ both a CPU and memory perspective.
In practice, you don't really want or need deep pagination anyway. Most users
become frustrated after the second page of results...rarely does anyone scroll
to page ten-thousand. Even Google limits search results after a certain number
of pages.
of pages.

It is highly recommended to disable "infinite" paging, removing it completely
from your interface.
Expand All @@ -170,24 +171,24 @@ from your interface.
****
In addition to removing it from your interface, make sure it is limited in your
application too. Bots have no problem adjusting URLs and paging through your
entire data set.
entire data set.
Alas, Googlebot has been known to take down a cluster or two because of
Alas, Googlebot has been known to take down a cluster or two because of
enthusiastic pagination
****

=== Handling failure

As discussed in <<_life_inside_a_cluster>>, failure can strike your cluster. In
As discussed in <<distributed-cluster>>, failure can strike your cluster. In
our example we have three nodes and three primary shards...but no replicas. If
a machine were to catch on fire *right now*, you would lose some data.

Does this mean your cluster stops executing search requests until the data is
restored? Absolutely not! It does, however, mean that your search results will
be incomplete.
be incomplete.

The Query Then Fetch process will continue like normal, but the coordinating
node will make a note that one primary shard is not available for search.
The Query Then Fetch process will continue like normal, but the coordinating
node will make a note that one primary shard is not available for search.
Documents will be fetched, sorted and returned to the client. In the search
metadata you will notice that one of the shards is marked as "failed":

Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion Data/Document.asciidoc → 030_Data/05_Document.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ application is concerned, our documents live in an _index_ -- Elasticsearch
takes care of the details.

****
We will discuss how to create and manage indices ourselves in <<index-admin>>,
We will discuss how to create and manage indices ourselves in <<index-management>>,
but for now we will let Elasticsearch create the index for us. All we have
to do is to choose a name, which must be lower case, cannot begin with
an underscore and cannot contain commas, e.g. `website`.
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
27 changes: 27 additions & 0 deletions 030_Data_In_Data_Out.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[[data-in-data-out]]
== Data in, data out

include::030_Data/00_Intro.asciidoc[]

include::030_Data/05_Document.asciidoc[]

include::030_Data/10_Index.asciidoc[]

include::030_Data/15_Get.asciidoc[]

include::030_Data/20_Exists.asciidoc[]

include::030_Data/25_Update.asciidoc[]

include::030_Data/30_Create.asciidoc[]

include::030_Data/35_Delete.asciidoc[]

include::030_Data/40_Version_control.asciidoc[]

include::030_Data/45_Partial_update.asciidoc[]

include::030_Data/50_Mget.asciidoc[]

include::030_Data/55_Bulk.asciidoc[]

45 changes: 45 additions & 0 deletions 050_Search.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
[[search]]
== Searching – the basic tools

include::050_Search/00_Intro.asciidoc[]

include::050_Search/05_Empty_search.asciidoc[]

include::050_Search/10_Multi_index_multi_type.asciidoc[]

include::050_Search/15_Pagination.asciidoc[]

include::050_Search/20_Query_string.asciidoc[]

include::050_Search/25_Data_type_differences.asciidoc[]

include::050_Search/30_Exact_vs_full_text.asciidoc[]

include::050_Search/35_Inverted_index.asciidoc[]

include::050_Search/40_Analysis.asciidoc[]

include::050_Search/45_Mapping.asciidoc[]

include::050_Search/50_Complex_datatypes.asciidoc[]

include::050_Search/55_Request_body_search.asciidoc[]

include::050_Search/60_Query_DSL.asciidoc[]

include::050_Search/65_Queries_vs_filters.asciidoc[]

include::050_Search/70_Important_clauses.asciidoc[]

include::050_Search/75_Queries_with_filters.asciidoc[]

include::050_Search/80_Validating_queries.asciidoc[]

include::050_Search/85_Sorting.asciidoc[]

include::050_Search/90_What_is_relevance.asciidoc[]

include::050_Search/95_Fielddata.asciidoc[]

include::050_Search/99_Conclusion.asciidoc[]

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ However, request bodies can be quite long and may exceed the maximum
query string length which can be as low as 2,000 bytes.
====

We will talk about <<_highlighting_matches>>, <<aggregations>>, and
<<_did_you_mean>> suggestions in later chapters. For now, we're going to focus
We will talk about <<TODO,highlighting_matches>>, <<aggregations>>, and
<<TODO,did_you_mean>> suggestions in later chapters. For now, we're going to focus
just on the query.

Instead of the cryptic query-string approach, request body search allows us
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
44 changes: 44 additions & 0 deletions 070_Index_Mgmt.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
[[index-management]]
== Index Management

We have seen how Elasticsearch makes it easy to start developing a new
application without requiring any advance planning. However, it doesn't take
long before we need to fine-tune the indexing and search process
to better suit particular use cases.

Almost all of these customizations relate to the _index_, and the _types_
which it contains. In this chapter we will discuss the APIs
for managing indices and type _mappings_, and the most important settings.

include::070_Index_Mgmt/05_Create_Delete.asciidoc[]

include::070_Index_Mgmt/10_Settings.asciidoc[]

include::070_Index_Mgmt/15_Configure_Analyzer.asciidoc[]

include::070_Index_Mgmt/20_Custom_Analyzers.asciidoc[]

include::070_Index_Mgmt/25_Mappings.asciidoc[]

include::070_Index_Mgmt/30_Root_Object.asciidoc[]

include::070_Index_Mgmt/35_Dynamic_Mapping.asciidoc[]

include::070_Index_Mgmt/40_Custom_Dynamic_Mapping.asciidoc[]

include::070_Index_Mgmt/45_Default_Mapping.asciidoc[]

include::070_Index_Mgmt/50_Reindexing.asciidoc[]

include::070_Index_Mgmt/55_Aliases.asciidoc[]

include::070_Index_Mgmt/60_Reindex_Optimizations.asciidoc[]

include::070_Index_Mgmt/65_Conclusion.asciidoc[]







Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ action.auto_create_index: false


****
Later, we will discuss how you can use <<_index_templates_2>>
Later, we will discuss how you can use <<index-templates>>
to pre-configure automatically created indices. This is particularly
useful when indexing log data, allowing you to roll over to a new
automatically created index every day.
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ We will discuss other field types such as `multi_field`, `ip`, `geo_point`,
`geo_shape`, and `binary` in the appropriate sections later
in the book.

include::Metadata_source.asciidoc[]
include::31_Metadata_source.asciidoc[]

include::Metadata_all.asciidoc[]
include::32_Metadata_all.asciidoc[]

include::Metadata_ID.asciidoc[]
include::33_Metadata_ID.asciidoc[]

include::Metadata_Other.asciidoc[]
include::34_Metadata_Other.asciidoc[]
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ In <<pagination>> we said that deep paging in a distributed system is very
expensive and should be avoided. But in order to reindex all of our data,
we need to retrieve every document in the old index!

The costly part of deep pagination is the global sorting of results (see
<<_distributed_search_execution>> for more technical details). But if
The costly part of deep pagination is the global sorting of results (see
<<distributed-search>> for more technical details). But if
we disable sorting then we can return all documents quite cheaply. To do
this, we use a special search mode called `scan`.

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 10ae14f

Please sign in to comment.