Skip to content

Commit

Permalink
clean up cluster-level shard allocation and routing settings
Browse files Browse the repository at this point in the history
  • Loading branch information
shainaraskas committed Dec 30, 2024
1 parent b32842e commit a885a0f
Show file tree
Hide file tree
Showing 12 changed files with 74 additions and 74 deletions.
2 changes: 1 addition & 1 deletion docs/reference/cat/nodeattrs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ console. They are _not_ intended for use by applications. For application
consumption, use the <<cluster-nodes-info,nodes info API>>.
====

Returns information about <<shard-allocation-filtering,custom node attributes>>.
Returns information about <<custom-node-attributes,custom node attributes>>.

[[cat-nodeattrs-api-request]]
==== {api-request-title}
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ one of the following:
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes, all machine learning nodes, and all coordinating-only nodes.
* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
which adds to the subset all nodes with a custom node attribute whose name
which adds to the subset all nodes with a <<custom-node-attributes,custom node attribute>> whose name
and value match the respective patterns. Custom node attributes are
configured by setting properties in the configuration file of the form
`node.attr.attrname: attrvalue`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[[migrate-index-allocation-filters]]
== Migrate index allocation filters to node roles

If you currently use custom node attributes and
If you currently use <<custom-node-attributes,custom node attributes>> and
<<shard-allocation-filtering, attribute-based allocation filters>> to
move indices through <<data-tiers, data tiers>> in a
https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management[hot-warm-cold architecture],
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/data-store-architecture.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ The topics in this section provides information about the architecture of {es} a
* <<node-roles-overview,Node roles>>: Learn about the different roles that nodes can have in an {es} cluster.
* <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
* <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes.
** <<shard-allocation-awareness,Shard allocation awareness>>: Learn how to use custom node attributes to distribute shards across different racks or availability zones.
* <<shard-request-cache,Shard request cache>>: Learn how {es} caches search requests to improve performance.
--
include::nodes-shards.asciidoc[]
include::node-roles.asciidoc[]
include::docs/data-replication.asciidoc[leveloffset=-1]
include::modules/shard-ops.asciidoc[]
include::modules/cluster/allocation_awareness.asciidoc[leveloffset=+1]
include::shard-request-cache.asciidoc[leveloffset=-1]
2 changes: 1 addition & 1 deletion docs/reference/ilm/apis/migrate-to-data-tiers.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<titleabbrev>Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing</titleabbrev>
++++

Switches the indices, ILM policies, and legacy, composable and component templates from using custom node attributes and
Switches the indices, ILM policies, and legacy, composable and component templates from using <<custom-node-attributes,custom node attributes>> and
<<shard-allocation-filtering, attribute-based allocation filters>> to using <<data-tiers, data tiers>>, and
optionally deletes one legacy index template.
Using node roles enables {ilm-init} to <<data-tier-migration, automatically move the indices>> between
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/index-modules/allocation/filtering.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ a particular index. These per-index filters are applied in conjunction with
<<cluster-shard-allocation-filtering, cluster-wide allocation filtering>> and
<<shard-allocation-awareness, allocation awareness>>.

Shard allocation filters can be based on custom node attributes or the built-in
Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
on custom node attributes to determine how to reallocate shards when moving
Expand Down
18 changes: 17 additions & 1 deletion docs/reference/modules/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,23 @@ include::cluster/shards_allocation.asciidoc[]

include::cluster/disk_allocator.asciidoc[]

include::cluster/allocation_awareness.asciidoc[]
[[shard-allocation-awareness-settings]]
==== Shard allocation awareness settings

You can use <<custom-node-attributes,custom node attributes>> as _awareness attributes_ to enable {es}
to take your physical hardware configuration into account when allocating shards.
If {es} knows which nodes are on the same physical server, in the same rack, or
in the same zone, it can distribute the primary shard and its replica shards to
minimize the risk of losing all shard copies in the event of a failure. <<shard-allocation-awareness,Learn more about shard allocation awareness>>.

`cluster.routing.allocation.awareness.attributes`::
(<<dynamic-cluster-setting,Dynamic>>)
The node attributes that {es} should use as awareness attributes. For example, if you have a `rack_id` attribute that specifies the rack in which each node resides, you can set this setting to `rack_id` to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.

`cluster.routing.allocation.awareness.force.*`::
(<<dynamic-cluster-setting,Dynamic>>)
The shard allocation awareness values that must exist for shards to be reallocated in case of location failure. Learn more about <<forced-awareness,forced awareness>>.


include::cluster/allocation_filtering.asciidoc[]

Expand Down
15 changes: 5 additions & 10 deletions docs/reference/modules/cluster/allocation_awareness.asciidoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,13 @@
[[shard-allocation-awareness]]
==== Shard allocation awareness
== Shard allocation awareness

You can use custom node attributes as _awareness attributes_ to enable {es}
to take your physical hardware configuration into account when allocating shards.
If {es} knows which nodes are on the same physical server, in the same rack, or
in the same zone, it can distribute the primary shard and its replica shards to
minimize the risk of losing all shard copies in the event of a failure.

When shard allocation awareness is enabled with the
<<dynamic-cluster-setting,dynamic>>
`cluster.routing.allocation.awareness.attributes` setting, shards are only
allocated to nodes that have values set for the specified awareness attributes.
If you use multiple awareness attributes, {es} considers each attribute
separately when allocating shards.
When shard allocation awareness is enabled with the `cluster.routing.allocation.awareness.attributes` setting, shards are only allocated to nodes that have values set for the specified awareness attributes. If you use multiple awareness attributes, {es} considers each attribute separately when allocating shards.

NOTE: The number of attribute values determines how many shard copies are
allocated in each location. If the number of nodes in each location is
Expand All @@ -22,11 +17,11 @@ unassigned.
TIP: Learn more about <<high-availability-cluster-design-large-clusters,designing resilient clusters>>.

[[enabling-awareness]]
===== Enabling shard allocation awareness
=== Enabling shard allocation awareness

To enable shard allocation awareness:

. Specify the location of each node with a custom node attribute. For example,
. Specify the location of each node with a <<custom-node-attributes,custom node attribute>>. For example,
if you want Elasticsearch to distribute shards across different racks, you might
use an awareness attribute called `rack_id`.
+
Expand Down Expand Up @@ -94,7 +89,7 @@ copies of a particular shard from being allocated in the same location, you can
enable forced awareness.

[[forced-awareness]]
===== Forced awareness
=== Forced awareness

By default, if one location fails, {es} spreads its shards across the remaining
locations. This might be undesirable if the cluster does not have sufficient
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ allocates shards from any index. These cluster wide filters are applied in
conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
and <<shard-allocation-awareness, allocation awareness>>.

Shard allocation filters can be based on custom node attributes or the built-in
Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.

The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to
Expand Down
76 changes: 22 additions & 54 deletions docs/reference/modules/cluster/disk_allocator.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,23 @@ on the affected node drops below the high watermark, {es} automatically removes
the write block. Refer to <<fix-watermark-errors,Fix watermark errors>> to
resolve persistent watermark errors.

[NOTE]
.Max headroom settings
===================================================
Max headroom settings apply only when watermark settings are percentages or ratios.
A max headroom value is intended to cap the required free disk space before hitting
the respective watermark. This is useful for servers with larger disks, where a percentage or ratio watermark could translate to an overly large free disk space requirement. In this case, the max headroom can be used to cap the required free disk space amount.
For example, where `cluster.routing.allocation.disk.watermark.flood_stage` is 95% and `cluster.routing.allocation.disk.watermark.flood_stage.max_headroom` is 100GB, this means that:
* For a smaller disk, e.g., of 100GB, the flood watermark will hit at 95%, meaning at 5GB of free space, since 5GB is smaller than the 100GB max headroom value.
* For a larger disk, e.g., of 100TB, the flood watermark will hit at 100GB of free space. That is because the 95% flood watermark alone would require 5TB of free disk space, but is capped by the max headroom setting to 100GB.
Max headroom settings have their default values only if their respective watermark settings are not explicitly set. If watermarks are explicitly set, then the max headroom settings do not have their default values, and need to be explicitly set if they are needed.
===================================================

[[disk-based-shard-allocation-does-not-balance]]
[TIP]
====
Expand Down Expand Up @@ -100,18 +117,7 @@ is now `true`. The setting will be removed in a future release.
+
--
(<<dynamic-cluster-setting,Dynamic>>)
Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (`index.blocks.read_only_allow_delete`) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value.

An example of resetting the read-only index block on the `my-index-000001` index:

[source,console]
--------------------------------------------------
PUT /my-index-000001/_settings
{
"index.blocks.read_only_allow_delete": null
}
--------------------------------------------------
// TEST[setup:my_index]
Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (<<index-block-settings,`index.blocks.read_only_allow_delete`>>) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value.
--
// end::cluster-routing-flood-stage-tag[]

Expand All @@ -121,10 +127,10 @@ Defaults to 100GB when
`cluster.routing.allocation.disk.watermark.flood_stage` is not explicitly set.
This caps the amount of free space required.

NOTE: You cannot mix the usage of percentage/ratio values and byte values across
NOTE: You can't mix the usage of percentage/ratio values and byte values across
the `cluster.routing.allocation.disk.watermark.low`, `cluster.routing.allocation.disk.watermark.high`,
and `cluster.routing.allocation.disk.watermark.flood_stage` settings. Either all values
are set to percentage/ratio values, or all are set to byte values. This enforcement is
must be set to percentage/ratio values, or all must be set to byte values. This is required
so that {es} can validate that the settings are internally consistent, ensuring that the
low disk threshold is less than the high disk threshold, and the high disk threshold is
less than the flood stage threshold. A similar comparison check is done for the max
Expand All @@ -150,44 +156,6 @@ set. This caps the amount of free space required on dedicated frozen nodes.
cluster. Defaults to `30s`.

NOTE: Percentage values refer to used disk space, while byte values refer to
free disk space. This can be confusing, since it flips the meaning of high and
free disk space. This can be confusing, because it flips the meaning of high and
low. For example, it makes sense to set the low watermark to 10gb and the high
watermark to 5gb, but not the other way around.

An example of updating the low watermark to at least 100 gigabytes free, a high
watermark of at least 50 gigabytes free, and a flood stage watermark of 10
gigabytes free, and updating the information about the cluster every minute:

[source,console]
--------------------------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "100gb",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
"cluster.info.update.interval": "1m"
}
}
--------------------------------------------------

Concerning the max headroom settings for the watermarks, please note
that these apply only in the case that the watermark settings are percentages/ratios.
The aim of a max headroom value is to cap the required free disk space before hitting
the respective watermark. This is especially useful for servers with larger
disks, where a percentage/ratio watermark could translate to a big free disk space requirement,
and the max headroom can be used to cap the required free disk space amount.
As an example, let us take the default settings for the flood watermark.
It has a 95% default value, and the flood max headroom setting has a default value of 100GB.
This means that:

* For a smaller disk, e.g., of 100GB, the flood watermark will hit at 95%, meaning at 5GB
of free space, since 5GB is smaller than the 100GB max headroom value.
* For a larger disk, e.g., of 100TB, the flood watermark will hit at 100GB of free space.
That is because the 95% flood watermark alone would require 5TB of free disk space, but
that is capped by the max headroom setting to 100GB.

Finally, the max headroom settings have their default values only if their respective watermark
settings are not explicitly set (thus, they have their default percentage values).
If watermarks are explicitly set, then the max headroom settings do not have their default values,
and would need to be explicitly set if they are desired.
watermark to 5gb, but not the other way around.
21 changes: 20 additions & 1 deletion docs/reference/modules/node.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,25 @@ modify the contents of the data directory. The data directory contains no
executables so a virus scan will only find false positives.
// end::modules-node-data-path-warning-tag[]

[[custom-node-attributes]]
==== Custom node attributes

If needed, you can add custom attributes to a node. These attributes can be used to <<cluster-routing-settings,filter which nodes a shard can be allocated to>>, or to group nodes together for <<shard-allocation-awareness,shard allocation awareness>>.

[TIP]
===============================================
You can also set a node attribute using the `-E` command line argument when you start a node:
[source,sh]
--------------------------------------------------------
./bin/elasticsearch -Enode.attr.rack_id=rack_one
--------------------------------------------------------
===============================================

`node.attr.<attribute-name>`::
(<<dynamic-cluster-setting,Dynamic>>)
A custom attribute that you can assign to a node. For example, you might assign a `rack_id` attribute to each node to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.

[discrete]
[[other-node-settings]]
=== Other node settings
Expand All @@ -129,4 +148,4 @@ including:

* <<cluster-name,`cluster.name`>>
* <<node-name,`node.name`>>
* <<modules-network,network settings>>
* <<modules-network,network settings>>
4 changes: 2 additions & 2 deletions docs/reference/modules/shard-ops.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ By default, the primary and replica shard copies for an index can be allocated t

You can control how shard copies are allocated using the following settings:

- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to allocate nodes availability zones, or prevent certain nodes from being used so you can perform maintenance.
- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to <<shard-allocation-awareness,allocate nodes availability zones>>, or prevent certain nodes from being used so you can perform maintenance.

- <<index-modules-allocation,Index-level shard allocation settings>>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes.

Expand Down Expand Up @@ -80,4 +80,4 @@ When a shard copy is relocated, it is created as a new shard copy on the target

You can control how and when shard copies are relocated. For example, you can adjust the rebalancing settings that control when shard copies are relocated to balance the cluster, or the high watermark for disk-based shard allocation that can trigger relocation. These settings are part of the <<modules-cluster,cluster-level shard allocation settings>>.

Shard relocation operations also respect shard allocation and recovery settings.
Shard relocation operations also respect shard allocation and recovery settings.

0 comments on commit a885a0f

Please sign in to comment.