clean up cluster-level shard allocation and routing settings

shainaraskas · Dec 30, 2024 · a885a0f · a885a0f
1 parent b32842e
commit a885a0f
Show file tree

Hide file tree

Showing 12 changed files with 74 additions and 74 deletions.
diff --git a/docs/reference/cat/nodeattrs.asciidoc b/docs/reference/cat/nodeattrs.asciidoc
@@ -17,7 +17,7 @@ console. They are _not_ intended for use by applications. For application
 consumption, use the <<cluster-nodes-info,nodes info API>>.
 ====
 
-Returns information about <<shard-allocation-filtering,custom node attributes>>.
+Returns information about <<custom-node-attributes,custom node attributes>>.
 
 [[cat-nodeattrs-api-request]]
 ==== {api-request-title}

diff --git a/docs/reference/cluster.asciidoc b/docs/reference/cluster.asciidoc
@@ -35,7 +35,7 @@ one of the following:
   master-eligible nodes, all data nodes, all ingest nodes, all voting-only
   nodes, all machine learning nodes, and all coordinating-only nodes.
 * a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
-  which adds to the subset all nodes with a custom node attribute whose name
+  which adds to the subset all nodes with a <<custom-node-attributes,custom node attribute>> whose name
   and value match the respective patterns. Custom node attributes are
   configured by setting properties in the configuration file of the form
   `node.attr.attrname: attrvalue`.

diff --git a/docs/reference/data-management/migrate-index-allocation-filters.asciidoc b/docs/reference/data-management/migrate-index-allocation-filters.asciidoc
@@ -2,7 +2,7 @@
 [[migrate-index-allocation-filters]]
 == Migrate index allocation filters to node roles
 
-If you currently use custom node attributes and
+If you currently use <<custom-node-attributes,custom node attributes>> and
 <<shard-allocation-filtering, attribute-based allocation filters>> to
 move indices through <<data-tiers, data tiers>> in a
 https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management[hot-warm-cold architecture],

diff --git a/docs/reference/data-store-architecture.asciidoc b/docs/reference/data-store-architecture.asciidoc
@@ -12,11 +12,13 @@ The topics in this section provides information about the architecture of {es} a
 * <<node-roles-overview,Node roles>>: Learn about the different roles that nodes can have in an {es} cluster.
 * <<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
 * <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>: Learn how {es} allocates and balances shards across nodes.
+** <<shard-allocation-awareness,Shard allocation awareness>>: Learn how to use custom node attributes to distribute shards across different racks or availability zones.
 * <<shard-request-cache,Shard request cache>>: Learn how {es} caches search requests to improve performance.
 --
 
 include::nodes-shards.asciidoc[]
 include::node-roles.asciidoc[]
 include::docs/data-replication.asciidoc[leveloffset=-1]
 include::modules/shard-ops.asciidoc[]
+include::modules/cluster/allocation_awareness.asciidoc[leveloffset=+1]
 include::shard-request-cache.asciidoc[leveloffset=-1]
diff --git a/docs/reference/ilm/apis/migrate-to-data-tiers.asciidoc b/docs/reference/ilm/apis/migrate-to-data-tiers.asciidoc
@@ -5,7 +5,7 @@
 <titleabbrev>Migrate indices, ILM policies, and legacy, composable and component templates to data tiers routing</titleabbrev>
 ++++
 
-Switches the indices, ILM policies, and legacy, composable and component templates from using custom node attributes and
+Switches the indices, ILM policies, and legacy, composable and component templates from using <<custom-node-attributes,custom node attributes>> and
 <<shard-allocation-filtering, attribute-based allocation filters>> to using <<data-tiers, data tiers>>, and
 optionally deletes one legacy index template.
 Using node roles enables {ilm-init} to <<data-tier-migration, automatically move the indices>> between

diff --git a/docs/reference/index-modules/allocation/filtering.asciidoc b/docs/reference/index-modules/allocation/filtering.asciidoc
@@ -6,7 +6,7 @@ a particular index. These per-index filters are applied in conjunction with
 <<cluster-shard-allocation-filtering, cluster-wide allocation filtering>> and
 <<shard-allocation-awareness, allocation awareness>>.
 
-Shard allocation filters can be based on custom node attributes or the built-in
+Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
 `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
 attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
 on custom node attributes to determine how to reallocate shards when moving

diff --git a/docs/reference/modules/cluster.asciidoc b/docs/reference/modules/cluster.asciidoc
@@ -27,7 +27,23 @@ include::cluster/shards_allocation.asciidoc[]
 
 include::cluster/disk_allocator.asciidoc[]
 
-include::cluster/allocation_awareness.asciidoc[]
+[[shard-allocation-awareness-settings]]
+==== Shard allocation awareness settings
+
+You can use <<custom-node-attributes,custom node attributes>> as _awareness attributes_ to enable {es}
+to take your physical hardware configuration into account when allocating shards.
+If {es} knows which nodes are on the same physical server, in the same rack, or
+in the same zone, it can distribute the primary shard and its replica shards to
+minimize the risk of losing all shard copies in the event of a failure. <<shard-allocation-awareness,Learn more about shard allocation awareness>>.
+
+`cluster.routing.allocation.awareness.attributes`::
+      (<<dynamic-cluster-setting,Dynamic>>)
+      The node attributes that {es} should use as awareness attributes. For example, if you have a `rack_id` attribute that specifies the rack in which each node resides, you can set this setting to `rack_id` to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.
+
+`cluster.routing.allocation.awareness.force.*`:: 
+        (<<dynamic-cluster-setting,Dynamic>>)
+        The shard allocation awareness values that must exist for shards to be reallocated in case of location failure. Learn more about <<forced-awareness,forced awareness>>.
+
 
 include::cluster/allocation_filtering.asciidoc[]
 

diff --git a/docs/reference/modules/cluster/allocation_awareness.asciidoc b/docs/reference/modules/cluster/allocation_awareness.asciidoc
@@ -1,18 +1,13 @@
 [[shard-allocation-awareness]]
-==== Shard allocation awareness
+== Shard allocation awareness
 
 You can use custom node attributes as _awareness attributes_ to enable {es}
 to take your physical hardware configuration into account when allocating shards.
 If {es} knows which nodes are on the same physical server, in the same rack, or
 in the same zone, it can distribute the primary shard and its replica shards to
 minimize the risk of losing all shard copies in the event of a failure.
 
-When shard allocation awareness is enabled with the
-<<dynamic-cluster-setting,dynamic>>
-`cluster.routing.allocation.awareness.attributes` setting, shards are only
-allocated to nodes that have values set for the specified awareness attributes.
-If you use multiple awareness attributes, {es} considers each attribute
-separately when allocating shards.
+When shard allocation awareness is enabled with the `cluster.routing.allocation.awareness.attributes` setting, shards are only allocated to nodes that have values set for the specified awareness attributes. If you use multiple awareness attributes, {es} considers each attribute separately when allocating shards.
 
 NOTE: The number of attribute values determines how many shard copies are
 allocated in each location. If the number of nodes in each location is
@@ -22,11 +17,11 @@ unassigned.
 TIP: Learn more about <<high-availability-cluster-design-large-clusters,designing resilient clusters>>.
 
 [[enabling-awareness]]
-===== Enabling shard allocation awareness
+=== Enabling shard allocation awareness
 
 To enable shard allocation awareness:
 
-. Specify the location of each node with a custom node attribute. For example, 
+. Specify the location of each node with a <<custom-node-attributes,custom node attribute>>. For example, 
 if you want Elasticsearch to distribute shards across different racks, you might 
 use an awareness attribute called `rack_id`. 
 +
@@ -94,7 +89,7 @@ copies of a particular shard from being allocated in the same location, you can
 enable forced awareness.
 
 [[forced-awareness]]
-===== Forced awareness
+=== Forced awareness
 
 By default, if one location fails, {es} spreads its shards across the remaining
 locations. This might be undesirable if the cluster does not have sufficient

diff --git a/docs/reference/modules/cluster/allocation_filtering.asciidoc b/docs/reference/modules/cluster/allocation_filtering.asciidoc
@@ -6,7 +6,7 @@ allocates shards from any index. These cluster wide filters are applied in
 conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
 and <<shard-allocation-awareness, allocation awareness>>.
 
-Shard allocation filters can be based on custom node attributes or the built-in
+Shard allocation filters can be based on <<custom-node-attributes,custom node attributes>> or the built-in
 `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.
 
 The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to

diff --git a/docs/reference/modules/cluster/disk_allocator.asciidoc b/docs/reference/modules/cluster/disk_allocator.asciidoc
@@ -41,6 +41,23 @@ on the affected node drops below the high watermark, {es} automatically removes
 the write block. Refer to <<fix-watermark-errors,Fix watermark errors>> to 
 resolve persistent watermark errors.
 
+[NOTE]
+.Max headroom settings
+===================================================
+
+Max headroom settings apply only when watermark settings are percentages or ratios. 
+
+A max headroom value is intended to cap the required free disk space before hitting
+the respective watermark. This is useful for servers with larger disks, where a percentage or ratio watermark could translate to an overly large free disk space requirement. In this case, the max headroom can be used to cap the required free disk space amount.
+
+For example, where `cluster.routing.allocation.disk.watermark.flood_stage` is 95% and `cluster.routing.allocation.disk.watermark.flood_stage.max_headroom` is 100GB, this means that:
+
+* For a smaller disk, e.g., of 100GB, the flood watermark will hit at 95%, meaning at 5GB of free space, since 5GB is smaller than the 100GB max headroom value.
+* For a larger disk, e.g., of 100TB, the flood watermark will hit at 100GB of free space. That is because the 95% flood watermark alone would require 5TB of free disk space, but is capped by the max headroom setting to 100GB.
+
+Max headroom settings have their default values only if their respective watermark settings are not explicitly set. If watermarks are explicitly set, then the max headroom settings do not have their default values, and need to be explicitly set if they are needed.
+===================================================
+
 [[disk-based-shard-allocation-does-not-balance]]
 [TIP]
 ====
@@ -100,18 +117,7 @@ is now `true`. The setting will be removed in a future release.
 +
 --
 (<<dynamic-cluster-setting,Dynamic>>)
-Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (`index.blocks.read_only_allow_delete`) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value.
-
-An example of resetting the read-only index block on the `my-index-000001` index:
-
-[source,console]
---------------------------------------------------
-PUT /my-index-000001/_settings
-{
-  "index.blocks.read_only_allow_delete": null
-}
---------------------------------------------------
-// TEST[setup:my_index]
+Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (<<index-block-settings,`index.blocks.read_only_allow_delete`>>) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value.
 --
 // end::cluster-routing-flood-stage-tag[]
 
@@ -121,10 +127,10 @@ Defaults to 100GB when
 `cluster.routing.allocation.disk.watermark.flood_stage` is not explicitly set.
 This caps the amount of free space required.
 
-NOTE: You cannot mix the usage of percentage/ratio values and byte values across
+NOTE: You can't mix the usage of percentage/ratio values and byte values across
 the `cluster.routing.allocation.disk.watermark.low`, `cluster.routing.allocation.disk.watermark.high`,
 and `cluster.routing.allocation.disk.watermark.flood_stage` settings. Either all values
-are set to percentage/ratio values, or all are set to byte values. This enforcement is
+must be set to percentage/ratio values, or all must be set to byte values. This is required
 so that {es} can validate that the settings are internally consistent, ensuring that the
 low disk threshold is less than the high disk threshold, and the high disk threshold is
 less than the flood stage threshold. A similar comparison check is done for the max
@@ -150,44 +156,6 @@ set. This caps the amount of free space required on dedicated frozen nodes.
     cluster. Defaults to `30s`.
 
 NOTE: Percentage values refer to used disk space, while byte values refer to
-free disk space. This can be confusing, since it flips the meaning of high and
+free disk space. This can be confusing, because it flips the meaning of high and
 low. For example, it makes sense to set the low watermark to 10gb and the high
-watermark to 5gb, but not the other way around.
-
-An example of updating the low watermark to at least 100 gigabytes free, a high
-watermark of at least 50 gigabytes free, and a flood stage watermark of 10
-gigabytes free, and updating the information about the cluster every minute:
-
-[source,console]
---------------------------------------------------
-PUT _cluster/settings
-{
-  "persistent": {
-    "cluster.routing.allocation.disk.watermark.low": "100gb",
-    "cluster.routing.allocation.disk.watermark.high": "50gb",
-    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
-    "cluster.info.update.interval": "1m"
-  }
-}
---------------------------------------------------
-
-Concerning the max headroom settings for the watermarks, please note
-that these apply only in the case that the watermark settings are percentages/ratios.
-The aim of a max headroom value is to cap the required free disk space before hitting
-the respective watermark. This is especially useful for servers with larger
-disks, where a percentage/ratio watermark could translate to a big free disk space requirement,
-and the max headroom can be used to cap the required free disk space amount.
-As an example, let us take the default settings for the flood watermark.
-It has a 95% default value, and the flood max headroom setting has a default value of 100GB.
-This means that:
-
-* For a smaller disk, e.g., of 100GB, the flood watermark will hit at 95%, meaning at 5GB
-of free space, since 5GB is smaller than the 100GB max headroom value.
-* For a larger disk, e.g., of 100TB, the flood watermark will hit at 100GB of free space.
-That is because the 95% flood watermark alone would require 5TB of free disk space, but
-that is capped by the max headroom setting to 100GB.
-
-Finally, the max headroom settings have their default values only if their respective watermark
-settings are not explicitly set (thus, they have their default percentage values).
-If watermarks are explicitly set, then the max headroom settings do not have their default values,
-and would need to be explicitly set if they are desired.
+watermark to 5gb, but not the other way around.
diff --git a/docs/reference/modules/node.asciidoc b/docs/reference/modules/node.asciidoc
@@ -120,6 +120,25 @@ modify the contents of the data directory. The data directory contains no
 executables so a virus scan will only find false positives.
 // end::modules-node-data-path-warning-tag[]
 
+[[custom-node-attributes]]
+==== Custom node attributes
+
+If needed, you can add custom attributes to a node. These attributes can be used to <<cluster-routing-settings,filter which nodes a shard can be allocated to>>, or to group nodes together for <<shard-allocation-awareness,shard allocation awareness>>.
+
+[TIP]
+===============================================
+You can also set a node attribute using the `-E` command line argument when you start a node:
+
+[source,sh]
+--------------------------------------------------------
+./bin/elasticsearch -Enode.attr.rack_id=rack_one
+--------------------------------------------------------
+===============================================
+
+`node.attr.<attribute-name>`::
+      (<<dynamic-cluster-setting,Dynamic>>)
+      A custom attribute that you can assign to a node. For example, you might assign a `rack_id` attribute to each node to ensure that primary and replica shards are not allocated on the same rack. You can specify multiple attributes as a comma-separated list.
+
 [discrete]
 [[other-node-settings]]
 === Other node settings
@@ -129,4 +148,4 @@ including:
 
 * <<cluster-name,`cluster.name`>>
 * <<node-name,`node.name`>>
-* <<modules-network,network settings>>
+* <<modules-network,network settings>>
diff --git a/docs/reference/modules/shard-ops.asciidoc b/docs/reference/modules/shard-ops.asciidoc
@@ -25,7 +25,7 @@ By default, the primary and replica shard copies for an index can be allocated t
 
 You can control how shard copies are allocated using the following settings:
 
-- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to allocate nodes availability zones, or prevent certain nodes from being used so you can perform maintenance.
+- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to <<shard-allocation-awareness,allocate nodes availability zones>>, or prevent certain nodes from being used so you can perform maintenance.
 
 - <<index-modules-allocation,Index-level shard allocation settings>>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes.
 
@@ -80,4 +80,4 @@ When a shard copy is relocated, it is created as a new shard copy on the target
 
 You can control how and when shard copies are relocated. For example, you can adjust the rebalancing settings that control when shard copies are relocated to balance the cluster, or the high watermark for disk-based shard allocation that can trigger relocation. These settings are part of the <<modules-cluster,cluster-level shard allocation settings>>.
 
-Shard relocation operations also respect shard allocation and recovery settings. 
+Shard relocation operations also respect shard allocation and recovery settings.