Optimize primary shard balance #17373
Labels
enhancement
Enhancement or improvement to existing feature or request
ShardManagement:Placement
untriaged
Is your feature request related to a problem? Please describe
Today, we utilize the allocation constraint mechanism to achieve primary shard balance. One constraint is known as
AllocationConstraints
, which is applied during shard allocation. Another constraint isRebalanceConstraints
, which comes into play during shard rebalance. It is worth noting that both of these constraints currently utilize the same predicates in their implementation, which may cause uneven shards distribution.OpenSearch/server/src/main/java/org/opensearch/cluster/routing/allocation/ConstraintTypes.java
Lines 69 to 88 in 56825f6
For example, suppose we have a cluster withe 3 data nodes, and create a index with 10 shards, 1 replica. A possible shards distribution shown below.
As you can see, Node1 and Node2 have double shards than Node2. This is because Node2 is assigned a high weight during rebalance. When evaluating the
isPerIndexPrimaryShardsPerNodeBreached
predicate, theperIndexPrimaryShardCount
of Node2 is 4, and theperIndexAllowedPrimaryShardCount
is 4 (Math.ceil(10/3)
), so the constraint is breached and a high weight is assigned. And theisPrimaryShardsPerNodeBreached
predicate is not breached thanks to the exist of the parameterbuffer
. However, the parameterbuffer
works as a proportion of theavgPrimaryShardsPerNode
, which may also cause uneven distribution when theavgPrimaryShardsPerNode
is large.Describe the solution you'd like
avgPrimaryShardsPerNode
equalsperIndexPrimaryShardCount
, we should treat theisPerIndexPrimaryShardsPerNodeBreached
as not breached during rebalance. This is because it is bound to happen when the number of shards is not divisible by the number of nodes, and this distribution is in fact balanced. Which is addressed in Using gt to make constraint test during rebalance #17324. When the change is applied, the distribution in the above example becomes as follows.buffer
should be an integer that stays constant as the number of shards in the cluster increases.Related component
ShardManagement:Placement
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: