3.x: Add RandomTwoChoice load balancing policy #198
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
The RandomTwoChoice policy is a wrapper load balancing policy that adds the "Power of 2 Choice" algorithm to its child policy. It will compare the first two hosts returned from the child policy query plan, and will first return the host with the target shard having fewer inflight requests. The rest of the child query plan will be left intact.
It is intended to be used with TokenAwarePolicy and RoundRobinPolicy, to send queries to the replica nodes by always avoiding the worst option (the replica with the target shard having the most inflight requests will not be chosen).
The original problem
The current default load-balancing policy is the TokenAwarePolicy, which is able to fetch the replica set based on the prepared statement and iterate over the replicas in a round-robin style. This approach has a certain limitation when one of the replicas gets overloaded with background processing like compactions. In that case, the driver keeps sending queries to the overloaded replica which potentially increases the latencies and even queries end up being timed out.
The solution
Now, to avoid sending queries to replicas under high load, we need to collect metrics about the request queues of each replica and for each shard. The shard-awareness feature will allow us to identify the target shard for a prepared statement, and then choose a replica with the target shard having the least number of inflight requests (shortest request queue). It will be particularly effective for environments where latency to each replica may vary unpredictably. However, this may all fall apart if there are multiple clients, as momentarily queries from multiple clients may be sent to the same replica with the target shard having the least number of inflight requests, and this will cause unbalanced and unfair distribution. So, instead of making the absolute best choice, similar to the “power of two choices” algorithm. we pick two replicas at random and choose the better option from them, thus, always avoiding the worst option. It is effective with multiple clients, as it simply avoids the worst choice and distributes the traffic with some degree of randomness.
Benchmarks
i3.4xlarge
nodes with Scylla 5.1.3 (AMI-07d0ff1841b1a9d8a)c5n.9xlarge
loaders with Ubuntu 22.04 LTS (AMI-0ab0629dba5ae551d)./cassandra-stress read duration=150m -schema 'replication(strategy=SimpleStrategy, factor=3)' -mode native cql3 -rate "threads=500" -pop 'dist=UNIFORM(1..100000000)' -node NODE_IP
All the nodes and loaders were in
us-east-2a
To test how well the new load-balancing policy will perform in comparison with the TokenAwarePolicy, shard 4 (randomly chosen shard) of one of the nodes was overloaded with a simple C application that made the CPU core busy and thus increased the number of inflight requests for that specific shard. There were run 4 copies of the overloading application which used approximately 75% of the CPU core.
Additionally, data was collected about the inflight requests for all nodes and shards and how many times a certain node was chosen to sent a request to. Charts were generated based on this data which highlights the better performance of the RandomTwoChoicePolicy over the TokenAwarePolicy.
Results of the TokenAwarePolicy
Overview
Load and Throughput by instance,shard
Latencies by instance,shard
Inflight Requests and Host Chosen (for all nodes and shard 4)
Results of the RandomTwoChoicePolicy
Overview
Load and Throughput by instance,shard
Latencies by instance,shard
Inflight Requests and Host Chosen/Not Chosen (for all nodes and shard 4)
Conclusion
The benchmark with a node having a single shard overloaded shows that the RandomTwoChoice policy performs better in terms of throughput, latencies and smart distribution of the requests to the nodes. The replica with an overloaded shard was chosen less frequently in case of RandomTwoChoice policy as the number of inflight/hanging requests were higher. This resulted in small latencies by instance and shard comparing to the TokenAwarePolicy where the P95 and P99 latencies were almost 300ms. Also the overall throughput was significantly higher and the loaders were able to maximize their use of CPU cores. However, the RandomTwoChoice policy may happen to be less efficient for write workloads, as writes result in a lot of background processing like compaction, hints, etc. So, as a result, the lattencies may rapidly increase and even cause frequent timeouts. In fact, the average and P95 latencies were not so different, but the P99 latency was significantly higher for the RandomTwoChoice policy. The throughput however, was more or less the same in comparision with the TokenAwarePolicy, so perhaps benchmarking with write workloads without throttling the throughput will not show significant differences as the cluster gets easily overloaded with background processings resulting in skyrocketing latencies, hints, background and foreground writes, etc:
Cluster Throughput (RandomTwoChoice vs TokenAwarePolicy)
Cluster Latencies (RandomTwoChoice vs TokenAwarePolicy)