Support token aware feature. #11

congguosn · 2022-11-14T10:20:53Z

As scylla supports token aware and shard aware features in CQL driver, it maybe a good try to support such feature in alternator load balancing lib as well to achieve better performance?

nyh · 2022-11-14T10:44:15Z

Sounds like a worthy idea.

The token ownership in the cluster (which tokens belong to which node) can change over time, but it's still worthwhile to use even using slightly outdated information - because if we send a request to the "wrong" node it will still work just like it works today.

We'll need to add a new request, like we have "/localnodes" today, to return the token ownership information. The client library would need to periodically refresh its knowledge.

Shard awareness is also possible, similar to how we do it in CQL - the client can control which shard will receive its request by controlling the port number used for the outgoing request.

There is one important implementation difficulty, though. The client library would need to parse and understand the request before routing it - determine if this is the type of request that accesses only a single item and which item this is. We'll probably need to modify/inject/monkey-patch the AWS library at different places than we do right now, and it might be hard to do in certain languages. But it might be worth a try.

nyh · 2024-11-07T14:39:00Z

Token-ownership awareness is useful for every GetItem (and Query) and PutItem request to avoid an extra hop, but it is extra useful for PutItem that uses LWT to avoid scylladb/scylladb#16261:

For LWT, the coordinator which receives the write is responsible for serializing different attempts to write to the same partition concurrently. If multiple coordinators receive writes to the same partition, they "contend" with each other and cause errors (scylladb/scylladb#16261) and slowness (scylladb/scylladb#13078). Instead of solving this somehow in Alternator (see also scylladb/scylladb#5703), we can fix in the the driver:

As for non-LWT PutItem, we send the request to one of RF replicas instead of to a random now, which already lowers the chance of contention.
Even better, specifically for LWT PutItem, we should send the request to a specific coordinator. We could always send it to the "primary owner" (first on the replica list) of this token, but even better - look at token % RF and if the result is x, take the x'th replica for this token. All clients will use the same algorithm so will send a write to the same partition to the same node - but all RF nodes will be sent requests for different tokens.
Obviously, even if we "prefer" a specific coordinator, we should try the next one (in cyclic order) if the preferred coordinator is down.

By the way, my previous comment mentioned the need to "parse and understand the request before routing it". Even better would be to get the request before it's serialized and not need to parse it. For example A PutItem call probably specifies the table name and partition key as separate parameters, and we shouldn't (if we can..) wait until we have the HTTP request as a string and then parse it again.

CC @dkropachev

nyh · 2024-11-24T10:00:11Z

In my previous comment I described a further optimization beyond token-aware load balancing, which is to pick not just one of RF replicas - but consistently pick the same one for a specific partition key. Let's call it the "consistent-coordinator" optimization.

But it's important to note that there is a tradeoff between this consistent-coordinator optimization and rack-aware load balancing (issue #31). Reads and non-LWT writes are better done with rack-aware load balancing to reduce the cross-rack networking costs. But for LWT-using writes, there are two cases:

If partition contention (two concurrent writes to the same partition) is known to be high, the consistent-coordinator optimization is better. It will avoid two different coordinators getting LWT writes and causing the slowness of LWT: when updating the same partition concurrently, some requests immediately return a timeout. scylladb#16261, High latency for concurrent LWT updates to the same partition scylladb#13078.
However, if the likelihood of contention is known to be low, the consistent-coordinator optimization is not helpful and will just cause more cross-rack network traffic that the rack-aware load balancing avoids.

So if we implement this extra consistent-coordinator optimization, it should be made optional to enable, and either be made user-configurable (with documentation explaining when to enable it), or could in theory also be made automatic when we detect high rates of LWT contention.

nyh added the enhancement New feature or request label Nov 14, 2022

nyh mentioned this issue Oct 1, 2024

Rack-aware load balancing #31

Open

nyh mentioned this issue Nov 7, 2024

alternator: forward LWT write to the right node scylladb/scylladb#5703

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support token aware feature. #11

Support token aware feature. #11

congguosn commented Nov 14, 2022

nyh commented Nov 14, 2022

nyh commented Nov 7, 2024

nyh commented Nov 24, 2024

Support token aware feature. #11

Support token aware feature. #11

Comments

congguosn commented Nov 14, 2022

nyh commented Nov 14, 2022

nyh commented Nov 7, 2024

nyh commented Nov 24, 2024