-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support token aware feature. #11
Comments
Sounds like a worthy idea. The token ownership in the cluster (which tokens belong to which node) can change over time, but it's still worthwhile to use even using slightly outdated information - because if we send a request to the "wrong" node it will still work just like it works today. We'll need to add a new request, like we have "/localnodes" today, to return the token ownership information. The client library would need to periodically refresh its knowledge. Shard awareness is also possible, similar to how we do it in CQL - the client can control which shard will receive its request by controlling the port number used for the outgoing request. There is one important implementation difficulty, though. The client library would need to parse and understand the request before routing it - determine if this is the type of request that accesses only a single item and which item this is. We'll probably need to modify/inject/monkey-patch the AWS library at different places than we do right now, and it might be hard to do in certain languages. But it might be worth a try. |
Token-ownership awareness is useful for every GetItem (and Query) and PutItem request to avoid an extra hop, but it is extra useful for PutItem that uses LWT to avoid scylladb/scylladb#16261: For LWT, the coordinator which receives the write is responsible for serializing different attempts to write to the same partition concurrently. If multiple coordinators receive writes to the same partition, they "contend" with each other and cause errors (scylladb/scylladb#16261) and slowness (scylladb/scylladb#13078). Instead of solving this somehow in Alternator (see also scylladb/scylladb#5703), we can fix in the the driver:
By the way, my previous comment mentioned the need to "parse and understand the request before routing it". Even better would be to get the request before it's serialized and not need to parse it. For example A PutItem call probably specifies the table name and partition key as separate parameters, and we shouldn't (if we can..) wait until we have the HTTP request as a string and then parse it again. CC @dkropachev |
In my previous comment I described a further optimization beyond token-aware load balancing, which is to pick not just one of RF replicas - but consistently pick the same one for a specific partition key. Let's call it the "consistent-coordinator" optimization. But it's important to note that there is a tradeoff between this consistent-coordinator optimization and rack-aware load balancing (issue #31). Reads and non-LWT writes are better done with rack-aware load balancing to reduce the cross-rack networking costs. But for LWT-using writes, there are two cases:
So if we implement this extra consistent-coordinator optimization, it should be made optional to enable, and either be made user-configurable (with documentation explaining when to enable it), or could in theory also be made automatic when we detect high rates of LWT contention. |
As scylla supports token aware and shard aware features in CQL driver, it maybe a good try to support such feature in alternator load balancing lib as well to achieve better performance?
The text was updated successfully, but these errors were encountered: