-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: improve efficiency of tablet filtering in go/vt/discovery
and topo
#16761
Comments
To expand on this point a bit, most of the supported KV stores support transactional KV writes. Using this functionality, this could be implemented as such:
So worst case (Step 3), 2 x list dir calls are added with the expectation this will be more beneficial than fetching every tablet alias |
I would note that this kind of topo work is often much more challenging than it would seem at first glance. I would recommend you have a PoC (proof of concept) PR to go along with any suggested implementations as this gives you a chance to flesh things out and see if it's possible, if it's actually any better, what challenges there are, any trade-offs, etc. |
I would also note that this ONLY applies to ZooKeeper as it does not support scans. Getting the tablets in one prefix scan — for the topo server implementations that support scans — was added in #9560 So it's probably also worth stepping back and clarifying the problem here. Why is doing 1 prefix scan per cell on consul here so problematic? IMO this is something we live with when using a KV store and trying to map custom index-like structures that we then have to manage ourselves on top of it is going to be problematic. |
@mattlord I think the issue is not with the prefix scan per cell, but the fact that after that operation, we do a single GET for each tablet from the topo, even though we are getting tablets that are not part of the shard the current tablet is in. if you consider 10k tablets, they would be doing 10k GETs each per minute, to a total of 100M RPCs per minute on the topo (plus the scans) |
@rvrangel / @mattlord yes, this is the concern. Another example, let's say we have Today:
So to put it more simply, filtering of tablets happens after topo gets, meaning we fetch all tablets from the topo when we don't need to |
@rvrangel I don't see where that happens here: vitess/go/vt/discovery/topology_watcher.go Lines 148 to 203 in da4b81d
Is there a test which demonstrates the problem we're trying to solve here? We could certainly do the cell based filtering earlier on. I don't see what other optimizations you have in mind here. And keep in mind that the proposed solution seems to involve MORE topo calls unless I'm misunderstanding something. |
My point is that the premise here seems somewhat flawed AFAICT. The premise seems to be that we get every tablet record individually, but I don't see that we do (with the exception of ZooKeeper, which it's worth noting is not fully supported). We do one index prefix scan per cell and then apply filters to the tablet records in the cell.
This is a limitation with K/V stores IMO. We can't pass predicates down to filter the records based on additional conditions. I think it's best to start with a very clear problem and a test case that demonstrates it. Then go from there. Then we can more clearly evaluate the current situation. From there, we can explore ideas which can be demonstrated to improve that specific scenario w/o negatively impacting others (since this is not a practical problem for most Vitess users AFAIK). Of course I could be missing or misunderstanding some things here, as always. As someone that's done a fair amount of Vitess Topology Server work, this statement |
As @mattlord already said, we don't necessarily do a GET for every tablet since v19 (#14693). It is still true that if you are using any of the available filters, we would fetch all the tablets and then filter them in vtgate memory. Beyond that I would dispute that usage of The problems you are seeing with the volume of GETs could possibly be mitigated by porting that PR into your fork. It does depend on prior work, so you would need that too. Caveat: it is possible for you to run into grpc limits though, at which point we'd fall back to one tablet at a time, completely negating the benefits of using |
go/vt/discovery
go/vt/topo
go/vt/topo
go/vt/discovery
and topo
RFC Description
Today
discovery.Healthcheck
(ingo/vt/discovery
- mainly used byvtgate
) supports filtering tablets watched by:--keyspaces_to_watch
(very common)--tablet-filter
on hostname--tablet-filter-tags
on tabletTags map[string]string
tagsBehind the scenes, this filtering happens in the tablet watcher's
loadTablets()
, but not very efficiently: first all tablets in the cell are grabbed from topo unconditionally, then the optional filtering of tablets occurs. At times this filtering excludes a significant amount of the topo KVs we fetched. More on "why" laterOn clusters with 1000s of tablets this becomes a scalability problem for the topology store that has to handle topo Get calls for all of the tablets fetched. In an extreme case, the
txthrottler
(which usesdiscovery.Healtcheck
to stream tablet stats) opens 1 x topology watcher per cell just to find tablets in it's local shard. Let's say we have a 3 x cell deployment with 1000 tablets per cell andtxthrottler
is running in a 3-tablet shard: this meanstxthrottler
will be reading 3000 topo KVs frequently just to find it's 2 other members, and this problem grows with number of tablets. In our production the problem is significantly largerNow why does tablet watcher fetch EVERY tablet from the cell? Today it kind of has to 🤷. Using a Consul topo as an example, tablets are stored by alias in paths named
/tablets/<tablet alias>/
and there is no efficient way to grab just 1 x keyspace or shard - you have to read everythingThere is a way to mitigate this inefficiency (but not resolve it):
--tablet_refresh_known_tablets=false
- this causesvtgate
to store tablet records it reads forever, which has it's own drawbacks and doesn't resolve the initial inefficient read of tablets for the entire cellThis issue is a RFC/feature request for a more efficient way to fetch topo records for a single shard and/or keyspace. Unfortunately improving the situation likely means a change to the layout of the topo
Some early ideas:
/keyspaces/<keyspace>/<shard>/<tablet>
that simply "point" to the actual/tablet/<alias>
record, kind of like an index<your idea here>
🙇
Use Case(s)
Large vitess deployments that use filtering (likely
--keyspaces_to_watch
) where rate of topo gets is a risk/concernThe text was updated successfully, but these errors were encountered: