-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy: coalesce identical Series() requests #4290
proxy: coalesce identical Series() requests #4290
Conversation
Deduplicate same Series() requests occurring at the same moment. This is pretty crucial because query-frontend caches depending on the query itself and sometimes dashboads such as node_exporter might ask for the same series but with different functions. In the popular node_exporter dashboard I can count at least 9 different queries that use the same `node_cpu_seconds_total` albeit with different label selectors & functions. My tests show with this improvement show that indeed there is a lot of overlap: ``` thanos_proxy_store_deduplicated_stream_requests_total 633 ``` After opening the node_exporter dashboard a few times. Signed-off-by: Giedrius Statkevičius <[email protected]>
90fa8c5
to
8d96c24
Compare
821c762
to
a42a3e0
Compare
Signed-off-by: Giedrius Statkevičius <[email protected]>
a42a3e0
to
2e0ef9f
Compare
Set aggregations to some default value if only 0s resolution data has been requested because it doesn't matter in this case what aggregations we're asking for since at this resolution we only have RAW data. This potentially can lead to even more savings. Signed-off-by: Giedrius Statkevičius <[email protected]>
The benchmarks aren't too happy:
Let me dig into this more. Any ideas are welcome or comments. EDIT: it doesn't seem that much can be done. I have compared CPU profiles of the benchmarks and the whole extra time seems to be spent on generating strings & locking/unlocking. Considering it's only a few milliseconds vs. potentially hundreds of them to perform a Series() call, I think this is still worthwhile. |
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this attempt. This is generally known as singleflight and there might be even some gRPC libraries for this.
At the end I would worry about real case:
- How often request are exactly the same? Often it's for different millisecond which already makes those "different"
- If we do this on querier side we are risking that the requests are already load-balanced among many queries.
I wonder, isn't that a good logic to add in query frontend which would allow us to plan ahead better? 🤔
Thank you for your comment. Thanks for pointing me to the exact phrase that I was looking for. https://pkg.go.dev/golang.org/x/sync/singleflight this library looks nice, I could reimplement this using that library if the whole approach here is OK to you. It seems some people are even recommending groupcache as a way to implement this https://www.reddit.com/r/golang/comments/98bzcr/team_exploring_how_to_cache_with_grpc_in_golang/e4g77i5/?utm_source=reddit&utm_medium=web2x&context=3. I'll try researching a bit more. We have an issue about it, maybe this part of code would be a good first user? :) However, let's discuss your other points. Let's take a real-life case such as the node_exporter dashboard which probably more or less everyone uses. It has a
Then there is the
Then
And so on. There are quite a few panels using When a dashboard gets loaded initially it's obviously the slowest but afterward asking for just the newest data with query-frontend is typically quite fast because it usually gets pulled from Prometheus instances.
Could you please elaborate on this? Even if the gRPC connections are load-balanced, it's still fine because we'd be sending one Series() call instead of 2 or more. I don't think we should do this in query-frontend because essentially the whole expression evaluation i.e. PromQL evaluation engine part would have to be pushed up to query-frontend, right? Unless I misunderstood you. In general, I don't think this is related to planning at all. Isn't planning about things such as QoS i.e. assigning some kind of priorities to queries coming from different tenants and so on? IMHO this is quite a simple optimization that has an orthogonal goal and can be in Thanos independently of query planning. |
@GiedriusS I am looking at the same use case. Have you tried this implementation already and does it help improve the query performance? Another question, when Grafana dashboard loads, does it cache the results of the same query or it just simply queries multiple times? |
Looks like this pr works for us and I can see the same |
Yes, I have tried it. I wouldn't raise a pull request if the code didn't work 😄 The performance question depends on the queries. If there are a lot of them using the same metrics then there is a huge performance win. This still needs a bit of work on the benchmarks. Probably will need to try out https://github.com/lestrrat-go/bufferpool 🤔 |
I have tried implementing this as a gRPC interceptor but doing this is not viable there because the first RecvMsg() returns so quickly and then the single-flight mechanism doesn't work properly at all. Here's my code: https://gist.github.com/GiedriusS/313b3be1543f543487df5f624f84d836. So, I would like to come back to this pull request very soon. |
…ame_series_requests Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I spent quite some time recently staring at the fan-out code recently and thought that implementing this would be a good idea: let's join multiple requests for the same data if they are being sent to the same StoreAPI nodes (:
Very often when a new dashboard gets opened, a lot of different queries come in where the same metric data is being used. Clicking around the classic
node_exporter
dashboard shows via the newly implemented metric that this assumption is quite true. It is very unlikely that the retrieved data from SeriesAPI nodes will different if the requests come a few milliseconds between them. So, I think that this is a reasonable optimization to make.Unfortunately, because upper layers assume sole ownership of the returned
storage.SeriesSet
it means that we have to make copies of the data before passing the data on. There are potential bigger savings here. Added aTODO
for that.Finally, I think that caching here is unnecessary because caching of the end-user responses is already performed by query-frontend.
Changes
Added a
broadcastingSeriesServer
that waits for a response on a query and then fans it out to all of the "listeners".Verification
Tests still pass