proxy: coalesce identical Series() requests #4290

GiedriusS · 2021-05-31T20:55:14Z

I spent quite some time recently staring at the fan-out code recently and thought that implementing this would be a good idea: let's join multiple requests for the same data if they are being sent to the same StoreAPI nodes (:

Very often when a new dashboard gets opened, a lot of different queries come in where the same metric data is being used. Clicking around the classic node_exporter dashboard shows via the newly implemented metric that this assumption is quite true. It is very unlikely that the retrieved data from SeriesAPI nodes will different if the requests come a few milliseconds between them. So, I think that this is a reasonable optimization to make.

Unfortunately, because upper layers assume sole ownership of the returned storage.SeriesSet it means that we have to make copies of the data before passing the data on. There are potential bigger savings here. Added a TODO for that.

Finally, I think that caching here is unnecessary because caching of the end-user responses is already performed by query-frontend.

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Added a broadcastingSeriesServer that waits for a response on a query and then fans it out to all of the "listeners".

Verification

Tests still pass

Deduplicate same Series() requests occurring at the same moment. This is pretty crucial because query-frontend caches depending on the query itself and sometimes dashboads such as node_exporter might ask for the same series but with different functions. In the popular node_exporter dashboard I can count at least 9 different queries that use the same `node_cpu_seconds_total` albeit with different label selectors & functions. My tests show with this improvement show that indeed there is a lot of overlap: ``` thanos_proxy_store_deduplicated_stream_requests_total 633 ``` After opening the node_exporter dashboard a few times. Signed-off-by: Giedrius Statkevičius <[email protected]>

pkg/store/proxy.go

Signed-off-by: Giedrius Statkevičius <[email protected]>

pkg/store/proxy.go

Set aggregations to some default value if only 0s resolution data has been requested because it doesn't matter in this case what aggregations we're asking for since at this resolution we only have RAW data. This potentially can lead to even more savings. Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS · 2021-06-01T10:49:31Z

The benchmarks aren't too happy:

benchmark                                                                                            old ns/op      new ns/op      delta
BenchmarkProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-16        1077205114     1256668659     +16.66%
BenchmarkProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-16       1023182953     1295766170     +26.64%
BenchmarkProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-16       106213070      129375959      +21.81%
BenchmarkProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-16     100188783      126916986      +26.68%
BenchmarkProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-16      37307          53264          +42.77%
BenchmarkProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-16     22434          38284          +70.65%

benchmark                                                                                            old allocs     new allocs     delta
BenchmarkProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-16        5298639        5299245        +0.01%
BenchmarkProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-16       6000189        6000345        +0.00%
BenchmarkProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-16       516583         520291         +0.72%
BenchmarkProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-16     540136         575343         +6.52%
BenchmarkProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-16      143            184            +28.67%
BenchmarkProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-16     90             128            +42.22%

benchmark                                                                                            old bytes     new bytes     delta
BenchmarkProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-16        213085944     258318696     +21.23%
BenchmarkProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-16       269202688     314400880     +16.79%
BenchmarkProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-16       20379208      25327196      +24.28%
BenchmarkProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-16     22262598      29730443      +33.54%
BenchmarkProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-16      8611          10207         +18.53%
BenchmarkProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-16     4271          5826          +36.41%

Let me dig into this more. Any ideas are welcome or comments.

EDIT: it doesn't seem that much can be done. I have compared CPU profiles of the benchmarks and the whole extra time seems to be spent on generating strings & locking/unlocking. Considering it's only a few milliseconds vs. potentially hundreds of them to perform a Series() call, I think this is still worthwhile.

Signed-off-by: Giedrius Statkevičius <[email protected]>

bwplotka

Thanks for this attempt. This is generally known as singleflight and there might be even some gRPC libraries for this.

At the end I would worry about real case:

How often request are exactly the same? Often it's for different millisecond which already makes those "different"
If we do this on querier side we are risking that the requests are already load-balanced among many queries.

I wonder, isn't that a good logic to add in query frontend which would allow us to plan ahead better? 🤔

GiedriusS · 2021-06-03T12:31:32Z

Thanks for this attempt. This is generally known as singleflight and there might be even some gRPC libraries for this.

At the end I would worry about real case:

How often request are exactly the same? Often it's for different millisecond which already makes those "different"

If we do this on querier side we are risking that the requests are already load-balanced among many queries.

I wonder, isn't that a good logic to add in query frontend which would allow us to plan ahead better?

Thank you for your comment. Thanks for pointing me to the exact phrase that I was looking for. https://pkg.go.dev/golang.org/x/sync/singleflight this library looks nice, I could reimplement this using that library if the whole approach here is OK to you.

It seems some people are even recommending groupcache as a way to implement this https://www.reddit.com/r/golang/comments/98bzcr/team_exploring_how_to_cache_with_grpc_in_golang/e4g77i5/?utm_source=reddit&utm_medium=web2x&context=3. I'll try researching a bit more. We have an issue about it, maybe this part of code would be a good first user? :) However, let's discuss your other points.

Let's take a real-life case such as the node_exporter dashboard which probably more or less everyone uses. It has a CPU Busy panel that uses a query such as:

(((count(count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu))) - avg(sum by (mode)(irate(node_cpu_seconds_total{mode='idle',instance=~"$node:$port",job=~"$job"}[5m])))) * 100) / count(count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu))

Then there is the CPU Cores panel:

count(count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu))

Then CPU System Load (1m):

avg(node_load1{instance=~"$node:$port",job=~"$job"}) /  count(count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu)) * 100

And so on. There are quite a few panels using node_cpu_seconds_total with identical matchers. So, once someone opens up a dashboard and even with query-frontend, we will happily go and pull all of those metrics a few times from remote object storage. This adds up when machines have 100+ cores. It's the same with some of the other metrics. I imagine the situation is quite similar in other cases.

When a dashboard gets loaded initially it's obviously the slowest but afterward asking for just the newest data with query-frontend is typically quite fast because it usually gets pulled from Prometheus instances.

If we do this on querier side we are risking that the requests are already load-balanced among many queries.

Could you please elaborate on this? Even if the gRPC connections are load-balanced, it's still fine because we'd be sending one Series() call instead of 2 or more.

I don't think we should do this in query-frontend because essentially the whole expression evaluation i.e. PromQL evaluation engine part would have to be pushed up to query-frontend, right? Unless I misunderstood you. In general, I don't think this is related to planning at all. Isn't planning about things such as QoS i.e. assigning some kind of priorities to queries coming from different tenants and so on?

IMHO this is quite a simple optimization that has an orthogonal goal and can be in Thanos independently of query planning.

yeya24 · 2021-07-01T06:33:42Z

@GiedriusS I am looking at the same use case. Have you tried this implementation already and does it help improve the query performance?

Another question, when Grafana dashboard loads, does it cache the results of the same query or it just simply queries multiple times?

yeya24 · 2021-07-11T17:43:02Z

Looks like this pr works for us and I can see the same Series requests are coalesced. This should solve #4407.

GiedriusS · 2021-07-12T07:34:22Z

Yes, I have tried it. I wouldn't raise a pull request if the code didn't work 😄 The performance question depends on the queries. If there are a lot of them using the same metrics then there is a huge performance win. This still needs a bit of work on the benchmarks. Probably will need to try out https://github.com/lestrrat-go/bufferpool 🤔

GiedriusS · 2021-07-12T07:35:03Z

#4379 is also an optimization that implements a similar thing. @yeya24 could you PTAL at it?

EDIT: we are working on groupcache so that PR is kind of pointless.

GiedriusS · 2021-09-09T16:26:18Z

I have tried implementing this as a gRPC interceptor but doing this is not viable there because the first RecvMsg() returns so quickly and then the single-flight mechanism doesn't work properly at all. Here's my code: https://gist.github.com/GiedriusS/313b3be1543f543487df5f624f84d836.

So, I would like to come back to this pull request very soon.

…ame_series_requests Signed-off-by: Giedrius Statkevičius <[email protected]>

Signed-off-by: Giedrius Statkevičius <[email protected]>

stale · 2021-11-25T21:09:49Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

GiedriusS force-pushed the feature/deduplicate_same_series_requests branch from 90fa8c5 to 8d96c24 Compare May 31, 2021 20:57

GiedriusS changed the title ~~Feature/deduplicate same series requests~~ proxy: coalesce identical Series() requests May 31, 2021

sonatype-lift bot reviewed May 31, 2021

View reviewed changes

pkg/store/proxy.go Outdated Show resolved Hide resolved

GiedriusS force-pushed the feature/deduplicate_same_series_requests branch 2 times, most recently from 821c762 to a42a3e0 Compare May 31, 2021 21:32

store: add fixes

2e0ef9f

Signed-off-by: Giedrius Statkevičius <[email protected]>

GiedriusS force-pushed the feature/deduplicate_same_series_requests branch from a42a3e0 to 2e0ef9f Compare May 31, 2021 21:36

GiedriusS marked this pull request as ready for review May 31, 2021 22:10

GiedriusS commented Jun 1, 2021

View reviewed changes

pkg/store/proxy.go Show resolved Hide resolved

GiedriusS added 3 commits June 2, 2021 15:46

proxy: use LRU, granular locks

7c1e086

Signed-off-by: Giedrius Statkevičius <[email protected]>

store: rename cacheKey

de61a4c

Signed-off-by: Giedrius Statkevičius <[email protected]>

proxy: simplify

a4f6eaa

Signed-off-by: Giedrius Statkevičius <[email protected]>

bwplotka reviewed Jun 2, 2021

View reviewed changes

GiedriusS added 2 commits September 9, 2021 21:57

Merge remote-tracking branch 'origin/main' into feature/deduplicate_s…

2fb9cad

…ame_series_requests Signed-off-by: Giedrius Statkevičius <[email protected]>

store: add better key matching logic

e39b97f

Signed-off-by: Giedrius Statkevičius <[email protected]>

stale bot added the stale label Nov 25, 2021

GiedriusS mentioned this pull request Jan 3, 2022

Querier can deduplicate select call #4407

Closed

stale bot closed this Jan 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy: coalesce identical Series() requests #4290

proxy: coalesce identical Series() requests #4290

GiedriusS commented May 31, 2021 •

edited

Loading

GiedriusS commented Jun 1, 2021 •

edited

Loading

bwplotka left a comment

GiedriusS commented Jun 3, 2021 •

edited

Loading

yeya24 commented Jul 1, 2021

yeya24 commented Jul 11, 2021

GiedriusS commented Jul 12, 2021

GiedriusS commented Jul 12, 2021 •

edited

Loading

GiedriusS commented Sep 9, 2021

stale bot commented Nov 25, 2021

proxy: coalesce identical Series() requests #4290

proxy: coalesce identical Series() requests #4290

Conversation

GiedriusS commented May 31, 2021 • edited Loading

Changes

Verification

GiedriusS commented Jun 1, 2021 • edited Loading

bwplotka left a comment

Choose a reason for hiding this comment

GiedriusS commented Jun 3, 2021 • edited Loading

yeya24 commented Jul 1, 2021

yeya24 commented Jul 11, 2021

GiedriusS commented Jul 12, 2021

GiedriusS commented Jul 12, 2021 • edited Loading

GiedriusS commented Sep 9, 2021

stale bot commented Nov 25, 2021

GiedriusS commented May 31, 2021 •

edited

Loading

GiedriusS commented Jun 1, 2021 •

edited

Loading

GiedriusS commented Jun 3, 2021 •

edited

Loading

GiedriusS commented Jul 12, 2021 •

edited

Loading