Summaries are getting slower and using more and more memory in long term #88

pelt · 2023-04-21T08:16:04Z

The underlying quantile package seems to getting slower and using more memory if configured with more than one invariants which is the default in both, aioprometheus and quantile. This is getting an issue for long-running services which gather millions of measurements for one summary metric. The response time for premetheus can increase to over one second and more.

A current workaround is to use exactly one invariant/quantile (if it's feasible for your use cases) so that this issue is not triggered within the quantile package.

JacobHenner · 2023-09-05T18:28:26Z

The underlying quantile package seems to getting slower and using more memory if configured with more than one invariants which is the default in both, aioprometheus and quantile.

How was this assessed? Do you have a reproducer?

Is this an issue with how aioprometheus uses the quantile library? Or is there a bug in the quantile library itself?

Why is the use of one invariant vs > 1 invariant relevant?

JacobHenner · 2023-09-07T00:39:34Z

I've been experimenting with this - I think there's either a bug (or deliberate difference) in the quantile library compared to the quantile libraries used in other language prometheus client libraries.

To test this theory, I calculated the (0.5, .005),(0.90,0.001),(0.99, 0.0001) quantiles for one million random integers in range [0,10) using both https://github.com/matttproud/python_quantile_estimation (Python, used here) and github.com/beorn7/perks/ (Go, used in the official Go prometheus client library).

The Go implementation took ~1.5 seconds to run and maintained ~1250 samples. Using the same class of input, the Python implementation maintained 104659 samples and took ~192 seconds. Both libraries claim to use the same algorithm from paper Effective Computation of Biased Quantiles over Data Streams.

I don't believe this is an issue specific to aioprometheus. However, one thing to note is that other prometheus client libraries (including Java, Go) implement sliding windows for Summaries. If I understand correctly, having sliding windows in aioprometheus's Summary implementation would provide an upper limit on how many samples would be retained (the maximum number of observations logged within the window). Perhaps supporting sliding windows should be considered. It looks like someone has tried: https://github.com/RefaceAI/aioprometheus-summary/blob/main/aioprometheus_summary/__init__.py

JacobHenner · 2023-09-11T02:22:31Z

The Go implementation took ~1.5 seconds to run and maintained ~1250 samples. Using the same class of input, the Python implementation maintained 104659 samples and took ~192 seconds. Both libraries claim to use the same algorithm from paper Effective Computation of Biased Quantiles over Data Streams.

I've written my own implementation, inspired by the Go implementation. It's performance and memory utilization is much better, and it passes the Go implementation's tests. I hope to release it publicly sometime soon.

alfiedotwtf · 2024-06-18T11:22:28Z

Hey Jacob. Did you manage to release the faster implementation? Having a look at the source, it seems that it's still using quantile-python which looks like it's from 2015 on pypy

JacobHenner · 2024-06-18T12:43:36Z

Hey Jacob. Did you manage to release the faster implementation? Having a look at the source, it seems that it's still using quantile-python which looks like it's from 2015 on pypy

Not yet, but I did get approval to do so - I'll try to share as soon as I have a chance.

alfiedotwtf · 2024-06-18T12:49:04Z

No problem, thanks for the update.

pelt changed the title ~~Summaries are getting slower and uses more and more memory in long term~~ Summaries are getting slower and using more and more memory in long term Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summaries are getting slower and using more and more memory in long term #88

Summaries are getting slower and using more and more memory in long term #88

pelt commented Apr 21, 2023

JacobHenner commented Sep 5, 2023

JacobHenner commented Sep 7, 2023

JacobHenner commented Sep 11, 2023

alfiedotwtf commented Jun 18, 2024

JacobHenner commented Jun 18, 2024

alfiedotwtf commented Jun 18, 2024

Summaries are getting slower and using more and more memory in long term #88

Summaries are getting slower and using more and more memory in long term #88

Comments

pelt commented Apr 21, 2023

JacobHenner commented Sep 5, 2023

JacobHenner commented Sep 7, 2023

JacobHenner commented Sep 11, 2023

alfiedotwtf commented Jun 18, 2024

JacobHenner commented Jun 18, 2024

alfiedotwtf commented Jun 18, 2024