Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summaries are getting slower and using more and more memory in long term #88

Open
pelt opened this issue Apr 21, 2023 · 6 comments
Open

Comments

@pelt
Copy link

pelt commented Apr 21, 2023

The underlying quantile package seems to getting slower and using more memory if configured with more than one invariants which is the default in both, aioprometheus and quantile. This is getting an issue for long-running services which gather millions of measurements for one summary metric. The response time for premetheus can increase to over one second and more.

A current workaround is to use exactly one invariant/quantile (if it's feasible for your use cases) so that this issue is not triggered within the quantile package.

@pelt pelt changed the title Summaries are getting slower and uses more and more memory in long term Summaries are getting slower and using more and more memory in long term Apr 21, 2023
@JacobHenner
Copy link
Contributor

The underlying quantile package seems to getting slower and using more memory if configured with more than one invariants which is the default in both, aioprometheus and quantile.

How was this assessed? Do you have a reproducer?

Is this an issue with how aioprometheus uses the quantile library? Or is there a bug in the quantile library itself?

Why is the use of one invariant vs > 1 invariant relevant?

@JacobHenner
Copy link
Contributor

I've been experimenting with this - I think there's either a bug (or deliberate difference) in the quantile library compared to the quantile libraries used in other language prometheus client libraries.

To test this theory, I calculated the (0.5, .005),(0.90,0.001),(0.99, 0.0001) quantiles for one million random integers in range [0,10) using both https://github.com/matttproud/python_quantile_estimation (Python, used here) and github.com/beorn7/perks/ (Go, used in the official Go prometheus client library).

The Go implementation took ~1.5 seconds to run and maintained ~1250 samples. Using the same class of input, the Python implementation maintained 104659 samples and took ~192 seconds. Both libraries claim to use the same algorithm from paper Effective Computation of Biased Quantiles over Data Streams.

I don't believe this is an issue specific to aioprometheus. However, one thing to note is that other prometheus client libraries (including Java, Go) implement sliding windows for Summaries. If I understand correctly, having sliding windows in aioprometheus's Summary implementation would provide an upper limit on how many samples would be retained (the maximum number of observations logged within the window). Perhaps supporting sliding windows should be considered. It looks like someone has tried: https://github.com/RefaceAI/aioprometheus-summary/blob/main/aioprometheus_summary/__init__.py

@JacobHenner
Copy link
Contributor

The Go implementation took ~1.5 seconds to run and maintained ~1250 samples. Using the same class of input, the Python implementation maintained 104659 samples and took ~192 seconds. Both libraries claim to use the same algorithm from paper Effective Computation of Biased Quantiles over Data Streams.

I've written my own implementation, inspired by the Go implementation. It's performance and memory utilization is much better, and it passes the Go implementation's tests. I hope to release it publicly sometime soon.

@alfiedotwtf
Copy link

Hey Jacob. Did you manage to release the faster implementation? Having a look at the source, it seems that it's still using quantile-python which looks like it's from 2015 on pypy

@JacobHenner
Copy link
Contributor

Hey Jacob. Did you manage to release the faster implementation? Having a look at the source, it seems that it's still using quantile-python which looks like it's from 2015 on pypy

Not yet, but I did get approval to do so - I'll try to share as soon as I have a chance.

@alfiedotwtf
Copy link

No problem, thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants