Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped #185

Open
berniechiu opened this issue Apr 14, 2020 · 4 comments

Comments

@berniechiu
Copy link

berniechiu commented Apr 14, 2020

As we mention in our README, using DirectFileStore has a measurable impact on the production app's memory usage.

This doesn't seem to be a memory leak, it doesn't grow unbounded over time, but it is a problem for some of our users.

We think this memory usage is particularly high when getting scraped, at which point the library has to read all the files from all the processes, and load all the data in RAM to aggregate it. There may be more efficient ways to do this. As an example, we've found this improvement in the past.

We'd like to reduce the memory overhead of using DirectFileStore as much as possible, so this is a sort of call for PRs.

Original issue text below, for the conversation thread below to make sense:


Hi Prometheus team,

We've bumped into issues like this

Screenshot 2020-04-14 17 55 37

Is there any possibility to dump the file store properly?

@dmagliola
Copy link
Collaborator

Hello, thank you for this report!

I may be misunderstanding the problem, but it seems like you are running out of memory?
Could I ask you for a bit more clarification, please?

Are you reporting this as a bug in the Prometheus Client?
Is the issue that your memory usage is now higher after adding the Prometheus Client?
What do you mean by "dump the file store properly"?
Do you have a sample bit of code that can reproduce this reliably?

The more details you can add, the better.
Thank you!

@berniechiu
Copy link
Author

berniechiu commented Apr 14, 2020

I may be misunderstanding the problem, but it seems like you are running out of memory?

Yes, we've encountered this after installing the library 🥺

Are you reporting this as a bug in the Prometheus Client?
Is the issue that your memory usage is now higher after adding the Prometheus Client?

I'm not quite sure if this is a bug or we need to configure extra options for the Prometheus Client.
We've just configured it by the default like this

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::DirectFileStore.new(dir: '/tmp/prometheus_direct_file_store')

What do you mean by "dump the file store properly"?

Sorry, I meant to release the file store or memory properly

https://github.com/prometheus/client_ruby#directfilestore-caveats-and-things-to-keep-in-mind
Based on the README above

Memory Usage: When scraped by Prometheus, this store will read all these files, get all the values and aggregate them. We have notice this can have a noticeable effect on memory usage for your app. We recommend you test this in a realistic usage scenario to make sure you won't hit any memory limits your app may have.

I wonder while aggregating the files, when do they release the memory or the file store that was configured. I suppose this is the issue here?

Do you have a sample bit of code that can reproduce this reliably?

I currently don't have simple code the replicate the issue, but I'll try.

Thanks a lot!

@dmagliola
Copy link
Collaborator

In my opinion, this isn't a bug in the library. And I don't think there are configuration options that will help with this.

We have looked into memory usage in our production environment specifically looking for a memory leak, and we don't think there is one. I don't think this is an issue with releasing memory properly.
The Prometheus Library does increase memory usage, as we mention in the section of the README you mention, but it's a bounded amount, it doesn't continue increasing indefinitely like with a memory leak. At least not that we have found.

I wonder while aggregating the files, when do they release the memory or the file store that was configured. I suppose this is the issue here?

From your stack trace, i'm under the impression the error happened when observing a histogram, not when aggregating the files, although of course some other request could have been aggregating results at the same time and using more RAM.

As for your specific question, we read all files by opening them and closing them after we've read each of them, details here but we do keep all the data that we read from all of them in memory until the request is finished, as we need it to aggregate all the values and generate the response.

There have been some efforts to limit this memory usage, like this one.
If you find more efficient ways for the Client to do its job, we'd love to get a PR.

If it's ok with you, i'm going to change this issue's title and description to reflect the fact that we want to reduce memory usage, rather than it looking like an exception, which will make it more approachable for anyone that would like to help improve this.

@dmagliola dmagliola changed the title Errno::ENOMEM: Cannot allocate memory Reduce memory usage, particularly while aggregating data when getting scraped Apr 14, 2020
@dmagliola dmagliola changed the title Reduce memory usage, particularly while aggregating data when getting scraped Reduce DirectFileStore memory usage, particularly while aggregating data when getting scraped Apr 14, 2020
@dmagliola dmagliola changed the title Reduce DirectFileStore memory usage, particularly while aggregating data when getting scraped Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped Apr 14, 2020
@berniechiu
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants