Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped #185

berniechiu · 2020-04-14T09:56:49Z

As we mention in our README, using DirectFileStore has a measurable impact on the production app's memory usage.

This doesn't seem to be a memory leak, it doesn't grow unbounded over time, but it is a problem for some of our users.

We think this memory usage is particularly high when getting scraped, at which point the library has to read all the files from all the processes, and load all the data in RAM to aggregate it. There may be more efficient ways to do this. As an example, we've found this improvement in the past.

We'd like to reduce the memory overhead of using DirectFileStore as much as possible, so this is a sort of call for PRs.

Original issue text below, for the conversation thread below to make sense:

Hi Prometheus team,

We've bumped into issues like this

Is there any possibility to dump the file store properly?

dmagliola · 2020-04-14T10:08:31Z

Hello, thank you for this report!

I may be misunderstanding the problem, but it seems like you are running out of memory?
Could I ask you for a bit more clarification, please?

Are you reporting this as a bug in the Prometheus Client?
Is the issue that your memory usage is now higher after adding the Prometheus Client?
What do you mean by "dump the file store properly"?
Do you have a sample bit of code that can reproduce this reliably?

The more details you can add, the better.
Thank you!

berniechiu · 2020-04-14T14:04:02Z

I may be misunderstanding the problem, but it seems like you are running out of memory?

Yes, we've encountered this after installing the library 🥺

Are you reporting this as a bug in the Prometheus Client?
Is the issue that your memory usage is now higher after adding the Prometheus Client?

I'm not quite sure if this is a bug or we need to configure extra options for the Prometheus Client.
We've just configured it by the default like this

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::DirectFileStore.new(dir: '/tmp/prometheus_direct_file_store')

What do you mean by "dump the file store properly"?

Sorry, I meant to release the file store or memory properly

https://github.com/prometheus/client_ruby#directfilestore-caveats-and-things-to-keep-in-mind
Based on the README above

Memory Usage: When scraped by Prometheus, this store will read all these files, get all the values and aggregate them. We have notice this can have a noticeable effect on memory usage for your app. We recommend you test this in a realistic usage scenario to make sure you won't hit any memory limits your app may have.

I wonder while aggregating the files, when do they release the memory or the file store that was configured. I suppose this is the issue here?

Do you have a sample bit of code that can reproduce this reliably?

I currently don't have simple code the replicate the issue, but I'll try.

Thanks a lot!

dmagliola · 2020-04-14T16:16:04Z

In my opinion, this isn't a bug in the library. And I don't think there are configuration options that will help with this.

We have looked into memory usage in our production environment specifically looking for a memory leak, and we don't think there is one. I don't think this is an issue with releasing memory properly.
The Prometheus Library does increase memory usage, as we mention in the section of the README you mention, but it's a bounded amount, it doesn't continue increasing indefinitely like with a memory leak. At least not that we have found.

I wonder while aggregating the files, when do they release the memory or the file store that was configured. I suppose this is the issue here?

From your stack trace, i'm under the impression the error happened when observing a histogram, not when aggregating the files, although of course some other request could have been aggregating results at the same time and using more RAM.

As for your specific question, we read all files by opening them and closing them after we've read each of them, details here but we do keep all the data that we read from all of them in memory until the request is finished, as we need it to aggregate all the values and generate the response.

There have been some efforts to limit this memory usage, like this one.
If you find more efficient ways for the Client to do its job, we'd love to get a PR.

If it's ok with you, i'm going to change this issue's title and description to reflect the fact that we want to reduce memory usage, rather than it looking like an exception, which will make it more approachable for anyone that would like to help improve this.

berniechiu · 2020-04-15T02:25:17Z

Thanks a lot!

dmagliola changed the title ~~Errno::ENOMEM: Cannot allocate memory~~ Reduce memory usage, particularly while aggregating data when getting scraped Apr 14, 2020

dmagliola changed the title ~~Reduce memory usage, particularly while aggregating data when getting scraped~~ Reduce DirectFileStore memory usage, particularly while aggregating data when getting scraped Apr 14, 2020

dmagliola changed the title ~~Reduce DirectFileStore memory usage, particularly while aggregating data when getting scraped~~ Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped Apr 14, 2020

dmagliola added the Help wanted label Apr 14, 2020

berniechiu mentioned this issue Apr 24, 2020

Errno::ENOMEM: Cannot allocate memory yabeda-rb/yabeda-prometheus#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped #185

Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped #185

berniechiu commented Apr 14, 2020 •

edited by dmagliola

Loading

dmagliola commented Apr 14, 2020

berniechiu commented Apr 14, 2020 •

edited

Loading

dmagliola commented Apr 14, 2020

berniechiu commented Apr 15, 2020

Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped #185

Reduce DirectFileStore memory overhead, particularly while aggregating data when getting scraped #185

Comments

berniechiu commented Apr 14, 2020 • edited by dmagliola Loading

dmagliola commented Apr 14, 2020

berniechiu commented Apr 14, 2020 • edited Loading

dmagliola commented Apr 14, 2020

berniechiu commented Apr 15, 2020

berniechiu commented Apr 14, 2020 •

edited by dmagliola

Loading

berniechiu commented Apr 14, 2020 •

edited

Loading