Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arxiv_plot_extract on next is using to much RAM memory #627

Open
DonHaul opened this issue Dec 5, 2024 · 0 comments
Open

arxiv_plot_extract on next is using to much RAM memory #627

DonHaul opened this issue Dec 5, 2024 · 0 comments
Labels
project: next type: bug Something isn't working

Comments

@DonHaul
Copy link
Collaborator

DonHaul commented Dec 5, 2024

When running the plotextract stage on next, we at times reach high RAM memory usage.
Image

This is happening on the follwing call tree:
process_tarball
convert_image call
convert_images()
convert_image()

usually each images takes around 70MB of memory to process but other take 800MB as seen below. this is for the processing of https://cernbox.cern.ch/s/QnUQJ7E6a2lSbWN

Line #    Mem usage    Increment   Line Contents
================================================
    74     38.9 MiB     38.9 MiB   @profile()
    75                             def convert_image(from_file, to_file, image_format):
    76                                 """Convert an image to given format."""
    77     39.0 MiB      0.1 MiB       memory_limit = limits['memory']
    78     39.0 MiB      0.0 MiB       disk_limit = limits['disk']
    79                                 # fix for weird situation which SOMETIMES
    80                                 # (usualy on first file in  a record)
    81                                 # limits resets to default value when used inside `with` block in here.
    82    868.9 MiB    829.9 MiB       with Image(filename=from_file) as original:
    83    868.9 MiB      0.0 MiB           limits['memory'] = memory_limit
    84    868.9 MiB      0.0 MiB           limits['disk'] = disk_limit
    85    868.9 MiB      0.0 MiB           with original.convert(image_format) as converted:
    86    868.9 MiB      0.0 MiB               limits['memory'] = memory_limit
    87    868.9 MiB      0.0 MiB               limits['disk'] = disk_limit
    88     46.6 MiB      0.0 MiB               converted.save(filename=to_file)
    89     46.6 MiB      0.0 MiB       return to_file

Besides this, as show in the top printscreen above this process may use up to 2.1gb of memory not only from the python process but also due to the subprocesses it spins up for magick and ghostscript on the background that also use extra memory. This is a known issue that has been reported by other wand users
https://stackoverflow.com/questions/44209861/how-to-reduce-wand-memory-usage

Next Actions:

  • Reduce the amount of allowed memory of wand. Both in the next-worker and the next-idler:
    • For the idler it is currently set to 2.4GB reducing it to 500MB seems to somehow reduce the Memory usageAPP_WAND_MEMORY_LIMIT=536870912
    • For the workers it is currently set to 500MB, to be checked what a good value for APP_WAND_MEMORY_LIMIT
    • Consider mount some volume to the worker pods, so that swap can be used, as they usually have between 1gb and 3gb of available RAM memory which is not enough in some cases

Edit some of the arxiv_plot_extract runs are using up to 7.6gb of RAM, to make sure they run, at least in the idler, make sure you have enough free space (if the pod is old, old python processes might be using ram unecessarily kill them)

MiB Mem :  14625.7 total,   9888.0 free,   3610.5 used,   1454.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  11015.2 avail Mem

9.8GiB of RAM which should be fine for most cases

@DonHaul DonHaul added project: next type: bug Something isn't working labels Dec 5, 2024
@DonHaul DonHaul changed the title plotextract on next is not clearing memory properly arxiv_plot_extract on next is not clearing memory properly Dec 6, 2024
@DonHaul DonHaul changed the title arxiv_plot_extract on next is not clearing memory properly arxiv_plot_extract on next is using to much RAM memory Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: next type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant