-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux : slower and slower computations of a neural network - a VRAM problem ? #654
Comments
Without profiling, I suspect this is because the VRAM is not freed in time (because GC does not know about GPU memory space). This creates memory pressure, and when it is too high, we manually trigger GC. This is a known problem and sadly there are no bulletproof solution for it at the moment. One possible thing to try is to create in your project directory a [AMDGPU]
soft_memory_limit = "80 %"
hard_memory_limit = "80 %" |
Thanks for the answer ! I've added this, changed nothing ; I then went down to Now I see that the problem has been discussed many times... e.g. https://github.com/JuliaGPU/CUDA.jl/issues/137 |
These memory limit parameters only control how soon the GC is triggered manually under-the-hood, so it won't help you avoiding GC calls. Probably the best solution for this is to introduce Reference-Counting as a garbage collection mechanism to Julia (along with current GC mechanism), but that is not trivial to do (although some work has been done in that direction). |
Closing this as we now have caching allocator that does not rely on GC so the allocations/deallocations are very fast: |
This small code:
is slower and slower to compute on my computer. Namely, the full code:
prints the following computation times:
so it goes from 3sec to more than 20sec, for the same part of code ! I checked the VRAM of my GPU card (with
cat /sys/class/drm/card1/device/mem_info_vram_used
) and is strictly increasing during the computation of the code above. Maybe this is the source of the problem ? But I'm unable to empty it.I tried many trivial things such as
finalize(input)
, but I was not able to solve the problem. The cpu version of it works well. Please help !The GPU card is
AMD Radeon RX 6700 XT
, I am on Manjaro, kernel 6.9.9-1.The text was updated successfully, but these errors were encountered: