Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Buffer Command? #8

Open
bamonroe opened this issue Dec 19, 2021 · 6 comments
Open

Release Buffer Command? #8

bamonroe opened this issue Dec 19, 2021 · 6 comments

Comments

@bamonroe
Copy link

My machine:
OS: Arch Linux
GPU: NVIDIA GeForce GTX 750ti (2GB)
Native Nvidia OpenCL drivers, version 495.46

R info:
R version: 4.1.2
Package version: CRAN OpenCL_0.2-2

I'm doing some pretty heavy simulations which tend to max out the VRAM on my GPU. This is fine because I can simply call "as.numeric" on the result of the oclRun command to retrieve the stored simulations off my GPU. I do something like this:

args <- list()
args$kernel <- kernel
args$size   <- size
args$dim    <- dim

args[[length(args) + 1]] <- as.clBuffer(input1, ctx)
args[[length(args) + 1]] <- as.clBuffer(input2, ctx)
...

res <- do.call(oclRun, args)
dat <- as.numeric(res)
save(dat, file = "dat.Rda")

I can see as I'm saving the data that my GPU's memory is still full using nvtop in the terminal. The output vector is what's eating the ram because there are about a million results being stored. If I do rm(res), the pointer to the clBuffer on the GPU is lost to the R session, but there isn't a corresponding call to clReleaseMemObject, or something like that to free up the VRAM. Can a method for rm be added to do this call? Or can some other way for releasing the buffer memory on the GPU be added? I'd like to put that "res-dat-save" block into a loop, but currently I have to let R quit for the GPU memory to be released.

Thanks for all the work with this package, I'm finding it very useful.

@s-u
Copy link
Owner

s-u commented Dec 19, 2021

@bamonroe Unter normal circumstances the regular memory management rules apply: as soon as the R reference is freed, so is the corresponding OpenCL object.

Maybe the only important part here is that R doesn't know about the size of the represented objects in the GPU, so R doesn't use them to decide when to trigger garbage collections, so you should call gc() directly to force it if you want the objects released immediately and not wait until the next collection. So in your above example you want to add rm(args) after do.call(), either res <- as.numeric(res) or explicit rm(res) and then gc() at the end - in theory that should release everything on the GPU. If that is not the case, please include a full example that I can run and look at.

@aaronpuchert
Copy link
Collaborator

aaronpuchert commented Dec 20, 2021

I've had the same issue in the past and gc() did it for me. Didn't even bother to rm anything, because I had enough memory to fit a couple of iterations. So it just dropped whatever was unreachable anyway.

Would be nice to have R handle this automatically though. Any idea if we can tell the garbage collector about externally allocated memory and perhaps even the (usually lower) memory limit? Not sure if OpenCL tells us that—clinfo has a “Global memory size”, but it seems too optimistic on my machine.

@s-u
Copy link
Owner

s-u commented Dec 20, 2021

@aaronpuchert unfortunately, no. R only tracks only memory/objects it allocates, and it has no concept of "foreign" memory. That would require really a custom code deep in R internals on allocation to also include some external measure other than its own allocator to trigger garbage collection. In order for it to be actually useful, it would have to track each source separately, since collecting "real" memory won't help the GPU and vice-versa.

However, the one thing we could do internally in OpenCL would be to track the allocations ourselves and trigger R GC if we see high GPU RAM pressure. Because of the fact that all allocations go through our code, we'd just have to add tracking either to cl_create_buffer or mkBuffer plus clFreeBuffer. Then in mkBuffer() we could trigger R_gc() if we think that the usage is high.

@bamonroe
Copy link
Author

Thanks so much for the quick responses. It looks like explicitly calling garbage collection did the trick. This seems very much like the sensible way to approach this problem. The more I thought about it, having an explicit "clDeleteBuffer" function could run the risk of dangling pointers.

It would be great to mention in the documentation for the clBuffer function. Something like "When all references to the pointer created by clBuffer have been deleted from R, the buffer is deleted on the corresponding device, freeing the memory allocated to the buffer." I guess "deleted from R" is not a good way to say it, "go out of scope" maybe?

@s-u
Copy link
Owner

s-u commented Dec 21, 2021

@bamonroe @aaronpuchert I have added support for memory tracking and automatic garbage collection. See ?oclMemLimits for details. The auto-GC is disabled by default for now, because it really needs to be set to some reasonable values which I didn't check how to obtain (note that the tracking is global, it is NOT per-device). It may make sense to enable it with some half-way reasonable values given the detected hardware ... have a look and let me know if it helps.

@aaronpuchert
Copy link
Collaborator

I have added support for memory tracking and automatic garbage collection. See ?oclMemLimits for details.

Nice, that seems like a good start.

The auto-GC is disabled by default for now, because it really needs to be set to some reasonable values which I didn't check how to obtain (note that the tracking is global, it is NOT per-device).

We could query clGetDeviceInfo with CL_DEVICE_GLOBAL_MEM_SIZE (and if CL_DEVICE_HOST_UNIFIED_MEMORY is true we might deactivate the mechanism). Then in cl_create_buffer we could get the device via Rf_getAttrib(context_exp, oclDeviceSymbol).

We could allow setting a limit per context, though it probably makes more sense to track per device, with the default taken as the global memory size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants