-
-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvjpeg decoder #3504
Comments
commented out for now until I can figure out why the decoding doesn't populate the pixel buffer
The inspiration for the GPU code above was not from nvidia's own documentation or sample code - which fails to tell you that the buffer must be a GPU buffer... and the API doesn't fail if you use a CPU buffer either.
Still TODO: move GPU device selection and keep the context alive as creating it is very expensive. |
Testing with: XPRA_OPENGL_DRAW_REFRESH=0 XPRA_OPENGL_NVJPEG=1 xpra attach tcp://127.0.0.1:10000/ \
--no-speaker --no-mmap --opengl=force -d nvjpeg,opengl --quality=10 A
That's 40ms to decompress and update the screen at 4K, more than half of which (21ms) is spent decoding. The next biggest item is the RGB buffer copying (10ms). Would using We can't really make this more parallel because we need an X11 context to access OpenGL and copy the GPU buffer. I don't think that this can be split into two threads (one for decoding and one for copying) because pycuda stores contexts per-thread. When handling multiple windows, the cuda context object could be cached and re-used, see #3209
|
enable nvjpeg decoder by default now that it works well
This is now enabled by default and works quite well! Possible future enhancements:
|
It still leaks GPU memory.. Things I have tried:
Things I have not tried:
Other notes:
|
the ImageWrapper just returns the CUDA buffer and the caller is responsible for downloading it or copying on the GPU
That's because we free the cuda context for this window by calling |
Perhaps try with nvjpeg-python, only temporarily as I really don't like the dependencies and the interface. |
Perhaps the leak comes from: xpra/xpra/client/gl/gl_window_backing_base.py Line 1051 in 4fbaf2c
Or: xpra/xpra/client/gl/gl_window_backing_base.py Lines 1064 to 1065 in 4fbaf2c
xpra/xpra/client/gl/gl_window_backing_base.py Line 1114 in 4fbaf2c
But perhaps we need to re-arrange the allocation / context / import? |
Could also be a vbo leak? |
Or perhaps we need to free the resources / stream whilst the cuda context is active: inducer/pycuda#370 (comment) |
Found memory allocation related issues in the release notes:
Perhaps we can just try again. |
As per the commit above, this no longer leaks memory with the latest drivers (at least on MS Windows) and is now enabled by default if the driver version is 522.6 or later. If we find that the fix is also included in older versions of the drivers, we can change the number used in the version check. |
Hi @totaam. May I ask why do you you rely on the driver version instead of CUDA version for checking? I'm trying to employ nvJPEG in other software, and I need a robust environment check to prevent this severe memory leak in deployments too. I've managed to find only this one reported issue on this bug in nvJPEG, and the PyTorch maintainers seem to rely on CUDA version for the check. I'm just trying to figure whether it's driver or CUDA is the primary factor of the memory leak presence. |
@dev0x13 |
@totaam That makes sense, thank you! |
FWIW: The leak has also been fixed with the Linux drivers - |
Same as #2984 but for client side decoding this time.
This will allow us to achieve the best possible single picture latency.
Once implemented, it would make sense to tell the server to prefer
jpeg
overwebp
in all(?) cases.The text was updated successfully, but these errors were encountered: