Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGL Backend: Nvidia Used Memory issue #272

Open
892768447 opened this issue Jan 23, 2025 · 1 comment
Open

EGL Backend: Nvidia Used Memory issue #272

892768447 opened this issue Jan 23, 2025 · 1 comment

Comments

@892768447
Copy link

892768447 commented Jan 23, 2025

I encountered a problem where an application that does not require a graphics card, such as gedit, takes up 2M of video memory when launched in egl mode. But the VGL mode won't.

Environment:

  • Quadro RTX 4000
  • NVIDIA-SMI 535.183.01
  • Driver Version: 535.183.01
  • CUDA Version: 12.2
  • Red Hat Enterprise Linux Server 7.7 (Maipo)
  • VirtualGL v3.1.2 64-bit
  • tigervnc-server-1.8.0-22.el7.x86_64

Problem:

vglrun -d egl gedit

Image

Guess:

After researching and debugging the code, it was found that when the eglMakeCurrent function is called, it will occupy 2 MB of video memory, and through example testing, it is necessary to call eglReleaseThread to release it immediately.

here is some software debug trace like:

VGL_DEBUG=1 vglrun -d egl glxgears

glxvisual::buildCfgAttribTable (//virtualgl-3.1.2/server/glxvisual.cpp:472)
glxvisual::chooseFBConfig (/virtualgl-3.1.2/server/glxvisual.cpp:881)
glxvisual::configsFromVisAttribs (/virtualgl-3.1.2/server/glxvisual.cpp:762)
glXChooseVisual (/virtualgl-3.1.2/server/faker-glx.cpp:225)
make_window.constprop ()
main ()
VGL_DEBUG=1 vglrun -d egl nm-applet

glxvisual::buildCfgAttribTable (/virtualgl-3.1.2/server/glxvisual.cpp:472)
glxvisual::assignDefaultFBConfigAttribs (/virtualgl-3.1.2/server/glxvisual.cpp:113)
glxvisual::buildVisAttribTable (/virtualgl-3.1.2/server/glxvisual.cpp:294)
glxvisual::getDefaultFBConfig (/virtualgl-3.1.2/server/glxvisual.cpp:1029)
matchConfig (/virtualgl-3.1.2/server/faker-glx.cpp:50)
glXGetConfig (/virtualgl-3.1.2/server/faker-glx.cpp:1004)
_gdk_x11_screen_update_visuals_for_gl ()
_gdk_x11_screen_init_visuals ()
_gdk_x11_screen_new ()
_gdk_x11_display_open ()
gdk_display_manager_open_display ()
gtk_init_check ()
gtk_init ()
main ()

Solve

I traced the code submission records and found that. server/glxvisual.cpp

2d37700#diff-6a7d16173bc2ce0f6720ad1a1d5fa7aa2157fc65544f738df0ddae1076a3bfdeR471

a76a050#diff-6a7d16173bc2ce0f6720ad1a1d5fa7aa2157fc65544f738df0ddae1076a3bfdeR462

I don't quite understand the additional purpose of the following code, it's just eglCreateContext->eglMakeCurrent(EDPY, EGL_NO_SURFACE, EGL_NO_SURFACE, ctx)->eglDestroyContext?

                        if(!_eglBindAPI(EGL_OPENGL_API))
				THROW("Could not enable OpenGL API");
			if(!(ctx = _eglCreateContext(EDPY, (EGLConfig)0, NULL, NULL)))
				THROW("Could not create temporary EGL context");
			{
				backend::TempContextEGL tc(ctx);
                                ...
			}
			_eglDestroyContext(EDPY, ctx);  ctx = 0;

This will cause video memory to be occupied and not released.
If I add a _eglReleaseThread(); after the call to _eglDestroyContext(EDPY, ctx); ctx = 0;, the video memory will be immediately released. Or when I comment out the following function calls // backend::TempContextEGL tc(ctx);.

diff --git a/server/glxvisual.cpp b/server/glxvisual.cpp
index 0ca73d8..cbc8006 100644
--- a/server/glxvisual.cpp
+++ b/server/glxvisual.cpp
@@ -475,12 +475,12 @@ static void buildCfgAttribTable(Display *dpy, int screen)
 			int bpcs[] = { defaultDepth == 30 ? 10 : 8, defaultDepth == 30 ? 8 : 0 };
 			int maxSamples = 0, maxPBWidth = 32768, maxPBHeight = 32768, nsamps = 1;
 
			if(!_eglBindAPI(EGL_OPENGL_API))
				THROW("Could not enable OpenGL API");
			if(!(ctx = _eglCreateContext(EDPY, (EGLConfig)0, NULL, NULL)))
				THROW("Could not create temporary EGL context");
 			{
-				backend::TempContextEGL tc(ctx);
+				// backend::TempContextEGL tc(ctx);
 
 				_glGetIntegerv(GL_MAX_SAMPLES, &maxSamples);
 				if(maxSamples > 0)
@@ -495,7 +495,7 @@ static void buildCfgAttribTable(Display *dpy, int screen)

or:

                             if (_eglGetCurrentContext())
			         backend::TempContextEGL tc(ctx);
Image Image

The above analysis is only my speculation and experiment, and I am not sure what other consequences there will be after making such modifications

@892768447 892768447 changed the title EGL Backend: Nvidia memory release issue EGL Backend: Nvidia Used Memory issue Jan 24, 2025
@dcommander
Copy link
Member

(I assume you are referring to the GLX back end when you say "the VGL mode.")

To give some background, VirtualGL is necessitated by the client/server architecture of X11, in which GPU acceleration can only be achieved with a physical X server [*]. VirtualGL can either add GPU-accelerated OpenGL to virtual X servers or, if you are using remote X, it can move OpenGL rendering from the client machine to the server machine and thus avoid the severe compatibility and performance limitations of indirect OpenGL. VirtualGL has two "front ends": one that supports applications that use the GLX API and another that supports applications that use the EGL/X11 API. (Both GLX and EGL/X11 are bridge APIs between X11 and OpenGL, allowing OpenGL rendering contexts to be bound to X11 drawables.) VirtualGL also has two "back ends": the "GLX back end" that accesses a GPU through a physical X server ("3D X server") connected to it, and the "EGL back end" that accesses a GPU through an associated EGL device. In addition to redirecting OpenGL rendering away from the virtual X server's software OpenGL implementation (or the remote X server's indirect OpenGL implementation), VirtualGL also redirects OpenGL rendering from X windows into off-screen pixel buffers (Pbuffers) in GPU memory.

EGL/X11 function calls are really straightforward to emulate. In most cases, those function calls are passed down to the 3D X server or EGL device on a 1-to-1 basis with only minimal modifications to redirect rendering from windows to Pbuffers. GLX function calls are also mostly straightforward to emulate with a 3D X server, since the function calls can similarly be passed down to the 3D X server on a 1-to-1 basis with only minimal modifications to redirect rendering from windows to Pbuffers.
However, emulating GLX with an EGL device (i.e. using VirtualGL's GLX front end with its EGL back end) is way more complicated than it should be. There are two primary reasons for that complexity:

  1. EGL has no ability to create multi-buffered Pbuffer surfaces, so there is no way to emulate a double-buffered or quad-buffered GLX drawable using an EGL surface.
  2. Multiple threads are allowed to render to the same GLX drawable simultaneously, but EGL forbids that with EGL surfaces.

If you're interested in the gory details of everything I tried in order to solve that problem (which included unsuccessfully petitioning nVidia to implement a multi-buffered Pbuffer surface extension), I refer you to the long comment thread under #10. tl;dr: Ultimately I had to use OpenGL renderbuffer objects (RBOs) to emulate multi-buffered Pbuffers. However, since renderbuffer objects are tied to a specific OpenGL context, that broke a basic design assumption of GLX (which is that GLX drawables are independent of GLX contexts.) To solve that, I had to create a singleton "RBO context" that holds all of the fake Pbuffers (Pbuffers that are emulated using RBOs) and is shared with every OpenGL context that the 3D application creates. That requires emulating GLX framebuffer configurations (GLXFBConfigs) using underlying OpenGL internal formats, since the visual properties of an RBO are also defined at the OpenGL level rather than at the level of the bridge API.

The code in question is part of the aforementioned GLXFBConfig emulation. It creates a temporary EGL Pbuffer surface and context so VirtualGL can probe the value of GL_MAX_SAMPLES in the underlying OpenGL implementation. That prevents VirtualGL from exposing emulated GLXFBConfigs that claim to support a multisampling level that the underlying OpenGL implementation doesn't support. Some OpenGL applications may still work if you comment out or change the code as you did above, but applications that use multisampling may break.

Adding a call to eglReleaseThread() in buildCfgAttribTable() should be innocuous and would presumably address the issue whereby non-OpenGL applications retain GPU memory throughout the life of the application. As far as OpenGL applications, however, I will need to examine the EGL back end more carefully to figure out the other situations in which eglReleaseThread() can safely be called.

[*] Not completely true anymore because of DRI3, which moves GPU buffer handling from the X server to X clients. That allows for GPU acceleration in virtual X servers, with some minor GLX conformance violations. (Since GPU buffer handling is now the province of X clients, multiple applications can no longer render to the same GLX drawable.) However, nVidia's drivers do not and probably never will support DRI3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants