AOT-compiled DLLs: how to think about resource utilization #102048
-
I'm thinking of an architecture in which multiple AOT-compiled .NET DLLs are loaded into a single process. How should I think about the per-DLL overhead of this? I assume each of these has its own garbage-collected heap. Does each then also have a dedicated finalizer thread? What about other possible sources of inefficiency? Would loading five of these be "a bit much"? What about twenty-five? Bonus points if the overhead can be compared with the overhead of loading Go-based DLLs exposing C-compatible APIs, as Go also has automatic memory management. (I'm aware, of course, that any shared source dependencies would get individual copies in the AOT-compiled DLLs. In some respects, this is a feature and not a bug.) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Yes, each would have their own GC heap, with their own threads for doing background garbage collection, with their own finalizer thread. I think 5 would still be okay. 25 would probably start to be too much. There are certain behaviors that don't scale well. For example, the GC will by default try to reserve 256 GB of address space on 64-bit systems just to make sure it has continuous block of memory for everything. One cannot reserve this amount many times. The amount can be adjusted by setting heap hard limit but setting it too low might start to interfere with perf or even functionality (OutOfMemoryException).
I don't think Go actually supports loading multiple DLLs with their own separate Go runtime. Here is someone trying to load a Go DLL into a Go app and failing and here is someone trying to load a Go DLL from a C++ DLL from a Go app with a Go maintainer saying one can't have two Go runtimes in the same process. Go doesn't play well with anything within the same process, including itself - I was involved in a bug report in the past where I found out exception handling stops working properly in the process once Go runtime initializes in it. |
Beta Was this translation helpful? Give feedback.
-
In addition to the answer provided by Michal, you may want to ensure the binaries use WKS GC and have limits reduced as mentioned (at the risk of OOM however, but worth a try). This way the lowest count of threads per .dll will be just 3 and their base memory footprint can also be lower (though, as tested, WKS GC, under light but constant allocation, ends up taking more memory than SRV GC + DATAS). You likely can achieve better scalability with trial and error, but at the point of having 20 DLLs - might as well investigate using JIT and loading assemblies normally, which would be much more efficient (more efficient and scalable than Go too, at the cost of not being an isolated graph of dependencies). |
Beta Was this translation helpful? Give feedback.
Yes, each would have their own GC heap, with their own threads for doing background garbage collection, with their own finalizer thread.
I think 5 would still be okay. 25 would probably start to be too much. There are certain behaviors that don't scale well. For example, the GC will by default try to reserve 256 GB of address space on 64-bit systems just to make sure it has continuous block of memory for everything. One cannot reserve this amount many times. The amount can be adjusted by setting heap hard limit but setting it too low might start to interfere with perf or even functionality (OutOfMemoryException).