-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-engine compositors #156
Conversation
Tested in Star Stable Online with the example OGL3 DLL, built with MSVC. All tested via Wine. Average FPS without, and with UI displayed:
So does tank a bit, worse FPS than the legacy render engine, but does now work on Wine unlike the last refactor. |
// TODO find a better alternative than allocating each frame | ||
let message_queue = self.rx.try_iter().collect::<Vec<_>>(); | ||
|
||
message_queue.into_iter().for_each(|PipelineMessage(hwnd, umsg, wparam, lparam)| { | ||
imgui_wnd_proc_impl(hwnd, umsg, wparam, lparam, self); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass directly to for_each?
// TODO find a better alternative than allocating each frame | |
let message_queue = self.rx.try_iter().collect::<Vec<_>>(); | |
message_queue.into_iter().for_each(|PipelineMessage(hwnd, umsg, wparam, lparam)| { | |
imgui_wnd_proc_impl(hwnd, umsg, wparam, lparam, self); | |
}); | |
self.rx.try_iter().for_each(|PipelineMessage(hwnd, umsg, wparam, lparam)| { | |
imgui_wnd_proc_impl(hwnd, umsg, wparam, lparam, self); | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not: self.rx.try_iter()
already mutably borrows self
, and imgui_wnd_proc_impl
would also have to mutably borrow self
for the same lifetime which is not possible.
Similarly, allocating a vector in Pipeline
and draining/extending it there with the contents of try_iter
is also not going to work for the same reason, barring using RefCell
which has overhead anyway -- maybe less so than allocating a vector outright, but at the cost of making the code look a lot uglier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concocted a solution for this:
- Store an allocated
Vec
in aOnceCell
. - Pull it out of the
OnceCell
before this section and use it as a backing storage forrx.try_iter()
. - Drain it immediately (preserving the capacity) and pass the contents to the wnd proc implementation.
- Store the drained
Vec
back in theOnceCell
.
This is a bit ugly, but works around the lifetime issues with little to no performance overhead -- OnceCell::take
and OnceCell::set
are simple moves, memory is only ever reallocated when needed, a generous buffer could be pre-allocated, and even a large vector won't be that much of a memory waste.
This PR feels more like the development of a redemption arc rather than good, conscientious software engineering, but here goes. The performance penalty incurred by @vars1ty is not acceptable, so I don't see this copy-buffers solution as viable. Reviewing my own code, I realized that the full screen quad renderers which I wrote from scratch are literally 30-40 lines away each from being complete, fully-fledged In the process, I have acquired enough knowledge about the various rendering engines and GPUs that I now feel confident maintaining all of these things by myself. In fact, I went back and checked the renderers' code from 0.5.0 and I shuddered. What the heck was that garbage, @veeenu ! We will thus go back to the previous model, skipping the current PR: one rendering engine per hook type. The way forward: considerations
ConclusionsMore updates to come soon. I have a plan well laid out in my mind to make good out of this absolute mess that I have willingly put myself in the midst of. February is coming to an end, which means that if everything goes according to plan, I will have a solid solution by the time it will be both feasible and advisable for me to finally go out and touch grass. 🌱 |
Superseded by #158. |
I come to you, dear community, with a PR that would infuse a merciless sense of dread even in the hearts of the greatest practitioners of Zen meditation alive.
This PR is close to a full rewrite of
hudhook
's core. The plan is to carefully review every single file and running extensive tests before smashing the green button, but the quality has in my opinion improved greatly.Changes
Unified renderer
We stay the course with the single
imgui
renderer as of the previous PR #143.The single renderer allows us to work on
imgui
specific circumstances from a single unified spot. This way, UI rendering issues and new features don't have to be spread over four different renderers, and we will be able in the future to leverage rendering concurrency if we so wish.This is currently not so; the
imgui
frame is constructed and rendered right in the middle of thePresent
hooks, and blocking those calls. Luckilyhudhook
andimgui
are pretty slick and can do all that in all four renderers without ever dipping below a solid 100fps on my machine (RTX 3080).Per-engine compositors
I took a deep plunge in the dark seas of rendering engines and built four full-screen quad renderers from scratch, one per supported engine, which take the off-screen rendered resource and composite it on top of the appropriate backbuffer.
If you are using one of these renderers, PLEASE test this branch to confirm that there is no performance penalty.
If you are willing to dig for a solution that doesn't leave the GPU, I'd be forever grateful.
Asynchronous input processing
There is now a
Pipeline
object (its design is still a bit dirty, I want to work on it) that is responsible for managing the lifecycle of the rendering engine, theimgui
context, and among those operations will also manage the window procedure. The window procedure proper is now dummy and only ships its parameters to ampsc::channel
.This has the advantage that the window events are now processed asynchronously just before a frame gets rendered, which should be fine on our end, but the drawback that an event cannot be blocked immediately should the render loop wish to do that, as it won't be able to decide right away whether to consume the events without forwarding them. This should not be a big deal as these decisions sort of depend on the state before the rendering anyway, rather than on any single specific event, and the previous model is not more fine grained than that.
If you had issues with blocking input, please give this branch a go, it might just inadvertently fix that. Or make things worse. It did not make things worse for my use cases.
Wine support
I have tested on-and-off the library on Wine with Proton, and it seems like it's working pretty well. There's still work to be done to figure things out, but we're getting there.
Todo list
Pipeline
external API; it is very similar in all renderers and the current solution looks very badAffected issues
DirectComposition
is not implemented on Wine #151Conclusion
All in all, I am very happy about the results of this PR and have been working tirelessly towards it. There are some outstanding issues, but all manageable. I will keep making inconsequential changes while I wait for your feedback, but my plan is to wrap this up sooner rather than later and finally release
0.6.0
.cc @Jakobzs @vars1ty @vsylva @camas