Replies: 5 comments 1 reply
-
the way roop does multi-thread is beneficial to swapper (onnx backend) but not to enhancer (torch backend) so increase threads doesnt speed up enhancer, it may get worse because swapper take too much vram |
Beta Was this translation helpful? Give feedback.
-
No real surprise - in the regular python code, outside of doing IO, the python GIL means only one thread runs at a time. For GPU work, most or all of the inference calls obtain a lock so that only one thread runs at a time. All in all, regardless of the thread count, there is little room for actual concurrency. |
Beta Was this translation helpful? Give feedback.
-
Oh right, the face swap calls can run concurrent and face swap alone is fast. But toss in any of the other models that lock and that crushes you. On my system, masking is like a 10-30x hit. Ideally, it would offer an alternative workflow given that generally a small proportion of the frames need to be masked. That's easier to implement in roop proper I think though, as it doesn't process a full frame at a time into the final video, which would allow you to have a workflow of: extract all the frames, swap the faces, allow the user to easily mark the frame ranges that actually need masking, do the masking, write the frames to video. In most of my cases, that's insanely faster. |
Beta Was this translation helpful? Give feedback.
-
Or rewrite the enhancers/blocking models to run in parallel, because other than masking you likely want to have face restoration in every frame 😢 |
Beta Was this translation helpful? Give feedback.
-
I have Asus ROG RTX 4090 24Gb.... |
Beta Was this translation helpful? Give feedback.
-
I conducted several tests using standard settings and the GFPGAN face enhancer (
post-processing
section) with my 3080Ti OC.I was quite surprised when I realized that the performance with 2 threads isn't much different from using 8+ threads.
For {relatively powerful} graphics card, 2-4 threads seem to be the optimal range and this is rather unexpectedly.
Please mention your GPU model.
Please test it with the GFPGAN enhancer switched on.
Of course, when testing different numbers of threads, the input and output face sources should remain consistent.
40 votes ·
Beta Was this translation helpful? Give feedback.
All reactions