What thread settings yielded the maximum performance? With GFPGAN on #116

mammuthus · 2023-08-15T19:52:52Z

mammuthus
Aug 15, 2023

I conducted several tests using standard settings and the GFPGAN face enhancer (post-processing section) with my 3080Ti OC.
I was quite surprised when I realized that the performance with 2 threads isn't much different from using 8+ threads.
For {relatively powerful} graphics card, 2-4 threads seem to be the optimal range and this is rather unexpectedly.

Please mention your GPU model.
Please test it with the GFPGAN enhancer switched on.
Of course, when testing different numbers of threads, the input and output face sources should remain consistent.

threads | fps (avg) | time to finish
1       | 4         | 1:54
2       | 5.3       | 1:37
4       | 5.5       | 1:29
8       | 5.3       | 1:37
12      | 2.5       | 3:25
16      | 2         | 4:00
24      | ?         | blackscreen -> reboot

What thread settings yielded the maximum performance?

1

7%

2

10%

3

27%

4

30%

5-7

5%

8-10

12%

11-13

0%

14+

7%

40 votes

phineas-pta · 2023-08-15T21:10:33Z

phineas-pta
Aug 15, 2023

the way roop does multi-thread is beneficial to swapper (onnx backend) but not to enhancer (torch backend) so increase threads doesnt speed up enhancer, it may get worse because swapper take too much vram

0 replies

markrmiller · 2023-08-19T01:16:28Z

markrmiller
Aug 19, 2023

No real surprise - in the regular python code, outside of doing IO, the python GIL means only one thread runs at a time. For GPU work, most or all of the inference calls obtain a lock so that only one thread runs at a time.

All in all, regardless of the thread count, there is little room for actual concurrency.

0 replies

markrmiller · 2023-08-19T01:25:09Z

markrmiller
Aug 19, 2023

Oh right, the face swap calls can run concurrent and face swap alone is fast. But toss in any of the other models that lock and that crushes you.

On my system, masking is like a 10-30x hit.

Ideally, it would offer an alternative workflow given that generally a small proportion of the frames need to be masked. That's easier to implement in roop proper I think though, as it doesn't process a full frame at a time into the final video, which would allow you to have a workflow of: extract all the frames, swap the faces, allow the user to easily mark the frame ranges that actually need masking, do the masking, write the frames to video. In most of my cases, that's insanely faster.

0 replies

C0untFloyd · 2023-08-19T10:14:04Z

C0untFloyd
Aug 19, 2023
Maintainer

Or rewrite the enhancers/blocking models to run in parallel, because other than masking you likely want to have face restoration in every frame 😢
I know 2 forks which did this partially (by converting to onnx) but they only use GFPGAN which I find lacking (rather prefer codeformer). Also, what they didn't do is reusing the face landmarks/cropped faces from the swapping process. Doing chained processing (reusing data from the previous plugin) is a huge advantage over the original roop.

0 replies

Sevensleeper81 · 2024-04-10T10:10:13Z

Sevensleeper81
Apr 10, 2024

I have Asus ROG RTX 4090 24Gb....
Number of Threads?
Thanks

1 reply

lysxelapsed Apr 10, 2024

Depends on which enhancer you use. See the FAQ and #461. Best is to test it out for yourself with a short video and your preferred enhancer. I'd start with 4 and go up one by one until you hit 80% of your VRAM max.
Or even better: Do the full course in the benchmark thread (#461) - maybe from 4 threads to your max? - so that other users can profit from your experience, too. If others would have done that, your question would already be (roughly) answered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What thread settings yielded the maximum performance? With GFPGAN on #116

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

What thread settings yielded the maximum performance? With GFPGAN on #116

mammuthus Aug 15, 2023

Replies: 5 comments · 1 reply

phineas-pta Aug 15, 2023

markrmiller Aug 19, 2023

markrmiller Aug 19, 2023

C0untFloyd Aug 19, 2023 Maintainer

Sevensleeper81 Apr 10, 2024

lysxelapsed Apr 10, 2024

mammuthus
Aug 15, 2023

Replies: 5 comments 1 reply

phineas-pta
Aug 15, 2023

markrmiller
Aug 19, 2023

markrmiller
Aug 19, 2023

C0untFloyd
Aug 19, 2023
Maintainer

Sevensleeper81
Apr 10, 2024