Benchmark Thread #461
Replies: 10 comments 3 replies
-
It seems you scared people away with your system specs 😄 I5-13600K, 32GB RAM @4400MHz, RTX2060 SUPER 8GB
Very inconsistent results, can't explain the small jump between 2 & 3 and the big jump once I used 4 threads. |
Beta Was this translation helpful? Give feedback.
-
😆 Yeah the bottleneck is definitely the GPU in my case 😄 Gave my old 4770k-rig a well-deserved upgrade last year, will probably last 7-8 years again 😁 But I expected this somehow.. everyone asks for a benchmark to see if they're in the ballpark. But doing it yourself? Hmmm 🤣 |
Beta Was this translation helpful? Give feedback.
-
https://github.com/facefusion/facefusion-assets?tab=readme-ov-file#examples Seems like we all borrow from each other (the example videos) 😄 |
Beta Was this translation helpful? Give feedback.
-
I did some optimization experiments and removed the genderage model from the workflow (almost never use the female/male option). With 1 thread the video processing took only half the time ~ 32 secs. Funnily doing it again with 4 threads it was a meager 1 sec difference to before (I didn't restart though). Perhaps writing to disc blocks the process... |
Beta Was this translation helpful? Give feedback.
-
New Benchmark with 3.5.5 System specs: i7-13700K, 64GB RAM @5600MHz, RTX3060 12GB
|
Beta Was this translation helpful? Give feedback.
-
I tried 20, 16, 12, 8, 6, 4 and 3 3 works best for me. anything above seems to do bursts of speed and then slowdowns. 3 is steady |
Beta Was this translation helpful? Give feedback.
-
My bench... surprised me with a few things. Maybe it's because the VRAM isn't offloading completely, but using Restoreformer, I ran it twice for each thread count. The first run was ALWAYS slow. Like three times as slow. The second run, though, is what you see in my chart. So, running it a second time went 3x as fast. Not sure why. Also, using Restoreformer, the second run with 2 threads was faster than the run with 3 threads? I dunno, but that's what I got. Same with GPEN and 2 threads. Maybe due to below... Also, my setup definitely doesn't like using 4 threads. I didn't completely close down Roop until 4 threads and Codeformer, where it was completely maxing VRAM and GPU and said it was going to take around 25 min! Closing and re-starting wiped VRAM and gave me better results, but I keep it on GFPGAN and 3 threads 99% of the time anyway. Same thing on GPEN, it's holding between 5GB - 6GB VRAM before starting, and GPEN was gonna take around 4.30 min until I restarted Roop. Meaning, all of my tests could just be skewed due to VRAM not releasing whatever it is holding between changing post-processors. Ryzen 7 7800X3D. 64GB RAM@6000MHz, RTX 4070 Super 12GB
|
Beta Was this translation helpful? Give feedback.
-
This is very strange. Restoreformer always takes x2 more time for me compared to GPEN. |
Beta Was this translation helpful? Give feedback.
-
In case anyone is interested, new benches on my PC with the update. These are defaults with changes only to post processing and threads. I'd say that's an awesome performance boost . Ryzen 7 7800x3D / 64GB RAM@6000MHz / RTX 4070 Super 12GB
Also, tested swapping the cuDNN files for 9.6.0, just to see if they would make a difference (newer cuDNN files make a decent improvement in stable diffusion webui forge, but they did not... on my setup the swap actually degraded performance a tiny bit, so I went back to the 9.1.0 files). Might test with versions in-between to see if swapping makes any difference at all. |
Beta Was this translation helpful? Give feedback.
-
lysxelapsed - apologies for not doing the enhancers, I will try to do so when I get some time maybe for the ideal thread counts at minimum. System Specs: i9-14900-K / 64 GB Ram @ 5600 MHZ / RTX 4090 24GB So based on the results, for 128, i could use around 22 threads and still see good results, but at 512, it was closer to 8. One thing that I was thinking about during the benchmarking is that the video is fairly small. I think longer videos are more likely to lead to a situation where all the VRAM gets used which seems to bottom out from that point forward.
|
Beta Was this translation helpful? Give feedback.
-
Since performance is an almost constant topic, I'm starting a proper benchmark thread. I found the benchmark video from rope (not roop, my bad 😆)
I restarted roop-unleashed between models because the VRAM usage builds up. Didn't go higher than 4 threads, because 12GB VRAM are (almost) maxxed out at 4 threads - except for no enhancer and DMDNet.
Here's the video and picture (I hope Jennifer Aniston doesn't mind 😆) I used:
https://github.com/C0untFloyd/roop-unleashed/assets/63816546/d04be678-c69e-4c1e-86a0-e5a5866649a2 <- Video
My specs: i7-13700K, 64GB RAM @5600MHz, RTX3060 12GB
Relevant settings -> Face selection method: Selected face / Processing method: In-Memory-processing Image output format: png / Video codec: hevc_nvenc
Here's the code for the 4x7 grid:
Beta Was this translation helpful? Give feedback.
All reactions