You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I directly fine-tune based on the provided stage-3 pretrained weight, how many iterations and gpus are estimated to get good results? Are there any guidance or insights on parameter-efficient fine-tuning techniques?
In addition, how to reduce the vRAM requirement (48g)? The operations that have been tried: sp_size=8, enable vae_tiling, reduce image resolution and video frame.
Looking forward to your reply!
The text was updated successfully, but these errors were encountered:
Our experiments on the Waymo dataset show that one may acquire usable results within 2000 iterations, starting from the stage 3 model, while more iterations further improve the quality and controllability.
The encoding may take too much memory on high resolution. If tiling does not help, you may try to generate latents offline and skip the VAE during training.
Looking forward to your reply!
The text was updated successfully, but these errors were encountered: