-
-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestep shifting #653
Timestep shifting #653
Conversation
Can you also add support for HunyuanVideo in BaseHunyuanVideoSetup.py? Maybe with some reasonable defaults for timestep shift? |
Shifting should already work for HV. There is no change in the model setup necessary for shifting, that's only required for dynamic shifting. But there is no reason to believe that HV has trained their model using resolution-dependant shifting, and I wouldn't know the right parameters for that. The scheduler configuration of HV uses a shift value of 7.0 for inference here: That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it? edit: |
Yes, I can test it later today - will set timestep shift to 7.0 and keep dynamic timestep shifting off. Which distribution should I use? 'LOGIT_NORMAL'? |
yes, Nerogar has quote their paper that they have used LOGIT_NORMAL for training. it's not clear if they have used shifting for training. |
Note: because there are open points above, I haven't tested the most-recent commit. There could be a typo, don't merge without a final test. |
I performed some tests with HV using UNIFORM distribution and a timestep shift of 3 and results look good so far. A shift of 7 was too high. |
Thanks! |
I think it's fine now. Let's see how future models implement shifting and then decide how to properly implement dynamic shifting |
This is an implementation of timestep shifting for training, as is done by
FlowMatchEulerDiscreteScheduler
during inference.An explanation why I think this is correct you can find here: #615
I am getting more and more convinved of this, because many of my own training experiments have failed or would have failed if I didn't manually mess with the timestep distribution first. The default timestep distribution often cannot change the image composition enough.
New training parameters:
This PR doesn't change any defaults, but I believe this should be the defaults for the presets:
SD3: timestep_shift = 3.0
Flux: dynamic_timestep_shifting = True
Here are some examples:
LOGIT_NORMAL distribution with timestep_shift == 1.8, which is what dynamic shifting calculates for 512x512 resolution:
LOGIT_NORMAL with 3.15, which is what dynamic shifting calculates for 1024x1024:
This is a UNIFORM distribution with the same timestep shift of 1.8: