Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestep shifting #653

Merged
merged 6 commits into from
Feb 8, 2025
Merged

Timestep shifting #653

merged 6 commits into from
Feb 8, 2025

Conversation

dxqbYD
Copy link
Collaborator

@dxqbYD dxqbYD commented Jan 24, 2025

This is an implementation of timestep shifting for training, as is done by FlowMatchEulerDiscreteScheduler during inference.
An explanation why I think this is correct you can find here: #615

I am getting more and more convinved of this, because many of my own training experiments have failed or would have failed if I didn't manually mess with the timestep distribution first. The default timestep distribution often cannot change the image composition enough.

New training parameters:
image

This PR doesn't change any defaults, but I believe this should be the defaults for the presets:

SD3: timestep_shift = 3.0
Flux: dynamic_timestep_shifting = True

Here are some examples:
LOGIT_NORMAL distribution with timestep_shift == 1.8, which is what dynamic shifting calculates for 512x512 resolution:
image

LOGIT_NORMAL with 3.15, which is what dynamic shifting calculates for 1024x1024:
image

This is a UNIFORM distribution with the same timestep shift of 1.8:
image

@arcteryox
Copy link

Can you also add support for HunyuanVideo in BaseHunyuanVideoSetup.py? Maybe with some reasonable defaults for timestep shift?

@dxqbYD
Copy link
Collaborator Author

dxqbYD commented Jan 30, 2025

Can you also add support for HunyuanVideo in BaseHunyuanVideoSetup.py? Maybe with some reasonable defaults for timestep shift?

Shifting should already work for HV. There is no change in the model setup necessary for shifting, that's only required for dynamic shifting. But there is no reason to believe that HV has trained their model using resolution-dependant shifting, and I wouldn't know the right parameters for that.

The scheduler configuration of HV uses a shift value of 7.0 for inference here:
https://huggingface.co/hunyuanvideo-community/HunyuanVideo/blob/main/scheduler/scheduler_config.json

That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it?

edit:
there are actually parameters for dynamic shifting in the HV scheduler configuration. But they haven't enabled it. strange.

@arcteryox
Copy link

arcteryox commented Jan 30, 2025

That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it?

Yes, I can test it later today - will set timestep shift to 7.0 and keep dynamic timestep shifting off. Which distribution should I use? 'LOGIT_NORMAL'?

@dxqbYD
Copy link
Collaborator Author

dxqbYD commented Jan 30, 2025

That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it?

Yes, I can test it later today - will set timestep shift to 7.0 and keep dynamic timestep shifting off. Which distribution should I use? 'LOGIT_NORMAL'?

yes, Nerogar has quote their paper that they have used LOGIT_NORMAL for training. it's not clear if they have used shifting for training.

@dxqbYD
Copy link
Collaborator Author

dxqbYD commented Jan 31, 2025

Note: because there are open points above, I haven't tested the most-recent commit. There could be a typo, don't merge without a final test.

@dxqbYD dxqbYD requested a review from Nerogar February 2, 2025 21:48
@arcteryox
Copy link

arcteryox commented Feb 5, 2025

I performed some tests with HV using UNIFORM distribution and a timestep shift of 3 and results look good so far. A shift of 7 was too high.

@dxqbYD
Copy link
Collaborator Author

dxqbYD commented Feb 8, 2025

Thanks!
The only open point in this PR I see is here:
#653 (comment)

@Nerogar
Copy link
Owner

Nerogar commented Feb 8, 2025

I think it's fine now. Let's see how future models implement shifting and then decide how to properly implement dynamic shifting

@Nerogar Nerogar merged commit 11c9987 into Nerogar:master Feb 8, 2025
1 check passed
@dxqbYD dxqbYD deleted the shift branch February 10, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants