Timestep shifting #653

dxqbYD · 2025-01-24T00:50:58Z

This is an implementation of timestep shifting for training, as is done by FlowMatchEulerDiscreteScheduler during inference.
An explanation why I think this is correct you can find here: #615

I am getting more and more convinved of this, because many of my own training experiments have failed or would have failed if I didn't manually mess with the timestep distribution first. The default timestep distribution often cannot change the image composition enough.

New training parameters:

This PR doesn't change any defaults, but I believe this should be the defaults for the presets:

SD3: timestep_shift = 3.0
Flux: dynamic_timestep_shifting = True

Here are some examples:
LOGIT_NORMAL distribution with timestep_shift == 1.8, which is what dynamic shifting calculates for 512x512 resolution:

LOGIT_NORMAL with 3.15, which is what dynamic shifting calculates for 1024x1024:

This is a UNIFORM distribution with the same timestep shift of 1.8:

modules/modelSampler/FluxSampler.py

modules/modelSetup/mixin/ModelSetupNoiseMixin.py

arcteryox · 2025-01-30T14:33:13Z

Can you also add support for HunyuanVideo in BaseHunyuanVideoSetup.py? Maybe with some reasonable defaults for timestep shift?

dxqbYD · 2025-01-30T15:09:22Z

Can you also add support for HunyuanVideo in BaseHunyuanVideoSetup.py? Maybe with some reasonable defaults for timestep shift?

Shifting should already work for HV. There is no change in the model setup necessary for shifting, that's only required for dynamic shifting. But there is no reason to believe that HV has trained their model using resolution-dependant shifting, and I wouldn't know the right parameters for that.

The scheduler configuration of HV uses a shift value of 7.0 for inference here:
https://huggingface.co/hunyuanvideo-community/HunyuanVideo/blob/main/scheduler/scheduler_config.json

That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it?

edit:
there are actually parameters for dynamic shifting in the HV scheduler configuration. But they haven't enabled it. strange.

arcteryox · 2025-01-30T15:20:03Z

That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it?

Yes, I can test it later today - will set timestep shift to 7.0 and keep dynamic timestep shifting off. Which distribution should I use? 'LOGIT_NORMAL'?

dxqbYD · 2025-01-30T15:21:24Z

That seems quite extreme compared to Flux and SD3, but that's all we know. Can you test it?

Yes, I can test it later today - will set timestep shift to 7.0 and keep dynamic timestep shifting off. Which distribution should I use? 'LOGIT_NORMAL'?

yes, Nerogar has quote their paper that they have used LOGIT_NORMAL for training. it's not clear if they have used shifting for training.

dxqbYD · 2025-01-31T08:59:41Z

Note: because there are open points above, I haven't tested the most-recent commit. There could be a typo, don't merge without a final test.

arcteryox · 2025-02-05T19:47:03Z

I performed some tests with HV using UNIFORM distribution and a timestep shift of 3 and results look good so far. A shift of 7 was too high.

dxqbYD · 2025-02-08T08:35:18Z

Thanks!
The only open point in this PR I see is here:
#653 (comment)

Nerogar · 2025-02-08T19:55:16Z

I think it's fine now. Let's see how future models implement shifting and then decide how to properly implement dynamic shifting

dxqbYD added 2 commits January 24, 2025 01:11

Timestep shift

3964e0f

fix

3622271

dxqbYD commented Jan 24, 2025

View reviewed changes

modules/modelSampler/FluxSampler.py Show resolved Hide resolved

dxqbYD commented Jan 24, 2025

View reviewed changes

modules/modelSetup/mixin/ModelSetupNoiseMixin.py Outdated Show resolved Hide resolved

dxqbYD commented Jan 24, 2025

View reviewed changes

modules/modelSetup/mixin/ModelSetupNoiseMixin.py Show resolved Hide resolved

xzuyn mentioned this pull request Jan 24, 2025

[Bug]: Artifacts on left and bottom sides of images with SD3.5M LoRAs #651

Open

Nerogar requested changes Jan 28, 2025

View reviewed changes

modules/modelSetup/mixin/ModelSetupNoiseMixin.py Outdated Show resolved Hide resolved

modules/modelSetup/mixin/ModelSetupNoiseMixin.py Show resolved Hide resolved

modules/modelSetup/mixin/ModelSetupNoiseMixin.py Outdated Show resolved Hide resolved

minor changes based on PR comments

cb0fe4e

dxqbYD added 2 commits January 31, 2025 10:39

Merge branch 'Nerogar:master' into shift

01c4a5f

Merge branch 'Nerogar:master' into shift

a9e0095

dxqbYD requested a review from Nerogar February 2, 2025 21:48

add a shifting method for discrete distributions (sigmoid and cosmap)

7cbd79f

Nerogar merged commit 11c9987 into Nerogar:master Feb 8, 2025
1 check passed

dxqbYD mentioned this pull request Feb 9, 2025

Timestep distribution on Flow matching models using FlowMatchEulerDiscreteScheduler #615

Closed

dxqbYD deleted the shift branch February 10, 2025 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestep shifting #653

Timestep shifting #653

dxqbYD commented Jan 24, 2025 •

edited

Loading

arcteryox commented Jan 30, 2025

dxqbYD commented Jan 30, 2025 •

edited

Loading

arcteryox commented Jan 30, 2025 •

edited

Loading

dxqbYD commented Jan 30, 2025

dxqbYD commented Jan 31, 2025

arcteryox commented Feb 5, 2025 •

edited

Loading

dxqbYD commented Feb 8, 2025

Nerogar commented Feb 8, 2025

Timestep shifting #653

Timestep shifting #653

Conversation

dxqbYD commented Jan 24, 2025 • edited Loading

arcteryox commented Jan 30, 2025

dxqbYD commented Jan 30, 2025 • edited Loading

arcteryox commented Jan 30, 2025 • edited Loading

dxqbYD commented Jan 30, 2025

dxqbYD commented Jan 31, 2025

arcteryox commented Feb 5, 2025 • edited Loading

dxqbYD commented Feb 8, 2025

Nerogar commented Feb 8, 2025

dxqbYD commented Jan 24, 2025 •

edited

Loading

dxqbYD commented Jan 30, 2025 •

edited

Loading

arcteryox commented Jan 30, 2025 •

edited

Loading

arcteryox commented Feb 5, 2025 •

edited

Loading