Added --blocks_to_swap_while_sampling, may allow faster sample image generation #2056

araleza · 2025-04-20T10:48:10Z

I've recently switched over to doing Flux full fine tuning instead of LoRA training. But I've found that sample image generation while training is very slow. I'm using --blocks_to_swap 35 which lets me have a batch size of 5. This block-swapping persists during sample inference, increasing sample image generation time.

The reason that block swapping is useful while training is because it saves a lot of VRAM, allowing a larger batch size, and for the memory needed for the optimizer's structures, e.g. momentum. But these are not required when doing sample image inference. If I open up nvtop I can see that my VRAM is mostly unused during this time.

This new option allows the number of blocks to swap to be set to a lower number while generating sample images. This may allow faster image generation. e.g. On my current setup where I'm using 50 sampling steps per image, putting --blocks_to_swap_while_sampling 2 reduces the time per image from around 3 minutes to around 1 minute 48 secs. That might not sound like a big difference at first, but if I generate around 100 images over a run, this saves around 2 hours in total.

…ameter.

rockerBOO · 2025-04-22T21:02:04Z

Maybe it would be better to not have the blocks_to_swap_while_sampling in the forward and instead would be a model method to call instead or to change the block swap value specifically. This way the forward doesn't have new parameters that could cause side effects later with needing to pass more parameters down to the model that aren't specifically inputs.

kohya-ss · 2025-04-27T12:46:25Z

I'm sorry it took me so long to check.

rockerBOO has a point.

I think there may be a way to extend ModelOffloader and switch the number of blocks depending on whether the model is train or eval. submit_move_blocks, wait_for_block, and prepare_block_devices_before_forward may be able to adjust the number of blocks by receiving the model as an additional argument.

araleza · 2025-04-27T12:48:57Z

Thanks for the reviews, @rockerBOO and @kohya-ss . I'll take a look at the code soon and try to include your suggestions for improvement. :)

Added --blocks_to_swap_while_sampling as an optional command line par…

dde9936

…ameter.

araleza mentioned this pull request Apr 20, 2025

support SD3 #1374

Draft

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added --blocks_to_swap_while_sampling, may allow faster sample image generation #2056

Added --blocks_to_swap_while_sampling, may allow faster sample image generation #2056

araleza commented Apr 20, 2025

rockerBOO commented Apr 22, 2025

kohya-ss commented Apr 27, 2025

araleza commented Apr 27, 2025

Added --blocks_to_swap_while_sampling, may allow faster sample image generation #2056

Are you sure you want to change the base?

Added --blocks_to_swap_while_sampling, may allow faster sample image generation #2056

Conversation

araleza commented Apr 20, 2025

rockerBOO commented Apr 22, 2025

kohya-ss commented Apr 27, 2025

araleza commented Apr 27, 2025