Why is Gradient Checkpointing Not Implemented in Training? #399

kostum123 · 2024-11-05T06:31:30Z

Checks

This template is only for question, not feature requests or bug reports.
I have thoroughly reviewed the project documentation and read the related paper(s).
I have searched for existing issues, including closed ones, no similar questions.
I confirm that I am using English to submit this report in order to facilitate communication.

Question details

It appears that gradient checkpointing is not implemented in the current training pipeline. Gradient checkpointing can significantly reduce memory usage by trading off computation, making it valuable for large models and resource-limited environments. This raises the question:

Is there a specific reason for not implementing gradient checkpointing?
If possible, could it be integrated in future updates, or are there known limitations that prevent its integration? If there is no compatibility issue, I would be open to exploring the possibility of adding it via a PR.

ZhikangNiu · 2024-11-05T06:34:21Z

Yeah, I think you can explore the gradient checkpointing in F5 and add it via a PR.

Fixes SWivid#399 Implement gradient checkpointing in the training pipeline. * **Model Backbones**: - Import `checkpoint` from `torch.utils.checkpoint` in `src/f5_tts/model/backbones/dit.py`, `src/f5_tts/model/backbones/unett.py`, and `src/f5_tts/model/backbones/mmdit.py`. - Add a parameter `use_checkpointing` to the constructors of `DiT`, `UNetT`, and `MMDiT` classes, defaulting to `False`. - Modify the `forward` methods to use `checkpoint` for each block if `use_checkpointing` is `True`. * **Trainer**: - Add a parameter `use_checkpointing` to the `Trainer` class constructor in `src/f5_tts/model/trainer.py`, defaulting to `False`. - Modify the `train` method to enable gradient checkpointing if `use_checkpointing` is `True`. * **Training Script**: - Add a parameter `use_checkpointing` to the `Trainer` instantiation in `src/f5_tts/train/train.py`, defaulting to `False`.

kostum123 added the question Further information is requested label Nov 5, 2024

kostum123 linked a pull request Nov 5, 2024 that will close this issue

Add gradient checkpointing to training pipeline #400

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is Gradient Checkpointing Not Implemented in Training? #399

Why is Gradient Checkpointing Not Implemented in Training? #399

kostum123 commented Nov 5, 2024

ZhikangNiu commented Nov 5, 2024

Why is Gradient Checkpointing Not Implemented in Training? #399

Why is Gradient Checkpointing Not Implemented in Training? #399

Comments

kostum123 commented Nov 5, 2024

Checks

Question details

ZhikangNiu commented Nov 5, 2024