Add a gradient checkpoint feature #20720

pass-lin · 2025-01-03T17:23:27Z

Gradient checkpoint is a widely used technique to reduce memory consumption.
Now we are adapting it for Keras. To make minimal modifications to existing models, we add a parameter enable_gradient_checkpoint to the layer, which is set to False by default. By simply changing this parameter, we can enable gradient checkpointing. However, for specific implementations depending on different backends, the following points need to be considered:
In the Torch backend, you should ensure that there are no dropout layers or normalization layers (such as BN, LN, GN, etc.) with inconsistent forward and backward behaviors in the layer of the function you're starting.
In the TensorFlow backend, you can only enable this setting in eager mode.
In the JAX backend, you should ensure that there are no strings or other non-differentiable JAX vaild types in the inputs of your function.Such as str

codecov-commenter · 2025-01-04T02:03:16Z

Codecov Report

Attention: Patch coverage is 23.07692% with 10 lines in your changes missing coverage. Please review.

Project coverage is 81.91%. Comparing base (41c429e) to head (487ac08).

Files with missing lines	Patch %	Lines
keras/src/ops/operation.py	16.66%	9 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20720      +/-   ##
==========================================
- Coverage   81.93%   81.91%   -0.02%     
==========================================
  Files         548      548              
  Lines       51190    51203      +13     
  Branches     7912     7916       +4     
==========================================
+ Hits        41942    41945       +3     
- Misses       7310     7319       +9     
- Partials     1938     1939       +1

Flag	Coverage Δ
keras	`81.74% <23.07%> (-0.02%)`	⬇️
keras-jax	`63.98% <15.38%> (-0.02%)`	⬇️
keras-numpy	`58.92% <15.38%> (-0.02%)`	⬇️
keras-openvino	`29.86% <15.38%> (-0.01%)`	⬇️
keras-tensorflow	`64.67% <23.07%> (-0.02%)`	⬇️
keras-torch	`64.05% <15.38%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet · 2025-01-04T22:54:22Z

Thanks for the PR! @divyashreepathihalli has a currently outstanding proposal for implementing gradient checkpointing in Keras across backends. I will you two figure out what to do.

pass-lin · 2025-01-05T04:54:30Z

Thanks for the PR! @divyashreepathihalli has a currently outstanding proposal for implementing gradient checkpointing in Keras across backends. I will you two figure out what to do.

Thank you. So my understanding is that we have a conflict? The absence of the gradient checkpoints function in Keras is very troublesome for training LLMs. As long as Keras can have this feature as soon as possible, I can follow your arrangement to implement it.

divyashreepathihalli · 2025-01-06T18:24:13Z

@pass-lin thank you for the PR, we appreciate your effort to bring rematerialization support to Keras. However, we are working on adding this feature and we are trying to add more fine grained control for enabling rematerialization with a mode parameter - more details here - https://docs.google.com/document/d/199s5kaT7fdqDJ5ryJ15aJJH8QIiLvPBYPpb3ZJgPEsE/edit?tab=t.0#heading=h.lleqmh1k4q6g
Once the initial design is worked out, I can open a contribution issue . If you’re interested, you can take up the issue at that point. I’ll make sure to keep you posted as things progress. Closing this PR for now. Thanks again!

pass-lin added 2 commits January 3, 2025 23:27

Add a gradient checkpoint feature

657e84a

reformat code

9c68e11

google-ml-butler bot added the size:S label Jan 3, 2025

google-ml-butler bot assigned gbaned Jan 3, 2025

pass-lin force-pushed the master branch from eae8dfd to 9c68e11 Compare January 3, 2025 17:28

reformat code

5e70bfb

reformat code

487ac08

divyashreepathihalli closed this Jan 6, 2025

pass-lin mentioned this pull request Jan 11, 2025

About Multi-Backend Implementation of Gradient Checkpointing question #19003

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a gradient checkpoint feature #20720

Add a gradient checkpoint feature #20720

pass-lin commented Jan 3, 2025

codecov-commenter commented Jan 4, 2025 •

edited

Loading

fchollet commented Jan 4, 2025

pass-lin commented Jan 5, 2025 •

edited

Loading

divyashreepathihalli commented Jan 6, 2025

Add a gradient checkpoint feature #20720

Add a gradient checkpoint feature #20720

Conversation

pass-lin commented Jan 3, 2025

codecov-commenter commented Jan 4, 2025 • edited Loading

Codecov Report

fchollet commented Jan 4, 2025

pass-lin commented Jan 5, 2025 • edited Loading

divyashreepathihalli commented Jan 6, 2025

codecov-commenter commented Jan 4, 2025 •

edited

Loading

pass-lin commented Jan 5, 2025 •

edited

Loading