-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add more options to the
unshard_checkpoint
function to help scale (#…
…145) I was getting worried about unsharding really big checkpoints like for the 32B, which we'll need to do soon. The main issue at the moment is that in order to unshard we need to load the entire model (or optimizer state) in memory, which clearly isn't scalable. So I've added an option to unshard the checkpoint into chunks of a given size which helps scale because only a single chunk (which could be as small as a single tensor) needs to load into memory at a time. Each chunk gets written to a unique file. I think HuggingFace does something similar. Note: this is not supported for optimizer state yet. But, speaking of optimizer state, this PR also adds a function called `load_keys()` for loading (and unsharding) specific keys from a checkpoint. So if you want to inspect part of the optimizer state, you could use that function without having to unshard the whole optimizer state.
- Loading branch information
Showing
5 changed files
with
305 additions
and
61 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.