Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix (scaling/standalone): better switch from runtime stats to param #1099

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

Giuseppe5
Copy link
Collaborator

@Giuseppe5 Giuseppe5 commented Nov 21, 2024

Reason for this PR

Currently, if we switch from training to eval before stats_collection_steps is done, we never update the value parameter to store the buffer value. This has a few side effects:

  • When applying learned round, we might keep the model in eval model but still accumulate gradients. If the value parameter is not being used, no gradients are accumulated
  • When exporting state_dict, value is not exported
  • When doing PTQ calibration, current setup is such that the buffer is never converted to its corresponding parameter value, causing some of the issues mentioned above.

Changes Made in this PR

At eval time, during the first iteration the buffer is always converted to buffer.
The side effect of this happens in the case the user would want to switch multiple times between training/evaluation mode very early on in the training process. Although it is common to switch between training/eval to check loss one val set, it is usually done after enough iteration that the buffer has already been converted to parameter anyway.
I'd admit that it could be marked as breaking change for this edge cases.

This has been removed in a more recent commit. I believe there are no more breaking changes at this point.

Testing Summary

Risk Highlight

  • This PR includes code from another work (please detail).
  • This PR contains API-breaking changes.
  • This PR depends on work in another PR (please provide links/details).
  • This PR introduces new dependencies (please detail).
  • There are coverage gaps not covered by tests.
  • Documentation updates required in subsequent PR.

Checklist

  • Code comments added to any hard-to-understand areas, if applicable.
  • Changes generate no new warnings.
  • Updated any relevant tests, if applicable.
  • No conflicts with destination dev branch.
  • I reviewed my own code changes.
  • Initial CI/CD passing.
  • 1+ reviews given, and any review issues addressed and approved.
  • Post-review full CI/CD passing.

@Giuseppe5 Giuseppe5 requested review from nickfraser and removed request for nickfraser November 25, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant