Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for gradient accumulation test #125

Merged
merged 3 commits into from
Dec 4, 2023
Merged

Conversation

achalddave
Copy link
Collaborator

Earlier, we used deepcopy, which seems to copy the tensor python objects, but keeps a pointer to the same tensor, meaning we were not properly testing gradient accumulation. This commit mainly fixes that, using load_state_dict() instead to make a model copy.

Unfortunately, the test fails once we do this, and a few changes are necessary to make it pass:

  1. Switch AdamW -> SGD
  2. Switch to float32

It's unclear why this was necessary, and solving this is left for a future commit.

Earlier, we used deepcopy, which seems to copy the tensor python
objects, but keeps a pointer to the same tensor, meaning we were not
properly testing gradient accumulation. This commit mainly fixes that,
using load_state_dict() instead to make a model copy.

Unfortunately, the test fails once we do this, and a few changes are
necessary to make it pass:

1. Switch AdamW -> SGD
2. Switch to float32

It's unclear why this was necessary, and solving this is left for a
future commit.
@achalddave
Copy link
Collaborator Author

Created #126 to keep track of the AdamW + grad accum issue.

Copy link
Collaborator

@GeorgiosSmyrnis GeorgiosSmyrnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - let's keep track of the issue #126 and see what's going on.

@GeorgiosSmyrnis GeorgiosSmyrnis merged commit c057de9 into main Dec 4, 2023
2 checks passed
@achalddave achalddave deleted the achal/fsdp-test branch December 11, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants