Fixes for gradient accumulation test #125

achalddave · 2023-11-30T19:00:53Z

Earlier, we used deepcopy, which seems to copy the tensor python objects, but keeps a pointer to the same tensor, meaning we were not properly testing gradient accumulation. This commit mainly fixes that, using load_state_dict() instead to make a model copy.

Unfortunately, the test fails once we do this, and a few changes are necessary to make it pass:

Switch AdamW -> SGD
Switch to float32

It's unclear why this was necessary, and solving this is left for a future commit.

Earlier, we used deepcopy, which seems to copy the tensor python objects, but keeps a pointer to the same tensor, meaning we were not properly testing gradient accumulation. This commit mainly fixes that, using load_state_dict() instead to make a model copy. Unfortunately, the test fails once we do this, and a few changes are necessary to make it pass: 1. Switch AdamW -> SGD 2. Switch to float32 It's unclear why this was necessary, and solving this is left for a future commit.

achalddave · 2023-11-30T19:03:00Z

Created #126 to keep track of the AdamW + grad accum issue.

GeorgiosSmyrnis

LGTM - let's keep track of the issue #126 and see what's going on.

achalddave added 2 commits November 29, 2023 22:53

Use load_state_dict instead of copy.deepcopy

f2d9dac

achalddave requested a review from jfisher52 November 30, 2023 19:00

achalddave mentioned this pull request Nov 30, 2023

Figure out why AdamW + gradient accumulation leads to different results for test case #126

Open

achalddave mentioned this pull request Dec 1, 2023

Support for initializing models on meta device #127

Merged

Merge branch 'main' into achal/fsdp-test

dabbcb4

GeorgiosSmyrnis approved these changes Dec 4, 2023

View reviewed changes

GeorgiosSmyrnis merged commit c057de9 into main Dec 4, 2023
2 checks passed

achalddave deleted the achal/fsdp-test branch December 11, 2023 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for gradient accumulation test #125

Fixes for gradient accumulation test #125

achalddave commented Nov 30, 2023

achalddave commented Nov 30, 2023

GeorgiosSmyrnis left a comment

Fixes for gradient accumulation test #125

Fixes for gradient accumulation test #125

Conversation

achalddave commented Nov 30, 2023

achalddave commented Nov 30, 2023

GeorgiosSmyrnis left a comment

Choose a reason for hiding this comment