-
Notifications
You must be signed in to change notification settings - Fork 1
hw2: NN Modules
Alex Nguyen edited this page Nov 2, 2022
·
1 revision
BatchNorm and LayerNorm, SGD and Adam
Calling detach() in backward will fail test gradient_of_gradient. I don't think it's important, since we never do gradient_of_gradient in optimization.
Don't know why tensor count of the test case is exactly 1132?