Skip to content

hw2: NN Modules

Alex Nguyen edited this page Nov 2, 2022 · 1 revision

BatchNorm and LayerNorm, SGD and Adam Screen Shot 2022-11-01 at 20 00 43

test_mlp_eval_epoch_1 Screen Shot 2022-11-01 at 20 12 57

test_mlp_train_mnist_1 Screen Shot 2022-11-01 at 20 14 35


Cần làm rõ

https://forum.dlsyscourse.org/t/calling-detach-to-reduce-tensor-count-causes-hw1-test-compute-gradient-fail/2469

Calling detach() in backward will fail test gradient_of_gradient. I don't think it's important, since we never do gradient_of_gradient in optimization.

Screen Shot 2022-10-31 at 22 15 40

Don't know why tensor count of the test case is exactly 1132? Screen Shot 2022-10-31 at 22 24 37

Screen Shot 2022-11-02 at 04 54 14
Clone this wiki locally