You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am intrigued by your work that demonstrates the effectiveness of ZO optimization in training large-scale models. Your main experiments on CIFAR-10 using ResNet20 show that it takes approximately 60 minutes per epoch (a result I have successfully replicated).
However, your framework, Deepzero, utilizes CGE, which causes the inference time to increase linearly with the model size. In Appendix D, Table A3, you reported training ResNet18, whose model size is approximately ten times larger than ResNet20. I am curious about how long it took to train ResNet18 using the Deepzero framework.
The text was updated successfully, but these errors were encountered:
I am intrigued by your work that demonstrates the effectiveness of ZO optimization in training large-scale models. Your main experiments on CIFAR-10 using ResNet20 show that it takes approximately 60 minutes per epoch (a result I have successfully replicated).
However, your framework, Deepzero, utilizes CGE, which causes the inference time to increase linearly with the model size. In Appendix D, Table A3, you reported training ResNet18, whose model size is approximately ten times larger than ResNet20. I am curious about how long it took to train ResNet18 using the Deepzero framework.
The text was updated successfully, but these errors were encountered: