-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model doesn't train even when ModuleValidator.validate yields no errors #672
Comments
Hi. Thanks for raising this! We are currently working on fixing these incompatibility issues when using Expanded Weights. So, I would suggest you to try the hooks mode -- identify which part of the model uses buffers and try to replace it with similar non-buffered modules. |
Hi, thank you so much for answering!
Thanks again! |
For now, we rely on both model ModuleValidator and GradSampleModule.validate() to check the compatibility. For the latter, under the strict mode, GSM will throw an error when the module includes a buffer (https://github.com/pytorch/opacus/blob/main/opacus/grad_sample/grad_sample_module.py#L108). The error can be muted by setting strict = False. |
|
🐛 Bug
We're trying to privately fine-tune a ViT B/16 model (link) with CIFAR-10 data. The non-private version uses
MultiHeadAttention
which is not compatible with DP. This compatibility issue is fixed when we useModuleValidator.fix
and it changes toDPMultiHeadAttention
. Also, theModuleValidator.validate
function yields no errors. However, the model fails to train and throws the following error:[NotImplementedError("Model contains a trainable layer with buffers that Opacus doesn't currently support
To fix this, I referred to a previous issue #454 and changed the hook style to "ew" for Expanded Weights. The model, optimizer, and train_loader are created with no errors, but in the training loop, another error shows up:
RuntimeError: Expanded Weights encountered but cannot handle function view
I don't know how to proceed from here. Any help is appreciated. Thank you!
To Reproduce
Colab link: Colab
Steps to reproduce the behavior:
Expected behavior
I expect the ViT/B16 model to be ready to train, especially after
ModuleValidator.validate
doesn't show any errors with the architecture and its modules.Environment
conda
,pip
, source):conda
The text was updated successfully, but these errors were encountered: