-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adversarial examples #2
Comments
Indeed many attacks are based on gradients, but usually that means for classification, you just need the logit vector before softmax, then automatic differentiation will work it out for you (if you use tensorflow or pytorch). I can definitely help if you want since I already have some code to do it. There are some ideas to attack Bayesian methods that actually require more than the logit vector. This might be slightly involved to implement... |
Thanks @YingzhenLi! def grad_logp(self, x):
"""
The gradient of the log predictive probabilities wrt x, a single input.
If x is shape (D,) then the output is shape (K, D), where K is the number of classes
""" or def grad_logp(self, X):
"""
The gradient of the log predictive probabilities wrt X, elementwise over the number of samples
If x is shape (N, D), then the output is shape (N, K, D), where K is the number of classes
""" Do you know of any references for continuous output models? |
Also not all models use the softmax (e.g. robust max and probit), so any evaluation should be agnostic to the link. |
Some attacks do need logit (like Carlini-Wagner L2), although I suspect putting the values before robust max and probit might work. Other attacks like FGSM/PGD/MIM only need the output probability vector. My code looks something like the following:
Then set softmax=True or False depending on the attack in use. |
A key work in this area is https://github.com/YingzhenLi/Dropout_BBalpha
A problem with implementing this method here is that it needs model gradients. Either we could build a task that supports multiple backends (not ideal) and get the gradients directly, or the model could provide its own gradients which could be manipulated in numpy. @YingzhenLi any thoughts?
The text was updated successfully, but these errors were encountered: