Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adversarial examples #2

Open
hughsalimbeni opened this issue Jun 14, 2018 · 4 comments
Open

adversarial examples #2

hughsalimbeni opened this issue Jun 14, 2018 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@hughsalimbeni
Copy link
Owner

A key work in this area is https://github.com/YingzhenLi/Dropout_BBalpha

A problem with implementing this method here is that it needs model gradients. Either we could build a task that supports multiple backends (not ideal) and get the gradients directly, or the model could provide its own gradients which could be manipulated in numpy. @YingzhenLi any thoughts?

@hughsalimbeni hughsalimbeni added enhancement New feature or request help wanted Extra attention is needed labels Jun 14, 2018
@YingzhenLi
Copy link

Indeed many attacks are based on gradients, but usually that means for classification, you just need the logit vector before softmax, then automatic differentiation will work it out for you (if you use tensorflow or pytorch). I can definitely help if you want since I already have some code to do it.

There are some ideas to attack Bayesian methods that actually require more than the logit vector. This might be slightly involved to implement...

@hughsalimbeni
Copy link
Owner Author

Thanks @YingzhenLi!
Certainly we want to use autodiff, but what I'm unsure of is whether the generic testing code should take the gradients itself, or whether we should just require all models to implement something like

def grad_logp(self, x):
    """
    The gradient of the log predictive probabilities wrt x, a single input. 
    If x is shape (D,) then the output is shape (K, D), where K is the number of classes 
    """

or

def grad_logp(self, X):
    """
    The gradient of the log predictive probabilities wrt X, elementwise over the number of samples
    If x is shape (N, D), then the output is shape (N, K, D), where K is the number of classes 
    """

Do you know of any references for continuous output models?

@hughsalimbeni
Copy link
Owner Author

Also not all models use the softmax (e.g. robust max and probit), so any evaluation should be agnostic to the link.

@YingzhenLi
Copy link

Some attacks do need logit (like Carlini-Wagner L2), although I suspect putting the values before robust max and probit might work.

Other attacks like FGSM/PGD/MIM only need the output probability vector.

My code looks something like the following:

def predict(self, X, softmax=True):
    y = self.model(X)    # value before sofmax
    if softmax:
        y = tf.nn.softmax(y)
    return y

Then set softmax=True or False depending on the attack in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants