adversarial examples #2

hughsalimbeni · 2018-06-14T16:30:02Z

A key work in this area is https://github.com/YingzhenLi/Dropout_BBalpha

A problem with implementing this method here is that it needs model gradients. Either we could build a task that supports multiple backends (not ideal) and get the gradients directly, or the model could provide its own gradients which could be manipulated in numpy. @YingzhenLi any thoughts?

YingzhenLi · 2018-06-15T07:59:34Z

Indeed many attacks are based on gradients, but usually that means for classification, you just need the logit vector before softmax, then automatic differentiation will work it out for you (if you use tensorflow or pytorch). I can definitely help if you want since I already have some code to do it.

There are some ideas to attack Bayesian methods that actually require more than the logit vector. This might be slightly involved to implement...

hughsalimbeni · 2018-06-15T10:57:51Z

Thanks @YingzhenLi!
Certainly we want to use autodiff, but what I'm unsure of is whether the generic testing code should take the gradients itself, or whether we should just require all models to implement something like

def grad_logp(self, x):
    """
    The gradient of the log predictive probabilities wrt x, a single input. 
    If x is shape (D,) then the output is shape (K, D), where K is the number of classes 
    """

or

def grad_logp(self, X):
    """
    The gradient of the log predictive probabilities wrt X, elementwise over the number of samples
    If x is shape (N, D), then the output is shape (N, K, D), where K is the number of classes 
    """

Do you know of any references for continuous output models?

hughsalimbeni · 2018-06-15T11:29:06Z

Also not all models use the softmax (e.g. robust max and probit), so any evaluation should be agnostic to the link.

YingzhenLi · 2018-06-15T12:12:49Z

Some attacks do need logit (like Carlini-Wagner L2), although I suspect putting the values before robust max and probit might work.

Other attacks like FGSM/PGD/MIM only need the output probability vector.

My code looks something like the following:

def predict(self, X, softmax=True):
    y = self.model(X)    # value before sofmax
    if softmax:
        y = tf.nn.softmax(y)
    return y

Then set softmax=True or False depending on the attack in use.

hughsalimbeni added enhancement New feature or request help wanted Extra attention is needed labels Jun 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adversarial examples #2

adversarial examples #2

hughsalimbeni commented Jun 14, 2018

YingzhenLi commented Jun 15, 2018

hughsalimbeni commented Jun 15, 2018

hughsalimbeni commented Jun 15, 2018

YingzhenLi commented Jun 15, 2018

adversarial examples #2

adversarial examples #2

Comments

hughsalimbeni commented Jun 14, 2018

YingzhenLi commented Jun 15, 2018

hughsalimbeni commented Jun 15, 2018

hughsalimbeni commented Jun 15, 2018

YingzhenLi commented Jun 15, 2018