-
Notifications
You must be signed in to change notification settings - Fork 1
Evaluation
This wiki page lists methods and ideas that can be used to score models with respect to their robustness against adversarial attacks.
Feed linear combinations of two inputs and check whether the classification around the samples is correct. Determine the distance from an image (when linearly approaching an image of another class) of the first miss-classified input. Analyze how noisy the classifications along the line are.
This figure plots classification over linear combination between a "1" and a "0" sample from the training data. Our first experiments can be found here.
Goodfellow is presenting a similar thing here and shows that the classification will work just fine in most directions except for a few. Therefore, the linear combination method might not be efficient in terms of spotting vulnerabilities of a model.
Plotting a histogram of activations and getting a sense for how they behave differently when feeding adversarial examples vs. normal samples.
Plotting the classification in two dimensions, where the first is given by the FGSM attack and the second is orthogonal. A plot of that kind can be seen in the image below. The validation is being done based on what these individual plots look like. Ideally, after some regularization, there would not be these two, distinct halves anymore.
- Machine Learning
- Infrastructure
- Challenge
- Misc