Evaluation

This wiki page lists methods and ideas that can be used to score models with respect to their robustness against adversarial attacks.

Linear Combinations

Feed linear combinations of two inputs and check whether the classification around the samples is correct. Determine the distance from an image (when linearly approaching an image of another class) of the first miss-classified input. Analyze how noisy the classifications along the line are.
This figure plots classification over linear combination between a "1" and a "0" sample from the training data. Our first experiments can be found here.

Goodfellow is presenting a similar thing here and shows that the classification will work just fine in most directions except for a few. Therefore, the linear combination method might not be efficient in terms of spotting vulnerabilities of a model.

Activations

Plotting a histogram of activations and getting a sense for how they behave differently when feeding adversarial examples vs. normal samples.

Plotting FGSM vs. Orthogonal Vector

Plotting the classification in two dimensions, where the first is given by the FGSM attack and the second is orthogonal. A plot of that kind can be seen in the image below. The validation is being done based on what these individual plots look like. Ideally, after some regularization, there would not be these two, distinct halves anymore.
screen shot 2018-06-21 at 13 50 18

Machine Learning
Infrastructure
- Google Cloud Platform Analysis
- Training Pipeline
Challenge
- Submissions
Misc
- Repository Conventions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation

Linear Combinations

Activations

Plotting FGSM vs. Orthogonal Vector

Clone this wiki locally