-
Notifications
You must be signed in to change notification settings - Fork 1
Evaluation
This wiki page lists methods and ideas that can be used to score models with respect to their robustness against adversarial attacks.
Feed linear combinations of two inputs and check whether the classification around the samples is correct. Determine the distance from an image (when linearly approaching an image of another class) of the first miss-classified input. Analyze how noisy the classifications along the line are.
This figure plots classification over linear combination between a "1" and a "0" sample from the training data. Our first experiments can be found here.
Goodfellow is presenting a similar thing here and shows that the classification will work just fine in most directions except for a few. Therefore, the linear combination method might not be efficient in terms of spotting vulnerabilities of a model.
Plotting a histogram of activations and getting a sense for how they behave differently when feeding adversarial examples vs. normal samples.
- Machine Learning
- Infrastructure
- Challenge
- Misc