title | date | comments | categories | tags | |||
---|---|---|---|---|---|---|---|
Adversarial Robustness for Deep Learning with Trends and Challenges |
2020-01-13 05:30:00 -0800 |
true |
|
|
Pin-Yu Chen 2020 Jan 13rd
This is a note for a speech in sinica
deepfake what is the wrong with this model? Adversarial examples Prediction -evasive attacks on an AI model
- crisis in trust/black-box learning:inconsistent perception and decision making between humans and machines
- Implications to security-critical tasks
- Limitation in current machine
- loss curve
Adverarial Robustness: (1)Attack: 1stand 0th order(gradient-free) optimization (2)Defense : Robust optimization (3)Evaluation and Certification (4)Interpretability :Deep Learning
Accuracy is not Adversarial Robustness More accuracy model increasing get in trouble
Why we need? we need to trust AI whenever a NN , there is a way to adversarial.
Adversarial examples: 1.Image captioning 2.speech recoginition 3.Data regression 4.phtsical world : real time traffic sign detector , 3D-printed adversarial turtle, Adversarial patch, Adversarial T-shirt
Holistic View of Adversarial Robustness data -> Model ------------> Inference
Evasion attack in Model and inferense examples 👍
- white box attack: standard white-box, adaptive white box(defense-aware) 已知模型
- Black-box : 不知模型
- Transfer attack(black box)
- Gray Box attack
Generate the box : Great BP (1) Attack Attack formulation: Auto ZOOM(Query REdemptions) minimize distance(看起來越像越好) minimize{$x_0,x_o+\delta + \lambda * loss(x_0, \delta)$} key technique : gradient estimation from system outputs
method : Zoo, ZOO+AE, AUTOZOO+BLI, AUTOZOO+AE
zeroth-order(ZO) optimization
Generating Contrastive Explanations PP(pertinent positive): PN(pertinent negative) (2) defense (attack in some method : change pixels) robustness evaluation : how close a refence input is to the closest decision boundary.(保證某個球內都是某一張圖的label) 對機器的adversarial examples
where we are and go
- Data augumentation with adversarial examples
- standard training to robust training:minmax bad in DNN
- input transformation, correction and anomaly detection: many are bypassed by advanced attacks loss accurcy and give robustness
- New learning model and training loss : slow progress
- Model with diversity : model ensembles and model with randomness (complicated)
- Domain and task-specific defenses : case-by-case, not automated
- Combineation of all the effective methods : system design
Method of Our defense : 切割前半段,利用語音連續性去判斷 簡單的:可以在輸入的時候加入一定的誤差,能夠增加模型穩定性
SPROUT : self-preprogessing training with Dirichlet label smoothing
How do we evalute adversarial robustness?
- Game-based approach
- Verification-based approach CNN-cert
high-accuracy