title

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real-world datasets.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

agarwal21c

0

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

110

119

110-119

110

false

Agarwal, Sushant and Jabbari, Shahin and Agarwal, Chirag and Upadhyay, Sohini and Wu, Steven and Lakkaraju, Himabindu

given	family
Sushant	Agarwal

given	family
Shahin	Jabbari

given	family
Chirag	Agarwal

given	family
Sohini	Upadhyay

given	family
Steven	Wu

given	family
Himabindu	Lakkaraju

2021-07-01

Proceedings of the 38th International Conference on Machine Learning

139

inproceedings

date-parts

2021

7

1

http://proceedings.mlr.press/v139/agarwal21c/agarwal21c.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021-07-01-agarwal21c.md

2021-07-01-agarwal21c.md

Files

2021-07-01-agarwal21c.md

Latest commit

History

2021-07-01-agarwal21c.md

File metadata and controls