Skip to content

Latest commit

 

History

History
58 lines (58 loc) · 2.58 KB

2022-06-28-canonne22a.md

File metadata and controls

58 lines (58 loc) · 2.58 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
The Price of Tolerance in Distribution Testing
We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution $p$ over $\{1, …, n\}$, is it $\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution $q$ (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e., $\varepsilon_1 = 0$) the sample complexity is $\Theta(\sqrt{n})$, strongly sublinear in the domain size. At the other end of the spectrum, when $\varepsilon_1 = \varepsilon_2/2$, the sample complexity jumps to the barely sublinear $\Theta(n/\log n)$. However, very little is known about the intermediate regime. We fully characterize the price of tolerance in distribution testing as a function of $n$, $\varepsilon_1$, $\varepsilon_2$, up to a single $\log n$ factor. Specifically, we show the sample complexity to be \[\tilde \Theta\mleft(\frac{\sqrt{n}}{\ve_2^{2}} + \frac{n}{\log n} \cdot \max \mleft\{\frac{\ve_1}{\ve_2^2},\mleft(\frac{\ve_1}{\ve_2^2}\mright)^{\!\!2}\mright\}\mright),\]{providing} a smooth tradeoff between the two previously known cases. We also provide a similar characterization for the problem of tolerant equivalence testing, where both $p$ and $q$ are unknown. Surprisingly, in both cases, the main quantity dictating the sample complexity is the ratio $\varepsilon_1/\varepsilon_2^2$, and not the more intuitive $\varepsilon_1/\varepsilon_2$. Of particular technical interest is our lower bound framework, which involves novel approximation-theoretic tools required to handle the asymmetry between $\varepsilon_1$ and $\varepsilon_2$, a challenge absent from previous works.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
canonne22a
0
The Price of Tolerance in Distribution Testing
573
624
573-624
573
false
Canonne, Clement L and Jain, Ayush and Kamath, Gautam and Li, Jerry
given family
Clement L
Canonne
given family
Ayush
Jain
given family
Gautam
Kamath
given family
Jerry
Li
2022-06-28
Proceedings of Thirty Fifth Conference on Learning Theory
178
inproceedings
date-parts
2022
6
28