Skip to content

Latest commit

 

History

History
55 lines (55 loc) · 2.19 KB

2022-06-28-gamlath22a.md

File metadata and controls

55 lines (55 loc) · 2.19 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Approximate Cluster Recovery from Noisy Labels
Designing algorithms for machine learning problems targeting beyond worst-case analysis and, in particular, analyzing the effect of side-information on the complexity of such problems is a very important line of research with many practical applications. In this paper we study the classic k-means clustering problem in the presence of noisy labels. In this problem, in addition to a set of points and parameter \(k\), we receive cluster labels of each point generated by either an adversarial or a random perturbation of the optimal solution. Our main goal is to formally study the effect of this extra information on the complexity of the k-means problem. In particular, in the context of random perturbations, we give an efficient algorithm that finds a clustering of cost within a factor $1+o(1)$ of the optimum even when the label of each point is perturbed with a large probability (think 99%). In contrast, we show that the side-information with adversarial perturbations is as hard as the original problem even if only a small $\epsilon$ fraction of the labels are perturbed. We complement this negative result by giving a simple algorithm in the case when the adversary is only allowed to perturb an $\epsilon$ fraction of the labels per \emph{each cluster}.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
gamlath22a
0
Approximate Cluster Recovery from Noisy Labels
1463
1509
1463-1509
1463
false
Gamlath, Buddhima and Lattanzi, Silvio and Norouzi-Fard, Ashkan and Svensson, Ola
given family
Buddhima
Gamlath
given family
Silvio
Lattanzi
given family
Ashkan
Norouzi-Fard
given family
Ola
Svensson
2022-06-28
Proceedings of Thirty Fifth Conference on Learning Theory
178
inproceedings
date-parts
2022
6
28