From a75ff4278ac220472038a03511886fbc74ce6c99 Mon Sep 17 00:00:00 2001 From: Neil Lawrence Date: Sun, 30 Jun 2024 21:13:34 +0100 Subject: [PATCH] Remove duplicates --- _posts/2024-06-29-addanki24a.md | 60 --------------------- _posts/2024-06-29-aden-ali24a.md | 52 ------------------- _posts/2024-06-29-agrawal24a.md | 35 ------------- _posts/2024-06-29-aliakbarpour24a.md | 75 --------------------------- _posts/2024-06-29-alon24a.md | 52 ------------------- _posts/2024-06-29-amortila24a.md | 55 -------------------- _posts/2024-06-29-anari24a.md | 50 ------------------ _posts/2024-06-29-areces24a.md | 52 ------------------- _posts/2024-06-29-arnal24a.md | 46 ---------------- _posts/2024-06-29-asi24a.md | 57 -------------------- _posts/2024-06-29-asilis24a.md | 60 --------------------- _posts/2024-06-29-assadi24a.md | 45 ---------------- _posts/2024-06-29-attias24a.md | 58 --------------------- _posts/2024-06-29-awasthi24a.md | 53 ------------------- _posts/2024-06-29-banerjee24a.md | 55 -------------------- _posts/2024-06-29-bangachev24a.md | 53 ------------------- _posts/2024-06-29-bateni24a.md | 61 ---------------------- _posts/2024-06-29-blanchard24a.md | 49 ----------------- _posts/2024-06-29-block24a.md | 55 -------------------- _posts/2024-06-29-brandenberger24a.md | 59 --------------------- _posts/2024-06-29-bresler24a.md | 49 ----------------- _posts/2024-06-29-bressan24a.md | 67 ------------------------ _posts/2024-06-29-bressan24b.md | 54 ------------------- _posts/2024-06-29-brown24a.md | 54 ------------------- _posts/2024-06-29-brown24b.md | 59 --------------------- _posts/2024-06-29-buhai24a.md | 56 -------------------- _posts/2024-06-29-carmon24a.md | 45 ---------------- 27 files changed, 1466 deletions(-) delete mode 100644 _posts/2024-06-29-addanki24a.md delete mode 100644 _posts/2024-06-29-aden-ali24a.md delete mode 100644 _posts/2024-06-29-agrawal24a.md delete mode 100644 _posts/2024-06-29-aliakbarpour24a.md delete mode 100644 _posts/2024-06-29-alon24a.md delete mode 100644 _posts/2024-06-29-amortila24a.md delete mode 100644 _posts/2024-06-29-anari24a.md delete mode 100644 _posts/2024-06-29-areces24a.md delete mode 100644 _posts/2024-06-29-arnal24a.md delete mode 100644 _posts/2024-06-29-asi24a.md delete mode 100644 _posts/2024-06-29-asilis24a.md delete mode 100644 _posts/2024-06-29-assadi24a.md delete mode 100644 _posts/2024-06-29-attias24a.md delete mode 100644 _posts/2024-06-29-awasthi24a.md delete mode 100644 _posts/2024-06-29-banerjee24a.md delete mode 100644 _posts/2024-06-29-bangachev24a.md delete mode 100644 _posts/2024-06-29-bateni24a.md delete mode 100644 _posts/2024-06-29-blanchard24a.md delete mode 100644 _posts/2024-06-29-block24a.md delete mode 100644 _posts/2024-06-29-brandenberger24a.md delete mode 100644 _posts/2024-06-29-bresler24a.md delete mode 100644 _posts/2024-06-29-bressan24a.md delete mode 100644 _posts/2024-06-29-bressan24b.md delete mode 100644 _posts/2024-06-29-brown24a.md delete mode 100644 _posts/2024-06-29-brown24b.md delete mode 100644 _posts/2024-06-29-buhai24a.md delete mode 100644 _posts/2024-06-29-carmon24a.md diff --git a/_posts/2024-06-29-addanki24a.md b/_posts/2024-06-29-addanki24a.md deleted file mode 100644 index 5fd612e..0000000 --- a/_posts/2024-06-29-addanki24a.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -title: Limits of Approximating the Median Treatment Effect -section: Original Papers -abstract: 'Average Treatment Effect (ATE) estimation is a well-studied problem in - causal inference. However, it does not necessarily capture the heterogeneity in - the data, and several approaches have been proposed to tackle the issue, including - estimating the Quantile Treatment Effects. In the finite population setting containing - $n$ individuals, with treatment and control values denoted by the potential outcome - vectors $\mathbf{a}, \mathbf{b}$, much of the prior work focused on estimating median$(\mathbf{a}) - -$ median$(\mathbf{b})$, as it is easier to estimate than the desired estimand of - median$(\mathbf{a-b})$, called the Median Treatment Effect (MTE). In this work, - we argue that MTE is not estimable and detail a novel notion of approximation that - relies on the sorted order of the values in $\mathbf{a-b}$: we approximate the median - by a value whose quantiles in $\mathbf{a-b}$ are close to $0.5$ (median). Next, - we identify a quantity called \emph{variability} that exactly captures the complexity - of MTE estimation. Using this, we establish that when potential outcomes take values - in the set $\{0,1,\ldots,k-1\}$ the worst-case (over inputs $\mathbf{a,b}$) optimal - (over algorithms) approximation factor of the MTE is $\frac{1}{2}\cdot \frac{2k-3}{2k-1}$. - Further, by drawing connections to the notions of instance-optimality studied in - theoretical computer science, we show that \emph{every} algorithm for estimating - the MTE obtains an approximation error that is no better than the error of an algorithm - that computes variability, on roughly a per input basis: hence, variability leads - to an almost instance optimal approximation algorithm for estimating the MTE. Finally, - we provide a simple linear time algorithm for computing the variability exactly. - Unlike much prior works, a particular highlight of our work is that we make no assumptions - about how the potential outcome vectors are generated or how they are correlated, - except that the potential outcome values are $k$-ary, i.e., take one of $k$ discrete - values $\{0,1,\ldots,k-1\}$.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: addanki24a -month: 0 -tex_title: Limits of Approximating the Median Treatment Effect -firstpage: 1 -lastpage: 21 -page: 1-21 -order: 1 -cycles: false -bibtex_author: Addanki, Raghavendra and Bhandari, Siddharth -author: -- given: Raghavendra - family: Addanki -- given: Siddharth - family: Bhandari -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/addanki24a/addanki24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-aden-ali24a.md b/_posts/2024-06-29-aden-ali24a.md deleted file mode 100644 index f0f7183..0000000 --- a/_posts/2024-06-29-aden-ali24a.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -title: 'Majority-of-Three: The Simplest Optimal Learner?' -section: Original Papers -abstract: 'Developing an optimal PAC learning algorithm in the realizable setting, - where empirical risk minimization (ERM) is suboptimal, was a major open problem - in learning theory for decades. The problem was finally resolved by Hanneke a few - years ago. Unfortunately, Hanneke’s algorithm is quite complex as it returns the - majority vote of many ERM classifiers that are trained on carefully selected subsets - of the data. It is thus a natural goal to determine the simplest algorithm that - is optimal. In this work we study the arguably simplest algorithm that could be - optimal: returning the majority vote of three ERM classifiers. We show that this - algorithm achieves the optimal in-expectation bound on its error which is provably - unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability - bound on this algorithm’s error. We conjecture that a better analysis will prove - that this algorithm is in fact optimal in the high-probability regime.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: aden-ali24a -month: 0 -tex_title: 'Majority-of-Three: The Simplest Optimal Learner?' -firstpage: 22 -lastpage: 45 -page: 22-45 -order: 22 -cycles: false -bibtex_author: Aden-Ali, Ishaq and H\ogsgaard, Mikael M\oller and Larsen, Kasper Green - and Zhivotovskiy, Nikita -author: -- given: Ishaq - family: Aden-Ali -- given: Mikael M\oller - family: H\ogsgaard -- given: Kasper Green - family: Larsen -- given: Nikita - family: Zhivotovskiy -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/aden-ali24a/aden-ali24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-agrawal24a.md b/_posts/2024-06-29-agrawal24a.md deleted file mode 100644 index e488f48..0000000 --- a/_posts/2024-06-29-agrawal24a.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: 'Conference on Learning Theory 2024: Preface' -section: Preface -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: agrawal24a -month: 0 -tex_title: 'Conference on Learning Theory 2024: Preface' -firstpage: i -lastpage: i -page: i-i -order: 0 -cycles: false -bibtex_author: Agrawal, Shipra and Roth, Aaron -author: -- given: Shipra - family: Agrawal -- given: Aaron - family: Roth -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/agrawal24a/agrawal24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-aliakbarpour24a.md b/_posts/2024-06-29-aliakbarpour24a.md deleted file mode 100644 index 17ae95b..0000000 --- a/_posts/2024-06-29-aliakbarpour24a.md +++ /dev/null @@ -1,75 +0,0 @@ ---- -title: Metalearning with Very Few Samples Per Task -section: Original Papers -abstract: " Metalearning and multitask learning are two frameworks for solving - a group of related learning tasks more efficiently than we could hope to solve each - of the individual tasks on their own. In multitask learning, we are given a fixed - set of related learning tasks and need to output one accurate model per task, whereas - in metalearning we are given tasks that are drawn i.i.d. from a metadistribution - and need to output some common information that can be easily specialized to new, - previously unseen tasks from the metadistribution. In this work, we consider a binary - classification setting where tasks are related by a shared representation, that - is, every task $P$ of interest can be solved by a classifier of the form $f_{P} - \\circ h$ where $h \\in \\mathcal{H}$ is a map from features to some representation - space that is shared across tasks, and $f_{P} \\in \\mathcal{F}$ is a task-specific - classifier from the representation space to labels. The main question we ask in - this work is how much data do we need to metalearn a good representation? Here, - the amount of data is measured in terms of both the number of tasks $t$ that we - need to see and the number of samples $n$ per task. We focus on the regime where - the number of samples per task is extremely small. Our main result shows that, in - a distribution-free setting where the feature vectors are in $\\mathbb{R}^d$, the - representation is a linear map from $\\mathbb{R}^d \\to \\mathbb{R}^k$, and the - task-specific classifiers are halfspaces in $\\mathbb{R}^k$, we can metalearn a - representation with error $\\varepsilon$ using just $n = k+2$ samples per task, - and $d \\cdot (1/\\varepsilon)^{O(k)}$ tasks. Learning with so few samples per - task is remarkable because metalearning would be impossible with $k+1$ samples per - task, and because we cannot even hope to learn an accurate task-specific classifier - with just $k+2$ samples per task. To obtain this result, we develop a sample-and-task-complexity - theory for distribution-free metalearning and multitask learning, which identifies - what properties of $\\mathcal{F}$ and $\\mathcal{H}$ make metalearning possible - with few samples per task. Our theory also yields a simple characterization of - distribution-free multitask learning. Finally, we give sample-efficient reductions - between metalearning and multitask learning, which, when combined with our characterization - of multitask learning, give a characterization of metalearning in certain parameter - regimes." -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: aliakbarpour24a -month: 0 -tex_title: Metalearning with Very Few Samples Per Task -firstpage: 46 -lastpage: 93 -page: 46-93 -order: 46 -cycles: false -bibtex_author: Aliakbarpour, Maryam and Bairaktari, Konstantina and Brown, Gavin and - Smith, Adam and Srebro, Nathan and Ullman, Jonathan -author: -- given: Maryam - family: Aliakbarpour -- given: Konstantina - family: Bairaktari -- given: Gavin - family: Brown -- given: Adam - family: Smith -- given: Nathan - family: Srebro -- given: Jonathan - family: Ullman -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/aliakbarpour24a/aliakbarpour24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-alon24a.md b/_posts/2024-06-29-alon24a.md deleted file mode 100644 index 9bcd0c1..0000000 --- a/_posts/2024-06-29-alon24a.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -title: A Unified Characterization of Private Learnability via Graph Theory -section: Original Papers -abstract: 'We provide a unified framework for characterizing pure and approximate - differentially private (DP) learnability. The framework uses the language of graph - theory: for a concept class $\mathcal{H}$, we define the contradiction graph $G$ - of $\mathcal{H}$. Its vertices are realizable datasets and two datasets $S,S’$ are - connected by an edge if they contradict each other (i.e., there is a point $x$ that - is labeled differently in $S$ and $S’$). Our main finding is that the combinatorial - structure of $G$ is deeply related to learning $\mathcal{H}$ under DP. Learning - $\mathcal{H}$ under pure DP is captured by the fractional clique number of $G$. - Learning $\mathcal{H}$ under approximate DP is captured by the clique number of - $G$. Consequently, we identify graph-theoretic dimensions that characterize DP learnability: - the \emph{clique dimension} and \emph{fractional clique dimension}. Along the way, - we reveal properties of the contradiction graph which may be of independent interest. - We also suggest several open questions and directions for future research.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: alon24a -month: 0 -tex_title: A Unified Characterization of Private Learnability via Graph Theory -firstpage: 94 -lastpage: 129 -page: 94-129 -order: 94 -cycles: false -bibtex_author: Alon, Noga and Moran, Shay and Schefler, Hilla and Yehudayoff, Amir -author: -- given: Noga - family: Alon -- given: Shay - family: Moran -- given: Hilla - family: Schefler -- given: Amir - family: Yehudayoff -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/alon24a/alon24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-amortila24a.md b/_posts/2024-06-29-amortila24a.md deleted file mode 100644 index fba565f..0000000 --- a/_posts/2024-06-29-amortila24a.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -title: Mitigating Covariate Shift in Misspecified Regression with Applications to - Reinforcement Learning -section: Original Papers -abstract: A pervasive phenomenon in machine learning applications is \emph{distribution - shift}, where training and deployment conditions for a machine learning model differ. - As distribution shift typically results in a degradation in performance, much attention - has been devoted to algorithmic interventions that mitigate these detrimental effects. - This paper studies the effect of distribution shift in the presence of model misspecification, - specifically focusing on $L_{\infty}$-misspecified regression and \emph{adversarial - covariate shift}, where the regression target remains fixed while the covariate - distribution changes arbitrarily. We show that empirical risk minimization, or standard - least squares regression, can result in undesirable \emph{misspecification amplification} - where the error due to misspecification is amplified by the density ratio between - the training and testing distributions. As our main result, we develop a new algorithm—inspired - by robust optimization techniques—that avoids this undesirable behavior, resulting - in no misspecification amplification while still obtaining optimal statistical rates. - As applications, we use this regression procedure to obtain new guarantees in offline - and online reinforcement learning with misspecification and establish new separations - between previously studied structural conditions and notions of coverage. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: amortila24a -month: 0 -tex_title: Mitigating Covariate Shift in Misspecified Regression with Applications - to Reinforcement Learning -firstpage: 130 -lastpage: 160 -page: 130-160 -order: 130 -cycles: false -bibtex_author: Amortila, Philip and Cao, Tongyi and Krishnamurthy, Akshay -author: -- given: Philip - family: Amortila -- given: Tongyi - family: Cao -- given: Akshay - family: Krishnamurthy -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/amortila24a/amortila24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-anari24a.md b/_posts/2024-06-29-anari24a.md deleted file mode 100644 index 50b10fd..0000000 --- a/_posts/2024-06-29-anari24a.md +++ /dev/null @@ -1,50 +0,0 @@ ---- -title: Fast parallel sampling under isoperimetry -section: Original Papers -abstract: We show how to sample in parallel from a distribution $\pi$ over $\mathbb{R}^d$ - that satisfies a log-Sobolev inequality and has a smooth log-density, by parallelizing - the Langevin (resp. underdamped Langevin) algorithms. We show that our algorithm - outputs samples from a distribution $\hat{\pi}$ that is close to $\pi$ in Kullback–Leibler - (KL) divergence (resp. total variation (TV) distance), while using only $\log(d)^{O(1)}$ - parallel rounds and $\widetilde{O}(d)$ (resp. $\widetilde O(\sqrt d)$) gradient - evaluations in total. This constitutes the first parallel sampling algorithms with - TV distance guarantees. For our main application, we show how to combine the TV - distance guarantees of our algorithms with prior works and obtain RNC sampling-to-counting - reductions for families of discrete distribution on the hypercube $\{\pm 1\}^n$ that - are closed under exponential tilts and have bounded covariance. Consequently, we - obtain an RNC sampler for directed Eulerian tours and asymmetric determinantal point - processes, resolving open questions raised in prior works. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: anari24a -month: 0 -tex_title: Fast parallel sampling under isoperimetry -firstpage: 161 -lastpage: 185 -page: 161-185 -order: 161 -cycles: false -bibtex_author: Anari, Nima and Chewi, Sinho and Vuong, Thuy-Duong -author: -- given: Nima - family: Anari -- given: Sinho - family: Chewi -- given: Thuy-Duong - family: Vuong -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/anari24a/anari24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-areces24a.md b/_posts/2024-06-29-areces24a.md deleted file mode 100644 index 7450761..0000000 --- a/_posts/2024-06-29-areces24a.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -title: Two fundamental limits for uncertainty quantification in predictive inference -section: Original Papers -abstract: 'We study the statistical hardness of estimating two basic representations - of uncertainty in predictive inference: prediction sets and calibration error. First, - we show that conformal prediction sets cannot approach a desired weighted conformal - coverage level—with respect to a family of binary witness functions with VC dimension - $d$—at a minimax rate faster than $O(d^{1/2}n^{-1/2})$. We also show that the algorithm - in Gibbs et al. (2023) achieves this rate and that extending our class of conformal - sets beyond thresholds of non-conformity scores to include arbitrary convex sets - of non-conformity scores only improves the minimax rate by a constant factor. Then, - under a similar VC dimension constraint on the witness function class, we show it - is not possible to estimate the weighted weak calibration error at a minimax rate - faster than $O(d^{1/4}n^{-1/2})$. We show that the algorithm in Kumar et al. (2019) - achieves this rate in the particular case of estimating the squared weak calibration - error of a predictor that outputs $d$ distinct values.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: areces24a -month: 0 -tex_title: Two fundamental limits for uncertainty quantification in predictive inference -firstpage: 186 -lastpage: 218 -page: 186-218 -order: 186 -cycles: false -bibtex_author: Areces, Felipe and Cheng, Chen and Duchi, John and Rohith, Kuditipudi -author: -- given: Felipe - family: Areces -- given: Chen - family: Cheng -- given: John - family: Duchi -- given: Kuditipudi - family: Rohith -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/areces24a/areces24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-arnal24a.md b/_posts/2024-06-29-arnal24a.md deleted file mode 100644 index ec3e992..0000000 --- a/_posts/2024-06-29-arnal24a.md +++ /dev/null @@ -1,46 +0,0 @@ ---- -title: Mode Estimation with Partial Feedback -section: Original Papers -abstract: " The combination of lightly supervised pre-training and online fine-tuning - has played a key role in recent AI developments. These new learning pipelines call - for new theoretical frameworks. In this paper, we formalize key aspects of weakly - supervised and active learning with a simple problem: the estimation of the mode - of a distribution with partial feedback. We showcase how entropy coding allows for - optimal information acquisition from partial feedback, develop coarse sufficient - statistics for mode identification, and adapt bandit algorithms to our new setting. - Finally, we combine those contributions into a statistically and computationally - efficient solution to our original problem. " -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: arnal24a -month: 0 -tex_title: Mode Estimation with Partial Feedback -firstpage: 219 -lastpage: 220 -page: 219-220 -order: 219 -cycles: false -bibtex_author: Arnal, Charles and Cabannes, Vivien and Perchet, Vianney -author: -- given: Charles - family: Arnal -- given: Vivien - family: Cabannes -- given: Vianney - family: Perchet -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/arnal24a/arnal24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-asi24a.md b/_posts/2024-06-29-asi24a.md deleted file mode 100644 index d367514..0000000 --- a/_posts/2024-06-29-asi24a.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -title: Universally Instance-Optimal Mechanisms for Private Statistical Estimation -section: Original Papers -abstract: " We consider the problem of instance-optimal statistical estimation - under the constraint of differential privacy where mechanisms must adapt to the - difficulty of the input dataset. We prove a new instance specific lower bound using - a new divergence and show it characterizes the local minimax optimal rates for private - statistical estimation. We propose two new mechanisms that are universally instance-optimal - for general estimation problems up to logarithmic factors. Our first mechanism, - the total variation mechanism, builds on the exponential mechanism with stable approximations - of the total variation distance, and is universally instance-optimal in the high - privacy regime $\\epsilon \\leq 1/\\sqrt{n}$. Our second mechanism, the T-mechanism, - is based on the T-estimator framework (Birg{é}, 2006) using the clipped log likelihood - ratio as a stable test: it attains instance-optimal rates for any $\\epsilon \\leq - 1$ up to logarithmic factors. Finally, we study the implications of our results - to robust statistical estimation, and show that our algorithms are universally optimal - for this problem, characterizing the optimal minimax rates for robust statistical - estimation. " -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: asi24a -month: 0 -tex_title: Universally Instance-Optimal Mechanisms for Private Statistical Estimation -firstpage: 221 -lastpage: 259 -page: 221-259 -order: 221 -cycles: false -bibtex_author: Asi, Hilal and Duchi, John C. and Haque, Saminul and Li, Zewei and - Ruan, Feng -author: -- given: Hilal - family: Asi -- given: John C. - family: Duchi -- given: Saminul - family: Haque -- given: Zewei - family: Li -- given: Feng - family: Ruan -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/asi24a/asi24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-asilis24a.md b/_posts/2024-06-29-asilis24a.md deleted file mode 100644 index 247a14a..0000000 --- a/_posts/2024-06-29-asilis24a.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -title: Regularization and Optimal Multiclass Learning -section: Original Papers -abstract: 'The quintessential learning algorithm of empirical risk minimization (ERM) - is known to fail in various settings for which uniform convergence does not characterize - learning. Relatedly, the practice of machine learning is rife with considerably - richer algorithmic techniques, perhaps the most notable of which is regularization. - Nevertheless, no such technique or principle has broken away from the pack to characterize - optimal learning in these more general settings. The purpose of this work is to - precisely characterize the role of regularization in perhaps the simplest setting - for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion - graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true - algorithmic principles: Occam’s Razor as embodied by structural risk minimization - (SRM), the principle of maximum entropy, and Bayesian inference. We also extract - from OIGs a combinatorial sequence we term the Hall complexity, which is the first - to characterize a problem’s transductive error rate exactly. Lastly, we introduce - a generalization of OIGs and the transductive learning setting to the agnostic case, - where we show that optimal orientations of Hamming graphs – judged using nodes’ - outdegrees minus a system of node-dependent credits – characterize optimal learners - exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes - error rates exactly, and exhibit an optimal learner using maximum entropy programs.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: asilis24a -month: 0 -tex_title: Regularization and Optimal Multiclass Learning -firstpage: 260 -lastpage: 310 -page: 260-310 -order: 260 -cycles: false -bibtex_author: Asilis, Julian and Devic, Siddartha and Dughmi, Shaddin and Sharan, - Vatsal and Teng, Shang-Hua -author: -- given: Julian - family: Asilis -- given: Siddartha - family: Devic -- given: Shaddin - family: Dughmi -- given: Vatsal - family: Sharan -- given: Shang-Hua - family: Teng -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/asilis24a/asilis24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-assadi24a.md b/_posts/2024-06-29-assadi24a.md deleted file mode 100644 index a0eca4d..0000000 --- a/_posts/2024-06-29-assadi24a.md +++ /dev/null @@ -1,45 +0,0 @@ ---- -title: 'The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure - Exploration in Multi-armed Bandits' -section: Original Papers -abstract: 'We give a near-optimal sample-pass trade-off for pure exploration in multi-armed - bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with - sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires - $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number - of arms and $\Delta$ is the reward gap between the best and the second-best arms. - Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] - (up to lower order terms) that only uses $O(1)$ memory and answers an open question - posed by Assadi and Wang [STOC’20].' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: assadi24a -month: 0 -tex_title: 'The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for - Pure Exploration in Multi-armed Bandits' -firstpage: 311 -lastpage: 358 -page: 311-358 -order: 311 -cycles: false -bibtex_author: Assadi, Sepehr and Wang, Chen -author: -- given: Sepehr - family: Assadi -- given: Chen - family: Wang -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/assadi24a/assadi24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-attias24a.md b/_posts/2024-06-29-attias24a.md deleted file mode 100644 index e8a60ff..0000000 --- a/_posts/2024-06-29-attias24a.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -title: 'Universal Rates for Regression: Separations between Cut-Off and Absolute Loss' -section: Original Papers -abstract: 'In this work we initiate the study of regression in the universal rates - framework of Bousquet et al. Unlike the traditional uniform learning setting, we - are interested in obtaining learning guarantees that hold for all fixed data-generating - distributions, but do not hold uniformly across them. We focus on the realizable - setting and we consider two different well-studied loss functions: the cut-off loss - at scale $\gamma > 0$, which asks for predictions that are $\gamma$-close to the - correct one, and the absolute loss, which measures how far away the prediction is - from the correct one. Our results show that the landscape of the achievable rates - in the two cases is completely different. First we give a trichotomic characterization - of the optimal learning rates under the cut-off loss: each class is learnable either - at an exponential rate, a (nearly) linear rate or requires arbitrarily slow rates. - Moving to the absolute loss, we show that the achievable learning rates are significantly - more involved by illustrating that an infinite number of different optimal learning - rates is achievable. This is the first time that such a rich landscape of rates - is obtained in the universal rates literature.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: attias24a -month: 0 -tex_title: 'Universal Rates for Regression: Separations between Cut-Off and Absolute - Loss' -firstpage: 359 -lastpage: 405 -page: 359-405 -order: 359 -cycles: false -bibtex_author: Attias, Idan and Hanneke, Steve and Kalavasis, Alkis and Karbasi, Amin - and Velegkas, Grigoris -author: -- given: Idan - family: Attias -- given: Steve - family: Hanneke -- given: Alkis - family: Kalavasis -- given: Amin - family: Karbasi -- given: Grigoris - family: Velegkas -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/attias24a/attias24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-awasthi24a.md b/_posts/2024-06-29-awasthi24a.md deleted file mode 100644 index a4bcad4..0000000 --- a/_posts/2024-06-29-awasthi24a.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -title: Learning Neural Networks with Sparse Activations -section: Original Papers -abstract: A core component present in many successful neural network architectures, - is an MLP block of two fully connected layers with a non-linear activation in between. - An intriguing phenomenon observed empirically, including in transformer architectures, - is that, after training, the activations in the hidden layer of this MLP block tend - to be extremely sparse on any given input. Unlike traditional forms of sparsity, - where there are neurons/weights which can be deleted from the network, this form - of {\em dynamic} activation sparsity appears to be harder to exploit to get more - efficient networks. Motivated by this we initiate a formal study of PAC learnability - of MLP layers that exhibit activation sparsity. We present a variety of results - showing that such classes of functions do lead to provable computational and statistical - advantages over their non-sparse counterparts. Our hope is that a better theoretical - understanding of {\em sparsely activated} networks would lead to methods that can - exploit activation sparsity in practice. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: awasthi24a -month: 0 -tex_title: Learning Neural Networks with Sparse Activations -firstpage: 406 -lastpage: 425 -page: 406-425 -order: 406 -cycles: false -bibtex_author: Awasthi, Pranjal and Dikkala, Nishanth and Kamath, Pritish and Meka, - Raghu -author: -- given: Pranjal - family: Awasthi -- given: Nishanth - family: Dikkala -- given: Pritish - family: Kamath -- given: Raghu - family: Meka -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/awasthi24a/awasthi24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-banerjee24a.md b/_posts/2024-06-29-banerjee24a.md deleted file mode 100644 index 102afe7..0000000 --- a/_posts/2024-06-29-banerjee24a.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -title: The SMART approach to instance-optimal online learning -section: Original Papers -abstract: 'We devise an online learning algorithm – titled Switching via Monotone - Adapted Regret Traces (SMART) – that adapts to the data and achieves regret that - is instance optimal, i.e., simultaneously competitive on every input sequence compared - to the performance of the follow-the-leader (FTL) policy and the worst case guarantee - of any other input policy. We show that the regret of the SMART policy on any - input sequence is within a multiplicative factor e/(e-1), approximately 1.58, of - the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper - bound on regret guaranteed by the given worst-case policy. This implies a strictly - stronger guarantee than typical ‘best-of-both-worlds’ bounds as the guarantee holds - for every input sequence regardless of how it is generated. SMART is simple to implement - as it begins by playing FTL and switches at most once during the time horizon to - the worst-case algorithm. Our approach and results follow from a reduction of instance - optimal online learning to competitive analysis for the ski-rental problem. We - complement our competitive ratio upper bounds with a fundamental lower bound showing - that over all input sequences, no algorithm can get better than a 1.43-fraction - of the minimum regret achieved by FTL and the minimax-optimal policy. We present - a modification of SMART that combines FTL with a “small-loss" algorithm to achieve - instance optimality between the regret of FTL and the small loss regret bound. ' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: banerjee24a -month: 0 -tex_title: The SMART approach to instance-optimal online learning -firstpage: 426 -lastpage: 426 -page: 426-426 -order: 426 -cycles: false -bibtex_author: Banerjee, Siddhartha and Bhatt, Alankrita and Yu, Christina Lee -author: -- given: Siddhartha - family: Banerjee -- given: Alankrita - family: Bhatt -- given: Christina Lee - family: Yu -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/banerjee24a/banerjee24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-bangachev24a.md b/_posts/2024-06-29-bangachev24a.md deleted file mode 100644 index 80bd04c..0000000 --- a/_posts/2024-06-29-bangachev24a.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -title: 'Detection of $L_∞$ Geometry in Random Geometric Graphs: Suboptimality of Triangles - and Cluster Expansion' -section: Original Papers -abstract: In this paper we study the random geometric graph $\mathsf{RGG}(n,\mathbb{T}^d,\mathsf{Unif},\sigma^q_p,p)$ - with $L_q$ distance where each vertex is sampled uniformly from the $d$-dimensional - torus and where the connection radius is chosen so that the marginal edge probability - is $p$. In addition to results addressing other questions, we make progress on determining - when it is possible to distinguish $\mathsf{RGG}(n,\mathbb{T}^d,\mathsf{Unif},\sigma^q_p,p)$ - from the Erdős-Rényi graph $\ergraph$. Our strongest result is in the setting $q - = \infty$, in which case $\mathsf{RGG}(n,\mathbb{T}^d,\mathsf{Unif},\sigma^q_p,p)$ - is the \textsf{AND} of $d$ 1-dimensional random geometric graphs. We derive a formula - similar to the \emph{cluster-expansion} from statistical physics, capturing the - compatibility of subgraphs from each of the $d$ 1-dimensional copies, and use it - to bound the signed expectations of small subgraphs. We show that counting signed - 4-cycles is optimal among all low-degree tests, succeeding with high probability - if and only if $d = \tilde{o}(np).$ In contrast, the signed triangle test is suboptimal - and only succeeds when $d = \tilde{o}((np)^{3/4}).$ Our result stands in sharp - contrast to the existing literature on random geometric graphs (mostly focused on - $L_2$ geometry) where the signed triangle statistic is optimal. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: bangachev24a -month: 0 -tex_title: "{Detection of $L_∞$ Geometry in Random Geometric Graphs: Suboptimality - of Triangles and Cluster Expansion}" -firstpage: 427 -lastpage: 497 -page: 427-497 -order: 427 -cycles: false -bibtex_author: Bangachev, Kiril and Bresler, Guy -author: -- given: Kiril - family: Bangachev -- given: Guy - family: Bresler -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/bangachev24a/bangachev24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-bateni24a.md b/_posts/2024-06-29-bateni24a.md deleted file mode 100644 index 46544f7..0000000 --- a/_posts/2024-06-29-bateni24a.md +++ /dev/null @@ -1,61 +0,0 @@ ---- -title: Metric Clustering and MST with Strong and Weak Distance Oracles -section: Original Papers -abstract: 'We study optimization problems in a metric space $(\mathcal{X},d)$ where - we can compute distances in two ways: via a “strong” oracle that returns exact distances - $d(x,y)$, and a “weak” oracle that returns distances $\tilde{d}(x,y)$ which may - be arbitrarily corrupted with some probability. This model captures the increasingly - common trade-off between employing both an expensive similarity model (e.g. a large-scale - embedding model), and a less accurate but cheaper model. Hence, the goal is to make - as few queries to the strong oracle as possible. We consider both “point queries”, - where the strong oracle is queried on a set of points $S \subset \cX $ and returns - $d(x,y)$ for all $x,y \in S$, and “edge queries” where it is queried for individual - distances $d(x,y)$. Our main contributions are optimal algorithms and lower bounds - for clustering and Minimum Spanning Tree (MST) in this model. For $k$-centers, $k$-median, - and $k$-means, we give constant factor approximation algorithms with only $\tilde{O}(k)$ - strong oracle point queries, and prove that $\Omega(k)$ queries are required for - any bounded approximation. For edge queries, our upper and lower bounds are both - $\tilde{\Theta}(k^2)$. Surprisingly, for the MST problem we give a $O(\sqrt{\log - n})$ approximation algorithm using no strong oracle queries at all, and we prove - a matching $\Omega(\sqrt{\log n})$ lower bound which holds even if $\Tilde{\Omega}(n)$ - strong oracle point queries are allowed. Furthermore, we empirically evaluate our - algorithms, and show that their quality is comparable to that of the baseline algorithms - that are given all true distances, but while querying the strong oracle on only - a small fraction ($<1%$) of points.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: bateni24a -month: 0 -tex_title: Metric Clustering and MST with Strong and Weak Distance Oracles -firstpage: 498 -lastpage: 550 -page: 498-550 -order: 498 -cycles: false -bibtex_author: Bateni, MohammadHossein and Dharangutte, Prathamesh and Jayaram, Rajesh - and Wang, Chen -author: -- given: MohammadHossein - family: Bateni -- given: Prathamesh - family: Dharangutte -- given: Rajesh - family: Jayaram -- given: Chen - family: Wang -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/bateni24a/bateni24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-blanchard24a.md b/_posts/2024-06-29-blanchard24a.md deleted file mode 100644 index e32a5de..0000000 --- a/_posts/2024-06-29-blanchard24a.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -title: Correlated Binomial Process -section: Original Papers -abstract: 'Cohen and Kontorovich (COLT 2023) initiated the study of what we call here - the Binomial Empirical Process: the maximal empirical mean deviation for sequences - of binary random variables (up to rescaling, the empirical mean of each entry of - the random sequence is a binomial hence the naming). They almost fully analyzed - the case where the binomials are independent, which corresponds to all random variable - entries from the sequence being independent. The remaining gap was closed by Blanchard - and Voráček (ALT 2024). In this work, we study the much more general and challenging - case with correlations. In contradistinction to Gaussian processes, whose behavior - is characterized by the covariance structure, we discover that, at least somewhat - surprisingly, for binomial processes covariance does not even characterize convergence. - Although a full characterization remains out of reach, we take the first steps with - nontrivial upper and lower bounds in terms of covering numbers.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: blanchard24a -month: 0 -tex_title: Correlated Binomial Process -firstpage: 551 -lastpage: 595 -page: 551-595 -order: 551 -cycles: false -bibtex_author: Blanchard, Mo\"{i}se and Cohen, Doron and Kontorovich, Aryeh -author: -- given: Moïse - family: Blanchard -- given: Doron - family: Cohen -- given: Aryeh - family: Kontorovich -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/blanchard24a/blanchard24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-block24a.md b/_posts/2024-06-29-block24a.md deleted file mode 100644 index c2b5cf4..0000000 --- a/_posts/2024-06-29-block24a.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -title: On the Performance of Empirical Risk Minimization with Smoothed Data -section: Original Papers -abstract: In order to circumvent statistical and computational hardness results in - sequential decision-making, recent work has considered smoothed online learning, - where the distribution of data at each time is assumed to have bounded likeliehood - ratio with respect to a base measure when conditioned on the history. While previous - works have demonstrated the benefits of smoothness, they have either assumed that - the base measure is known to the learner or have presented computationally inefficient - algorithms applying only in special cases. This work investigates the more general - setting where the base measure is \emph{unknown} to the learner, focusing in particular - on the performance of Empirical Risk Minimization (ERM) with square loss when the - data are well-specified and smooth. We show that in this setting, ERM is able - to achieve sublinear error whenever a class is learnable with iid data; in particular, - ERM achieves error scaling as $\tilde O( \sqrt{\mathrm{comp}(\mathcal F) \cdot T} - )$, where $\mathrm{comp}(\mathcal{F})$ is the statistical complexity of learning - $\mathcal F$ with iid data. In so doing, we prove a novel norm comparison bound - for smoothed data that comprises the first sharp norm comparison for dependent data - applying to arbitrary, nonlinear function classes. We complement these results with - a lower bound indicating that our analysis of ERM is essentially tight, establishing - a separation in the performance of ERM between smoothed and iid data. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: block24a -month: 0 -tex_title: On the Performance of Empirical Risk Minimization with Smoothed Data -firstpage: 596 -lastpage: 629 -page: 596-629 -order: 596 -cycles: false -bibtex_author: Block, Adam and Rakhlin, Alexander and Shetty, Abhishek -author: -- given: Adam - family: Block -- given: Alexander - family: Rakhlin -- given: Abhishek - family: Shetty -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/block24a/block24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-brandenberger24a.md b/_posts/2024-06-29-brandenberger24a.md deleted file mode 100644 index 9bacab1..0000000 --- a/_posts/2024-06-29-brandenberger24a.md +++ /dev/null @@ -1,59 +0,0 @@ ---- -title: Errors are Robustly Tamed in Cumulative Knowledge Processes -section: Original Papers -abstract: 'We study processes of societal knowledge accumulation, where the validity - of a new unit of knowledge depends both on the correctness of its derivation and - on the validity of the units it depends on. A fundamental question in this setting - is: If a constant fraction of the new derivations is wrong, can investing a constant - fraction, bounded away from one, of effort ensure that a constant fraction of knowledge - in society is valid? Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023) introduced - a concrete probabilistic model to analyze such questions and showed an affirmative - answer to this question. Their study, however, focuses on the simple case where - each new unit depends on just one existing unit, and units attach according to a - {\em preferential attachment rule}. In this work, we consider much more general - families of cumulative knowledge processes, where new units may attach according - to varied attachment mechanisms and depend on multiple existing units. We also allow - a (random) fraction of insertions of adversarial nodes. We give a robust affirmative - answer to the above question by showing that for \textit{all} of these models, as - long as many of the units follow simple heuristics for checking a bounded number - of units they depend on, all errors will be eventually eliminated. Our results indicate - that preserving the quality of large interdependent collections of units of knowledge - is feasible, as long as careful but not too costly checks are performed when new - units are derived/deposited.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: brandenberger24a -month: 0 -tex_title: Errors are Robustly Tamed in Cumulative Knowledge Processes -firstpage: 630 -lastpage: 631 -page: 630-631 -order: 630 -cycles: false -bibtex_author: Brandenberger, Anna and Marcussen, Cassandra and Mossel, Elchanan and - Sudan, Madhu -author: -- given: Anna - family: Brandenberger -- given: Cassandra - family: Marcussen -- given: Elchanan - family: Mossel -- given: Madhu - family: Sudan -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/brandenberger24a/brandenberger24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-bresler24a.md b/_posts/2024-06-29-bresler24a.md deleted file mode 100644 index ea94728..0000000 --- a/_posts/2024-06-29-bresler24a.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -title: Thresholds for Reconstruction of Random Hypergraphs From Graph Projections -section: Original Papers -abstract: 'The graph projection of a hypergraph is a simple graph with the same vertex - set and with an edge between each pair of vertices that appear in a hyperedge. We - consider the problem of reconstructing a random $d$-uniform hypergraph from its - projection. Feasibility of this task depends on $d$ and the density of hyperedges - in the random hypergraph. For $d=3$ we precisely determine the threshold, while - for $d\geq 4$ we give bounds. All of our feasibility results are obtained by exhibiting - an efficient algorithm for reconstructing the original hypergraph, while infeasibility - is information-theoretic. Our results also apply to mildly inhomogeneous random - hypergrahps, including hypergraph stochastic block models. A consequence of our - results is that claims from the 2023 COLT paper gaudio’23 are disproved. ' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: bresler24a -month: 0 -tex_title: Thresholds for Reconstruction of Random Hypergraphs From Graph Projections -firstpage: 632 -lastpage: 647 -page: 632-647 -order: 632 -cycles: false -bibtex_author: Bresler, Guy and Guo, Chenghao and Polyanskiy, Yury -author: -- given: Guy - family: Bresler -- given: Chenghao - family: Guo -- given: Yury - family: Polyanskiy -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/bresler24a/bresler24a.pdf -extras: -- label: Supplementary ZIP - link: https://proceedings.mlr.press/v247/bresler24a/bresler24a-supp.zip -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-bressan24a.md b/_posts/2024-06-29-bressan24a.md deleted file mode 100644 index f0e07b1..0000000 --- a/_posts/2024-06-29-bressan24a.md +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: A Theory of Interpretable Approximations -section: Original Papers -abstract: 'Can a deep neural network be approximated by a small decision tree based - on simple features? This question and its variants are behind the growing demand - for machine learning models that are \emph{interpretable} by humans. In this work - we study such questions by introducing \emph{interpretable approximations}, a notion - that captures the idea of approximating a target concept $c$ by a small aggregation - of concepts from some base class $\mathcal{H}$. In particular, we consider the approximation - of a binary concept $c$ by decision trees based on a simple class $\mathcal{H}$ - (e.g., of bounded VC dimension), and use the tree depth as a measure of complexity. - Our primary contribution is the following remarkable trichotomy. For any given pair - of $\mathcal{H}$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated - by $\mathcal{H}$ with arbitrary accuracy; (ii) $c$ can be approximated by $\mathcal{H}$ - with arbitrary accuracy, but there exists no universal rate that bounds the complexity - of the approximations as a function of the accuracy; or (iii) there exists a constant - $\kappa$ that depends only on $\mathcal{H}$ and $c$ such that, for \emph{any} data - distribution and \emph{any} desired accuracy level, $c$ can be approximated by $\mathcal{H}$ - with a complexity not exceeding $\kappa$. This taxonomy stands in stark contrast - to the landscape of supervised classification, which offers a complex array of distribution-free - and universally learnable scenarios. We show that, in the case of interpretable - approximations, even a slightly nontrivial a-priori guarantee on the complexity - of approximations implies approximations with constant (distribution-free and accuracy-free) - complexity. We extend our trichotomy to classes $\mathcal{H}$ of unbounded VC dimension - and give characterizations of interpretability based on the algebra generated by - $\mathcal{H}$.' -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: bressan24a -month: 0 -tex_title: A Theory of Interpretable Approximations -firstpage: 648 -lastpage: 668 -page: 648-668 -order: 648 -cycles: false -bibtex_author: Bressan, Marco and Cesa-Bianchi, Nicol{\`o} and Esposito, Emmanuel - and Mansour, Yishay and Moran, Shay and Thiessen, Maximilian -author: -- given: Marco - family: Bressan -- given: Nicolò - family: Cesa-Bianchi -- given: Emmanuel - family: Esposito -- given: Yishay - family: Mansour -- given: Shay - family: Moran -- given: Maximilian - family: Thiessen -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/bressan24a/bressan24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-bressan24b.md b/_posts/2024-06-29-bressan24b.md deleted file mode 100644 index d889d1b..0000000 --- a/_posts/2024-06-29-bressan24b.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -title: Efficient Algorithms for Learning Monophonic Halfspaces in Graphs -section: Original Papers -abstract: We study the problem of learning a binary classifier on the vertices of - a graph. In particular, we consider classifiers given by \emph{monophonic halfspaces}, - partitions of the vertices that are convex in a certain abstract sense. Monophonic - halfspaces, and related notions such as geodesic halfspaces, have recently attracted - interest, and several connections have been drawn between their properties (e.g., - their VC dimension) and the structure of the underlying graph $G$. We prove several - novel results for learning monophonic halfspaces in the supervised, online, and - active settings. Our main result is that a monophonic halfspace can be learned with - near-optimal passive sample complexity in time polynomial in $n=|V(G)|$. This requires - us to devise a polynomial-time algorithm for consistent hypothesis checking, based - on several structural insights on monophonic halfspaces and on a reduction to 2-satisfiability. - We prove similar results for the online and active settings. We also show that the - concept class can be enumerated with delay $\mathrm{poly}(n)$, and that empirical - risk minimization can be performed in time $2^{\omega(G)}\mathrm{poly}(n)$ where - $\omega(G)$ is the clique number of $G$. These results answer open questions from - the literature (Gonz\´{a}lez et al., 2020), and show a contrast with geodesic - halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023). -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: bressan24b -month: 0 -tex_title: Efficient Algorithms for Learning Monophonic Halfspaces in Graphs -firstpage: 669 -lastpage: 696 -page: 669-696 -order: 669 -cycles: false -bibtex_author: Bressan, Marco and Esposito, Emmanuel and Thiessen, Maximilian -author: -- given: Marco - family: Bressan -- given: Emmanuel - family: Esposito -- given: Maximilian - family: Thiessen -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/bressan24b/bressan24b.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-brown24a.md b/_posts/2024-06-29-brown24a.md deleted file mode 100644 index 381628f..0000000 --- a/_posts/2024-06-29-brown24a.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -title: Online Stackelberg Optimization via Nonlinear Control -section: Original Papers -abstract: In repeated interaction problems with adaptive agents, our objective often - requires anticipating and optimizing over the space of possible agent responses. - We show that many problems of this form can be cast as instances of online (nonlinear) - control which satisfy \textit{local controllability}, with convex losses over a - bounded state space which encodes agent behavior, and we introduce a unified algorithmic - framework for tractable regret minimization in such cases. When the instance dynamics - are known but otherwise arbitrary, we obtain oracle-efficient $O(\sqrt{T})$ regret - by reduction to online convex optimization, which can be made computationally efficient - if dynamics are locally \textit{action-linear}. In the presence of adversarial disturbances - to the state, we give tight bounds in terms of either the cumulative or per-round - disturbance magnitude (for \textit{strongly} or \textit{weakly} locally controllable - dynamics, respectively). Additionally, we give sublinear regret results for the - cases of unknown locally action-linear dynamics as well as for the bandit feedback - setting. Finally, we demonstrate applications of our framework to well-studied problems - including performative prediction, recommendations for adaptive agents, adaptive - pricing of real-valued goods, and repeated gameplay against no-regret learners, - directly yielding extensions beyond prior results in each case. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: brown24a -month: 0 -tex_title: Online Stackelberg Optimization via Nonlinear Control -firstpage: 697 -lastpage: 749 -page: 697-749 -order: 697 -cycles: false -bibtex_author: Brown, William and Papadimitriou, Christos and Roughgarden, Tim -author: -- given: William - family: Brown -- given: Christos - family: Papadimitriou -- given: Tim - family: Roughgarden -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/brown24a/brown24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-brown24b.md b/_posts/2024-06-29-brown24b.md deleted file mode 100644 index a2811d1..0000000 --- a/_posts/2024-06-29-brown24b.md +++ /dev/null @@ -1,59 +0,0 @@ ---- -title: 'Insufficient Statistics Perturbation: Stable Estimators for Private Least - Squares Extended Abstract' -section: Original Papers -abstract: We present a sample- and time-efficient differentially private algorithm - for ordinary least squares, with error that depends linearly on the dimension and - is independent of the condition number of $X^\top X$, where $X$ is the design matrix. - All prior private algorithms for this task require either $d^{3/2}$ examples, error - growing polynomially with the condition number, or exponential time. Our near-optimal - accuracy guarantee holds for any dataset with bounded statistical leverage and bounded - residuals. Technically, we build on the approach of Brown et al. (2023) for private - mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator - of the empirical regression vector. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: brown24b -month: 0 -tex_title: 'Insufficient Statistics Perturbation: Stable Estimators for Private Least - Squares Extended Abstract' -firstpage: 750 -lastpage: 751 -page: 750-751 -order: 750 -cycles: false -bibtex_author: Brown, Gavin and Hayase, Jonathan and Hopkins, Samuel and Kong, Weihao - and Liu, Xiyang and Oh, Sewoong and Perdomo, Juan C and Smith, Adam -author: -- given: Gavin - family: Brown -- given: Jonathan - family: Hayase -- given: Samuel - family: Hopkins -- given: Weihao - family: Kong -- given: Xiyang - family: Liu -- given: Sewoong - family: Oh -- given: Juan C - family: Perdomo -- given: Adam - family: Smith -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/brown24b/brown24b.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-buhai24a.md b/_posts/2024-06-29-buhai24a.md deleted file mode 100644 index bb4a406..0000000 --- a/_posts/2024-06-29-buhai24a.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -title: Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression -section: Original Papers -abstract: We study computational-statistical gaps for improper learning in sparse - linear regression. More specifically, given $n$ samples from a $k$-sparse linear - model in dimension $d$, we ask what is the minimum sample complexity to efficiently - (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for - the regression vector that achieves non-trivial prediction error on the $n$ samples. - Information-theoretically this can be achieved using $\Theta(k \log (d/k))$ samples. - Yet, despite its prominence in the literature, there is no polynomial-time algorithm - known to achieve the same guarantees using less than $\Theta(d)$ samples without - additional restrictions on the model. Similarly, existing hardness results are either - restricted to the proper setting, in which the estimate must be sparse as well, - or only apply to specific algorithms. We give evidence that efficient algorithms - for this task require at least (roughly) $\Omega(k^2)$ samples. In particular, we - show that an improper learning algorithm for sparse linear regression can be used - to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes - in which efficient algorithms are widely believed to require at least $\Omega(k^2)$ - samples. We complement our reduction with low-degree and statistical query lower - bounds for the sparse PCA problems from which we reduce. Our hardness results apply - to the (correlated) random design setting in which the covariates are drawn i.i.d. - from a mean-zero Gaussian distribution with unknown covariance. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: buhai24a -month: 0 -tex_title: Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression -firstpage: 752 -lastpage: 771 -page: 752-771 -order: 752 -cycles: false -bibtex_author: Buhai, Rares-Darius and Ding, Jingqiu and Tiegel, Stefan -author: -- given: Rares-Darius - family: Buhai -- given: Jingqiu - family: Ding -- given: Stefan - family: Tiegel -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/buhai24a/buhai24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ---- diff --git a/_posts/2024-06-29-carmon24a.md b/_posts/2024-06-29-carmon24a.md deleted file mode 100644 index bce5e5e..0000000 --- a/_posts/2024-06-29-carmon24a.md +++ /dev/null @@ -1,45 +0,0 @@ ---- -title: The Price of Adaptivity in Stochastic Convex Optimization -section: Original Papers -abstract: We prove impossibility results for adaptivity in non-smooth stochastic convex - optimization. Given a set of problem parameters we wish to adapt to, we define a - “price of adaptivity” (PoA) that, roughly speaking, measures the multiplicative - increase in suboptimality due to uncertainty in these parameters. When the initial - distance to the optimum is unknown but a gradient norm bound is known, we show that - the PoA is at least logarithmic for expected suboptimality, and double-logarithmic - for median suboptimality. When there is uncertainty in both distance and gradient - norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower - bounds nearly match existing upper bounds, and establish that there is no parameter-free - lunch. -layout: inproceedings -series: Proceedings of Machine Learning Research -publisher: PMLR -issn: 2640-3498 -id: carmon24a -month: 0 -tex_title: The Price of Adaptivity in Stochastic Convex Optimization -firstpage: 772 -lastpage: 774 -page: 772-774 -order: 772 -cycles: false -bibtex_author: Carmon, Yair and Hinder, Oliver -author: -- given: Yair - family: Carmon -- given: Oliver - family: Hinder -date: 2024-06-29 -address: -container-title: Proceedings of Thirty Seventh Conference on Learning Theory -volume: '247' -genre: inproceedings -issued: - date-parts: - - 2024 - - 6 - - 29 -pdf: https://proceedings.mlr.press/v247/carmon24a/carmon24a.pdf -extras: [] -# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ ----