-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
66 lines (47 loc) · 2.04 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
output:
github_document:
pandoc_args: --webtex
---
```{r setup, include = F}
knitr::opts_chunk$set(echo = T, fig.path = 'man/figures/README-')
```
# robustsubsets
## Overview
An R implementation of [robust subset selection](https://arxiv.org/abs/2005.08217).
Robust subset selection is a robust adaption of the classic best subset selection estimator, and is defined by the constrained least squares problem:
$$
\min_{\beta, I}\,\frac{1}{2}\sum_{i\in I}(y_i-x_i^T\beta)^2\quad\,\operatorname{s.t.}\|\beta\|_0\leq k,\,I\subseteq\{1,\ldots,n\},\,|I|\geq h
$$
Robust subsets seeks out the best subset of predictors and observations and performs a least squares fit on this subset. The number of predictors used in the fit is controlled by the parameter `k` and the observations by the parameter `h`.
## Installation
You should install Gurobi and the associated R package `gurobi` before installing `robustsubsets`. Gurobi is available for free under academic license at https://www.gurobi.com/.
To install `robustsubsets` from GitHub, run the following code:
``` {r, eval = F}
devtools::install_github('ryan-thompson/robustsubsets')
```
## Usage
The `rss()` function fits a robust subset regression model for a grid of `k` and `h`. The `cv.rss()` function provides a convenient way to automatically cross-validate these parameters.
```{r, example}
library(robustsubsets)
# Generate training data with contaminated predictor matrix
set.seed(0)
n <- 100 # Number of observations
p <- 10 # Number of predictors
p0 <- 5 # Number of relevant predictors
ncontam <- 10 # Number of contaminated observations
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n, c(rep(10, ncontam), rep(0, n - ncontam)))
y <- x %*% beta + e
# Fit using robust subset selection
fit <- rss(x, y)
coef(fit, k = p0, h = n - ncontam)
# Cross-validate using robust subset selection
cl <- parallel::makeCluster(2)
fit <- cv.rss(x, y, cluster = cl)
parallel::stopCluster(cl)
coef(fit)
```
## Documentation
See the package [reference manual](robustsubsets_1.1.4.pdf).