Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTN update readme #40

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
8f70bb1
ENH plot_quadratic and update config file
MatDag Feb 19, 2024
61806e0
ENH add quadratics to readme
MatDag Feb 19, 2024
0ecc679
FIX latex readme
MatDag Feb 19, 2024
2307ccc
FIX latex readme
MatDag Feb 19, 2024
391f6cb
Merge branch 'main' into update_readme
MatDag Oct 11, 2024
f394a6d
FIX typo
MatDag Oct 11, 2024
8da9ac5
FIX typo
MatDag Oct 11, 2024
047c83e
FIX double backslash
MatDag Oct 11, 2024
0d8b556
FIX typo
MatDag Oct 11, 2024
17e2267
ENH doc eigenvalues of matrices
MatDag Oct 11, 2024
89f0966
FIX typo
MatDag Oct 11, 2024
479e1d5
ENH value function evaluation nit that expensive
MatDag Oct 11, 2024
76ac1fe
FIX double backquotes
MatDag Oct 11, 2024
8dc98c2
WIP doc
MatDag Oct 14, 2024
df74427
WIP complete doc how to create a solver
MatDag Oct 14, 2024
dd3380c
ENH add comments amigo
MatDag Oct 14, 2024
beb83f1
ENH readme
MatDag Oct 14, 2024
c77ba62
ENH docstring StochasticJaxSolver
MatDag Oct 14, 2024
05dfa21
ENH comment amigo
MatDag Oct 14, 2024
38f2197
FIX flake8
MatDag Oct 14, 2024
d4b1b35
FIX review suggestions README.rst
MatDag Oct 16, 2024
d093b8b
CLN create template_stochastic_solver and moove explanation from AmIGO
MatDag Oct 16, 2024
e0d785c
ENH add template_solver.py
MatDag Oct 17, 2024
f333a7c
ENH add template_dataset.py
MatDag Oct 17, 2024
c31f5cc
ENH ref to benchopt template
MatDag Oct 17, 2024
931e095
Update README.rst
tomMoral Oct 18, 2024
143ca61
ENH apply suggestion readme
MatDag Oct 18, 2024
481880f
ENH replace rst by md
MatDag Oct 18, 2024
ae2ce5d
FIX brackets
MatDag Oct 18, 2024
b0cb39a
FIX brackets
MatDag Oct 18, 2024
928b70a
FIX brackets
MatDag Oct 18, 2024
a29f393
FIX brackets
MatDag Oct 18, 2024
1038359
FIX brackets
MatDag Oct 18, 2024
50ca4aa
Update README.md
MatDag Oct 18, 2024
cd580c1
Update README.md
MatDag Oct 18, 2024
5f68c11
CLN remove tilde
MatDag Oct 18, 2024
0d1ab03
FIX ref
MatDag Oct 18, 2024
7d9508e
CLN remove useless params
MatDag Oct 18, 2024
59001b3
WIP
MatDag Oct 18, 2024
2fac0da
ENH simplify template_dataset
MatDag Oct 22, 2024
3d7bded
FIX typo
MatDag Oct 22, 2024
10b3bc0
FIX batched_quadratics disappeared in simulated.py...
MatDag Oct 22, 2024
5b80320
CLN remove plot_quadratics.py
MatDag Oct 22, 2024
302d8af
ENH rm generate_matrices
MatDag Oct 23, 2024
a7769ec
CLN docstring
MatDag Oct 23, 2024
7a881a8
FIX flake8
MatDag Oct 23, 2024
9b4c556
ENH callback info template dataset
MatDag Oct 24, 2024
510c812
ENH lr_scheduler template_stochastic_solver
MatDag Oct 24, 2024
e918f5a
ENH add comments oracles
MatDag Nov 21, 2024
f65755e
ENH docstring init
MatDag Nov 21, 2024
24c08fa
ENH docstring get_step
MatDag Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
Bilevel Optimization Benchmark
===============================
[![test](https://github.com/benchopt/benchmark_bilevel/workflows/Tests/badge.svg)](https://github.com/benchopt/benchmark_bilevel/actions)
[![python](https://img.shields.io/badge/python-3.6%2B-blue)](https://www.python.org/downloads/release/python-360/)

*Results can be consulted on https://benchopt.github.io/results/benchmark_bilevel.html*

BenchOpt is a package to simplify, to make more transparent, and
reproducible the comparison of optimization algorithms.
This benchmark is dedicated to solvers for bilevel optimization:

$$\min_{x} f(x, z^* (x)) \quad \text{with} \quad z^*(x) = \arg\min_z g(x, z),$$

where $g$ and $f$ are two functions of two variables.

Different problems
------------------

This benchmark implements three bilevel optimization problems: quadratic problem, regularization selection, and data cleaning.

### 1 - Simulated quadratic bilevel problem


In this problem, the inner and the outer functions are quadratic functions defined on $\mathbb{R}^{d\times p}$

$$g(x, z) = \frac{1}{n}\sum_{i=1}^n \frac{1}{2} z^\top A_i z + \frac{1}{2} x^\top B_i x + x^\top C_i z + a_i^\top z + b_i^\top x$$

and

$$f(x, z) = \frac{1}{m} \sum_{j=1}^m \frac{1}{2} z^\top F_j z + \frac{1}{2} x^\top H_j x + x^\top K_j z + f_j^\top z + h_j^\top x$$

where $A_i, F_j$ are symmetric positive definite matrices of size $p\times p$, $B_i, F_j$ are symmetric positive definite matrices of size $d\times d$, $C_i, K_j$ are matrices of size $d\times p$, $a_i$, $f_j$ are vectors of size $d$, and $b_i, h_j$ are vectors of size $p$.

The matrices $A_i, B_i, F_j, H_j$ are randomly generated such that the eigenvalues of $\frac1n\sum_i A_i$ are between ``mu_inner``, and ``L_inner_inner``, the eigenvalues of $\frac1n\sum_i B_i$ are between ``mu_inner``, and ``L_inner_outer``, the eigenvalues of $\frac1m\sum_j F_j$ are between ``mu_inner``, and ``L_outer_inner``, and the eigenvalues of $\frac1m\sum_j H_j$ are between ``mu_inner``, and ``L_outer_outer``.

The matrices $C_i, K_j$ are generated randomly such that the spectral norm of $\frac1n\sum_i C_i$ is lower than ``L_cross_inner``, and the spectral norm of $\frac1m\sum_j K_j$ is lower than ``L_cross_outer``.

Note that in this setting, the solution of the inner problem is a linear system.
As the full batch inner and outer functions can be computed efficiently with the average Hessian matrices, the value function is evaluated in closed form.


### 2 - Regularization selection

In this problem, the inner function $g$ is defined by


$$g(x, z) = \frac{1}{n} \sum_{i=1}^{n} \ell(d_i; z) + \mathcal{R}(x, z)$$

where $d_1, \dots, d_n$ are training data samples, $z$ are the parameters of the machine learning model, and the loss function $\ell$ measures how well the model parameters $z$ predict the data $d_i$.
There is also a regularization $\mathcal{R}$ that is parametrized by the regularization strengths $x$, which aims at promoting a certain structure on the parameters $z$.

The outer function $f$ is defined as the unregularized loss on unseen data

$$f(x, z) = \frac{1}{m} \sum_{j=1}^{m} \ell(d'_j; z)$$

where the $d'_1, \dots, d'_m$ are new samples from the same dataset as above.

There are currently two datasets for this regularization selection problem.

#### Covtype - [*Homepage*](https://archive.ics.uci.edu/dataset/31/covertype*)

This is a logistic regression problem, where the data have the form $d_i = (a_i, y_i)$ with $a_i\in\mathbb{R}^p$ the features and $y_i=\pm1$ the binary target.
For this problem, the loss is $\ell(d_i, z) = \log(1+\exp(-y_i a_i^T z))$, and the regularization is simply given by
$$\mathcal{R}(x, z) = \frac12\sum_{j=1}^p\exp(x_j)z_j^2,$$
each coefficient in $z$ is independently regularized with the strength $\exp(x_j)$.

#### Ijcnn1 - [*Homepage*](https://www.openml.org/search?type=data&sort=runs&id=1575&status=active)

This is a multiclass logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\in\mathbb{R}^p$ are the features and $y_i\in \{1,\dots, k\}$ is the integer target, with k the number of classes.
For this problem, the loss is $\ell(d_i, z) = \text{CrossEntropy}(za_i, y_i)$ where $z$ is now a k x p matrix. The regularization is given by
$$\mathcal{R}(x, z) = \frac12\sum_{j=1}^k\exp(x_j)\|z_j\|^2,$$
each line in $z$ is independently regularized with the strength $\exp(x_j)$.


### 3 - Data cleaning

This problem was first introduced by [Franceschi et al. (2017)](https://arxiv.org/abs/1703.01785).
In this problem, the data is the MNIST dataset.
The training set has been corrupted: with a probability $p$, the label of the image $`y\in\{1,\dots,10\}`$ is replaced by another random label between 1 and 10.
We do not know beforehand which data has been corrupted.
We have a clean testing set, which has not been corrupted.
The goal is to fit a model on the corrupted training data that has good performances on the test set.
To do so, a set of weights -- one per train sample -- is learned as well as the model parameters.
Ideally, we would want a weight of 0 for data that has been corrupted and a weight of 1 for uncorrupted data.
The problem is cast as a bilevel problem with $g$ given by

$$g(x, z) =\frac1n \sum_{i=1}^n \sigma(x_i)\ell(d_i, z) + \frac C 2 \|z\|^2$$

where the $d_i$ are the corrupted training data, $\ell$ is the loss of a CNN parameterized by $z$, $\sigma$ is a sigmoid function, and C is a small regularization constant.
Here the outer variable $x$ is a vector of dimension $n$, and the weight of data $i$ is given by $\sigma(x_i)$.
The test function is

$$f(x, z) =\frac1m \sum_{j=1}^n \ell(d'_j, z)$$

where the $d_j$ are uncorrupted testing data.

Install
--------

This benchmark can be run using the following commands:

```bash
$ pip install -U benchopt
$ git clone https://github.com/benchopt/benchmark_bilevel
$ benchopt run benchmark_bilevel
```

Apart from the problem, options can be passed to ``benchopt run`` to restrict the benchmarks to some solvers or datasets, e.g.:

```bash
$ benchopt run benchmark_bilevel -s solver1 -d dataset2 --max-runs 10 --n-repetitions 10
````

You can also use config files to set the benchmark run:

```bash
$ benchopt run benchmark_bilevel --config config/X.yml
```

where ``X.yml`` is a config file. See https://benchopt.github.io/index.html#run-a-benchmark for an example of a config file. This will launch a huge grid search. When available, you can rather use the file ``X_best_params.yml`` to launch an experiment with a single set of parameters for each solver.

Use ``benchopt run -h`` for more details about these options, or visit https://benchopt.github.io/api.html.

### How to contribute to the benchmark?

If you want to add a solver or a new problem, you are welcome to open an issue or submit a pull request!

#### 1 - How to add a new solver?

Each solver derives from the [`benchopt.BaseSolver` class](https://benchopt.github.io/user_guide/generated/benchopt.BaseSolver.html) in the [solvers](solvers) folder. The solvers are separated among the stochastic JAX solvers and the others:
* Stochastic Jax solver: these solvers inherit from the [`StochasticJaxSolver` class](benchmark_utils/stochastic_jax_solver.py) see the detailed explanations in the [template stochastic solver](solvers/template_stochastic_solver.py).
* Other solver: see the detailed explanation in the [Benchopt documentation](https://benchopt.github.io/tutorials/add_solver.html). An example is provided in the [template solver](solvers/template_solver.py).

#### 2 - How to add a new problem?

In this benchmark, each problem is defined by a [Dataset class](https://benchopt.github.io/user_guide/generated/benchopt.BaseDataset.html) in the [datasets](datasets) folder. A [template](datasets/template_dataset.py) is provided.

Cite
----

If you use this benchmark in your research project, please cite the following paper:

```
@inproceedings{dagreou2022,
title = {A Framework for Bilevel Optimization That Enables Stochastic and Global Variance Reduction Algorithms},
booktitle = {Advances in {{Neural Information Processing Systems}} ({{NeurIPS}})},
author = {Dagr{\'e}ou, Mathieu and Ablin, Pierre and Vaiter, Samuel and Moreau, Thomas},
year = {2022}
}
```
132 changes: 0 additions & 132 deletions README.rst

This file was deleted.

26 changes: 22 additions & 4 deletions benchmark_utils/stochastic_jax_solver.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,28 @@ def set_objective(self, f_inner, f_outer, n_inner_samples, n_outer_samples,

inner_var0, outer_var0: array-like, shape (dim_inner,) (dim_outer,)

f_inner_fb, f_outer_fb: callable
Full batch version of f_inner and f_outer. Should take as input:
* inner_var: array-like, shape (dim_inner,)
* outer_var: array-like, shape (dim_outer,)
Attributes
----------
f_inner, f_outer: callable
Inner and outer objective function for the bilevel optimization
problem.

n_inner_samples, n_outer_samples: int
Number of samples to draw for the inner and outer objective
functions.

inner_var0, outer_var0: array-like, shape (dim_inner,) (dim_outer,)

batch_size_inner, batch_size_outer: int
Size of the minibatch to use for the inner and outer objective
functions.

state_inner_sampler, state_outer_sampler: dict
State of the minibatch samplers for the inner and outer objectives.

one_epoch: callable
Jitted function that runs the solver for one epoch. One epoch is
defined as `eval_freq` iterations of the solver.
"""

self.f_inner = f_inner
Expand Down
4 changes: 2 additions & 2 deletions config/quadratics_021424_best_params.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ objective:
dataset:
- quadratic[L_cross_inner=0.1,L_cross_outer=0.1,mu_inner=[.1],n_samples_inner=[32768],n_samples_outer=[1024],dim_inner=100,dim_outer=10]
solver:
- AmIGO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,outer_ratio=1.0,step_size=0.01,random_state=[1,2,3,4,5,6,7,8,9,10]]
- AmIGO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,outer_ratio=0.1,step_size=0.01,random_state=[1,2,3,4,5,6,7,8,9,10]]
- MRBO[batch_size=64,eta=0.5,eval_freq=16,framework=none,n_shia_steps=10,outer_ratio=0.1,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- SABA[batch_size=64,eval_freq=64,framework=none,mode_init_memory=zero,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- SRBA[batch_size=64,eval_freq=64,framework=none,outer_ratio=0.1,period_frac=0.5,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- SRBA[batch_size=64,eval_freq=64,framework=none,outer_ratio=1.0,period_frac=0.5,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- StocBiO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,n_shia_steps=10,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- VRBO[batch_size=64,eval_freq=2,framework=none,n_inner_steps=10,n_shia_steps=10,outer_ratio=1.0,period_frac=0.01,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- F2SA[batch_size=64,delta_lmbda=0.01,eval_freq=16,framework=none,lmbda0=1,n_inner_steps=10,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
Expand Down
Loading
Loading