Many problems of recent interest in statistics and machine learning can be solved by ReHLine, and we provide reproducible benchmark code and results at this repository.
Note
This benchmark is based on ReHLine 0.0.3.
Problem | Results |
---|---|
SVM | Result |
Smoothed SVM | Result |
FairSVM | Result |
ElasticQR | Result |
RidgeHuber | Result |
For all benchmarks, it is necessary to install the following packages.
pip install benchopt scikit-learn
pip install rehline==0.0.3
Then each problem has its own dependencies, described in each subsection.
For the SVM task, you also need to install the CVXPY package:
pip install cvxpy[MOSEK]
The command above will also install the commercial MOSEK solver, but you still need a license to use it.
Follow the MOSEK website to obtain a license. For students and professors at a university, the academic license is also available. Once you get a license, follow the instructions to install it on your machine.
To run all benchmarks available, enter the SVM directory and use the following command:
cd benchmark_SVM
benchopt run . -d classification_data
To gather more repetitions, add the following options:
benchopt run . -d classification_data --n-repetitions 10 --timeout 1000
To run the benchmark for a specific solver and data set, add the corresponding parameters. For example:
benchopt run . -d classification_data[dataset_name="steel-plates-fault"] -s rehline
The command above then only tests the ReHLine solver on the steel-plates-fault
data set.
Running sSVM benchmarks require the following additional package:
pip install sklearn-contrib-lightning
And the running commands are similar to those in the SVM subsection.
cd benchmark_sSVM
benchopt run . -d classification_data --n-repetitions 10 --timeout 1000
Running the Huber benchmarks requires the CVXPY package and the R software. It would be easier to operate within a Conda environment.
pip install cvxpy[MOSEK]
conda install r-base rpy2 -c conda-forge
R -e "chooseCRANmirror(ind = 1); install.packages(c('hqreg'))"
Also see the SVM subsection for the configuration of the MOSEK commercial solver.
To run all benchmarks available, enter the Huber directory and use the following command:
cd benchmark_Huber
benchopt run . -d reg_data --n-repetitions 10 --timeout 1000
First follow the Huber subsection to install the dependencies there. Additionally, in order to include the CPLEX and Gurobi commercial solvers, you will need to install the following two PyPI packages:
pip install gurobipy cplex
Due to the use of these two commercial solvers, the Community Edition or Restricted license is being used here. You may encounter the following errors:
CPLEX: CPLEX Error 1016 - Community Edition. Problem size limits have been exceeded. Please purchase a license at http://ibm.biz/error1016.
GUROBI: Restricted license - for non-production use only - expires 2024-10-28.
Please visit http://ibm.biz/error1016 to update your CPLEX license. Additionally, for GUROBI, please check the ARGUMENT 'env' which allows for the passage of a Gurobi Environment, specifying parameters and license information. You can find further information regarding the placement of the Gurobi license file gurobi.lic
at https://support.gurobi.com/hc/en-us/articles/360013417211-Where-do-I-place-the-Gurobi-license-file-gurobi-lic.
PS: We use reg_sim
data to demonstrate that the free version of the two solvers has a very limited capacity to handle data scales.
Finally, we also test the R implementation of ReHLine. Use the following commands to download the source package and install it.
conda install r-rcpp r-rcppeigen -c conda-forge
curl -L -O https://github.com/softmin/ReHLine-r/archive/refs/heads/main.zip
unzip main.zip
R CMD INSTALL ReHLine-r-main
To run all benchmarks available, enter the QR directory and use the following command:
cd benchmark_QR
benchopt run . -d reg_data --n-repetitions 10 --timeout 1000
There are also some simulated data sets available:
benchopt run . -d reg_sim
Following the Huber and QR subsections, first install CVXPY with commercial solvers MOSEK, CPLEX, and Gurobi:
pip install cvxpy[MOSEK] gurobipy cplex
We also include the original implementation of FairSVM based on the DCCP package:
pip install dccp==1.0.3
Note that we explicitly specify the version of DCCP, since otherwise
the installation may encounter errors.
The file benchmark_FairSVM/fair_classification/linear_clf_pref_fairness.py
is derived from the original
FairSVM repository,
with slight modifications for software compatibility.
To run all benchmarks available, enter the FairSVM directory and use the following command:
cd benchmark_FairSVM
benchopt run . -d classification_data --n-repetitions 10 --timeout 1000
There are also some simulated data sets available:
benchopt run . -d classification_sim
Assuming all dependencies are properly installed, the following commands are used to generate benchmark results in the article:
cd benchmark_SVM
benchopt run . -d classification_data --n-repetitions 10 --timeout 1000
cd ../benchmark_sSVM
benchopt run . -d classification_data --n-repetitions 10 --timeout 1000
cd ../benchmark_Huber
benchopt run . -d reg_data --n-repetitions 10 --timeout 1000
cd ../benchmark_QR
benchopt run . -d reg_data --n-repetitions 10 --timeout 1000
cd ../benchmark_FairSVM
benchopt run . -d classification_data --n-repetitions 10 --timeout 1000