ReHLine-SVM is a tiny and header-only C++ library aiming to be the fastest linear SVM solver. The whole library is a single header file rehline.h, and its only dependency is the Eigen library, which is also header-only.
ReHLine-SVM solves the following optimization problem:
where
The example1.cpp
file shows the basic use of the ReHLine-SVM
library. The key step is to first define a result variable res
,
and then call the rehline::rehline_svm()
function:
// Setting parameters
double C = 100.0;
int max_iter = 1000;
double tol = 1e-5;
int shrink = 1;
int verbose = 0;
int trace_freq = 100;
// Run the solver
rehline::ReHLineResult<Matrix> res;
rehline::rehline_svm(res, X, y, C, max_iter, tol, shrink, verbose, trace_freq);
The variable X
is a matrix of dimension y
is a vector of length X
is suggested to be a row-majored matrix
in Eigen, whose type can be defined as follows:
using Matrix = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
The complete code can be found in the example1.cpp file.
Assuming the Eigen library has been extracted to the current directory, we can use the following command to compile the program:
g++ -std=c++11 -O2 -I. example1.cpp -o example1
Running example1
gives the following possible output:
niter = 100
beta =
-0.284341
0.680663
-0.273076
-0.134658
-0.165271
0.0601155
0.0697878
0.150636
0.343303
0.226581
It is widely recognized that the celebrated Liblinear library is one of the fastest linear SVM solvers, and we seriously take the challenge.
We include the latest version of Liblinear (2.47 for now) in the
liblinear
directory, and slightly modify its main program file
train.c
to include the computing time
(the train2.c file).
Then we compare Liblinear and ReHLine-SVM on a large data set with
5,000,000 observations and 18 features. To reproduce the experiment,
download the SUSY data,
and extract it into the data
directory.
The following command is used to compile the Liblinear solver:
gcc -O3 -fPIC -c liblinear/blas/daxpy.c
gcc -O3 -fPIC -c liblinear/blas/ddot.c
gcc -O3 -fPIC -c liblinear/blas/dnrm2.c
gcc -O3 -fPIC -c liblinear/blas/dscal.c
ar rcs blas.a daxpy.o ddot.o dnrm2.o dscal.o
g++ -std=c++11 -O2 -fPIC -DNDEBUG -c liblinear/newton.cpp
g++ -std=c++11 -O2 -fPIC -DNDEBUG -c liblinear/linear.cpp
gcc -O3 -fPIC -c liblinear/train2.c
g++ -std=c++11 -O2 train2.o newton.o linear.o blas.a -o run_liblinear
And then run the program:
./run_liblinear -s 3 -c 2e-5 -e 1e-5 data/SUSY
It will generate a model file named SUSY.model
, and the output
shows that its model solving takes 9.57 seconds.
......**
optimization finished, #iter = 64
Objective value = -58.018530
nSV = 3122272
Computation time: 9.574224 seconds.
Then we use ReHLine-SVM to compute it (the complete code is in example2.cpp), and compare its result with that of Liblinear:
g++ -std=c++11 -O2 -I. -DNDEBUG example2.cpp -o run_rehline
./run_rehline data/SUSY SUSY.model
The output shows that ReHLine-SVM only takes about 3.35 seconds while achieving the same objective function value 58.0185.
Iter 0, dual_objfn = -45.7268, primal_objfn = 66.7088, xi_diff = 0, beta_diff = 21.9285
*** Iter 21, free variables converge; next test on all variables
Computation time: 3.35343 seconds
beta =
0.907429
0.00030992
-0.000610212
-0.0973955
0.000383808
0.000224872
2.04278
0.00178147
0.229938
-0.0821459
-0.560706
0.669993
-1.31528
-0.297077
-0.569394
-0.118377
-0.896771
0.252943
objfn = 58.0185
beta(liblinear) =
0.907424
0.000313113
-0.000610905
-0.0973855
0.000380076
0.000225803
2.04279
0.00178176
0.229943
-0.0821463
-0.560707
0.669993
-1.31528
-0.297069
-0.569391
-0.118391
-0.896777
0.252936
objfn(liblinear) = 58.0185
The core of this library is the function
rehline::rehline_svm(result, X, y, C, max_iter, tol, shrink, verbose, trace_freq, cout)
The meaning of each argument is as follows:
-
result
: (output) an object containing the optimization results. -
X
: (input) an$n\times d$ data matrix, preferred to be row-majored. -
y
: (input) a response vector of length$n$ taking values of 1 or -1. -
C
: (input) a scalar standing for the cost parameter. -
max_iter
: (input) an integer representing the maximum number of iterations. -
tol
: (input) a scalar giving the convergence tolerance. -
shrink
: (input) if it is a positive integer, then a shrinking scheme is used to accelerate the algorithm, and the value of this argument is used as a random seed; otherwise, the vanilla algorithm is used. -
verbose
: (input) level of verbosity, taking values of 0, 1, or 2. -
trace_freq
(input) trace objective function values everytrace_freq
iterations; only works ifverbose > 0
. -
cout
: the output stream object, default to bestd::cout
.
The ReHLine-SVM library is based on the ReHLine algorithm and solver, which takes SVM as a special case. ReHLine can do much more, including the smoothed SVM, Huber regression, quantile regression, etc. Please see the ReHLine project page for details.
ReHLine-SVM is open source under the MIT license.
Please consider to cite our article if you find ReHLine-SVM or the ReHLine algorithm/solver useful.
@inproceedings{daiqiu2023rehline,
title={ReHLine: Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence},
author={Dai, Ben and Yixuan Qiu},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}