Skip to content

Commit

Permalink
FIX Libsvm and liblinear rand() fix for convergence on windows target…
Browse files Browse the repository at this point in the history
…s (and improvement on all targets) (scikit-learn#13511)

* Fixed random number generation on windows. See cjlin1/liblinear#28

* Added correct PR number

* Fixed random number generator for libsvm on windows targets

* Updated comments

* Suppressed C4293 warnings using explicit cast.

* Suppressed C4293 warnings using explicit cast.

* Following code review: `myrand` is now inline, and all integer size checks for random generation fix are now static.

* Dummy comment edit to re-trigger build

* Fixed include error

* Attempt to provide a better and faster result by using a mersene twister random number generator as suggested by https://codeforces.com/blog/entry/61587

* Attempt to integrate the extra compiler arg requesting for explicit C++11 support

* Attempt to integrate the extra compiler arg requesting for explicit C++11 support. liblinear extension.

* the `extra_compile_args` for C++11 was not working when used in `config.add_extension` so now pre-creating liblinear library first with the flag.

* Fixed liblinear extension build: there was a missing library dependency.

* Fixed liblinear library dependency for extension compilation.

* Fixed liblinear library compilation: added tron

* Now correctly setting the seed in libsvm

* Now correctly setting the seed in liblinear. This is done using a new interface function `set_seed`.

* Updated what's new accordingly

* Increased tolerance when comparing `_average_precision` with `_average_precision_slow`. Indeed `_average_precision` is based on predictions ordering and has errors in case of ties (e.g. several 0.5 prediction values)

* Minor improvement of this test method for readability (no change in output)

* Minor doc update as per review

* dropped commented line

* Reverted test readability improvement change

* Using double barticks

* Clarified all comments

* removed useless space

* Clarified all comments, and included a `set_seed` method as per code review.

* Updated what's new about changed models as suggested by review.

* Replaced random number generator: now using a `bounded_rand_int` using tweaked Lemire post-processor from http://www.pcg-random.org/posts/bounded-rands.html)

* Updated readme rst and moved it to 0.22

* Whats new moved from 0.222 to 0.23

* Updated docs

* Update doc/whats_new/v0.23.rst

Co-Authored-By: Chiara Marmo <[email protected]>

* Dummy change to re-trigger the CI build

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.21.rst

* Update doc/whats_new/v0.22.rst

* As per code review: added LogisticRegression in the list of changes, and repeated the list of classes affected in changelog.

* New random number generator used in libsvm / liblinear is now in an independent header file `newrand/newrand.h`.

* Update sklearn/svm/src/liblinear/linear.cpp

* Fixed dates in headers and added a header to newrand.h

* Added a sentence in the changelog to explain the impact in user-friendly terms.

* Added link to PR in readme file and removed commented lines as per code review

Co-authored-by: Sylvain MARIE <[email protected]>
Co-authored-by: Chiara Marmo <[email protected]>
  • Loading branch information
3 people authored and gio8tisu committed May 15, 2020
1 parent 27511f1 commit e4157e8
Show file tree
Hide file tree
Showing 9 changed files with 188 additions and 48 deletions.
28 changes: 27 additions & 1 deletion doc/whats_new/v0.23.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ random sampling procedures.
- :class:`ensemble.BaggingClassifier`, :class:`ensemble.BaggingRegressor`,
and :class:`ensemble.IsolationForest`. |Fix|

- Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
:class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
:class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`.
|Efficiency| |Fix|

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we
Expand Down Expand Up @@ -296,7 +302,7 @@ Changelog
- |API| Changed the formatting of values in
:meth:`metrics.ConfusionMatrixDisplay.plot` and
:func:`metrics.plot_confusion_matrix` to pick the shorter format (either '2g'
or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
or 'd'). :pr:`16159` by :user:`Rick Mackenbach <Rick-Mackenbach>` and
`Thomas Fan`_.

- |Enhancement| :func:`metrics.pairwise.pairwise_distances_chunked` now allows
Expand Down Expand Up @@ -378,6 +384,26 @@ Changelog
:mod:`sklearn.svm`
..................

- |Fix| |Efficiency| Improved ``libsvm`` and ``liblinear`` random number
generators used to randomly select coordinates in the coordinate descent
algorithms. Platform-dependent C ``rand()`` was used, which is only able to
generate numbers up to ``32767`` on windows platform (see this `blog
post <https://codeforces.com/blog/entry/61587>`) and also has poor
randomization power as suggested by `this presentation
<https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful>`.
It was replaced with C++11 ``mt19937``, a Mersenne Twister that correctly
generates 31bits/63bits random numbers on all platforms. In addition, the
crude "modulo" postprocessor used to get a random number in a bounded
interval was replaced by the tweaked Lemire method as suggested by `this blog
post <http://www.pcg-random.org/posts/bounded-rands.html>`.
Any model using the :func:`svm.libsvm` or the :func:`svm.liblinear` solver,
including :class:`svm.LinearSVC`, :class:`svm.LinearSVR`,
:class:`svm.NuSVC`, :class:`svm.NuSVR`, :class:`svm.OneClassSVM`,
:class:`svm.SVC`, :class:`svm.SVR`, :class:`linear_model.LogisticRegression`,
is affected. In particular users can expect a better convergence when the
number of samples (LibSVM) or the number of features (LibLinear) is large.
:pr:`13511` by :user:`Sylvain Marié <smarie>`.

- |API| :class:`svm.SVR` and :class:`svm.OneClassSVM` attributes, `probA_` and
`probB_`, are now deprecated as they were not useful. :pr:`15558` by
`Thomas Fan`_.
Expand Down
3 changes: 2 additions & 1 deletion sklearn/metrics/tests/test_ranking.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,8 +737,9 @@ def _test_precision_recall_curve(y_true, probas_pred):
assert_array_almost_equal(precision_recall_auc, 0.859, 3)
assert_array_almost_equal(precision_recall_auc,
average_precision_score(y_true, probas_pred))
# `_average_precision` is not very precise in case of 0.5 ties: be tolerant
assert_almost_equal(_average_precision(y_true, probas_pred),
precision_recall_auc, decimal=3)
precision_recall_auc, decimal=2)
assert p.size == r.size
assert p.size == thresholds.size + 1
# Smoke test in the case of proba having only one value
Expand Down
35 changes: 28 additions & 7 deletions sklearn/svm/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,27 @@ def configuration(parent_package='', top_path=None):
config.add_library('libsvm-skl',
sources=[join('src', 'libsvm', 'libsvm_template.cpp')],
depends=[join('src', 'libsvm', 'svm.cpp'),
join('src', 'libsvm', 'svm.h')],
join('src', 'libsvm', 'svm.h'),
join('src', 'newrand', 'newrand.h')],
# Force C++ linking in case gcc is picked up instead
# of g++ under windows with some versions of MinGW
extra_link_args=['-lstdc++'],
# Use C++11 to use the random number generator fix
extra_compiler_args=['-std=c++11'],
)

libsvm_sources = ['_libsvm.pyx']
libsvm_depends = [join('src', 'libsvm', 'libsvm_helper.c'),
join('src', 'libsvm', 'libsvm_template.cpp'),
join('src', 'libsvm', 'svm.cpp'),
join('src', 'libsvm', 'svm.h')]
join('src', 'libsvm', 'svm.h'),
join('src', 'newrand', 'newrand.h')]

config.add_extension('_libsvm',
sources=libsvm_sources,
include_dirs=[numpy.get_include(),
join('src', 'libsvm')],
join('src', 'libsvm'),
join('src', 'newrand')],
libraries=['libsvm-skl'],
depends=libsvm_depends,
)
Expand All @@ -41,16 +46,30 @@ def configuration(parent_package='', top_path=None):
if os.name == 'posix':
libraries.append('m')

liblinear_sources = ['_liblinear.pyx',
join('src', 'liblinear', '*.cpp')]
# precompile liblinear to use C++11 flag
config.add_library('liblinear-skl',
sources=[join('src', 'liblinear', 'linear.cpp'),
join('src', 'liblinear', 'tron.cpp')],
depends=[join('src', 'liblinear', 'linear.h'),
join('src', 'liblinear', 'tron.h'),
join('src', 'newrand', 'newrand.h')],
# Force C++ linking in case gcc is picked up instead
# of g++ under windows with some versions of MinGW
extra_link_args=['-lstdc++'],
# Use C++11 to use the random number generator fix
extra_compiler_args=['-std=c++11'],
)

liblinear_sources = ['_liblinear.pyx']
liblinear_depends = [join('src', 'liblinear', '*.h'),
join('src', 'newrand', 'newrand.h'),
join('src', 'liblinear', 'liblinear_helper.c')]

config.add_extension('_liblinear',
sources=liblinear_sources,
libraries=libraries,
libraries=['liblinear-skl'] + libraries,
include_dirs=[join('.', 'src', 'liblinear'),
join('.', 'src', 'newrand'),
join('..', 'utils'),
numpy.get_include()],
depends=liblinear_depends,
Expand All @@ -64,8 +83,10 @@ def configuration(parent_package='', top_path=None):
config.add_extension('_libsvm_sparse', libraries=['libsvm-skl'],
sources=libsvm_sparse_sources,
include_dirs=[numpy.get_include(),
join("src", "libsvm")],
join("src", "libsvm"),
join("src", "newrand")],
depends=[join("src", "libsvm", "svm.h"),
join('src', 'newrand', 'newrand.h'),
join("src", "libsvm",
"libsvm_sparse_helper.c")])

Expand Down
2 changes: 1 addition & 1 deletion sklearn/svm/src/liblinear/liblinear_helper.c
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ struct parameter *set_parameter(int solver_type, double eps, double C,
if (param == NULL)
return NULL;

srand(seed);
set_seed(seed);
param->solver_type = solver_type;
param->eps = eps;
param->C = C;
Expand Down
76 changes: 43 additions & 33 deletions sklearn/svm/src/liblinear/linear.cpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
/*
/*
Modified 2011:
- Make labels sorted in group_classes, Dan Yamins.
Modified 2012:
- Changes roles of +1 and -1 to match scikit API, Andreas Mueller
Expand All @@ -22,6 +22,13 @@
Modified 2015:
- Patched liblinear for sample_weights - Manoj Kumar
See https://github.com/scikit-learn/scikit-learn/pull/5274
Modified 2020:
- Improved random number generator by using a mersenne twister + tweaked
lemire postprocessor. This fixed a convergence issue on windows targets.
Sylvain Marie
See <https://github.com/scikit-learn/scikit-learn/pull/13511#issuecomment-481729756>
*/

#include <math.h>
Expand All @@ -32,6 +39,10 @@
#include <locale.h>
#include "linear.h"
#include "tron.h"
#include <climits>
#include <random>
#include "../newrand/newrand.h"

typedef signed char schar;
template <class T> static inline void swap(T& x, T& y) { T t=x; x=y; y=t; }
#ifndef min
Expand Down Expand Up @@ -456,19 +467,19 @@ void l2r_l2_svr_fun::grad(double *w, double *g)
g[i] = w[i] + 2*g[i];
}

// A coordinate descent algorithm for
// A coordinate descent algorithm for
// multi-class support vector machines by Crammer and Singer
//
// min_{\alpha} 0.5 \sum_m ||w_m(\alpha)||^2 + \sum_i \sum_m e^m_i alpha^m_i
// s.t. \alpha^m_i <= C^m_i \forall m,i , \sum_m \alpha^m_i=0 \forall i
//
//
// where e^m_i = 0 if y_i = m,
// e^m_i = 1 if y_i != m,
// C^m_i = C if m = y_i,
// C^m_i = 0 if m != y_i,
// and w_m(\alpha) = \sum_i \alpha^m_i x_i
// C^m_i = C if m = y_i,
// C^m_i = 0 if m != y_i,
// and w_m(\alpha) = \sum_i \alpha^m_i x_i
//
// Given:
// Given:
// x, y, C
// eps is the stopping tolerance
//
Expand Down Expand Up @@ -579,7 +590,7 @@ int Solver_MCSVM_CS::Solve(double *w)
double eps_shrink = max(10.0*eps, 1.0); // stopping tolerance for shrinking
bool start_from_all = true;

// Initial alpha can be set here. Note that
// Initial alpha can be set here. Note that
// sum_m alpha[i*nr_class+m] = 0, for all i=1,...,l-1
// alpha[i*nr_class+m] <= C[GETI(i)] if prob->y[i] == m
// alpha[i*nr_class+m] <= 0 if prob->y[i] != m
Expand Down Expand Up @@ -615,7 +626,7 @@ int Solver_MCSVM_CS::Solve(double *w)
double stopping = -INF;
for(i=0;i<active_size;i++)
{
int j = i+rand()%(active_size-i);
int j = i+bounded_rand_int(active_size-i);
swap(index[i], index[j]);
}
for(s=0;s<active_size;s++)
Expand Down Expand Up @@ -775,14 +786,14 @@ int Solver_MCSVM_CS::Solve(double *w)
return iter;
}

// A coordinate descent algorithm for
// A coordinate descent algorithm for
// L1-loss and L2-loss SVM dual problems
//
// min_\alpha 0.5(\alpha^T (Q + D)\alpha) - e^T \alpha,
// s.t. 0 <= \alpha_i <= upper_bound_i,
//
//
// where Qij = yi yj xi^T xj and
// D is a diagonal matrix
// D is a diagonal matrix
//
// In L1-SVM case:
// upper_bound_i = Cp if y_i = 1
Expand All @@ -793,12 +804,12 @@ int Solver_MCSVM_CS::Solve(double *w)
// D_ii = 1/(2*Cp) if y_i = 1
// D_ii = 1/(2*Cn) if y_i = -1
//
// Given:
// Given:
// x, y, Cp, Cn
// eps is the stopping tolerance
//
// solution will be put in w
//
//
// See Algorithm 3 of Hsieh et al., ICML 2008

#undef GETI
Expand Down Expand Up @@ -888,7 +899,7 @@ static int solve_l2r_l1l2_svc(

for (i=0; i<active_size; i++)
{
int j = i+rand()%(active_size-i);
int j = i+bounded_rand_int(active_size-i);
swap(index[i], index[j]);
}

Expand Down Expand Up @@ -1009,14 +1020,14 @@ static int solve_l2r_l1l2_svc(
}


// A coordinate descent algorithm for
// A coordinate descent algorithm for
// L1-loss and L2-loss epsilon-SVR dual problem
//
// min_\beta 0.5\beta^T (Q + diag(lambda)) \beta - p \sum_{i=1}^l|\beta_i| + \sum_{i=1}^l yi\beta_i,
// s.t. -upper_bound_i <= \beta_i <= upper_bound_i,
//
//
// where Qij = xi^T xj and
// D is a diagonal matrix
// D is a diagonal matrix
//
// In L1-SVM case:
// upper_bound_i = C
Expand All @@ -1025,13 +1036,13 @@ static int solve_l2r_l1l2_svc(
// upper_bound_i = INF
// lambda_i = 1/(2*C)
//
// Given:
// Given:
// x, y, p, C
// eps is the stopping tolerance
//
// solution will be put in w
//
// See Algorithm 4 of Ho and Lin, 2012
// See Algorithm 4 of Ho and Lin, 2012

#undef GETI
#define GETI(i) (i)
Expand Down Expand Up @@ -1107,7 +1118,7 @@ static int solve_l2r_l1l2_svr(

for(i=0; i<active_size; i++)
{
int j = i+rand()%(active_size-i);
int j = i+bounded_rand_int(active_size-i);
swap(index[i], index[j]);
}

Expand Down Expand Up @@ -1253,17 +1264,17 @@ static int solve_l2r_l1l2_svr(
}


// A coordinate descent algorithm for
// A coordinate descent algorithm for
// the dual of L2-regularized logistic regression problems
//
// min_\alpha 0.5(\alpha^T Q \alpha) + \sum \alpha_i log (\alpha_i) + (upper_bound_i - \alpha_i) log (upper_bound_i - \alpha_i),
// s.t. 0 <= \alpha_i <= upper_bound_i,
//
// where Qij = yi yj xi^T xj and
//
// where Qij = yi yj xi^T xj and
// upper_bound_i = Cp if y_i = 1
// upper_bound_i = Cn if y_i = -1
//
// Given:
// Given:
// x, y, Cp, Cn
// eps is the stopping tolerance
//
Expand Down Expand Up @@ -1333,7 +1344,7 @@ int solve_l2r_lr_dual(const problem *prob, double *w, double eps, double Cp, dou
{
for (i=0; i<l; i++)
{
int j = i+rand()%(l-i);
int j = i+bounded_rand_int(l-i);
swap(index[i], index[j]);
}
int newton_iter = 0;
Expand Down Expand Up @@ -1521,7 +1532,7 @@ static int solve_l1r_l2_svc(

for(j=0; j<active_size; j++)
{
int i = j+rand()%(active_size-j);
int i = j+bounded_rand_int(active_size-j);
swap(index[i], index[j]);
}

Expand Down Expand Up @@ -1903,7 +1914,7 @@ static int solve_l1r_lr(

for(j=0; j<QP_active_size; j++)
{
int i = j+rand()%(QP_active_size-j);
int i = j+bounded_rand_int(QP_active_size-j);
swap(index[i], index[j]);
}

Expand Down Expand Up @@ -2234,14 +2245,14 @@ static void group_classes(const problem *prob, int *nr_class_ret, int **label_re
label[i+1] = this_label;
count[i+1] = this_count;
}

for (i=0; i <l; i++)
{
j = 0;
int this_label = (int)prob->y[i];
while(this_label != label[j])
{
j++;
j++;
}
data_label[i] = j;

Expand Down Expand Up @@ -2594,7 +2605,7 @@ void cross_validation(const problem *prob, const parameter *param, int nr_fold,
for(i=0;i<l;i++) perm[i]=i;
for(i=0;i<l;i++)
{
int j = i+rand()%(l-i);
int j = i+bounded_rand_int(l-i);
swap(perm[i],perm[j]);
}
for(i=0;i<=nr_fold;i++)
Expand Down Expand Up @@ -3057,4 +3068,3 @@ void set_print_string_function(void (*print_func)(const char*))
else
liblinear_print_string = print_func;
}

2 changes: 2 additions & 0 deletions sklearn/svm/src/liblinear/linear.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ struct model
int *n_iter; /* no. of iterations of each class */
};

void set_seed(unsigned seed);

struct model* train(const struct problem *prob, const struct parameter *param, BlasFunctions *blas_functions);
void cross_validation(const struct problem *prob, const struct parameter *param, int nr_fold, double *target);

Expand Down
Loading

0 comments on commit e4157e8

Please sign in to comment.