-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed bug : random number generator under Windows 32/64 #28
base: master
Are you sure you want to change the base?
Conversation
Another solution may be implementing a simple random number generator according to glibc's rand(). please refer to the source code in glibc. Actually, in multi-core LIBLINEAR, we implement rand() by our own to ensure thread safety related issues of glibc's rand(). |
@infwinston as documented in the code patch: max random number in Windows is 15 bits, which is 32767. max random number in linux+GCC is 31 bits (resp. 63 bits in 64 bits systems I guess) so that's 2147483647 (resp 9223372036854775807). The fix provided in the FAQ proposes to use
Notes
Probably the best would be to use your implementation from multi-core liblinear to ensure consistency across the two libs ? I did not check what you actually implemented there. |
@smarie Thanks for your detailed explanation. I understand your point now. |
Super. That way, other windows users will be able to reach convergence instead of having to increase max_iterations to a ridiculously high level and recompile :). Meanwhile in the FAQ you could maybe change the proposed fix to ((rand() << 16) + (rand() << 1) + (rand() >> 14)) or equivalently to ((rand()*65536)+(rand()*2)+(rand()/16384)) since apparently even windows 64 has 32-bit int (http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows, thanks @tavianator ). Some additional thoughts: on 32-bit integer systems (including windows 64 in this case then), or on very difficult (ill-posed ? :) ) problems I guess, it might still be useful to easily increase the max number of iterations - we never know... Providing an option in the API to increase it (including in the wrappers: mex, etc...) would provide an additional lever to the users facing these cases. |
Good to see this work on Windows support.
Combining this becomes something like: // in header
#ifdef _WIN32
#define Rand windows_random
#define Check_random windows_check_random
#else
#define Rand rand
#define Check_random ((void)0) // nothing to do
#endif
// in windows.c
int _windows_random(int (*rand)()) {
// if ...
return /* ... */ rand() /* ... */;
}
int windows_random() {
return _windows_random(rand);
}
const char *windows_check_random() {
if (std::numeric_limits<int>::max() != _windows_random(0x7fff)) {
// ...
}
// ...
} And ... please feel free to disagree, I think the code is already a step forward for Windows users. Don't let my comment hold off merging. |
+1: thanks @wvengen for the feedback |
…ns to converge, than the hardcoded max number (for example this max number is 1000 in solve_l2r_l1l2_svc). Therefore we had to recompile the code with a larger max number to make it reach the same results than under Linux. This issue is explained in http://www.csie.ntu.edu.tw/~cjlin/liblinear/FAQ.html under "Windows Binary Files" but the fix provided is actually not correct. This commit fixes the random number generator for windows, to make it work the same way than in Linux. It also includes an auto-check performed at startup.
Hi, I came across this issue again by seeing it on scikit-learn. I took this opportunity to
Can you please have a look and merge ? That would help windows users have correct results, instead of non-converged ones. |
…r size checks are now static.
…heck is performed in all cases since the fix depends not on the platform but on the value of `RAND_MAX`. Updated the binaries.
…s (and improvement on all targets) (#13511) * Fixed random number generation on windows. See cjlin1/liblinear#28 * Added correct PR number * Fixed random number generator for libsvm on windows targets * Updated comments * Suppressed C4293 warnings using explicit cast. * Suppressed C4293 warnings using explicit cast. * Following code review: `myrand` is now inline, and all integer size checks for random generation fix are now static. * Dummy comment edit to re-trigger build * Fixed include error * Attempt to provide a better and faster result by using a mersene twister random number generator as suggested by https://codeforces.com/blog/entry/61587 * Attempt to integrate the extra compiler arg requesting for explicit C++11 support * Attempt to integrate the extra compiler arg requesting for explicit C++11 support. liblinear extension. * the `extra_compile_args` for C++11 was not working when used in `config.add_extension` so now pre-creating liblinear library first with the flag. * Fixed liblinear extension build: there was a missing library dependency. * Fixed liblinear library dependency for extension compilation. * Fixed liblinear library compilation: added tron * Now correctly setting the seed in libsvm * Now correctly setting the seed in liblinear. This is done using a new interface function `set_seed`. * Updated what's new accordingly * Increased tolerance when comparing `_average_precision` with `_average_precision_slow`. Indeed `_average_precision` is based on predictions ordering and has errors in case of ties (e.g. several 0.5 prediction values) * Minor improvement of this test method for readability (no change in output) * Minor doc update as per review * dropped commented line * Reverted test readability improvement change * Using double barticks * Clarified all comments * removed useless space * Clarified all comments, and included a `set_seed` method as per code review. * Updated what's new about changed models as suggested by review. * Replaced random number generator: now using a `bounded_rand_int` using tweaked Lemire post-processor from http://www.pcg-random.org/posts/bounded-rands.html) * Updated readme rst and moved it to 0.22 * Whats new moved from 0.222 to 0.23 * Updated docs * Update doc/whats_new/v0.23.rst Co-Authored-By: Chiara Marmo <[email protected]> * Dummy change to re-trigger the CI build * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.22.rst * As per code review: added LogisticRegression in the list of changes, and repeated the list of classes affected in changelog. * New random number generator used in libsvm / liblinear is now in an independent header file `newrand/newrand.h`. * Update sklearn/svm/src/liblinear/linear.cpp * Fixed dates in headers and added a header to newrand.h * Added a sentence in the changelog to explain the impact in user-friendly terms. * Added link to PR in readme file and removed commented lines as per code review Co-authored-by: Sylvain MARIE <[email protected]> Co-authored-by: Chiara Marmo <[email protected]>
…s (and improvement on all targets) (scikit-learn#13511) * Fixed random number generation on windows. See cjlin1/liblinear#28 * Added correct PR number * Fixed random number generator for libsvm on windows targets * Updated comments * Suppressed C4293 warnings using explicit cast. * Suppressed C4293 warnings using explicit cast. * Following code review: `myrand` is now inline, and all integer size checks for random generation fix are now static. * Dummy comment edit to re-trigger build * Fixed include error * Attempt to provide a better and faster result by using a mersene twister random number generator as suggested by https://codeforces.com/blog/entry/61587 * Attempt to integrate the extra compiler arg requesting for explicit C++11 support * Attempt to integrate the extra compiler arg requesting for explicit C++11 support. liblinear extension. * the `extra_compile_args` for C++11 was not working when used in `config.add_extension` so now pre-creating liblinear library first with the flag. * Fixed liblinear extension build: there was a missing library dependency. * Fixed liblinear library dependency for extension compilation. * Fixed liblinear library compilation: added tron * Now correctly setting the seed in libsvm * Now correctly setting the seed in liblinear. This is done using a new interface function `set_seed`. * Updated what's new accordingly * Increased tolerance when comparing `_average_precision` with `_average_precision_slow`. Indeed `_average_precision` is based on predictions ordering and has errors in case of ties (e.g. several 0.5 prediction values) * Minor improvement of this test method for readability (no change in output) * Minor doc update as per review * dropped commented line * Reverted test readability improvement change * Using double barticks * Clarified all comments * removed useless space * Clarified all comments, and included a `set_seed` method as per code review. * Updated what's new about changed models as suggested by review. * Replaced random number generator: now using a `bounded_rand_int` using tweaked Lemire post-processor from http://www.pcg-random.org/posts/bounded-rands.html) * Updated readme rst and moved it to 0.22 * Whats new moved from 0.222 to 0.23 * Updated docs * Update doc/whats_new/v0.23.rst Co-Authored-By: Chiara Marmo <[email protected]> * Dummy change to re-trigger the CI build * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.21.rst * Update doc/whats_new/v0.22.rst * As per code review: added LogisticRegression in the list of changes, and repeated the list of classes affected in changelog. * New random number generator used in libsvm / liblinear is now in an independent header file `newrand/newrand.h`. * Update sklearn/svm/src/liblinear/linear.cpp * Fixed dates in headers and added a header to newrand.h * Added a sentence in the changelog to explain the impact in user-friendly terms. * Added link to PR in readme file and removed commented lines as per code review Co-authored-by: Sylvain MARIE <[email protected]> Co-authored-by: Chiara Marmo <[email protected]>
Fixed bug : under windows, some problems required a lot more iterations to converge, than the hardcoded max number (for example this max number is 1000 in solve_l2r_l1l2_svc). Therefore we had to recompile the code with a larger max number to make it reach the same results than under Linux.
This issue is explained in the liblinear FAQ under "Windows Binary Files" section but the fix provided in the FAQis actually incorrect and not valid for both 32-bit and 64-bit systems.
This commit fixes the random number generator for windows, to make it work the same way than in Linux. It also includes an auto-check performed at startup.