-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix model locale issue and improve model R/W performance. #3405
Conversation
Hello @guolinke @StrikerRUS @henry0312 @imatiach-msft I think finally the fix is in #3267! :) Running the integration testsWe are using a Java provider for LightGBM focused on low-latency: https://github.com/feedzai/feedzai-openml-java/tree/master/openml-lightgbm The Java provider PR where the new tests and LightGBM patch live can be found here: feedzai/feedzai-openml-java#53. To run the tests just clone it and do This will fetch the requested LightGBM codebase, build it in a docker container and then start building the Java provider which uses LightGBM's Integration tests and resultsTo prove the patch works, there are many tests. The main ones for this issue are 2:
When locale is forced to be "C" both v3.0.0 and this custom version work, but when another locale is used and C is not enforced (pt_UTF-8 in my case) the standard with differences in the output round-trip model read/write with the C locale: With the patched version however, all outputs match the behaviour of the |
Any tips on what's causing so many failed builds? In my system it works. |
@AlbertoEAF did you test the model loading speed of This PR? |
Hello, Alberto. |
Hello @guolinke no I didn't. Any hints on what's a large model? Also I have another question, I guess we should have tests for this right? I'm not sure how we should proceed though. Could we have a regression test with a reference model file in the repo so we could read and re-write it to disk and compare, just like we do on the Java provider? (~10MB model.txt preferrably in git-lfs to avoid weighing down on the repo). |
All failing R-package tests are not cloning the new git submodules in
As for Python I didn't find the cause but could it be the same thing? For instance the Not used to Travis nor this build system, would appreciate help to fix the failing builds. |
Please don't care about them. There are a lot of maintenance burden to get new files worked with R CRAN installation process. We will help with it later.
Yes, I guess two new folders should be copied unconditionally just like LightGBM/python-package/setup.py Lines 44 to 45 in 2870490
|
@guolinke Regarding model read/write speed, I ran first all the tests I had before with both versions. Regular tests. 3 runs per LightGBM version. Both display similar performance even though the patched version seems slightly faster (not statistically significant with only these runs):
Then I took a model file with 200 trees and 8.8MB, wrote a script to sample more trees from it and add it to the file and generated a model with 50200 trees and 2.2GB. I ran all the tests as before as well as read/re-write that model to disk with both versions of LightGBM:
The new version is significantly faster, by around 35% on round-trip model read+writes. |
Ok, added that to the Python setup @StrikerRUS but some of the builds still fail. For instance, this one fails building fast_double_parser: https://travis-ci.org/github/microsoft/LightGBM/jobs/730024056 because it lacks the non-standard I'm not sure I understand all this environments, what do they differ by? I guess it's related to the compiler. With the pretty old Ubuntu 14.04 and gcc 4.8.4 this compiles though. |
@AlbertoEAF great! happy to see the performance gain! |
Let me comment here as well as in the related issue you just opened. I do not think that the support is missing. The issue, I believe, is that the See my comment. I think we can fix this easily with a few lines of code. |
Please let me know when this is ready for help on the R side, and I'll make a PR into this one. CRAN (the official package manager for R) has very strict requirements for portability of compiled code, so if those checks turn up anything that would make the R package not work on CRAN's set of environments (https://cran.r-project.org/web/checks/check_results_lightgbm.html), we'll have to talk about how to make this change in a way that doesn't violate their requirements. |
Hello, Seeing some Python builds still where I suspect the setup.py is not copying the LightGBM/python-package/setup.py Line 46 in 32d5533
I still get that error in some Python builds though. Example: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=7444&view=logs&j=02a2c3ba-81f8-54e3-0767-5d5adbb0daa9&t=720ee3fa-96d4-5b47-dbf4-01607b74ade2&l=1269). Any tips? Should I declare that copy outside the "if"? As this issue seems to happen for "dist" "sdist" "bdist" builds. However that doesn't make sense either, as looking at the full log the I ask that as in the full log I see:
but I don't see copying LightGBM/python-package/setup.py Lines 43 to 66 in 32d5533
Any ideas @StrikerRUS ? |
I took a look at all failing Travis builds. I listed what I believe are 4 types of build errors which might require changes:
I'd appreciate input over any of the issues @StrikerRUS @guolinke @jameslamb :) Thank you! |
This is not an R build. All of the R builds are on GitHub Actions. This is a C++ test, checking code compiled with Sorry, I don't really understand why it's failing. Does If so, I believe that opens us up to a lot of weird issues, where values like |
Co-authored-by: James Lamb <[email protected]>
9aeba44
to
7e17729
Compare
@jameslamb by rebasing against the latest master now there are 2 failing jobs in Travis: python 2 & mpi build I thought python2 support was being dropped The other error was "connection reset by peer". Any ideas? |
The two test failures look unrelated to your code. I'll just restart them.
By the way, we squash commits on merge in this project, so you don't have to rebase + force push every time. You can just I like doing that because then Not a big deal though, totally up to your preference.
Officially, 3.1.x is the final minor release to support Python 2. 3.2.x will not support it. But we haven't removed the support yet, because if we decide to release a bugfix-only release next (3.1.1), we have to keep that support. You can follow #3581 for the current status of Python 2 in this project. |
@jameslamb @StrikerRUS finally green tests! Anything left to merge? :D Thank you so much for all the help ;) |
@guolinke Could you please review this? |
I will review it today |
Now that we have all these approvals (!!!), I think we need to wait to merge until we decide if the next release will be 3.1.1 or 3.2.0. |
Yeah, I agree that this "sensitive" PR should go into |
okay, let us wait for 3.1.1 release first |
I saw 3.1.1 is out! 🚀 @guolinke do you think we can merge now? |
@AlbertoEAF yeah, this should be included in v3.2.0 |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
When Java is used, the default C++ locale is broken. This is true for
Java providers that use the C API or even Python models that require JEP.
This patch solves that issue making the model reads/writes insensitive
to such settings.
To achieve it, within the model read/write codebase:
This approach means:
[CRITICAL BUG][Python] cannot wrire() UTF-8 strings by UnicodeEncodeError #2979 with the previous approach Fix Booster read/write locale dependency #2891
Changes:
Bugfixes: