-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Very large l2 when training model #4305
Comments
Thanks very much for using |
Ok, I took a look. I was able to reproduce this behavior on my system (
I then tried building
So I'm not sure what the root cause is, but I suspect that one of the stability fixes we've made recently for the R package fixed this. Maybe one or all of these:
I'm very sorry for the inconvenience, but could you try building git clone --recursive [email protected]:microsoft/LightGBM.git
cd LightGBM
sh build-cran-package.sh
R CMD INSTALL lightgbm_3.2.1.99.tar.gz I'll start a separate conversation with other maintainers about doing a new release to CRAN soon. |
Hi @jameslamb, I followed your code to install the latest version of lightGBM and I am getting exactly the same l2 training error as you posted. Thanks so much for the help and looking forward to lightGBM v3.3.0 on CRAN soon! |
Ok great! Very sorry for the inconvenience. Thanks again for the excellent bug report with a detailed reproducible example. Made it easy for me to test fixes. You can subscribe to #4310 to be notified when the next release is out. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
Hi, I am using lightGBM to determine feature importances from an in-house dataset that is very sparse in nature. When training the model on this sparse dataset, I noticed that the training l2 error is very large in the order of 10^73 and the feature importance results do not agree with my domain knowledge.
I also tried running the same dataset using xgboost and the training RMSE is much smaller in the range of 0.4-0.6. Furthermore, the feature importance results make a lot more sense to me. Finally, I also compared the Gain computed from lightGBM and xgboost (see the scatter plot below) and they do not agree very well with each other. I wonder if lightGBM does any manipulation/preprocessing to the dataset which resulted in the spurious large training l2 error?
As an additional note, I ran the same feature importance code previously on the older version of lightGBM (v2.3.4) and got results that are similar to xgboost. I only started getting this weird phenomenon when I upgraded to version3+ of lightGBM.
Reproducible example
The in-house dataset
testData.rds
can be downloaded from hereAnd here is the R code:
Output from lightGBM:
Output from xgboost:
Comparison of Gain feature importance from xgboost vs lightGBM:
Environment info
The text was updated successfully, but these errors were encountered: