-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Accept data frames as inputs #4323
Comments
I've added this feature request to #2302, per this project's standard approach for managing its backlog. If you are interested in contributing to this feature or have additional information to add, please leave a comment and the issue can be re-opened. |
Want to add that when this feature is picked up, the R package should support categorical features for data frames using the same interface as the Python package:
LightGBM/python-package/lightgbm/basic.py Line 1127 in 8a90ea3
LightGBM/python-package/lightgbm/basic.py Lines 534 to 540 in 8a90ea3
LightGBM/python-package/lightgbm/basic.py Lines 555 to 556 in 8a90ea3
|
The tricky thing here is that we would need to store factor levels in the Booster object in order to allow for safe out-of-sample predictions. This could be a named list like |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, this was locked accidentally. Just unlocked it. We'd still love help with this feature! |
Summary
It should be possible to use R data frames directly as inputs in
{lightgbm}
, without converting them to matrices.Motivation
The
data.frame
is a very common data structure in R, and many statistics and machine learning projects accept data frames as inputs, including but not limited to{caret}
,{randomForest}
, and the packages listed in #4295 (comment).Allowing the use of
data.frame
objects would reduce friction for R users working with LightGBM.Description
I think this work can be broken down into the following components
data.frame
inputs fordata
argument tolgb.Dataset()
data.frame
inputs fordata
argument tolightgbm()
data.frame
inputs fordata
argument tolgb.cv()
data.frame
inputs forpredict.lgb.Booster()
/Booster$predict()
/Predictor$predict()
init_score
inlgb.Dataset()
ifdata
is adata.frame
label
inlgb.Dataset()
ifdata
is adata.frame
weight
inlgb.Dataset()
ifdata
is adata.frame
References
See #4207 for a prior proposal and #4207 (review) for reference on this issue.
This feature precedes #4295.
The text was updated successfully, but these errors were encountered: