Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][R-package] Request: provide more idiomatic interface #4295

Open
david-cortes opened this issue May 16, 2021 · 2 comments
Open

[RFC][R-package] Request: provide more idiomatic interface #4295

david-cortes opened this issue May 16, 2021 · 2 comments

Comments

@david-cortes
Copy link
Contributor

LightGBM for R has an interface which requires creating a dataset object from an R native type, and passing this custom type to the model training function. The prediction functions then take native R objects instead.

This interface is inconvenient to use, and is very different from what base R or typical R packages for decision trees (or classification/regresssion in general) have as interface.

Would be better if lightgbm could offer a more idiomatic interface, like ranger for example, which among others:

  • Allows a formula interface as well as an X/y interface, both of which accept R's own types (including data frames).
  • Accepts non-standard evaluation for column names in data frames.
  • Automatically determines categorical features when using data frames.
@jameslamb
Copy link
Collaborator

jameslamb commented May 16, 2021

Thank you for your interest in LightGBM, and for writing this up! In the future, please link to previous conversations when you create new issues from them, as this helps us to keep track of the many conversations in this project. Specifically, I believe the content of this issue is very closely related to #42007 and #4207 (comment).

  • Allows a formula interface as well as an X/y interface, both of which accept R's own types (including data frames).

  • Accepts non-standard evaluation for column names in data frames.

Have you seen {treesnip}'s LightGBM integration?

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/tests/testthat/test-lightgbm.R#L16

Does that package's interface satisfy the first two items (accept a formula, use non-standard evaluation for column names)? If not, it would be very helpful if you could provide code examples describing how you would like {lightgbm} to work. More specifics on the interface you would like this {lightgbm} to provide are necessary for maintainers or other contributors to do the work to satisfy this request.

could offer a more idiomatic interface

Is there a standard in the R ecosystem that you recommend {lightgbm} try to follow? Similar to https://scikit-learn.org/stable/developers/develop.html?

@david-cortes
Copy link
Contributor Author

Yes, treesnip provides an interface with handling of data frames, categorical features, and non-standard evaluation; but it is a parsnip connector.

A good standard to follow would be Tidymodel's guidelines for modeling packages:
https://tidymodels.github.io/model-implementation-principles/

ranger is a very good practical example to copy from. Lots of other R packages for decision trees follow similar conventions (e.g. randomForest, gbm, party, C5.0, among many others) - not sure if I'd call it an ecosystem though.

@StrikerRUS StrikerRUS changed the title [R-package] Request: provide more idiomatic interface [RFC][R-package] Request: provide more idiomatic interface Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants