Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting quantile regression with XGBoost #1143

Open
pbhogale opened this issue Jul 26, 2024 · 2 comments
Open

Supporting quantile regression with XGBoost #1143

pbhogale opened this issue Jul 26, 2024 · 2 comments

Comments

@pbhogale
Copy link

Feature

In situations when one wants to use R to do a quantile regression, the options available are fairly limited - the quantreg package and quantregForests are two.

On the other hand, since version 2.0.0, XGBoost also provides a quantile regression option, available in the R package as well.

  1. It would be useful in and off itself to provide an interface to this capability from boost_tree so that quantile regression with xgboost is available out of the box
  2. The function probably::int_conformal_quantile it might also be useful to give an option to use the xgboost quantile regression or the quantreg rq function instead of regression forests.
@joranE
Copy link

joranE commented Jul 26, 2024

I also am extremely interested in quantile regression being added to parsnip, but that feature is (AFAICT) not available yet in the xgboost R package, and probably will be some time coming still. They are rewriting the entire R interface to the underlying xgboost library and won't enable quantile regression until that's done. Skimming the linked issue, I think there is still a lot of work to do.

@frankiethull
Copy link

@pbhogale / @joranE -

While I have used "reg::quantileerror", for xgboost via dev version, it is not on CRAN yet. Meaning parsnip will not support until then.

But one thing to note is that lightgbm does support "quantile" regression. This is shown below.

library(tidymodels)
library(bonsai)
library(lightgbm)

tidymodels_prefer()
data(Chicago)

n <- nrow(Chicago)
Chicago <- Chicago %>% select(ridership, Clark_Lake, Quincy_Wells)

Chicago_train <- Chicago[1:(n - 7), ]
Chicago_test <- Chicago[(n - 6):n, ]


# spec ---
bt_reg_spec <- 
boost_tree(trees = 15) %>% 
  set_mode("regression") %>% 
  # passing quantilereg via ellipsis to xgboost engine: 
  # need at least dev verson 2.0.0 for "reg:quantileerror"
  # set_engine("xgboost", objective = "reg:quantileerror", quantile_alpha = .8)
  
  # available in lightgbm CRAN version:
  set_engine("lightgbm", objective = "quantile", alpha = .8)
bt_reg_spec
#> Boosted Tree Model Specification (regression)
#> 
#> Main Arguments:
#>   trees = 15
#> 
#> Engine-Specific Arguments:
#>   objective = quantile
#>   alpha = 0.8
#> 
#> Computational engine: lightgbm

# fit
set.seed(1)
bt_reg_fit <- bt_reg_spec %>% fit(ridership ~ ., data = Chicago_train)
bt_reg_fit
#> parsnip model object
#> 
#> LightGBM Model (15 trees)
#> Objective: quantile
#> Fitted to dataset with 2 columns


predict(bt_reg_fit, Chicago_test)
#> # A tibble: 7 × 1
#>   .pred
#>   <dbl>
#> 1 21.0 
#> 2 21.5 
#> 3 21.5 
#> 4 21.4 
#> 5 19.9 
#> 6 10.8 
#> 7  9.62

Created on 2025-03-25 with reprex v2.1.1

From what I can tell, a single model can only solve for a single quantile at a time with both xgb and lgbm engine backends. I think this is where the quantregForest for RF and quantreg for lm differ from these other packages.

@simonpcouch / @topepo -

at first this seemed like an easy item to register to bonsai. But I guess it would take some pre/post processing for n-th number of regressions for n-th number of quantile_levels supplied to set_mode().

parsnip::set_model_mode(model = "boost_tree", mode = "quantile regression")

parsnip::set_model_engine(
  model = "boost_tree",
  mode = "quantile regression",
  eng = "lightgbm"
)

parsnip::set_dependency(
  model = "boost_tree",
  eng = "lightgbm",
  pkg = "lightgbm",
  mode = "quantile regression"
)

parsnip::set_dependency(
  model = "boost_tree",
  eng = "lightgbm",
  pkg = "bonsai",
  mode = "quantile regression"
)


parsnip::set_fit(
  model = "boost_tree",
  eng = "lightgbm",
  mode = "quantile regression",
  value = list(
    interface = "data.frame",
    protect = c("x", "y", "weights"),
    func = c(pkg = "bonsai", fun = "train_lightgbm"),
    defaults = list(
      verbose = -1,
      num_threads = 0,
      seed = quote(sample.int(10^5, 1)),
      deterministic = TRUE,
      objective = "quantile", 
      # ???????????????????????????????????????
      #  set as model arg to quantile_levels? 
      alpha = .8
    )
  )
)

parsnip::set_encoding(
  model = "boost_tree",
  mode = "quantile regression",
  eng = "lightgbm",
  options = list(
    predictor_indicators = "none",
    compute_intercept = FALSE,
    remove_intercept = FALSE,
    allow_sparse_x = TRUE
  )
)

parsnip::set_pred(
  model = "boost_tree",
  eng = "lightgbm",
  mode = "quantile regression",
  # ??????????????????????????????????????????????
  # type "quantile" was built special for linear quanteg
  # could need a pre/post special here for lgbm to have standard outputs: 
  type = "numeric",
  value = list(
    pre = NULL,
    post = NULL,
    func = c(pkg = "bonsai", fun = "predict_lightgbm_regression_numeric"),
    args = list(
      object = quote(object),
      new_data = quote(new_data)
    )
  )
)

I would also second a swappable engine on probably::int_conformal_quantile but probably is not currently designed in a way to swap engines like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants