Supporting quantile regression with XGBoost #1143

pbhogale · 2024-07-26T06:52:59Z

Feature

In situations when one wants to use R to do a quantile regression, the options available are fairly limited - the quantreg package and quantregForests are two.

On the other hand, since version 2.0.0, XGBoost also provides a quantile regression option, available in the R package as well.

It would be useful in and off itself to provide an interface to this capability from boost_tree so that quantile regression with xgboost is available out of the box
The function probably::int_conformal_quantile it might also be useful to give an option to use the xgboost quantile regression or the quantreg rq function instead of regression forests.

The text was updated successfully, but these errors were encountered:

joranE · 2024-07-26T12:42:46Z

I also am extremely interested in quantile regression being added to parsnip, but that feature is (AFAICT) not available yet in the xgboost R package, and probably will be some time coming still. They are rewriting the entire R interface to the underlying xgboost library and won't enable quantile regression until that's done. Skimming the linked issue, I think there is still a lot of work to do.

frankiethull · 2025-03-25T19:43:36Z

@pbhogale / @joranE -

While I have used "reg::quantileerror", for xgboost via dev version, it is not on CRAN yet. Meaning parsnip will not support until then.

But one thing to note is that lightgbm does support "quantile" regression. This is shown below.

library(tidymodels)
library(bonsai)
library(lightgbm)

tidymodels_prefer()
data(Chicago)

n <- nrow(Chicago)
Chicago <- Chicago %>% select(ridership, Clark_Lake, Quincy_Wells)

Chicago_train <- Chicago[1:(n - 7), ]
Chicago_test <- Chicago[(n - 6):n, ]


# spec ---
bt_reg_spec <- 
boost_tree(trees = 15) %>% 
  set_mode("regression") %>% 
  # passing quantilereg via ellipsis to xgboost engine: 
  # need at least dev verson 2.0.0 for "reg:quantileerror"
  # set_engine("xgboost", objective = "reg:quantileerror", quantile_alpha = .8)
  
  # available in lightgbm CRAN version:
  set_engine("lightgbm", objective = "quantile", alpha = .8)
bt_reg_spec
#> Boosted Tree Model Specification (regression)
#> 
#> Main Arguments:
#>   trees = 15
#> 
#> Engine-Specific Arguments:
#>   objective = quantile
#>   alpha = 0.8
#> 
#> Computational engine: lightgbm

# fit
set.seed(1)
bt_reg_fit <- bt_reg_spec %>% fit(ridership ~ ., data = Chicago_train)
bt_reg_fit
#> parsnip model object
#> 
#> LightGBM Model (15 trees)
#> Objective: quantile
#> Fitted to dataset with 2 columns


predict(bt_reg_fit, Chicago_test)
#> # A tibble: 7 × 1
#>   .pred
#>   <dbl>
#> 1 21.0 
#> 2 21.5 
#> 3 21.5 
#> 4 21.4 
#> 5 19.9 
#> 6 10.8 
#> 7  9.62

^{Created on 2025-03-25 with reprex v2.1.1}

From what I can tell, a single model can only solve for a single quantile at a time with both xgb and lgbm engine backends. I think this is where the quantregForest for RF and quantreg for lm differ from these other packages.

@simonpcouch / @topepo -

at first this seemed like an easy item to register to bonsai. But I guess it would take some pre/post processing for n-th number of regressions for n-th number of quantile_levels supplied to set_mode().

parsnip::set_model_mode(model = "boost_tree", mode = "quantile regression")

parsnip::set_model_engine(
  model = "boost_tree",
  mode = "quantile regression",
  eng = "lightgbm"
)

parsnip::set_dependency(
  model = "boost_tree",
  eng = "lightgbm",
  pkg = "lightgbm",
  mode = "quantile regression"
)

parsnip::set_dependency(
  model = "boost_tree",
  eng = "lightgbm",
  pkg = "bonsai",
  mode = "quantile regression"
)


parsnip::set_fit(
  model = "boost_tree",
  eng = "lightgbm",
  mode = "quantile regression",
  value = list(
    interface = "data.frame",
    protect = c("x", "y", "weights"),
    func = c(pkg = "bonsai", fun = "train_lightgbm"),
    defaults = list(
      verbose = -1,
      num_threads = 0,
      seed = quote(sample.int(10^5, 1)),
      deterministic = TRUE,
      objective = "quantile", 
      # ???????????????????????????????????????
      #  set as model arg to quantile_levels? 
      alpha = .8
    )
  )
)

parsnip::set_encoding(
  model = "boost_tree",
  mode = "quantile regression",
  eng = "lightgbm",
  options = list(
    predictor_indicators = "none",
    compute_intercept = FALSE,
    remove_intercept = FALSE,
    allow_sparse_x = TRUE
  )
)

parsnip::set_pred(
  model = "boost_tree",
  eng = "lightgbm",
  mode = "quantile regression",
  # ??????????????????????????????????????????????
  # type "quantile" was built special for linear quanteg
  # could need a pre/post special here for lgbm to have standard outputs: 
  type = "numeric",
  value = list(
    pre = NULL,
    post = NULL,
    func = c(pkg = "bonsai", fun = "predict_lightgbm_regression_numeric"),
    args = list(
      object = quote(object),
      new_data = quote(new_data)
    )
  )
)

I would also second a swappable engine on probably::int_conformal_quantile but probably is not currently designed in a way to swap engines like that.

pbhogale mentioned this issue Jul 26, 2024

Support quantile linear regression #465

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting quantile regression with XGBoost #1143

Supporting quantile regression with XGBoost #1143

pbhogale commented Jul 26, 2024

joranE commented Jul 26, 2024 •

edited

Loading

frankiethull commented Mar 25, 2025

Supporting quantile regression with XGBoost #1143

Supporting quantile regression with XGBoost #1143

Comments

pbhogale commented Jul 26, 2024

Feature

joranE commented Jul 26, 2024 • edited Loading

frankiethull commented Mar 25, 2025

joranE commented Jul 26, 2024 •

edited

Loading