You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After fitting a lightGBM model with tidymodels and treesnip, I can take the fitted workflow and make predictions on new data without any problems. However, after saving the adjusted model in ".rds" format, closing the session and loading the ".rds" model in a new session, when I try to generate a prediction the R session breaks.
This only happens with the lightGBM model, for any other type of model this inconvenience does not happen. Here is a reproducible example:
library(dplyr)
library(parsnip)
library(rsample)
library(yardstick)
library(recipes)
library(workflows)
library(dials)
library(tune)
library(treesnip)
data = bind_rows(iris, iris, iris, iris, iris, iris, iris)
set.seed(2)
initial_split <- initial_split(data, p = 0.75)
train <- training(initial_split)
test <- testing(initial_split)
initial_split
#> <Analysis/Assess/Total>
#> <788/262/1050>
recipe <- recipe(Sepal.Length ~ ., data = data) %>%
step_dummy(all_nominal(), -all_outcomes())
model <- boost_tree(
mtry = 3,
trees = 1000,
min_n = tune(),
tree_depth = tune(),
loss_reduction = tune(),
learn_rate = tune(),
sample_size = 0.75
) %>%
set_mode("regression") %>%
set_engine("lightgbm")
wf <- workflow() %>%
add_model(model) %>%
add_recipe(recipe)
wf
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: boost_tree()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 1 Recipe Step
#>
#> ● step_dummy()
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Boosted Tree Model Specification (regression)
#>
#> Main Arguments:
#> mtry = 3
#> trees = 1000
#> min_n = tune()
#> tree_depth = tune()
#> learn_rate = tune()
#> loss_reduction = tune()
#> sample_size = 0.75
#>
#> Computational engine: lightgbm
# resamples
resamples <- vfold_cv(train, v = 3)
# grid
grid <- parameters(model) %>%
finalize(train) %>%
grid_random(size = 10)
head(grid)
#> # A tibble: 6 x 4
#> min_n tree_depth learn_rate loss_reduction
#> <int> <int> <dbl> <dbl>
#> 1 2 4 0.000282 0.0000402
#> 2 13 10 0.00333 13.0
#> 3 32 11 0.000000585 0.000106
#> 4 32 7 0.000258 0.163
#> 5 31 13 0.0000000881 0.000479
#> 6 19 14 0.000000167 0.00174
# grid search
tune_grid <- wf %>%
tune_grid(
resamples = resamples,
grid = grid,
control = control_grid(verbose = FALSE),
metrics = metric_set(rmse)
)
# select best hiperparameter found
best_params <- select_best(tune_grid, "rmse")
wf <- wf %>% finalize_workflow(best_params)
wf
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: boost_tree()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 1 Recipe Step
#>
#> ● step_dummy()
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Boosted Tree Model Specification (regression)
#>
#> Main Arguments:
#> mtry = 3
#> trees = 1000
#> min_n = 13
#> tree_depth = 10
#> learn_rate = 0.00333377440294304
#> loss_reduction = 13.0320661814971
#> sample_size = 0.75
#>
#> Computational engine: lightgbm
# last fit
last_fit <- last_fit(wf,initial_split)
# metrics
collect_metrics(last_fit)
#> # A tibble: 2 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 0.380
#> 2 rsq standard 0.837
# fit to predict new data
model_fit <- fit(wf, data)
#> [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000020 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 95
#> [LightGBM] [Info] Number of data points in the train set: 1050, number of used features: 5
#> [LightGBM] [Info] Start training from score 5.843333
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
.................................................................................
predicciones = predict(model_fit, iris)
head(predicciones)
#> # A tibble: 6 x 1
#> .pred
#> <dbl>
#> 1 5.13
#> 2 5.12
#> 3 5.12
#> 4 5.12
#> 5 5.13
#> 6 5.25
# save model
saveRDS(model_fit, "model_fit.rds")
After saving the model, I close the session and in a new session load the model.
model <- readRDS("model_fit.rds")
predicciones = predict(model, iris)
When I try to generate the prediction the r session breaks. An alternative that works mostly is to pull the workflow, extract the fit and save with the model's own method, however I lose all the workflow stored in the work_flow. I will be attentive to any help or suggestion.
pull_lightgbm = pull_workflow_fit(model_fit)
library(lightgbm)
lgb.save(pull_lightgbm$fit, "lightgbm.model")
model = lgb.load("lightgbm.model")
thank you @rafzamb , I was able to reproduce the crash. It seems that some object from lightgbm is lost when the session closes.
One workaround would be save both workflow and lgb model (as you did) and then mount it back like:
model <- readRDS("model_fit.rds")
model_lgb <- lightgbm::lgb.load("lightgbm.model")
model$fit$fit$fit <- model_lgb
It is obviously not ideal. But at the same time it sounds pretty odd to consider saveRDS() to perform anything else that it is supposed to do (like store a side file such as a lgb.booster). We have to think about a good way to solve this issue!
After fitting a lightGBM model with tidymodels and treesnip, I can take the fitted workflow and make predictions on new data without any problems. However, after saving the adjusted model in ".rds" format, closing the session and loading the ".rds" model in a new session, when I try to generate a prediction the R session breaks.
This only happens with the lightGBM model, for any other type of model this inconvenience does not happen. Here is a reproducible example:
The lightGBM model was installed as follows
After saving the model, I close the session and in a new session load the model.
When I try to generate the prediction the r session breaks. An alternative that works mostly is to pull the workflow, extract the fit and save with the model's own method, however I lose all the workflow stored in the work_flow. I will be attentive to any help or suggestion.
Created on 2020-11-16 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: