Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] how to extract cross validation evaluation metrics from lgb.cv()? #5571

Closed
yilinwu123 opened this issue Nov 4, 2022 · 5 comments

Comments

@yilinwu123
Copy link

Dear Developers:

I am using R package lightgbm. I first split my data into train and test set. Then I want to conduct 2-fold cross validation on my training data to tune parameters. I would like to extract cross validation evaluation metrics such as 'auc'. For 2-fold cross validation, there are two iterations, so there are two evaluation metrics ‘auc’ predicted from the held-out data. May I ask how to extract the cross validation evaluation metrics which is the mean of these two aucs? I am not sure whether I could use $best_score to extract it.

Thanks a lot for your help!

@jameslamb jameslamb changed the title R package lgb.cv "how to extract cross validation evaluation metrics?" [R-package] how to extract cross validation evaluation metrics from lgb.cv()? Nov 5, 2022
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM! I can help with this.

First, it's important to understand...2-fold cross validation does not mean "there are two iterations". It means that 2 separate LightGBM models will be trained on different randomly-selected subsets of the training data.

lgb.cv() returns a LightGBM CVBooster object. Metrics evaluated on the out-of-fold data (averaged across all models), is available in the $records_evals$valid attribute of that object.

Here's an example showing how to perform 2-fold cross validation for a binary classification problem, using 5 boosting rounds (i.e. training 5 trees in each model).

This code will work on the latest release of LightGBM on CRAN (v3.3.3) and with the latest development version from this git repository.

library(lightgbm)

# create a dataset for binary classification task "is this iris a setosa?"
data("iris")

feature_names <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
target_col <- "is_setosa"

irisDF <- data.frame(iris)
irisDF[[target_col]] <- as.integer(iris[["Species"]] == "setosa")

dtrain <- lightgbm::lgb.Dataset(
    data = data.matrix(irisDF[, feature_names])
    , label = irisDF[[target_col]]
    , params = list(
        min_data_in_bin = 1L
    )
)

# perform 2-folder cross-validation
num_boosting_rounds <- 5L
cv_bst <- lightgbm::lgb.cv(
    data = dtrain
    , nrounds = num_boosting_rounds
    , nfold = 2L
    , params = list(
        objective = "binary"
        , metric = c("auc", "binary_error")
        , num_leaves = 2L
        , min_data_in_leaf = 1L
    )
    , showsd = FALSE
)

# view out-of-fold binary error and AUC (averaged over the two models)
cv_metrics <- cv_bst[["record_evals"]][["valid"]]
metricDF <- data.frame(
    iteration = seq_len(num_boosting_rounds)
    , auc = round(unlist(cv_metrics[["auc"]][["eval"]]), 3L)
    , binary_error = round(unlist(cv_metrics[["binary_error"]][["eval"]]), 3L)
)
metricDF

metricDF in this sample code contains the values of two metrics (AUC and binary error) averaged across both models.

  iteration   auc binary_error
1         1 0.995        0.333
2         2 0.995        0.333
3         3 0.995        0.193
4         4 0.995        0.007
5         5 0.995        0.007

Hope that helps! Sorry, we will try to improve the documentation on this eval_results property and its interpretation in the future.

@github-actions
Copy link

github-actions bot commented Dec 5, 2022

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@github-actions github-actions bot closed this as completed Dec 5, 2022
@mocista
Copy link

mocista commented Jul 26, 2023

The evaluation_log of xgboost crossvalidation may look like this:

iter train_rmse_mean train_rmse_std test_rmse_mean test_rmse_std
1: 1 3098 9.72 3052 41.00
2: 2 3002 9.22 3011 40.98

That means the training metrics are also there.
In lgbm this seems not to be the case, I only see the validation metrics (ex.: lgb_cv$record_evals$valid$rmse$eval)
Is there a way to also get the training metrics?

Thanks for your help!

@jmoralez
Copy link
Collaborator

Hey @mocista. You'll be able to use the eval_train_metric argument of lgb.cv (added in #4918) in lightgbm>=4.0.0, however we're still in the process of publishing that version to CRAN. You can subscribe to #5987 to track when we do.

Sorry for the inconvenience.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants