Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide metric into train_metric and valid_metric #1912

Closed
wants to merge 21 commits into from

Conversation

henry0312
Copy link
Contributor

@henry0312 henry0312 commented Dec 17, 2018

Close #1911

@henry0312
Copy link
Contributor Author

henry0312 commented Dec 17, 2018

sample1.py
(eval_set has just valid set)

import numpy
import lightgbm
from lightgbm.sklearn import LGBMRegressor, LGBMModel, LGBMRanker
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler


X, y = make_regression(n_samples=100, n_features=10, random_state=42)
scaler = MinMaxScaler((0, 20))
y = scaler.fit_transform(y.reshape(-1, 1))
y = y.reshape(-1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_test = y_test.astype(int)  # label should be int type
group_train = [10 for _ in range(7)]
group_test = [10 for _ in range(3)]

# create dataset for lightgbm
lgb_train = lightgbm.Dataset(X_train, y_train, group=group_train)
lgb_eval = lightgbm.Dataset(X_test, y_test, group=group_test)

# specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'valid_metric': 'ndcg',
    'eval_at': [1, 3, 5],
    'label_gain': list(range(0, numpy.max(y_test) + 1)),
    'num_leaves': 15,
    'learning_rate': 0.01,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'seed': 42,
}

gbm = lightgbm.train(params,
                     lgb_train,
                     num_boost_round=1000,
                     valid_sets=[lgb_eval],
                     early_stopping_rounds=10)

sample2.py
(eval_set has train and valid set)

import numpy
import lightgbm
from lightgbm.sklearn import LGBMRegressor, LGBMModel, LGBMRanker
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler


X, y = make_regression(n_samples=100, n_features=10, random_state=42)
scaler = MinMaxScaler((0, 20))
y = scaler.fit_transform(y.reshape(-1, 1))
y = y.reshape(-1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
y_test = y_test.astype(int)  # label should be int type
group_train = [10 for _ in range(7)]
group_test = [10 for _ in range(3)]

# create dataset for lightgbm
lgb_train = lightgbm.Dataset(X_train, y_train, group=group_train)
lgb_eval = lightgbm.Dataset(X_test, y_test, group=group_test)

# specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'train_metric': ['l2', 'l1'],
    'valid_metric': ['ndcg'],
    'eval_at': [3, 5],
    'label_gain': list(range(0, numpy.max(y_test) + 1)),
    'num_leaves': 15,
    'learning_rate': 0.01,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'seed': 42,
}

gbm = lightgbm.train(params,
                     lgb_train,
                     num_boost_round=1000,
                     valid_sets=[lgb_train, lgb_eval],
                     early_stopping_rounds=10)

Both of sample scripts works.

@guolinke
Copy link
Collaborator

@henry0312 Can this be backward compatible ?

@henry0312
Copy link
Contributor Author

@guolinke No, this change will break backward compatible, so I think we will need to release v3.x

@StrikerRUS
Copy link
Collaborator

@henry0312 Can you please provide a short description of changes at cpp side to help indicate other places where something should be updated?

@henry0312
Copy link
Contributor Author

@StrikerRUS Sure, please wait for a day.

@guolinke
Copy link
Collaborator

@henry0312 I think it is better for backward compatible, as this change is for some rare cases.

  1. metric and its aliases will set to both train_metric and valid_metric
  2. train_metric only for the training dataset
  3. valid_metric only for the validate dataset

@henry0312
Copy link
Contributor Author

  1. metric and its aliases will set to both train_metric and valid_metric

Is it possible?
I don't know how to implement...

@guolinke
Copy link
Collaborator

@henry0312
I think this could be done by some conditions, like:

if `metric` in params:
  set metric to training data
  set metric to validation data
else:
  if `train_metric` in params:
    set metric to training data
  if `valid_metric` in params:
    set metric to validation data

@henry0312
Copy link
Contributor Author

henry0312 commented Dec 18, 2018 via email

`python helpers/parameter_generator.py`
@henry0312
Copy link
Contributor Author

http://www.csiss.org/SPACE/workshops/2004/SAC/files/fisher.pdf (Fisher (1958)) is not found.
Perhaps, www.csiss.org is down now...
I don't know which work is Fisher, 1958.

@henry0312 henry0312 changed the title Divide metric into train_metric and valid_metric [WIP] Divide metric into train_metric and valid_metric Dec 18, 2018
@StrikerRUS
Copy link
Collaborator

@henry0312 Fisher is OK now. I've re-run test.

@henry0312
Copy link
Contributor Author

@StrikerRUS

Can you please provide a short description of changes at cpp side to help indicate other places where something should be updated?

I divided metric parameter into train_metric and valid_metric.
train_metric is used for evaluation for train dataset and valid_metric is used for evaluation for valid dataset.
(Currently, metric parameter is used for both train and valid dataset)

We can pass different metrics to train_metric and valid_metric individually
(e.g. train_metric=l2 and valid_metric=ndcg)

cf. #1912 (comment)

@henry0312
Copy link
Contributor Author

I'm working on backward compatibility. It takes a while.

@StrikerRUS
Copy link
Collaborator

@henry0312 Thanks a lot for the explanation! Is the default behavior (picking metric corresponding to the objective in case of no metric passed in params) preserved for both metrics?

@henry0312 henry0312 changed the title [WIP] Divide metric into train_metric and valid_metric Divide metric into train_metric and valid_metric Dec 20, 2018
@henry0312
Copy link
Contributor Author

@guolinke @StrikerRUS ready for review :)

@henry0312
Copy link
Contributor Author

@StrikerRUS

Is the default behavior (picking metric corresponding to the objective in case of no metric passed in params) preserved for both metrics?

Yes, but default value is set in c++ side.

https://github.com/Microsoft/LightGBM/pull/1912/files#diff-2060c568af2dc566b6e950554a179e00R470

@StrikerRUS
Copy link
Collaborator

@henry0312 Great!

As @guolinke has already mentioned, this use case is rather uncommon and therefore we should try to preserve the current behavior for ordinary users (backward compatibility). Due to this, I suppose that metric (and its' aliases) should have the highest priority over train_metric and valid_metric. Also, I believe that we don't need new arguments in Python code, they will be not often used by users and will simply confuse them, because one can achieve the same result by using params['train_metric']/params['valid_metric'].

@henry0312
Copy link
Contributor Author

henry0312 commented Dec 20, 2018

@StrikerRUS

and therefore we should try to preserve the current behavior for ordinary users (backward compatibility).

I think this change has backward compatibility because default value for metric is set in c++ as the same way as the current code.
In other words, reggressin will be set if using LGBMRegressor, binarry_error will be set if using LGBMClassifier and ndcg will be set if using LGBMRanker.

See:

@@ -460,24 +467,12 @@ def fit(self, X, y,
feval = _eval_function_wrapper(eval_metric)
else:
feval = None
# register default metric for consistency with callable eval_metric case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change has backward compatibility because default value for metric is set in c++ as the same way as the current code.

Yeah, but I'm talking about the Python side (we had many issues about inconsistent behavior at Python side and this code solved them). So, this code should be leaved as is to prevent the future changes in current behavior. Please see this our old conversation (setting default metric only at cpp side is not enough): #1589 (comment)

Copy link
Collaborator

@StrikerRUS StrikerRUS Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry0312 Please fix this (leave old code) to get this PR merged.
I think that in case of we don't introduce new arguments (please see #1912 (review)), it's enought just to add something like this after the old code:

if params['train_metric'] or params['valid_metric']:
    params['metric'] = 'None'

to support this new behavior

if metric is set, train_metric/valid_metric will be overwritten

I believe that this is the most painless and not new-odd-behavior-introducing way to support new train/valid metric support 😃

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry0312 @guolinke I've decided to dive into the "metrics hell" again and started to write a big test which reflects the current behavior of metrics handling: 8265469 (metrics_test branch). The next step is sklearn - the hardest part. After this, I'll create a PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR has been created: #1954.

@henry0312 I hope it will help you abandon the idea to create more arguments for metrics 😄
I like the idea of separate metrics a lot, but seriously, corresponding keys in params is quite enough.

@@ -330,7 +330,8 @@ def fit(self, X, y,
sample_weight=None, init_score=None, group=None,
eval_set=None, eval_names=None, eval_sample_weight=None,
eval_class_weight=None, eval_init_score=None, eval_group=None,
eval_metric=None, early_stopping_rounds=None, verbose=True,
eval_metric=None, eval_train_metric=None, eval_valid_metric=None,
Copy link
Collaborator

@StrikerRUS StrikerRUS Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that these new args are really needed. One can achieve the needed result in their rare case with params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that these new args are really needed.
One can achieve the needed result in their rare case with params.

I don't think they are not rere cases, because CatBoost, which is one of the most famous GBDT library and was acepted in NIPS 2018, supports setting of individual metrics to training and validation.
This PR (change) allows us to set evaluation metric different from objective function, which is useful and needed and thus CatBoost implemetned it.

cf. eval_metric in
https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_parameters-list-docpage/#python-reference_parameters-list.

However, as you you pointed, eval_metric, eval_train_metric and eval_valid_metric are confusing for many users...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke What do you think? My position is that corresponding keys in params is enough because new arguments introduce huge number of possible combination which is very hard to support.

I remember we were getting rid of some arguments in favor of their aliases in params some time ago (max_bin argument is the first thing that comes to my mind). So, new args is against this policy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think to have 3 similar parameters is confusing. BTW, eval_metric could be a function, while eval_train_metric and eval_valid_metric only can be string, these inconsistency will further cause confusing...

@henry0312
Copy link
Contributor Author

@guolinke It has passed a little while since I sent this PR, but I just need this function.
Could you check the changes, especialy in c++?

@guolinke
Copy link
Collaborator

@henry0312 sorry, I am being busy recently. I will check it now.

@@ -1553,7 +1552,6 @@ def __init__(self, params=None, train_set=None, model_file=None, silent=False):
# buffer for inner predict
self.__inner_predict_buffer = [None]
self.__is_predicted_cur_iter = [False]
self.__get_eval_info()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reload is deleted?
I think this will somehow affect the performance.

@@ -426,19 +426,20 @@ LGBM_SE LGBM_BoosterGetEvalNames_R(LGBM_SE handle,
LGBM_SE buf_len,
LGBM_SE actual_len,
LGBM_SE eval_names,
LGBM_SE call_state) {
LGBM_SE call_state,
LGBM_SE data_idx) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break R package, you should update the R files accordingly.
BTW, the call state should be in the last argument.

@@ -82,7 +82,8 @@ def objective_ls(y_true, y_pred):
X, y = load_boston(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
gbm = lgb.LGBMRegressor(n_estimators=50, silent=True, objective=objective_ls)
gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=5, verbose=False)
gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric='regression',
early_stopping_rounds=5, verbose=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test these by only metric, therefore, we can check the backward compatible.

}
return ret;
}

int GetEvalNames(char** out_strs) const {
int GetEvalNames(int data_idx, char** out_strs) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have per-data metric ?

|| (!objective_type_multiclass && metric_type_multiclass)) {
Log::Fatal("Multiclass objective and metrics don't match");
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can pack the checking into a function, and avoid the repeat code.

@henry0312
Copy link
Contributor Author

@guolinke Thank you for your quick review while you are busy.
I'll also check and reply as soon as possible.

BTW, I put some ideas on #1934.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is protection from accidental merge before #1912 (review) being resolved, because it breaks the consistency between cases with and without custom callable metric.

@StrikerRUS
Copy link
Collaborator

@henry0312 Hi!
Can you please tell what is the fate of this PR?

@jameslamb
Copy link
Collaborator

@henry0312 Hi!
Can you please tell what is the fate of this PR?

@henry0312 what is the status of this PR?

@henry0312
Copy link
Contributor Author

I think this feature will be needed, but I have to rewrite, since it takes too long time (I'm sorry) and there are conflicts.
For now, I close this PR.

@henry0312 henry0312 closed this Nov 17, 2019
@StrikerRUS
Copy link
Collaborator

Ah, you closed it! I wanted to congratulate with one-year anniversary next month!.. 😃

@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

objective = regression and metric = ndcg
4 participants