-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective #4925
Conversation
@StrikerRUS could you take a quick look at this? It's still missing the pandas collections (grads and hess as dataframes, etc.) and list of lists, do you think they should be allowed as well? Since |
@jmoralez
I don't have any objections for restricting data types here to numpy arrays only for the sake of great codebase simplification. I don't see any problems with calling |
if grad.ndim == 2: # multi-class | ||
num_data = grad.shape[0] | ||
if weight.size != num_data: | ||
raise ValueError("grad and hess should be of shape [n_samples, n_classes]") | ||
weight = weight.reshape(num_data, 1) | ||
grad *= weight | ||
hess *= weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grad and hess are weighted in the sklearn interface but they're not in basic, should we weigh them there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke Hey! Don't you remember the reason for doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with interfaces in basic.py
, the weighting will be done in the C++ side finally. I'll double check why weighting is done here directly with sklearn interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shiyu1994 You've merged this PR without resolving this conversation. Could you please share your findings about weighting derivatives here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I did not notice that what we discussed above is customized objective. I thought we are discussing native objectives of LightGBM. I suddenly noticed that weights with customized objective function is not handled correctly for Python API. See the code below.
import numpy as np
import lightgbm as lgb
def fobj(preds, train_data):
labels = train_data.get_label()
return preds - labels, np.ones_like(labels)
def test():
np.random.seed(123)
num_data = 10000
num_feature = 100
train_X = np.random.randn(num_data, num_feature)
train_y = np.mean(train_X, axis=-1)
valid_X = np.random.randn(num_data, num_feature)
valid_y = np.mean(valid_X, axis=-1)
weights = np.random.rand(num_data)
train_data = lgb.Dataset(train_X, train_y, weight=weights)
valid_data = lgb.Dataset(valid_X, valid_y)
params = {
"verbose": 2,
"metric": "rmse",
"learning_rate": 0.2,
"num_trees": 20,
}
booster = lgb.train(train_set=train_data, valid_sets=[valid_data], valid_names=["valid"], params=params, fobj=fobj)
if __name__ == "__main__":
test()
If we comment out the weights in the training dataset construction. The code will provide exactly the same output as below.
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.000000
[LightGBM] [Debug] init for col-wise cost 0.000012 seconds, init for row-wise cost 0.001697 seconds
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.004134 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 25500
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 100
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[1] valid's rmse: 0.100043
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 6
[2] valid's rmse: 0.099099
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[3] valid's rmse: 0.0982311
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[4] valid's rmse: 0.0974867
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[5] valid's rmse: 0.0965613
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[6] valid's rmse: 0.0957191
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[7] valid's rmse: 0.0949163
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 6
[8] valid's rmse: 0.0940159
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[9] valid's rmse: 0.0932777
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[10] valid's rmse: 0.0924858
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[11] valid's rmse: 0.0917661
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[12] valid's rmse: 0.0909356
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[13] valid's rmse: 0.0901323
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[14] valid's rmse: 0.0894671
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[15] valid's rmse: 0.0888048
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[16] valid's rmse: 0.0881257
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[17] valid's rmse: 0.0874723
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[18] valid's rmse: 0.0868133
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8
[19] valid's rmse: 0.0862182
[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7
[20] valid's rmse: 0.0856057
We need a separate PR to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I find an additional issue. The latest master branch did not produce any evaluation results in the log as above. I get the log with version 3.3.2 instead. This is another issue we need to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suddenly noticed that weights with customized objective function is not handled correctly for Python API.
Yes, that's what I noticed when I saw that in the scikit-learn interface grad and hess are weighted before boosting. I don't know if it's because in basic you get a Dataset
and have access to the weights and can weigh them in the objective function and in sklearn you can't but if that's the case it's worth mentioning in the docs.
The latest master branch did not produce any evaluation results in the log as above.
I believe this is because callbacks are now preferred (#4878), to log the evaluation you have to specify callbacks=[lgb.log_evaluation(1)]
@@ -3159,8 +3165,8 @@ def eval(self, data, name, feval=None): | |||
is_higher_better : bool | |||
Is eval result higher better, e.g. AUC is ``is_higher_better``. | |||
|
|||
For multi-class task, the preds is group by class_id first, then group by row_id. | |||
If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i]. | |||
For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also check that customized evaluation function with multi class works correctly? I've read the code, and it seems that the customized evaluation function will finally take the output of __inner_predict
as input, which is of the shape n_sample * n_class
. This is inconsistent with the hint here.
LightGBM/python-package/lightgbm/basic.py
Line 3840 in 820ae7e
feval_ret = eval_function(self.__inner_predict(data_idx), cur_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm you're right. I've only modified the portions required for fobj
, I'll work on feval
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the reshaping to __inner_predict
in 5a56a30 so that it works in both places and added a test to check that we get the same result using the built-in log loss and computing it manually.
@jmoralez Thank you for working on this! I just left a comment about customized evaluation function. Other parts LGTM. |
@@ -2999,6 +2998,9 @@ def update(self, train_set=None, fobj=None): | |||
if not self.__set_objective_to_none: | |||
self.reset_parameter({"objective": "none"}).__set_objective_to_none = True | |||
grad, hess = fobj(self.__inner_predict(0), self.train_set) | |||
if self.num_model_per_iteration() > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to use _Booster__num_class
here instead to avoid the lib call? I don't fully understand where __num_class
gets converted to _Booster__num_class
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is safe since Booster.__num_class
comes from the lib call. See
LightGBM/python-package/lightgbm/basic.py
Lines 2598 to 2601 in a1fbe84
_safe_call(_LIB.LGBM_BoosterGetNumClasses( | |
self.handle, | |
ctypes.byref(out_num_class))) | |
self.__num_class = out_num_class.value |
and
LightGBM/python-package/lightgbm/basic.py
Lines 2616 to 2620 in a1fbe84
out_num_class = ctypes.c_int(0) | |
_safe_call(_LIB.LGBM_BoosterGetNumClasses( | |
self.handle, | |
ctypes.byref(out_num_class))) | |
self.__num_class = out_num_class.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that the attribute changes name. I see it's used as self.__num_class
in some places but if I add a breakpoint at that line it doesn't have that attribute but has self._Booster__num_class
which is the part that confuses me. Do you think the performance impact of calling the lib on each iteration is noticeable and should be changed to use the attribute instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Waiting for the CI tests to finish.
@@ -2999,6 +2998,9 @@ def update(self, train_set=None, fobj=None): | |||
if not self.__set_objective_to_none: | |||
self.reset_parameter({"objective": "none"}).__set_objective_to_none = True | |||
grad, hess = fobj(self.__inner_predict(0), self.train_set) | |||
if self.num_model_per_iteration() > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is safe since Booster.__num_class
comes from the lib call. See
LightGBM/python-package/lightgbm/basic.py
Lines 2598 to 2601 in a1fbe84
_safe_call(_LIB.LGBM_BoosterGetNumClasses( | |
self.handle, | |
ctypes.byref(out_num_class))) | |
self.__num_class = out_num_class.value |
and
LightGBM/python-package/lightgbm/basic.py
Lines 2616 to 2620 in a1fbe84
out_num_class = ctypes.c_int(0) | |
_safe_call(_LIB.LGBM_BoosterGetNumClasses( | |
self.handle, | |
ctypes.byref(out_num_class))) | |
self.__num_class = out_num_class.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved given that CI tests are passed.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Closes #4046.
This makes the predictions input for a custom objective be a
(num_data, num_class)
matrix and allows the user to return matrices of the same shape as gradients and hessians.