Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Weights is ignored when customized objective function is used #5027

Closed
Tracked by #5153
shiyu1994 opened this issue Feb 23, 2022 · 12 comments · Fixed by #5211
Closed
Tracked by #5153

[python] Weights is ignored when customized objective function is used #5027

shiyu1994 opened this issue Feb 23, 2022 · 12 comments · Fixed by #5211
Labels

Comments

@shiyu1994
Copy link
Collaborator

Description

Weights are not multiplied to the gradients and hessians from customized objective function in Python API.

Reproducible example

import numpy as np
import lightgbm as lgb

def fobj(preds, train_data):
    labels = train_data.get_label()
    return preds - labels, np.ones_like(labels)

def test():
    np.random.seed(123)
    num_data = 10000
    num_feature = 100
    train_X = np.random.randn(num_data, num_feature)
    train_y = np.mean(train_X, axis=-1)
    valid_X = np.random.randn(num_data, num_feature)
    valid_y = np.mean(valid_X, axis=-1)
    weights = np.random.rand(num_data)
    train_data = lgb.Dataset(train_X, train_y, weight=weights) # comment out weights will get the same output
    valid_data = lgb.Dataset(valid_X, valid_y)
    params = {
        "verbose": 2,
        "metric": "rmse",
        "learning_rate": 0.2,
        "num_trees": 20,
    }
    booster = lgb.train(train_set=train_data, valid_sets=[valid_data], valid_names=["valid"], params=params, fobj=fobj)

if __name__ == "__main__":
    test()

LightGBM version or commit hash:
Version 3.3.2

Command(s) you used to install LightGBM
Install from source

@shiyu1994 shiyu1994 added the bug label Feb 23, 2022
@jmoralez
Copy link
Collaborator

Moving the conversation from #4925 (comment) here. This may be intentional because the weights are available in the custom objective function through the training API and not through scikit-learn's but it'd be nice to clarify this.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Feb 23, 2022

This may be intentional because the weights are available in the custom objective function through the training API and not through scikit-learn's but it'd be nice to clarify this.

I think this is a oversight because one form of custom evaluation function accepts weights in scikit-learn API:

Expects a callable with following signatures:
``func(y_true, y_pred)``,
``func(y_true, y_pred, weight)``
or ``func(y_true, y_pred, weight, group)``

I guess the same can be done for custom objective function.

@jmoralez
Copy link
Collaborator

Hmm now I'm more confused because for the objective function weights aren't allowed

labels = dataset.get_label()
argc = len(signature(self.func).parameters)
if argc == 2:
grad, hess = self.func(labels, preds)
elif argc == 3:
grad, hess = self.func(labels, preds, dataset.get_group())
else:
raise TypeError(f"Self-defined objective function should have 2 or 3 arguments, got {argc}")

but for eval they are
labels = dataset.get_label()
argc = len(signature(self.func).parameters)
if argc == 2:
return self.func(labels, preds)
elif argc == 3:
return self.func(labels, preds, dataset.get_weight())
elif argc == 4:
return self.func(labels, preds, dataset.get_weight(), dataset.get_group())
else:
raise TypeError(f"Self-defined eval function should have 2, 3 or 4 arguments, got {argc}")

@StrikerRUS
Copy link
Collaborator

Hmm now I'm more confused because for the objective function weights aren't allowed

Yeah, exactly! I'm proposing to allow passing weights for the objective function.

@jmoralez
Copy link
Collaborator

I think it may be more user friendly to weigh things automatically. I think specifying sample weights either through the Dataset or a method in the sklearn API kind of implies that I want to use them to weigh my samples everywhere (grad, hess, metrics). It would be awkward that the grad and hess are weighted automatically but the metric isn't, and that if I switch to the training API I have to weigh everything myself. WDYT?

@StrikerRUS
Copy link
Collaborator

kind of implies that I want to use them to weigh my samples everywhere

Highly likely. But what if no?.. There will be no way to unweight them then. I think it's better to not weight automatically anything but allow user to choose weight or not weight.

@jmoralez
Copy link
Collaborator

I think it's better to not weight automatically anything but allow user to choose weight or not weight.

I agree with you, it gives the user full control and could enable use cases like #4995, which could be achieved by weighing only the metric but not the grad and hess. So we should remove this then, right?

if weight is not None:
if grad.ndim == 2: # multi-class
num_data = grad.shape[0]
if weight.size != num_data:
raise ValueError("grad and hess should be of shape [n_samples, n_classes]")
weight = weight.reshape(num_data, 1)
grad *= weight
hess *= weight

@shiyu1994
Copy link
Collaborator Author

Highly likely. But what if no?.. There will be no way to unweight them then. I think it's better to not weight automatically anything but allow user to choose weight or not weight.

Does that mean, in the example above, we should let the user to weight the gradients in the fobj function?

@StrikerRUS
Copy link
Collaborator

So we should remove this then, right?

Does that mean, in the example above, we should let the user to weight the gradients in the fobj function?

I guess so.

@shiyu1994
Copy link
Collaborator Author

Shall we open an PR to remove the weighting in sklearn API?

@StrikerRUS
Copy link
Collaborator

I'm for it.

@jameslamb jameslamb mentioned this issue Apr 14, 2022
60 tasks
StrikerRUS added a commit that referenced this issue Jun 27, 2022
…loses #5027) (#5211)

* allow custom weighing in sklearn api

* add suggestions from review

Co-authored-by: Nikita Titov <[email protected]>
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants