[python-package] Fix misdetected objective after multiple calls to `LGBMClassifier.fit` #6002

david-cortes · 2023-07-22T16:32:40Z

This PR fixes a bug in which LGBMClassifier will set its internal objective after the first call to fit and won't try to detect it anymore based on the supplied y on subsequent calls to fit.

…ixes microsoft#5675

jameslamb

Makes sense to me, thanks!

@jmoralez could you review as well whenever you have time?

jmoralez

Thanks!

shiyu1994 · 2023-09-12T05:01:58Z

Seems that we encounter segment faults and test case failures after merging this PR. Maybe we shall investigate this?
For example:
https://github.com/microsoft/LightGBM/actions/runs/6154321952/job/16699516939
https://github.com/microsoft/LightGBM/actions/runs/6154321948/job/16700740982

jameslamb · 2023-09-12T13:50:11Z

Thanks @shiyu1994 . I'll look into that today. I'm confused how the tests could have passed on this PR and now be failing on master 🤔

The segfaults might be unrelated, but I don't understand how this could be failing on master:

________________ test_classifier_fit_detects_classes_every_time ________________

    def test_classifier_fit_detects_classes_every_time():
        rng = np.random.default_rng(seed=123)
        nrows = 1000
        ncols = 20
    
        X = rng.standard_normal(size=(nrows, ncols))
        y_bin = (rng.random(size=nrows) <= .3).astype(np.float64)
        y_multi = rng.integers(4, size=nrows)
    
        model = lgb.LGBMClassifier(verbose=-1)
        for _ in range(2):
            model.fit(X, y_multi)
            assert model.objective_ == "multiclass"
>           model.fit(X, y_bin)

tests/python_package_test/test_sklearn.py:1579: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/root/.local/lib/python3.9/site-packages/lightgbm/sklearn.py:1142: in fit
    super().fit(
/root/.local/lib/python3.9/site-packages/lightgbm/sklearn.py:842: in fit
    self._Booster = train(
/root/.local/lib/python3.9/site-packages/lightgbm/engine.py:255: in train
    booster = Booster(params=params, train_set=train_set)
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:3200: in __init__
    train_set.construct()
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:2276: in construct
    self._lazy_init(data=self.data, label=self.label, reference=None,
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:1918: in _lazy_init
    self.__init_from_np2d(data, params_str, ref_dataset)
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:2054: in __init_from_np2d
    _safe_call(_LIB.LGBM_DatasetCreateFromMat(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ret = -1

    def _safe_call(ret: int) -> None:
        """Check the return value from C API call.
    
        Parameters
        ----------
        ret : int
            The return value from C API calls.
        """
        if ret != 0:
>           raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
E           lightgbm.basic.LightGBMError: Number of classes should be specified and greater than 1 for multiclass training

/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:242: LightGBMError
----------------------------- Captured stderr call -----------------------------
[LightGBM] [Fatal] Number of classes should be specified and greater than 1 for multiclass training
=============================== warnings summary ===============================

=========================== short test summary info ============================
FAILED tests/python_package_test/test_sklearn.py::test_classifier_fit_detects_classes_every_time
===== 1 failed, 520 passed, 411 skipped, 2 xfailed, 135 warnings in 45.24s =====

jameslamb · 2023-09-12T14:37:51Z

@shiyu1994 I just merged #6090 to master. The QEMU job takes over an hour and still running, but otherwise I observed all the Python CI jobs pass a few minutes ago:

Appveyor: https://ci.appveyor.com/project/guolinke/lightgbm/builds/48017021/job/9w3pr19b51b3alwv
Azure DevOps: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=15180&view=results
GitHub Actions: https://github.com/microsoft/LightGBM/actions/runs/6160576747

So I suspect these test failures were a transient issue caused by the order commits were merged or something. I'm not planning to investigate this further unless we see it again.

github-actions · 2023-12-13T00:21:11Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

fix misdetection of classifier objective after multiple calls to fit f…

cdf534c

…ixes microsoft#5675

david-cortes requested review from StrikerRUS, shiyu1994, jameslamb and jmoralez as code owners July 22, 2023 16:32

david-cortes changed the title ~~Fix misdetected objective after multiple calls to LGBMClassifier.fit~~ [python-package] Fix misdetected objective after multiple calls to LGBMClassifier.fit Jul 22, 2023

david-cortes added 2 commits July 22, 2023 19:01

missing prefix

de4e519

linter

20c47fc

jameslamb added the fix label Jul 23, 2023

Merge branch 'master' into fix_classifier_refit

b777288

jameslamb requested a review from guolinke as a code owner September 8, 2023 02:41

jameslamb approved these changes Sep 8, 2023

View reviewed changes

jmoralez approved these changes Sep 11, 2023

View reviewed changes

jameslamb merged commit 5e592fe into microsoft:master Sep 12, 2023
39 checks passed

github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] Fix misdetected objective after multiple calls to `LGBMClassifier.fit` #6002

[python-package] Fix misdetected objective after multiple calls to `LGBMClassifier.fit` #6002

david-cortes commented Jul 22, 2023

jameslamb left a comment

jmoralez left a comment

shiyu1994 commented Sep 12, 2023

jameslamb commented Sep 12, 2023 •

edited

Loading

jameslamb commented Sep 12, 2023

github-actions bot commented Dec 13, 2023

[python-package] Fix misdetected objective after multiple calls to LGBMClassifier.fit #6002

[python-package] Fix misdetected objective after multiple calls to LGBMClassifier.fit #6002

Conversation

david-cortes commented Jul 22, 2023

jameslamb left a comment

Choose a reason for hiding this comment

jmoralez left a comment

Choose a reason for hiding this comment

shiyu1994 commented Sep 12, 2023

jameslamb commented Sep 12, 2023 • edited Loading

jameslamb commented Sep 12, 2023

github-actions bot commented Dec 13, 2023

[python-package] Fix misdetected objective after multiple calls to `LGBMClassifier.fit` #6002

[python-package] Fix misdetected objective after multiple calls to `LGBMClassifier.fit` #6002

jameslamb commented Sep 12, 2023 •

edited

Loading