Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Fix misdetected objective after multiple calls to LGBMClassifier.fit #6002

Merged
merged 4 commits into from
Sep 12, 2023

Conversation

david-cortes
Copy link
Contributor

fixes #5675

This PR fixes a bug in which LGBMClassifier will set its internal objective after the first call to fit and won't try to detect it anymore based on the supplied y on subsequent calls to fit.

@david-cortes david-cortes changed the title Fix misdetected objective after multiple calls to LGBMClassifier.fit [python-package] Fix misdetected objective after multiple calls to LGBMClassifier.fit Jul 22, 2023
@jameslamb jameslamb added the fix label Jul 23, 2023
@jameslamb jameslamb requested a review from guolinke as a code owner September 8, 2023 02:41
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks!

@jmoralez could you review as well whenever you have time?

Copy link
Collaborator

@jmoralez jmoralez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jameslamb jameslamb merged commit 5e592fe into microsoft:master Sep 12, 2023
39 checks passed
@shiyu1994
Copy link
Collaborator

Seems that we encounter segment faults and test case failures after merging this PR. Maybe we shall investigate this?
For example:
https://github.com/microsoft/LightGBM/actions/runs/6154321952/job/16699516939
https://github.com/microsoft/LightGBM/actions/runs/6154321948/job/16700740982

@jameslamb
Copy link
Collaborator

jameslamb commented Sep 12, 2023

Thanks @shiyu1994 . I'll look into that today. I'm confused how the tests could have passed on this PR and now be failing on master 🤔

The segfaults might be unrelated, but I don't understand how this could be failing on master:

________________ test_classifier_fit_detects_classes_every_time ________________

    def test_classifier_fit_detects_classes_every_time():
        rng = np.random.default_rng(seed=123)
        nrows = 1000
        ncols = 20
    
        X = rng.standard_normal(size=(nrows, ncols))
        y_bin = (rng.random(size=nrows) <= .3).astype(np.float64)
        y_multi = rng.integers(4, size=nrows)
    
        model = lgb.LGBMClassifier(verbose=-1)
        for _ in range(2):
            model.fit(X, y_multi)
            assert model.objective_ == "multiclass"
>           model.fit(X, y_bin)

tests/python_package_test/test_sklearn.py:1579: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/root/.local/lib/python3.9/site-packages/lightgbm/sklearn.py:1142: in fit
    super().fit(
/root/.local/lib/python3.9/site-packages/lightgbm/sklearn.py:842: in fit
    self._Booster = train(
/root/.local/lib/python3.9/site-packages/lightgbm/engine.py:255: in train
    booster = Booster(params=params, train_set=train_set)
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:3200: in __init__
    train_set.construct()
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:2276: in construct
    self._lazy_init(data=self.data, label=self.label, reference=None,
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:1918: in _lazy_init
    self.__init_from_np2d(data, params_str, ref_dataset)
/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:2054: in __init_from_np2d
    _safe_call(_LIB.LGBM_DatasetCreateFromMat(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ret = -1

    def _safe_call(ret: int) -> None:
        """Check the return value from C API call.
    
        Parameters
        ----------
        ret : int
            The return value from C API calls.
        """
        if ret != 0:
>           raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
E           lightgbm.basic.LightGBMError: Number of classes should be specified and greater than 1 for multiclass training

/root/.local/lib/python3.9/site-packages/lightgbm/basic.py:242: LightGBMError
----------------------------- Captured stderr call -----------------------------
[LightGBM] [Fatal] Number of classes should be specified and greater than 1 for multiclass training
=============================== warnings summary ===============================

=========================== short test summary info ============================
FAILED tests/python_package_test/test_sklearn.py::test_classifier_fit_detects_classes_every_time
===== 1 failed, 520 passed, 411 skipped, 2 xfailed, 135 warnings in 45.24s =====

@jameslamb
Copy link
Collaborator

@shiyu1994 I just merged #6090 to master. The QEMU job takes over an hour and still running, but otherwise I observed all the Python CI jobs pass a few minutes ago:

image image image

So I suspect these test failures were a transient issue caused by the order commits were merged or something. I'm not planning to investigate this further unless we see it again.

Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python-package] Classifier fitted to non-binary data cannot be refit to binary data
4 participants