Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] fix Dask docstrings and mimic sklearn wrapper importing way #3855

Merged
merged 9 commits into from
Jan 26, 2021

Conversation

StrikerRUS
Copy link
Collaborator

Will be reworked after discussion in #3808.
But for now this PR makes docstrings consistent with actual method signatures and reliable for early-adopters of lightgbm.dask.
Live demo: https://lightgbm.readthedocs.io/en/dask_docs/Python-API.html#dask-api.

With fixed importing way via compat.py module users are able to actually import classes, read their docstrings but in case of missing Dask dependencies informative error will be raised:

image

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much! Overall I'm +1, just a few suggestions

Comment on lines +108 to +109
elif isinstance(seq[0], (DataFrame, Series)):
return concat(seq, axis=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with importing from compat, but could the names be changed on import to something like pd_DataFrame?

from .compat import DataFrame as pd_DataFrame
from .compat import Series as pd_Series

Since both pandas and dask have a DataFrame class, I think just calling this DataFrame makes the code difficult to read. I know that I personally will read this in the future and think "wait does that mean pandas or Dask DataFrame".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally agree! But I think we can make import aliases in compat.py and then import like from .compat import pd_DataFrame. Otherwise in case of identical names it will be confusing to have

from dask import DataFrame
from pandas import DataFrame

in compat.py.

I'm going to rename only Dask imports in this PR to not overcomplicate review. pandas will be done in a follow-up PR. Do you agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sounds good, thank you

@@ -390,12 +409,9 @@ def _predict(model, data, raw_score=False, pred_proba=False, pred_leaf=False, pr


class _LGBMModel:
def __init__(self):
def _fit(self, model_factory, X, y, sample_weight=None, group=None, client=None, **kwargs):
if not all((DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED)):
raise LightGBMError('dask, pandas and scikit-learn are required for lightgbm.dask')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this check is being moved out of the constructor, then can you please put it in the _predict() function as well?

If someone tries to load a saved DaskLGBMClassifier from a pickle file (for example) and then use its .predict() method, I think we also want them to get an informative error about dask not being available. They won't get an ImportError on pickle.load() because of the magic of .compat.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Will do.

if this check is being moved out of the constructor

I wish I could leave it in __init__(), but I realized that parent's constructor in _LGBMModel class is never called 🙁 .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! didn't think about that when reviewing the MRO change in #3822

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants