Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] warn if attempting to use tree_learner other than data parallel #3848

Merged
merged 2 commits into from
Jan 25, 2021

Conversation

jameslamb
Copy link
Collaborator

Based on #3834 (comment).

Today, LightGBM's distributed learning supports three methods (feature parallel, data parallel, and voting parallel) --> https://lightgbm.readthedocs.io/en/latest/Features.html#optimization-in-parallel-learning

The Dask interface in lightgbm.dask is currently only tested with data parallel, where each chunk of the dataset is a row-wise chunk (all features, subset of rows).

The other methods MIGHT work with lightgbm.dask, but significant testing and documentation is needed. Supporting those methods might include adding extra validation to the Dask package as well.

Changes in this PR

The other parallel learning methods are optimizations, and I don't think the work to support them (#3834) needs to make it into the next LightGBM release (3.2.0). So that testing and documentation won't happen in the next few weeks.

This PR proposes adding a warning if you use one of these methods, to tell users that support for methods other than "data" is experimental and might break.

@StrikerRUS I know you recommended an error in #3834 (comment), but in my opinion it would be better to make this only a warning so that expert users (who know how to do the Dask partitioning for the different methods) are still able to experiment with the other methods.

@@ -244,6 +244,7 @@ def _train(client, data, label, params, model_factory, sample_weight=None, group
for tree_learner_param in _ConfigAliases.get('tree_learner'):
tree_learner = params.get(tree_learner_param)
if tree_learner is not None:
params['tree_learner'] = tree_learner
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a little awkward in the case where aliases are used, because multiple aliases for tree_learner might exist in params.

That change will only live on master for a short time, because I'm cleaning it up in #3813 (comment) right now

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I know you recommended an error in #3834 (comment), but in my opinion it would be better to make this only a warning so that expert users (who know how to do the Dask partitioning for the different methods) are still able to experiment with the other methods.

OK, but please be prepared to that it is very common to use

import warnings

warnings.filterwarnings("ignore")

in Python scripts with LightGBM 😃

@StrikerRUS StrikerRUS merged commit 5cdaf1b into microsoft:master Jan 25, 2021
@jameslamb jameslamb deleted the dask/warn-parallel branch January 25, 2021 15:38
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants