[ci] run Dask examples on CI #4064

StrikerRUS · 2021-03-12T01:04:34Z

Dask examples are placed in the python-guide/dask subdirectory.

Also this PR adds some punctuation to Dask guide.

StrikerRUS · 2021-03-12T01:37:57Z

Good news! I accidentally forgot to exclude running Dask examples on macOS and they succeeded! Sorry, no logs. Linking #3782.

jameslamb

Thanks for the fix, but I disagree with adding CI-only details into the examples, and I'm confused by the punctuation changes in the docs

jameslamb · 2021-03-12T02:11:26Z

examples/python-guide/dask/binary-classification.py

 import dask.array as da
 from distributed import Client, LocalCluster
 from sklearn.datasets import make_blobs

 import lightgbm as lgb

 if __name__ == "__main__":
+    if not sys.platform.startswith('linux'):


If the examples were working on Mac, can we remove this check until something breaks? These are supposed to be examples that teach users how to use lightgbm.dask, and I'm really uncomfortable with including details only necessary for CI in them.

including details only necessary for CI in them.

I don't think these details are CI-related only. Imagine, user is using macOS and trying to get familiar with LightGBM's Dask-package. There is no info about that we only support Linux for now anywhere in the docs. So user runs example, gets a crash and feels disappointed (maybe posts a new issue here).

If the examples were working on Mac

I guess that according to #3782, this is just a happy coincidence.

However, this is not my strong opinion and if you still think that it is better to allow users execute examples that we know may crash on their systems, I'll be happy to revert these if guards.

I think they are CI-specific because of the use of sys.exit(0). If a user copies that code and runs it in a Python REPL, their Python session will end. If they run it in a Jupyter notebook, they'll get an error like this

SystemExit: 0
/Users/jlamb/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3425: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

On this point:

There is no info about that we only support Linux for now anywhere in the docs.
I guess that according to #3782, this is just a happy coincidence

I think that's an oversight, and that we should have a note at https://lightgbm.readthedocs.io/en/latest/Parallel-Learning-Guide.html#dask saying that Dask features are only tested on Linux.

While I was developing these examples in #3814 , I tested them on my Mac. So I do actually expect them to continue working on Mac. They're much simpler than the tests in test_dask.py, which is why I think we're unable to run test_dask.py in Mac CI jobs.

I'm in favor of reverting these if guards and instead adding a note to the documentation saying that lightgbm.dask is only tested on Linux.

If a user copies that code and runs it in a Python REPL, their Python session will end. If they run it in a Jupyter notebook, they'll get an error like this

It is suggested to run these examples only from command line and from nothing else.

After installing the package and its dependencies, any of the examples here can be run with a command like this:
python binary-classification.py

I'm in favor of reverting these if guards and instead adding a note to the documentation saying that lightgbm.dask is only tested on Linux.

Sure, done!

jameslamb · 2021-03-12T02:12:52Z

docs/Parallel-Learning-Guide.rst

-* ensure that each worker in the cluster has some of the training data
-* try to give each worker roughly the same amount of data, especially if your dataset is small
-* if you plan to train multiple models (for example, to tune hyperparameters) on the same data, use ``client.persist()`` before training to materialize the data one time
+* ensure that each worker in the cluster has some of the training data;


Why is it necessary to add all these semicolons?

Sorry, as you know I'm not a native English speaker, I thought these semicolons were just accidentally missed.

Semicolons were traditionally used in lists to separate each bullet point, but although this is still correct, they are not used as often today.
https://www.onlinegrammar.com.au/punctuation-in-lists/

Now I see that I was wrong. But it looks like we need a full stop at the end at least, right?

A common style is to use lower case for the first letter in each bullet point, and a full stop after the last bullet point.

No problem, totally understood!

I don't think you need to worry about this. For native speakers (in American English, at least...that website you linked to is from Australia 😁 ) bulleted lists don't really have formal rules. Any mix of the following is equally valid:

capitalizing vs. not capitalizing the first letter of each bullet point

using a period at the end of each bullet point

using semicolons at the end of the first n-1 bullet points and a period at the end of the final one

So it's not that the current state on master is "right" and your proposal is "wrong". They're all equally valid ways to write a bulleted list, and I don't think it's worth spending time worrying about it.

Got it! Thank you very much for the lesson! 🙂

jameslamb · 2021-03-14T19:13:38Z

python-package/README.rst

@@ -204,6 +204,10 @@ You can use ``python setup.py bdist_wheel`` instead of ``python setup.py install
 Install Dask-package
 ''''''''''''''''''''

+.. warning::
+
+    Dask-package is only tested on Linux.


When this file is viewed on GitHub, this warning doesn't render the way it would in Sphinx.

Are you ok with that? It doesn't bother me because I hope we'll be able to remove this soon anyway, but just wanted you to be aware.

Yeah, like all other Warnings and Notes in other RSTs:

I believe there should be no problem with rendering on PyPI because http://rst.ninjs.org/ shows no errors when I paste raw content of python-package/README.rst.
FYI, PyPI used to show blank page in case of any errors in RST formatting of uploaded README, but I don't know what's the current policy of error handling there.

ok sounds good!

jameslamb

looks good, thanks

* [docs]Add alt text on images * Update docs/GPU-Windows.rst Co-authored-by: James Lamb <[email protected]> * Update docs/GPU-Windows.rst Co-authored-by: James Lamb <[email protected]> * Apply suggestions from code review Co-authored-by: James Lamb <[email protected]> * Apply suggestions from code review Co-authored-by: James Lamb <[email protected]> * Merge main branch commit updates (#1) * [docs] Add alt text to image in Parameters-Tuning.rst (#4035) * [docs] Add alt text to image in Parameters-Tuning.rst Add alt text to Leaf-wise growth image, as part of #4028 * Update docs/Parameters-Tuning.rst Co-authored-by: James Lamb <[email protected]> Co-authored-by: James Lamb <[email protected]> * [ci] [R-package] upgrade to R 4.0.4 in CI (#4042) * [docs] update description of deterministic parameter (#4027) * update description of deterministic parameter to require using with force_row_wise or force_col_wise * Update include/LightGBM/config.h Co-authored-by: Nikita Titov <[email protected]> * update docs Co-authored-by: Nikita Titov <[email protected]> * [dask] Include support for init_score (#3950) * include support for init_score * use dataframe from init_score and test difference with and without init_score in local model * revert refactoring * initial docs. test between distributed models with and without init_score * remove ranker from tests * test value for root node and change docs * comma * re-include parametrize * fix incorrect merge * use single init_score and the booster_ attribute * use np.float64 instead of float * [ci] ignore untitle Jupyter notebooks in .gitignore (#4047) * [ci] prevent getting incompatible dask and distributed versions (#4054) * [ci] prevent getting incompatible dask and distributed versions * Update .ci/test.sh Co-authored-by: Nikita Titov <[email protected]> * empty commit Co-authored-by: Nikita Titov <[email protected]> * [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055) * [ci] fix R CMD CHECK note about example timings (fixes #4049) * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * empty commit Co-authored-by: Nikita Titov <[email protected]> * [ci] add CMake + R 3.6 test back (fixes #3469) (#4053) * [ci] add CMake + R 3.6 test back (fixes #3469) * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * Update .ci/test_r_package_windows.ps1 * -Wait and remove rtools40 * empty commit Co-authored-by: Nikita Titov <[email protected]> * [dask] include multiclass-classification task in tests (#4048) * include multiclass-classification task and task_to_model_factory dicts * define centers coordinates. flatten init_scores within each partition for multiclass-classification * include issue comment and fix linting error * Update index.rst (#4029) Add alt text to logo image Co-authored-by: James Lamb <[email protected]> * [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059) * [dask] raise more informative error for duplicates in 'machines' * uncomment * avoid test failure * Revert "avoid test failure" This reverts commit 9442bdf. * [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030) * [dask] add tutorial documentation (fixes #3814, fixes #3838) * add notes on saving the model * quick start examples * add examples * fix timeouts in examples * remove notebook * fill out prediction section * table of contents * add line back * linting * isort * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * move examples under python-guide * remove unused pickle import Co-authored-by: Nikita Titov <[email protected]> * set 'pending' commit status for R Solaris optional workflow (#4061) * [docs] add Yu Shi to repo maintainers (#4060) * Update FAQ.rst * Update CODEOWNERS * set is_linear_ to false when it is absent from the model file (fix #3778) (#4056) * Add CMake option to enable sanitizers and build gtest (#3555) * Add CMake option to enable sanitizer * Set up gtest * Address reviewer's feedback * Address reviewer's feedback * Update CMakeLists.txt Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: Nikita Titov <[email protected]> * added type hint (#4070) * [ci] run Dask examples on CI (#4064) * Update Parallel-Learning-Guide.rst * Update test.sh * fix path * address review comments * [python-package] add type hints on Booster.set_network() (#4068) * [python-package] add type hints on Booster.set_network() * change behavior * [python-package] Some mypy fixes (#3916) * Some mypy fixes * address James' comments * Re-introduce pass in empty classes * Update compat.py Remove extra lines * [dask] [ci] fix flaky network-setup test (#4071) * [tests][dask] simplify code in Dask tests (#4075) * simplify Dask tests code * enable CI * disable CI * Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076) This reverts commit 4e9c976. * Fix parsing of non-finite values (#3942) * Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix parsing of non-finite values. Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match. No attempt to optimise string allocations in this implementation since it is usually rarely invoked. * Dummy commit to trigger CI * Also handle -nan in double parsing method * Update include/LightGBM/utils/common.h Remove trailing whitespace to pass linting tests Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: matthew-peacock <[email protected]> Co-authored-by: Guolin Ke <[email protected]> Co-authored-by: Nikita Titov <[email protected]> * [dask] remove unused imports from typing (#4079) * Range check for DCG position discount lookup (#4069) * Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data. * Change debug logging location so that we can print the data file name as well. * Revert "Change debug logging location so that we can print the data file name as well." This reverts commit 3981b34. * Add data file name to debug logging. * Move log line to a place where it is output even when query IDs are read from a separate file. * Also add the out-of-range check to rank metrics. * Perform check after number of queries is initialized. * Update * [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084) * [ci] install additional LaTeX packages in R CI jobs * update autoconf version * bump upper limit on package size to 100 * [SWIG] Add streaming data support + cpp tests (#3997) * [feature] Add ChunkedArray to SWIG * Add ChunkedArray * Add ChunkedArray_API_extensions.i * Add SWIG class wrappers * Address some review comments * Fix linting issues * Move test to tests/test_ChunkedArray_manually.cpp * Add test note * Move ChunkedArray to include/LightGBM/utils/ * Declare more explicit types of ChunkedArray in the SWIG API. * Port ChunkedArray tests to googletest * Please C++ linter * Address StrikerRUS' review comments * Update SWIG doc & disable ChunkedArray<int64_t> * Use CHECK_EQ instead of assert * Change include order (linting) * Rename ChunkedArray -> chunked_array files * Change header guards * Address last comments from StrikerRUS * store all CMake files in one place (#4087) * v3.2.0 release (#3872) * Update VERSION.txt * update appveyor.yml and configure * fix Appveyor builds Co-authored-by: James Lamb <[email protected]> Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: StrikerRUS <[email protected]> * [ci] Bump version for development (#4094) * Update .appveyor.yml * Update cran-comments.md * Update VERSION.txt * update configure Co-authored-by: James Lamb <[email protected]> * [ci] fix flaky Azure Pipelines jobs (#4095) * Update test.sh * Update setup.sh * Update .vsts-ci.yml * Update test.sh * Update setup.sh * Update .vsts-ci.yml * Update setup.sh * Update setup.sh Co-authored-by: Subham Agrawal <[email protected]> Co-authored-by: James Lamb <[email protected]> Co-authored-by: shiyu1994 <[email protected]> Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: jmoralez <[email protected]> Co-authored-by: marcelonieva7 <[email protected]> Co-authored-by: Philip Hyunsu Cho <[email protected]> Co-authored-by: Deddy Jobson <[email protected]> Co-authored-by: Alberto Ferreira <[email protected]> Co-authored-by: mjmckp <[email protected]> Co-authored-by: matthew-peacock <[email protected]> Co-authored-by: Guolin Ke <[email protected]> Co-authored-by: ashok-ponnuswami-msft <[email protected]> Co-authored-by: StrikerRUS <[email protected]> * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: James Lamb <[email protected]> Co-authored-by: Subham Agrawal <[email protected]> Co-authored-by: shiyu1994 <[email protected]> Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: jmoralez <[email protected]> Co-authored-by: marcelonieva7 <[email protected]> Co-authored-by: Philip Hyunsu Cho <[email protected]> Co-authored-by: Deddy Jobson <[email protected]> Co-authored-by: Alberto Ferreira <[email protected]> Co-authored-by: mjmckp <[email protected]> Co-authored-by: matthew-peacock <[email protected]> Co-authored-by: Guolin Ke <[email protected]> Co-authored-by: ashok-ponnuswami-msft <[email protected]> Co-authored-by: StrikerRUS <[email protected]>

github-actions · 2023-08-23T23:58:35Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

StrikerRUS added 2 commits March 12, 2021 03:54

Update Parallel-Learning-Guide.rst

eccd807

Update test.sh

841990c

StrikerRUS added the maintenance label Mar 12, 2021

StrikerRUS force-pushed the dask_examples branch from 385ae88 to 4d1b750 Compare March 12, 2021 01:39

fix path

cf324d2

StrikerRUS force-pushed the dask_examples branch from 4d1b750 to cf324d2 Compare March 12, 2021 01:43

StrikerRUS marked this pull request as ready for review March 12, 2021 02:08

StrikerRUS requested review from guolinke, jameslamb and Laurae2 as code owners March 12, 2021 02:08

jameslamb requested changes Mar 12, 2021

View reviewed changes

StrikerRUS added 2 commits March 14, 2021 19:18

Merge branch 'master' into dask_examples

ac96c25

address review comments

b2c1685

StrikerRUS requested review from chivee, henry0312, shiyu1994 and wxchan as code owners March 14, 2021 16:35

jameslamb reviewed Mar 14, 2021

View reviewed changes

jameslamb approved these changes Mar 14, 2021

View reviewed changes

jameslamb merged commit b044070 into master Mar 15, 2021

jameslamb deleted the dask_examples branch March 15, 2021 02:58

StrikerRUS mentioned this pull request Mar 26, 2021

[Dask] Socket recv error, code: 54 on macOS #4116

Closed

jameslamb mentioned this pull request Nov 8, 2021

[R-package] parallelize compilation in CMake-based builds #4525

Merged

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] run Dask examples on CI #4064

[ci] run Dask examples on CI #4064

StrikerRUS commented Mar 12, 2021 •

edited

Loading

StrikerRUS commented Mar 12, 2021

jameslamb left a comment

jameslamb Mar 12, 2021

StrikerRUS Mar 14, 2021 •

edited

Loading

jameslamb Mar 14, 2021

StrikerRUS Mar 14, 2021

jameslamb Mar 12, 2021

StrikerRUS Mar 14, 2021

jameslamb Mar 14, 2021

StrikerRUS Mar 14, 2021

jameslamb Mar 14, 2021

StrikerRUS Mar 14, 2021

jameslamb Mar 14, 2021

jameslamb left a comment

github-actions bot commented Aug 23, 2023

[ci] run Dask examples on CI #4064

[ci] run Dask examples on CI #4064

Conversation

StrikerRUS commented Mar 12, 2021 • edited Loading

StrikerRUS commented Mar 12, 2021

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS Mar 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

StrikerRUS commented Mar 12, 2021 •

edited

Loading

StrikerRUS Mar 14, 2021 •

edited

Loading