Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.2.0 release #3872

Merged
merged 9 commits into from
Mar 22, 2021
Merged

v3.2.0 release #3872

merged 9 commits into from
Mar 22, 2021

Conversation

guolinke
Copy link
Collaborator

This is a stable release.
The next release may is 4.0.0, for many breaking changes. And it may take a long time to finish.
So it better to fix the critical bugs in 3.2.0.

Please list the PR need to be merged in this release.

@jameslamb
Copy link
Collaborator

jameslamb commented Jan 28, 2021

Please list the PR need to be merged in this release.

Dask

The Dask interface doesn't need to be perfect for 3.2.0 and it's ok for it to have only limited functionality, but there are some Dask items I'd still like to get done before releasing 3.2.0. Since there has never been a release with Dask in it, I want to put in enough thought and work that we have some confidence that breaking changes won't be necessary in the future.

This is the minimal list of issues I'd like to be done before 3.2.0, to feel confident in that release:

I'm able to spend some time at work working on this, but I still expect that it would take 1-3 weeks for these to get done. It's only me and @StrikerRUS working on the Dask package and reviewing each others' PRs (with occasional contributions from outside contributors). Is that ok, or do you want to get 3.2.0 out sooner?

R package

There are no features or bugs for the R package that I think absolutely need to be done before a 3.2.0 release.

@guolinke
Copy link
Collaborator Author

@jameslamb no hurry, we can take the time.
@shiyu1994 I think we can setup a dev branch for 4.0.0, and maybe some PRs could merge to dev branch first. then rebase to master after 3.2.0 release.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 28, 2021

Please list the PR need to be merged in this release.

I'm making items that I think definitely should be fixed in 3.2.0 bold.

Core

Java

R-package

GPU

CUDA

Important PRs submitted later than creating this list but nothing prevents them to be included in the release

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 28, 2021

Release checklist:

  • Update VERSION.txt number.
  • Update version in Appveyor config file.
  • Update version in configure file of R-package: /gha run r-configure.
  • Run R valgrind checks after all PRs are merged: /gha run r-valgrind.
  • Run R Solaris checks after all PRs are merged: /gha run r-solaris.
  • Push commit with v* tag to trigger GitHubRelease action at Azure Pipelines.
  • Convert automatic GitHub release from Draft to normal one.
  • Update stable tag at GitHub.
  • Upload release to PyPI.
  • Upload release to CRAN.
  • Update cran-comments.md when new release is accepted on CRAN.
  • Upload release to NuGet.
  • Update version and hash in Homebrew formula.

@shiyu1994
Copy link
Collaborator

@jameslamb no hurry, we can take the time.
@shiyu1994 I think we can setup a dev branch for 4.0.0, and maybe some PRs could merge to dev branch first. then rebase to master after 3.2.0 release.

Ok. Maybe 4.0.0 can include category encoding, symmetric trees, and the on-going efficiency optimizations.

@guolinke
Copy link
Collaborator Author

guolinke commented Feb 6, 2021

@shiyu1994 it seems this release still need some time. You can create a dev branch, and merge your PRs to that branch first.

@jameslamb
Copy link
Collaborator

jameslamb commented Feb 7, 2021

I'm sorry, but I just found another serious issue in the Dask package: #3918. It's my next priority, and I think it needs to be fixed before the release.

EDITt: False alarm, I think some of what I saw was just a result of using very very small data in tests. #3918 (comment)

@StrikerRUS
Copy link
Collaborator

@jameslamb May I ask you a question? Why do you think that documentation PRs should block new release? All our docs are available online and can be updated independently from source code that is being uploaded to differente package managers with each release. We don't have any "snapshots" for docs...

@jameslamb
Copy link
Collaborator

@jameslamb May I ask you a question? Why do you think that documentation PRs should block new release?

This period right now, before the release, is the only chance we'll ever have to freely break the API of lightgbm.dask with minimal user impact. Once that module is part of a formal release, we have to be much more hesitant about breaking changes.

The particular issues I've linked (#3814, #3838) are about high-level documentation of the interface. We might find some inconsistencies in the Dask API during creation and review on those PRs, and if those things come up then we'll have the opportunity to change them.

And anyway, there are 4 other non-Dask issues you've said "definitely" should be fixed (#3872 (comment)) before the release, right?

@StrikerRUS
Copy link
Collaborator

@jameslamb

We might find some inconsistencies in the Dask API during creation and review on those PRs

I see, makes sense!

And anyway, there are 4 other non-Dask issues you've said "definitely" should be fixed (#3872 (comment)) before the release, right?

I said "I think" they should, because I cannot say just "they should" given that I'm not able to help fixing them 🙂 . So it's up to who is able to fix them decide whether they will or will not be in 3.2.0.

Anyway, it was just my curiosity why docs should block new release. Now I get it that writing docs may help spot API problems.

@shiyu1994
Copy link
Collaborator

Hi @StrikerRUS and @jameslamb , thank you for the hard work. For the 3 bold core BUG's, #3761 is a numerical problem due to different computing strategy of col-wise and row-wise of LightGBM. And #3679 should have been fixed for the CPU version. We're waiting for response from #3778 for a reproducible example.

@StrikerRUS
Copy link
Collaborator

@shiyu1994

#3761 is a numerical problem due to different computing strategy of col-wise and row-wise of LightGBM

Can we have any workaround for this? If no, then I believe we should document that deterministic=true should be used together with either force_col_wise=true or force_row_wise=true. Otherwise, there is no guarantee of determinism.

@shiyu1994
Copy link
Collaborator

shiyu1994 commented Feb 27, 2021

@StrikerRUS Yes, I agree that we should document this in the description of deterministic parameter. I'll update the document.

@StrikerRUS
Copy link
Collaborator

@shiyu1994

And #3679 should have been fixed for the CPU version.

Then feel free to close that issue. For GPU version we have a separate open issue: #2793.

@jameslamb
Copy link
Collaborator

I just submitted 3.2.0 to CRAN, using the package generated by #3872 (comment).

image

@guolinke you should receive an email.

Please note that the submission is not completed until it is confirmed by the maintainer (via a link in the e-mail), who may need to check the spam folder.

I'll open a PR with the cran-comments.md file updates when we hear back from CRAN, and will upload the CRAN package to the release when we resolve #3872 (comment).

@StrikerRUS
Copy link
Collaborator

Changelog fixed now:

image

With new push to master release drafter will create new untagged draft release (3.2.1, it doesn't respect our VERSION.txt).

@jameslamb
Copy link
Collaborator

great, thanks! I just uploaded lightgbm-3.2.0-r-cran.tar.gz

@StrikerRUS
Copy link
Collaborator

I'll modify attached files with downloaded sources.

Hmm, I can't do that!

These files (Source code (zip) and Source code (tar.gz)) cannot be edited!

image

image

What should we do?..

@StrikerRUS
Copy link
Collaborator

Hmm, "nice"!..

My repo has a submodule, when i create a release, the tar/zip file generated doesn’t include the submodule.

This is not currently possible through GitHub.
https://github.community/t/create-release-that-contains-submodule/1329/2

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 22, 2021

What should we do?..

OK. I've uploaded correct files to the release (LightGBM-3.2.0-complete_source_code_zip.zip and LightGBM-3.2.0-complete_source_code_tar_gz.tar.gz). We should automate it, btw.

image

P.S. Now I know how to create .tar.gz in Windows: https://stackoverflow.com/questions/10773880/how-to-create-tar-gz-archive-file-in-windows#comment59759781_10773880 🤣

@jameslamb
Copy link
Collaborator

thanks for doing that!

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 22, 2021

PyPI release has been uploaded and tested.
Cool that embedded GPU-support is working!

image

@StrikerRUS
Copy link
Collaborator

@guolinke

Oh, I forget that. Would you like to take responsibility for the Nuget package? You can give me your username if yes.

OK, I can do that. Just registered account at NuGet: StrikerRUS

@guolinke
Copy link
Collaborator Author

@StrikerRUS @jameslamb Thank you so much!

@StrikerRUS had added you in Nuget.
image

@jameslamb
Copy link
Collaborator

it looks like {lightgbm} 3.2.0 has been accepted to CRAN!

https://cran.r-project.org/web/packages/lightgbm/index.html

but none of the CRAN checks have run yet (notice they are all 3.1.1)

image

https://cran.r-project.org/web/checks/check_results_lightgbm.html

I'll check back over the next few days and update cran-comments.md.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 23, 2021

I hope I've done NuGet package publishing right. All I've done was clicking "Upload" section and choosing new .nupkg file from attachments of the release.

image

@guolinke Should we fix this warning for next release?

 We found the following issue(s):
The <licenseUrl> element is deprecated. Consider using the <license> element instead. Learn more.
We recommend that you fix these issues and upload a new package. Read more

image

UPD: Looks fine:

image

@lostmygithubaccount
Copy link

does homebrew need to be updated somehow? I'm still seeing 3.1.1:

image

@StrikerRUS
Copy link
Collaborator

@lostmygithubaccount

does homebrew need to be updated somehow?

Yes, refer to #3872 (comment) and Homebrew/homebrew-core#73665.

@StrikerRUS
Copy link
Collaborator

@jameslamb I don't remember r-devel-windows-x86_64-gcc10-UCRT CRAN flavor. Looks like something new, right?

image

image

https://www.r-project.org/nosvn/winutf8/ucrt3/CRAN/checks/gcc10-UCRT/README.txt

Luckily, we are passing checks there! 🙂

@jameslamb
Copy link
Collaborator

Yep! It's experimental.

@StrikerRUS
Copy link
Collaborator

Looks like it is fixing the following our issue:

Now, that test runs on all operating systems, but on Windows we expect the feature names in Booster$dump_model() to be changed.
#3647 (comment)

The main purpose of this toolchain is to allow UTF-8 as the
current native encoding on Windows, which would be a major step in fixing
most encoding problems R users encounter on Windows.

@StrikerRUS
Copy link
Collaborator

@guolinke @jameslamb @ankane
Seems that we have some problems with continuing to distribute LightGBM via Homebrew:
Homebrew/homebrew-core#73665 (comment).

StrikerRUS added a commit that referenced this pull request Mar 25, 2021
* [docs]Add alt text on images

* Update docs/GPU-Windows.rst

Co-authored-by: James Lamb <[email protected]>

* Update docs/GPU-Windows.rst

Co-authored-by: James Lamb <[email protected]>

* Apply suggestions from code review

Co-authored-by: James Lamb <[email protected]>

* Apply suggestions from code review

Co-authored-by: James Lamb <[email protected]>

* Merge main branch commit updates (#1)

* [docs] Add alt text to image in Parameters-Tuning.rst (#4035)

* [docs] Add alt text to image in Parameters-Tuning.rst

Add alt text to Leaf-wise growth image, as part of #4028

* Update docs/Parameters-Tuning.rst

Co-authored-by: James Lamb <[email protected]>

Co-authored-by: James Lamb <[email protected]>

* [ci] [R-package] upgrade to R 4.0.4 in CI (#4042)

* [docs] update description of deterministic parameter (#4027)

* update description of deterministic parameter to require using with force_row_wise or force_col_wise

* Update include/LightGBM/config.h

Co-authored-by: Nikita Titov <[email protected]>

* update docs

Co-authored-by: Nikita Titov <[email protected]>

* [dask] Include support for init_score (#3950)

* include support for init_score

* use dataframe from init_score and test difference with and without init_score in local model

* revert refactoring

* initial docs. test between distributed models with and without init_score

* remove ranker from tests

* test value for root node and change docs

* comma

* re-include parametrize

* fix incorrect merge

* use single init_score and the booster_ attribute

* use np.float64 instead of float

* [ci] ignore untitle Jupyter notebooks in .gitignore (#4047)

* [ci] prevent getting incompatible dask and distributed versions (#4054)

* [ci] prevent getting incompatible dask and distributed versions

* Update .ci/test.sh

Co-authored-by: Nikita Titov <[email protected]>

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055)

* [ci] fix R CMD CHECK note about example timings (fixes #4049)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [ci] add CMake + R 3.6 test back (fixes #3469) (#4053)

* [ci] add CMake + R 3.6 test back (fixes #3469)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* Update .ci/test_r_package_windows.ps1

* -Wait and remove rtools40

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [dask] include multiclass-classification task in tests (#4048)

* include multiclass-classification task and task_to_model_factory dicts

* define centers coordinates. flatten init_scores within each partition for multiclass-classification

* include issue comment and fix linting error

* Update index.rst (#4029)

Add alt text to logo image

Co-authored-by: James Lamb <[email protected]>

* [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059)

* [dask] raise more informative error for duplicates in 'machines'

* uncomment

* avoid test failure

* Revert "avoid test failure"

This reverts commit 9442bdf.

* [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030)

* [dask] add tutorial documentation (fixes #3814, fixes #3838)

* add notes on saving the model

* quick start examples

* add examples

* fix timeouts in examples

* remove notebook

* fill out prediction section

* table of contents

* add line back

* linting

* isort

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* move examples under python-guide

* remove unused pickle import

Co-authored-by: Nikita Titov <[email protected]>

* set 'pending' commit status for R Solaris optional workflow (#4061)

* [docs] add Yu Shi to repo maintainers (#4060)

* Update FAQ.rst

* Update CODEOWNERS

* set is_linear_ to false when it is absent from the model file (fix #3778) (#4056)

* Add CMake option to enable sanitizers and build gtest (#3555)

* Add CMake option to enable sanitizer

* Set up gtest

* Address reviewer's feedback

* Address reviewer's feedback

* Update CMakeLists.txt

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: Nikita Titov <[email protected]>

* added type hint (#4070)

* [ci] run Dask examples on CI (#4064)

* Update Parallel-Learning-Guide.rst

* Update test.sh

* fix path

* address review comments

* [python-package] add type hints on Booster.set_network() (#4068)

* [python-package] add type hints on Booster.set_network()

* change behavior

* [python-package] Some mypy fixes (#3916)

* Some mypy fixes

* address James' comments

* Re-introduce pass in empty classes

* Update compat.py

Remove extra lines

* [dask] [ci] fix flaky network-setup test (#4071)

* [tests][dask] simplify code in Dask tests (#4075)

* simplify Dask tests code

* enable CI

* disable CI

* Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076)

This reverts commit 4e9c976.

* Fix parsing of non-finite values (#3942)

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

* remove duplicate functions from merge

* Fix parsing of non-finite values.  Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match.  No attempt to optimise string allocations in this implementation since it is usually rarely invoked.

* Dummy commit to trigger CI

* Also handle -nan in double parsing method

* Update include/LightGBM/utils/common.h

Remove trailing whitespace to pass linting tests

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>

* [dask] remove unused imports from typing (#4079)

* Range check for DCG position discount lookup (#4069)

* Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data.

* Change debug logging location so that we can print the data file name as well.

* Revert "Change debug logging location so that we can print the data file name as well."

This reverts commit 3981b34.

* Add data file name to debug logging.

* Move log line to a place where it is output even when query IDs are read from a separate file.

* Also add the out-of-range check to rank metrics.

* Perform check after number of queries is initialized.

* Update

* [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084)

* [ci] install additional LaTeX packages in R CI jobs

* update autoconf version

* bump upper limit on package size to 100

* [SWIG] Add streaming data support + cpp tests (#3997)

* [feature] Add ChunkedArray to SWIG

* Add ChunkedArray
* Add ChunkedArray_API_extensions.i
* Add SWIG class wrappers

* Address some review comments

* Fix linting issues

* Move test to tests/test_ChunkedArray_manually.cpp

* Add test note

* Move ChunkedArray to include/LightGBM/utils/

* Declare more explicit types of ChunkedArray in the SWIG API.

* Port ChunkedArray tests to googletest

* Please C++ linter

* Address StrikerRUS' review comments

* Update SWIG doc & disable ChunkedArray<int64_t>

* Use CHECK_EQ instead of assert

* Change include order (linting)

* Rename ChunkedArray -> chunked_array files

* Change header guards

* Address last comments from StrikerRUS

* store all CMake files in one place (#4087)

* v3.2.0 release (#3872)

* Update VERSION.txt

* update appveyor.yml and configure

* fix Appveyor builds

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>

* [ci] Bump version for development (#4094)

* Update .appveyor.yml

* Update cran-comments.md

* Update VERSION.txt

* update configure

Co-authored-by: James Lamb <[email protected]>

* [ci] fix flaky Azure Pipelines jobs (#4095)

* Update test.sh

* Update setup.sh

* Update .vsts-ci.yml

* Update test.sh

* Update setup.sh

* Update .vsts-ci.yml

* Update setup.sh

* Update setup.sh

Co-authored-by: Subham Agrawal <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: shiyu1994 <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: jmoralez <[email protected]>
Co-authored-by: marcelonieva7 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Deddy Jobson <[email protected]>
Co-authored-by: Alberto Ferreira <[email protected]>
Co-authored-by: mjmckp <[email protected]>
Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: ashok-ponnuswami-msft <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Subham Agrawal <[email protected]>
Co-authored-by: shiyu1994 <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: jmoralez <[email protected]>
Co-authored-by: marcelonieva7 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Deddy Jobson <[email protected]>
Co-authored-by: Alberto Ferreira <[email protected]>
Co-authored-by: mjmckp <[email protected]>
Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: ashok-ponnuswami-msft <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>
@ankane
Copy link
Contributor

ankane commented Mar 25, 2021

Surprised it's an issue for header-only libraries. One approach I've seen other projects take is have a CMake flag to use system dependencies (for instance, LibTorch has -DUSE_SYSTEM_LIBS - https://github.com/pytorch/pytorch/blob/90bbe0b38bf5248a567bf4f8b7525612e06b641c/CMakeLists.txt#L343).

@StrikerRUS
Copy link
Collaborator

They just updated the formula: Homebrew/homebrew-core#73665 (review).

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants