Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] version changes + more frequent releases #3210

Closed
jameslamb opened this issue Jul 7, 2020 · 17 comments
Closed

[RFC] version changes + more frequent releases #3210

jameslamb opened this issue Jul 7, 2020 · 17 comments
Labels

Comments

@jameslamb
Copy link
Collaborator

I'd like to open this request for comment to discuss a proposal.

After releasing v 3.0.0 (#3071 ), I'd like to propose that we use 4-part version numbers for language wrappers, broken down like this:

image

So, for example, if you see version 3.1.0.8 of the R package, that means "the 8th released version of the R package which wraps LightGBM version 3.1.0".

Example

The examples below don't propose that every new merge to master becomes a release, but the changes below are examples used to show what might cause different components of a 4-part version number to change.

Event 1: 3.0.0 is release

  • LightGBM version set to 3.0.0
  • lightgbm (Python) 3.0.0.0 released to PyPi
  • {lightgbm} (R) 3.0.0.0 released to CRAN
  • LightGBM (lib for .NET extensions) 3.0.0.0 released to NuGet

Event 2: bug fix to LightGBM, like fixing #3209

  • LightGBM version set to 3.0.1
  • lightgbm (Python) 3.0.1.0 released to PyPi
  • {lightgbm} (R) 3.0.1.0 released to CRAN
  • LightGBM (lib for .NET extensions) 3.0.1.0 released to NuGet

Event 3: bug fix in {lightgbm} (R), like #3117

  • {lightgbm} (R) 3.0.1.1 released to CRAN

Event 4: LightGBM adds a new type of boosting, like #2644

  • LightGBM version set to 3.1.0
  • lightgbm (Python) 3.1.0.0 released to PyPi
  • {lightgbm} (R) 3.1.0.0 released to CRAN
  • LightGBM (lib for .NET extensions) 3.1.0.0 released to NuGet

How this makes LightGBM better

This approach would allow us to release fixes to individual components of LightGBM more frequently.

This would allow us to avoid the current situation, where the PyPi package (for example), has not had an update in 7 months: https://pypi.org/project/lightgbm/#history. More frequent updates allow our users to rely on package managers more, instead of building from GitHub, which I think is a better user experience.

Releasing more frequently would also reduce the gap between the current state of this repo and the documentation at https://lightgbm.readthedocs.io/en/latest/, so that that documentation is more likely to answer a user's questions accurately.

Allowing the version numbers to be different between R and Python (for example), is important since this two libraries are at very different stages in their development. The R package is still somewhat immature and there is a lot of work ahead for it, while the Python package is fairly mature and stable by comparison. A 4-part version number would allow the R package to be more frequently updated than the Python package, while preserving the use of the first three version components for LightGBM itself..

@guolinke
Copy link
Collaborator

guolinke commented Jul 7, 2020

will this conflict with semantic versioning? https://semver.org/

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jul 7, 2020

Good suggestion!
If I'm not mistaken something similar @imatiach-msft uses for JAVA binding in MMLSpark: #3041 (comment).

However, I vote for the consistent version number across all official LightGBM components. 4-part version numbers will greatly increase the maintenance burden. Also, it will be very hard to make separate changelogs across all components, because you will need to list all commits multiple times and keep track of them per component.

This approach would allow us to release fixes to individual components of LightGBM more frequently.

I'm not sure that we are able to do that due to the lack of time and other resources. Instead, I suggest to get back and try to stick to bi-monthly releases. I believe, it'll be enough for the most of our users.

@imatiach-msft
Copy link
Contributor

@StrikerRUS yes, I do almost exactly this, except instead of a fourth version I extend the third version, eg 2.3.150 corresponds to 2.3.1. I like this proposed versioning schema and can migrate to it for the JAVA wrapper, I'm open to any new ideas. I can't really keep to 2.3.1 because the JAVA releases are separate and I sometimes have blocking issues that span both (JAVA JNI + native jar) and mmlspark Scala wrapper code - and waiting for the next official LightGBM release to create the jar would be an extra burden, especially since MMLSpark is not as stable as LightGBM and often users hit new blocking issues. I kept it to 3 versions ..* when I originally released because that seems to be the standard way for semantic versioning.

@mirekphd
Copy link

mirekphd commented Aug 6, 2020

will this conflict with semantic versioning? https://semver.org/

Even if it is permissible under all world conventions, it is rare enough that most CI CD systems have not been tested for it. One needs a sizeable collection of packages installed in their environment to encounter first case of this kind. We do, so I can confirm that the incidence of 4-part tokens among python packages used for data science and machine learning is around 1.5%. Three of these packages are even very well known (at least in the ML community).

Here's the list of such packages (among 850 we have installed in our largest container heavily influenced by Kaggle Kernels):

dill
ephem                     
gettext                   
h2o                       
lime                      
mkl-random
msgpack-numpy
opencv-python 
pkginfo                   
ppft                      
pystan                    
singledispatch            
typing            

@mirekphd
Copy link

mirekphd commented Aug 6, 2020

Can we please try to separate the red herring of 4-part versions with the urgent bug of no releases having been made for 8 months, which was raised e.g. in #3274?

@guolinke
Copy link
Collaborator

guolinke commented Aug 6, 2020

@mirekphd
I think the delay of the current release is due to many new changes in the 3.0 version.
3.0 provides about 2x speed-up in CPU, and many new (breaking) features. There are still some on-going works, so we will release a pre-release now, and continued to work on the rest items.
It is not the usual case, normally, we will release by monthly or bi-monthly.

BTW, currently, the release process is manually. It will be better if we can fully automate it, so that we can have a more frequent release.

@mirekphd
Copy link

mirekphd commented Aug 7, 2020

3.0 provides about 2x speed-up in CPU, and many new (breaking) features. There are still some on-going works, so we will release a pre-release now, and continued to work on the rest items.

Excellent news! I did not know that such large improvements were still possible! It means that in v3.0.0 CPU training will most likely overtake GPU training...:) the difference in favor of GPU is so small even for huge datasets and under the new CUDA implementation, as we saw in #3160

By the way, I happen to know that there is still a room for substantial improvement in your CPU implementation for a very frequent use case, but now I will wait for your 3.0.0 release to see if my ideas will still work in that version too before making them public.

@guolinke
Copy link
Collaborator

guolinke commented Aug 7, 2020

@mirekphd
the remaining works of 3.0 are the more new features, the CPU efficiency part is almost done.
you can have a try, we just released 3.0.0rc1 .

@AlbertoEAF
Copy link
Contributor

Hello, just to be sure, are we migrating to the 4-part versioning or no? We're already at 3.0.0.99 after all.

But yes, having more releases would be nice, maybe it would be a good time to launch a new one :)

Should we close this issue?

@StrikerRUS
Copy link
Collaborator

@AlbertoEAF

We're already at 3.0.0.99 after all.

I believe that current 4-part versioning has a bit different semantics. #3344 (comment)

maybe it would be a good time to launch a new one :)

Already is in progress: #3484! 🙂

@StrikerRUS
Copy link
Collaborator

@jameslamb I think we can close this. Seems this maintenance burden doesn't worth it. One synced release for all components is better I believe. WDYT?

@jameslamb
Copy link
Collaborator Author

Seems I was outvoted on this, yes.

@jameslamb jameslamb changed the title RFC: version changes + more frequent releases [RFC] version changes + more frequent releases Jan 3, 2022
@mirekphd
Copy link

mirekphd commented Jul 10, 2022

The question of infrequent releases has returned. Currently there has been no new tag or release added for half a year.

If manual tagging is an excessive burden, then maybe adding automated daily tags instead of the patch version, e.g. in the format:

<major>.<minor>.YYYYMMDD

would work for you? This can be probably easily automated, of course at the cost of violating some semantic versioning rules, e.g. running a risk of introducing breaking changes without proper warning to the users (via an increase in the major version).

For such auto-tagged releases no release notes are expected either, so it's enough to ensure that code committed to the master branch gets covered by build tests before it gets auto-tagged and auto-released.

Of course you can make the auto-release frequency lower than daily (e.g. monthly) and use version increments, but then users would start expecting some release notes.

@jameslamb it seems this is still an unresolved issue - why not reopen it here or create a new one to address the low update frequency part?

@jameslamb
Copy link
Collaborator Author

Thanks @mirekphd . I promise, I understand the frustration with how long this project has gone without a release. I've described some of that pain in #5153.

Operational concerns like "manual tagging ... burden" are not the main reasons LightGBM has gone so long without a new release.

Some projects started 18+ months ago (e.g. #3234, items under "CUDA" at #5153) promised to introduce significant breaking changes on master, so many other breaking changes have accumulated on master in anticipation of a 4.0.0 release that would include them. Until those projects are merged and in a releasable state (or until something significant changes about the direction of the project), there won't be a new release.

cc @shiyu1994 @StrikerRUS @jmoralez @guolinke if you want to add anything else

@guolinke
Copy link
Collaborator

@jameslamb we can focus on "breaking" changes first, and make the next release faster.

@jameslamb
Copy link
Collaborator Author

That would be great. I really hope we can do a release soon.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants