Pr4 advanced method monotone constraints #3264

CharlesAuguste · 2020-07-29T16:40:12Z

@guolinke @jameslamb @StrikerRUS
Following PRs #2305, #2717, #2770, and #2939 here is the last PR of the series. #2305 was judged to be not merge-able because it was too big. Therefore, in my ultimate comment I said I would split it into smaller PRs easier to merge. #2717 was some refactoring, #2770 was the intermediate method, #2939 was the monotone penalization, and this PR is about the advanced method.

As a reminder, the basic method over-constrains leaves, and creates "unused space" between them. The intermediate method is smarter in updating the constraints and creates less "space" between the leaves, but it doesn't take into account that depending on thresholds, a leaf on the right and on the left of a monotone splits may not have to constrain each other. This advanced method takes it into account, and recomputes leaves constraints from the beginning when needed. This leads in a more expressive model, with a meaningfully better training loss. More details in the original report https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf. Feel free to ask for more details.

About the code, most changes happen in the monotone_constraints.hpp file, where the new AdvancedLeafConstraints has been added. Tests have been updated to include this new constraining method.

2 things in the original PR that are missing in here:

The code was more optimized for time in the original PR, and the method is a bit slower here. Though the optimizations made the code significantly more complicated, and I think the time saved was not much. I can implement the optimization in another PR later, as I think this PR is complicated enough as is.
In the original PR, once a tree is build, there was a final leaves refitting, that I have not implemented here yet. I can either implement it here or in another PR later. This can slightly improve performance.

I am looking forward to your review. Thanks.

jameslamb · 2020-08-10T01:31:59Z

Thanks @CharlesAuguste , really really appreciate all of your efforts!

I apologize for the inconvenience, but can you please merge master into this branch? We just merged some fixes that will address some of the failing CI jobs.

CharlesAuguste · 2020-08-10T11:10:15Z

Thanks for the advice @jameslamb! I rebased on master and it did fix some issues. I still have 3 failing tasks, which I am not sure about. Can you let me know what you think about these? Thanks!

StrikerRUS · 2020-08-10T13:03:34Z

@CharlesAuguste

I still have 3 failing tasks, which I am not sure about.

All tests are failing due to new NOTE from R-package tests on macOS.

* checking installed package size ... NOTE
  installed size is  5.1Mb
  sub-directories of 1Mb or more:
    libs   4.4Mb

@jameslamb I guess we can't do anything with this except changing the condition here, right?

LightGBM/.ci/test_r_package.sh

Lines 202 to 206 in 7be57d7

    
           if [[ $OS_NAME == "linux" ]] && [[ $R_BUILD_TYPE == "cran" ]]; then 
        
               ALLOWED_CHECK_NOTES=2 
        
           else 
        
               ALLOWED_CHECK_NOTES=1 
        
           fi

jameslamb · 2020-08-10T18:16:13Z

@CharlesAuguste

I still have 3 failing tasks, which I am not sure about.

All tests are failing due to new NOTE from R-package tests on macOS.
* checking installed package size ... NOTE
  installed size is  5.1Mb
  sub-directories of 1Mb or more:
    libs   4.4Mb
@jameslamb I guess we can't do anything with this except changing the condition here, right?

LightGBM/.ci/test_r_package.sh

Lines 202 to 206 in 7be57d7

if [[ $OS_NAME == "linux" ]] && [[ $R_BUILD_TYPE == "cran" ]]; then

ALLOWED_CHECK_NOTES=2

else

ALLOWED_CHECK_NOTES=1

fi

uuuuugh that "install-size" test from R CMD CHECK is so frustrating. It's so hard to predict.

Yes, @CharlesAuguste could you please increment both limits in the code @StrikerRUS pointed to?

if [[ $OS_NAME == "linux" ]] && [[ $R_BUILD_TYPE == "cran" ]]; then
    ALLOWED_CHECK_NOTES=3
else
    ALLOWED_CHECK_NOTES=2
fi

I don't want to hold your PR up. Apologies for the inconvenience.

StrikerRUS · 2020-08-10T18:27:09Z

@jameslamb If I'm not mistaken that error is in ALLOWED_CHECK_NOTES=2 case already. Please note that failing tests are run on macOS.

So it can be simply

ALLOWED_CHECK_NOTES=2

whithout any ifs: one for submission date and another one for libs size.

Refer to #3188 (comment).

jameslamb · 2020-08-10T18:31:01Z

great, even better. I'm ok with that.

CharlesAuguste · 2020-08-11T14:51:45Z

@jameslamb @StrikerRUS that worked thanks! The check-doc task in Travis is still failing, and doesn't seem related to the PR though. Otherwise, I think the PR should be ready for a review. Let me know what you think! Thanks.

jameslamb

the R CI change looks fine to me, sorry about that!

I'm not qualified to ✅ / ❌ this PR so I'll leave it to other reviewers to comment on the implementation. Thanks for adding this!

jameslamb · 2020-08-11T14:56:58Z

.ci/test_r_package.sh

-else
-    ALLOWED_CHECK_NOTES=1
-fi
+ALLOWED_CHECK_NOTES=2


perfect, thank you

guolinke · 2020-08-11T23:55:46Z

@CharlesAuguste is this PR ready to review?

aldanor · 2020-08-12T00:24:41Z

@CharlesAuguste is this PR ready to review?

@guolinke We've reviewed it internally, I think it should be ready, yes.

guolinke · 2020-08-12T00:39:52Z

src/treelearner/feature_histogram.hpp

@@ -763,7 +764,7 @@ class FeatureHistogram {
  template <bool USE_MC, bool USE_L1, bool USE_MAX_OUTPUT, bool USE_SMOOTHING>
  static double CalculateSplittedLeafOutput(
      double sum_gradients, double sum_hessians, double l1, double l2,
-      double max_delta_step, const ConstraintEntry& constraints,
+      double max_delta_step, const BasicConstraint constraints,


const T& ?

guolinke · 2020-08-12T00:41:27Z

src/treelearner/feature_histogram.hpp

@@ -288,6 +288,7 @@ class FeatureHistogram {
    double best_sum_left_gradient = 0;
    double best_sum_left_hessian = 0;
    double gain_shift;
+    constraints->InitCumulativeConstraints(true);


I think this is only enable when USE_MC is true?

guolinke · 2020-08-12T00:43:35Z

src/treelearner/feature_histogram.hpp

+
+        if (constraint_update_necessary) {
+          constraints->Update(t + offset);
+        }


please use USE_MC for efficiency.

guolinke · 2020-08-12T00:53:04Z

src/treelearner/feature_histogram.hpp

  template <bool USE_RAND, bool USE_MC, bool USE_L1, bool USE_MAX_OUTPUT, bool USE_SMOOTHING,
            bool REVERSE, bool SKIP_DEFAULT_BIN, bool NA_AS_MISSING>
  void FindBestThresholdSequentially(double sum_gradient, double sum_hessian,
                                     data_size_t num_data,
-                                     const ConstraintEntry& constraints,
+                                     FeatureConstraint* constraints,


don't we really need to change this to an editable object?
I quick through the code, it seems the CumulativeFeatureConstraint is reset in each split.
A simple solution is to make CumulativeFeatureConstraint mutable.

Very good advice!

guolinke · 2020-08-12T00:55:31Z

@CharlesAuguste
can you leverage USE_MC in split finding?
This template argument is to improve the speed when user don't enable MC.

CharlesAuguste · 2020-08-23T13:50:48Z

@CharlesAuguste
can you leverage USE_MC in split finding?
This template argument is to improve the speed when user don't enable MC.

Should be better now, let me know what you think. Thanks

CharlesAuguste · 2020-08-23T22:19:02Z

@guolinke I also fixed the constraints not being const in feature_histogram.hpp. Let me know what else I can improve in this PR.

StrikerRUS · 2020-08-24T17:13:08Z

@CharlesAuguste Could you please add advanced method with small description here?

LightGBM/include/LightGBM/config.h

Lines 444 to 451 in 9503d3f

    
           // type = enum 
        
           // alias = monotone_constraining_method, mc_method 
        
           // options = basic, intermediate 
        
           // desc = used only if ``monotone_constraints`` is set 
        
           // desc = monotone constraints method 
        
           // descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions 
        
           // descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results 
        
           std::string monotone_constraints_method = "basic";

CharlesAuguste · 2020-08-24T20:20:04Z

@CharlesAuguste Could you please add advanced method with small description here?

LightGBM/include/LightGBM/config.h

Lines 444 to 451 in 9503d3f

// type = enum

// alias = monotone_constraining_method, mc_method

// options = basic, intermediate

// desc = used only if ``monotone_constraints`` is set

// desc = monotone constraints method

// descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions

// descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results

std::string monotone_constraints_method = "basic";

I forgot to update the doc, sorry. @StrikerRUS It is updated now and, and I ran the parameter generator in the helpers. Let me know if you want something different. Additionally we uploaded the report to https://hal.archives-ouvertes.fr/hal-02862802, and are trying to see if we can also upload it to arXiv. Thanks

Co-authored-by: Nikita Titov <[email protected]>

CharlesAuguste · 2020-09-20T17:19:06Z

@CharlesAuguste there is a conflict with a bug fix: #3384, can you check it before merge?

@guolinke should be good now. I am not sure what caused the failing test.

StrikerRUS · 2020-09-21T14:40:31Z

src/treelearner/serial_tree_learner.cpp

+        constraints_.get(), feature_index, ~(leaf_splits->leaf_index()),
+        train_data_->FeatureNumBin(feature_index));
+  }
+


@CharlesAuguste Could you please fix lint issue which causes CI test to fail?

src/treelearner/serial_tree_learner.cpp:722: Line ends in whitespace. Consider deleting these extra spaces. [whitespace/end_of_line] [4]

@StrikerRUS Sorry about that I missed that line and didn't understand why the job failed. It is al good now!

No problem, thank you!

aldanor · 2020-09-21T23:12:15Z

Epic! Now all we need is a new version release :)

aldanor · 2020-09-21T23:14:26Z

Btw, very minor nit: I've just noticed, we haven't updated the link to pdf in docs/Parameters.rst, should that be updated to hal link which can considered to be more or less permanent? (https://hal.archives-ouvertes.fr/hal-0286280)

StrikerRUS · 2020-09-22T02:06:28Z

@aldanor Thanks!
TBH, I think it will be better to provide a direct link to PDF, given that that site is in French: https://hal.archives-ouvertes.fr/hal-02862802/document. It was not obvious to me that I should click very small piece of file name to view full text. I expected to view doc after clicking big colored buttons 😃 .

Unfortunately, it seems that it is impossible to provide a permanent link for English version of the site. When change fr->en in the upper right corner, I can see "Files" (better than "FICHIERS" in terms of finding where to click to view PDF) but the URL remains the same.

CharlesAuguste · 2020-09-22T19:59:45Z

@aldanor @StrikerRUS I am still trying to upload that to arXiv, through HAL, though I have not yet been successful. If you want I can change the link in the documentation to the HAL pdf for now (in another PR), and we will see later if we want to update it again? Thanks

aldanor · 2020-09-22T20:22:28Z

@CharlesAuguste Maybe let's wait and see if we can get it uploaded to arxiv? If that succeeds, we can open a PR to update a link to that. (For now, as @StrikerRUS says, it probably makes more sense to keep it as it is then)

StrikerRUS · 2020-09-23T14:05:39Z

@CharlesAuguste @aldanor I believe, it will make sense to update the link right now if there are any actual differences in the content of two files: hosted at HAL and one uploaded in GitHub comments.

CharlesAuguste · 2020-09-23T17:51:05Z

@CharlesAuguste @aldanor I believe, it will make sense to update the link right now if there are any actual differences in the content of two files: hosted at HAL and one uploaded in GitHub comments.

@StrikerRUS you are right, there are minor changes. I will make a PR to update the link now, and we can update it again if we manage to upload to arXiv. Thanks

StrikerRUS · 2020-09-23T19:05:12Z

OK, great! Thank you!

github-actions · 2023-08-24T04:52:56Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

CharlesAuguste requested review from btrotta, chivee, guolinke, henry0312, StrikerRUS and wxchan as code owners July 29, 2020 16:40

StrikerRUS added the feature label Aug 2, 2020

CharlesAuguste force-pushed the PR4-advanced-method-monotone-constraints branch from 58f1536 to 2801b5d Compare August 7, 2020 17:58

CharlesAuguste force-pushed the PR4-advanced-method-monotone-constraints branch from 6208bd2 to 6ae08d6 Compare August 10, 2020 10:08

CharlesAuguste requested review from jameslamb and Laurae2 as code owners August 11, 2020 14:04

jameslamb reviewed Aug 11, 2020

View reviewed changes

.ci/test_r_package.sh Outdated

else

ALLOWED_CHECK_NOTES=1

fi

ALLOWED_CHECK_NOTES=2

Copy link

Collaborator

jameslamb Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, thank you

guolinke reviewed Aug 12, 2020

View reviewed changes

This was referenced Aug 15, 2020

fix zero bin in categorical split #3305

Merged

[ci] fix flaky R CMD check test about library size #3308

Merged

CharlesAuguste force-pushed the PR4-advanced-method-monotone-constraints branch from 80c7d5d to 4e1adbc Compare August 23, 2020 13:48

CharlesAuguste and others added 13 commits September 20, 2020 17:57

Fix loop iterator.

250dfe7

Co-authored-by: Nikita Titov <[email protected]>

Fix loop iterator.

73d9752

Co-authored-by: Nikita Titov <[email protected]>

Fix loop iterator.

afa744f

Co-authored-by: Nikita Titov <[email protected]>

Fix loop iterator.

7e9987b

Co-authored-by: Nikita Titov <[email protected]>

Fix loop iterator.

9da9d09

Co-authored-by: Nikita Titov <[email protected]>

Remove std namespace qualifier.

184c4ef

Fix unsigned_int size_t comparison.

e9f6953

Set num_features as int for consistency with the rest of the codebase.

1b38dc4

Make sure constraints exist before recomputing them.

21f32d2

Initialize previous constraints in UpdateConstraints.

609f78a

Update monotone constraints restrictions.

f554a24

Refactor UpdateConstraints loop.

6b3d73d

Update src/io/config.cpp

5774cf4

Co-authored-by: Nikita Titov <[email protected]>

CharlesAuguste force-pushed the PR4-advanced-method-monotone-constraints branch from 5e75625 to 5774cf4 Compare September 20, 2020 16:58

StrikerRUS reviewed Sep 21, 2020

View reviewed changes

Delete white spaces.

6ec24f4

StrikerRUS merged commit 4278f22 into microsoft:master Sep 21, 2020

CharlesAuguste mentioned this pull request Sep 27, 2020

[docs] Change doc link to monotone constraints report to HAL document #3410

Merged

JoshuaC3 mentioned this pull request Jul 2, 2021

A better method to enforce monotonic constraints in regression and classification trees dmlc/xgboost#7073

Open

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr4 advanced method monotone constraints #3264

Pr4 advanced method monotone constraints #3264

CharlesAuguste commented Jul 29, 2020 •

edited

Loading

jameslamb commented Aug 10, 2020

CharlesAuguste commented Aug 10, 2020

StrikerRUS commented Aug 10, 2020

jameslamb commented Aug 10, 2020

StrikerRUS commented Aug 10, 2020

jameslamb commented Aug 10, 2020

CharlesAuguste commented Aug 11, 2020

jameslamb left a comment

jameslamb Aug 11, 2020

guolinke commented Aug 11, 2020

aldanor commented Aug 12, 2020

guolinke Aug 12, 2020

guolinke Aug 12, 2020

guolinke Aug 12, 2020

guolinke Aug 12, 2020

CharlesAuguste Aug 23, 2020

guolinke commented Aug 12, 2020

CharlesAuguste commented Aug 23, 2020

CharlesAuguste commented Aug 23, 2020

StrikerRUS commented Aug 24, 2020

CharlesAuguste commented Aug 24, 2020

CharlesAuguste commented Sep 20, 2020

StrikerRUS Sep 21, 2020

CharlesAuguste Sep 21, 2020

StrikerRUS Sep 21, 2020

aldanor commented Sep 21, 2020

aldanor commented Sep 21, 2020

StrikerRUS commented Sep 22, 2020

CharlesAuguste commented Sep 22, 2020

aldanor commented Sep 22, 2020

StrikerRUS commented Sep 23, 2020

CharlesAuguste commented Sep 23, 2020

StrikerRUS commented Sep 23, 2020

github-actions bot commented Aug 24, 2023

Pr4 advanced method monotone constraints #3264

Pr4 advanced method monotone constraints #3264

Conversation

CharlesAuguste commented Jul 29, 2020 • edited Loading

jameslamb commented Aug 10, 2020

CharlesAuguste commented Aug 10, 2020

StrikerRUS commented Aug 10, 2020

jameslamb commented Aug 10, 2020

StrikerRUS commented Aug 10, 2020

jameslamb commented Aug 10, 2020

CharlesAuguste commented Aug 11, 2020

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guolinke commented Aug 11, 2020

aldanor commented Aug 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guolinke commented Aug 12, 2020

CharlesAuguste commented Aug 23, 2020

CharlesAuguste commented Aug 23, 2020

StrikerRUS commented Aug 24, 2020

CharlesAuguste commented Aug 24, 2020

CharlesAuguste commented Sep 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aldanor commented Sep 21, 2020

aldanor commented Sep 21, 2020

StrikerRUS commented Sep 22, 2020

CharlesAuguste commented Sep 22, 2020

aldanor commented Sep 22, 2020

StrikerRUS commented Sep 23, 2020

CharlesAuguste commented Sep 23, 2020

StrikerRUS commented Sep 23, 2020

github-actions bot commented Aug 24, 2023

CharlesAuguste commented Jul 29, 2020 •

edited

Loading