Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat position bias via GAM in LambdaMART #5929

Merged
merged 103 commits into from
Sep 4, 2023

Conversation

metpavel
Copy link
Collaborator

Add support to counter position bias present in relevance labels for learning-to-rank task. Position bias is modeled as an additive factor in logistic space within Generalized Additive Model (GAM) framework.

@metpavel
Copy link
Collaborator Author

@microsoft-github-policy-service agree company="Microsoft"

Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

include/LightGBM/dataset.h Outdated Show resolved Hide resolved
include/LightGBM/dataset.h Show resolved Hide resolved
src/io/metadata.cpp Outdated Show resolved Hide resolved
include/LightGBM/dataset.h Show resolved Hide resolved
src/objective/rank_objective.hpp Outdated Show resolved Hide resolved
src/objective/rank_objective.hpp Show resolved Hide resolved
src/objective/rank_objective.hpp Show resolved Hide resolved
src/objective/rank_objective.hpp Outdated Show resolved Hide resolved
@shiyu1994
Copy link
Collaborator

The static check CI job is failing. And there are 5 linter errors to be fixed from the log. Thanks.
https://github.com/microsoft/LightGBM/actions/runs/5275652092/jobs/9549918023?pr=5929

Linting C++ code
running cpplint
include/LightGBM/dataset.h:224:  An else should appear on the same line as the preceding }  [whitespace/newline] [4]
include/LightGBM/dataset.h:224:  If an else has a brace on one side, it should have it on both  [readability/braces] [5]
include/LightGBM/dataset.h:237:  An else should appear on the same line as the preceding }  [whitespace/newline] [4]
include/LightGBM/dataset.h:237:  If an else has a brace on one side, it should have it on both  [readability/braces] [5]
src/io/metadata.cpp:576:  Line ends in whitespace.  Consider deleting these extra spaces.  [whitespace/end_of_line] [4]

src/io/metadata.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor revisions.

@jameslamb
Copy link
Collaborator

jameslamb commented Jun 19, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/5308500765

Status: success ✔️.

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I've quickly skimmed the papers linked the documentation, and think I finally understand now how this is working.

I'm ok with removing the parameter to provide a customized filename... thanks @shiyu1994 for the explanation in #5929 (comment).

I'm comfortable with the changes to LightGBM's Python and C API (other than a minor comment about parameter naming) and I'd be happy to implement the R side of this after this PR is merged. Can one of you please create a new issue at https://github.com/microsoft/LightGBM/issues to track that work?

Please see my most recent round of comments. I still feel that:

  • the tests as currently written don't provide very strong evidence of this implementation's correctness
  • the documentation is unclear about the expected content of positions

I'll watch this PR closely this week and try to respond quickly as you comment and ask for new reviews. I understand there's some urgency inside Microsoft on getting this merged. But please note that I'll be traveling Tuesday-Thursday with limited available time here on GitHub.

docs/Parameters.rst Outdated Show resolved Hide resolved
src/io/metadata.cpp Show resolved Hide resolved
tests/python_package_test/test_engine.py Show resolved Hide resolved
docs/Advanced-Topics.rst Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator

jameslamb commented Aug 14, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/5857085872

Status: success ✔️.

@metpavel
Copy link
Collaborator Author

@jameslamb, thank you for your effort to learn more about the problem and the proposed solution! And thank you for offering your help in adding the R language support for it. Here is the issue I created for that: #6063.

@jameslamb jameslamb self-requested a review September 3, 2023 05:58
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for accepting my proposed rewording for the documentation, for the effort you put into the excellent new tests, and for documenting the R addition in #6063.

Please fix the compilation errors (looks like there are a few more places where lambdarank_position_bias_regularizer needs to be changed to lambdarank_position_bias_regularization). After that, I'll run the valgrind checks one more time.

Besides that, I have no other major comments left that are worth blocking merging. There are some minor code things I'd like to change in the tests, but I'll propose that in a follow-up PR after this.

Now that I understand it (thanks for your patience with me 😅 ) I'm very excited to see this feature make it into LightGBM! It's been 2.5 years since @robhowley first introduced #4531. Thanks to you and @shiyu1994 for all your efforts in making this happen.

I also want to tag @thvasilo, who was very involved in the discussion in #4531, just so they're aware.

docs/Advanced-Topics.rst Outdated Show resolved Hide resolved
src/io/metadata.cpp Show resolved Hide resolved
gbm_unbiased_with_file = lgb.train(params, lgb_train, valid_sets = lgb_valid, num_boost_round=50)

# the performance of the unbiased LambdaMART should outperform the plain LambdaMART on the dataset with position bias
assert gbm_baseline.best_score['valid_0']['ndcg@3'] + 0.03 <= gbm_unbiased_with_file.best_score['valid_0']['ndcg@3']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thank you for this! I think we'll be glad to have this stricter test.

tests/python_package_test/test_engine.py Show resolved Hide resolved
@jameslamb
Copy link
Collaborator

jameslamb commented Sep 4, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/6068729923

Status: success ✔️.

@jameslamb jameslamb self-requested a review September 4, 2023 03:24
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my remaining comments have been addressed by @shiyu1994 's recent comments, so marking this "approved".

Please do wait to see if the valgrind test passes one more time before merging though: #5929 (comment)

Thank you both for all the hard work that went into this addition to LightGBM!

@shiyu1994 shiyu1994 merged commit 7e34d23 into microsoft:master Sep 4, 2023
@jameslamb jameslamb mentioned this pull request Sep 5, 2023
19 tasks
@metpavel metpavel deleted the metpavel-posbias_GAM branch September 6, 2023 04:33
@metpavel
Copy link
Collaborator Author

metpavel commented Sep 6, 2023

All of my remaining comments have been addressed by @shiyu1994 's recent comments, so marking this "approved".
Thank you both for all the hard work that went into this addition to LightGBM!

@jameslamb, thank you again for your diligent reviews and prioritizing the quality of LightGBM! Many thanks to @shiyu1994 for your help and support! I'm looking forward to future collaborations with you guys!

Copy link

github-actions bot commented Dec 6, 2023

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants