Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ranking algorithm: position unbiased option #4531

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
e5b6f31
chore: ignore swig jni file
robhowley Feb 2, 2021
ca85e05
Merge pull request #1 from Zocdoc/rh_update_ignore
robert-howley-zocdoc Feb 2, 2021
7718343
chore: run tests on pr to position_debias
robhowley Feb 2, 2021
7063000
Update cuda.yml
robert-howley-zocdoc Feb 3, 2021
4f4387a
Merge pull request #2 from Zocdoc/rh_main_fork_branch
robert-howley-zocdoc Feb 3, 2021
482c143
feat: crude copy paste
robhowley Feb 3, 2021
4bae3ad
feat: add unbiased lambdamart config params
robhowley Feb 3, 2021
1449576
Merge pull request #3 from Zocdoc/rh_config
robert-howley-zocdoc Feb 3, 2021
c49005e
Merge branch 'position_unbiased' into rh_impl_phase_1
robhowley Feb 3, 2021
3cbcb62
chore: variable init
robhowley Feb 9, 2021
a0680e6
feat: add bias corrected lambda accumulators
robhowley Feb 9, 2021
8827540
chore: remove intermediate vectors
robhowley Feb 9, 2021
6798c2a
chore: address linter issues
robhowley Feb 9, 2021
e1c8158
chore: remove unused position_lambdas variable
robhowley Feb 9, 2021
26b316b
chore: remove unused position_scores variables
robhowley Feb 9, 2021
4b8c2e9
chore: remove position counts variables
robhowley Feb 9, 2021
c50e92c
chore: consolidate initialization and updates
robhowley Feb 10, 2021
592ade6
chore: linter whitespace
robhowley Feb 10, 2021
c29a6f8
chore: add comments on formulas and derivations
robhowley Feb 11, 2021
9796a72
chore: eta has slightly clearer/diff meaning in this impl
robhowley Feb 11, 2021
a00ac5c
fix: debias rank values
robhowley Feb 19, 2021
672ec5b
chore: linter
robhowley Feb 19, 2021
d3347d2
Merge pull request #4 from Zocdoc/rh_impl_phase_1
robert-howley-zocdoc Feb 23, 2021
88e3542
chore: give better name to bias regularizer
robhowley Feb 25, 2021
52f6243
Merge pull request #5 from Zocdoc/rh_better_eta_name
robert-howley-zocdoc Feb 25, 2021
99f4f04
chore: tests w configs relevant to unbiased
robhowley Mar 3, 2021
87219c2
chore: remove unused param, replaced by truncation_level
robhowley Mar 3, 2021
baa4a0d
Merge pull request #6 from Zocdoc/rh_remove_position_bins
robert-howley-zocdoc Mar 3, 2021
20fe972
fix: update workflow trigger to correct branch name
robhowley Mar 3, 2021
9ac06d5
merge conflicts
robhowley Aug 18, 2021
da17901
more merge conflicts
robhowley Aug 18, 2021
a821813
remove git workflow customizations
robhowley Aug 18, 2021
32aa904
remove extra comments
robhowley Aug 18, 2021
b450bfa
remove print statement
robhowley Aug 18, 2021
fc5b92d
remove test refactor, line spacing, and comment typo fix
robhowley Sep 7, 2021
ebb8f40
remove gitignore changes, more whitespace removal
robhowley Sep 7, 2021
6ae3cfe
end w new line
robhowley Sep 7, 2021
9572d7d
fix redundant blank line found in cpp lint
robhowley Sep 7, 2021
0cbe02d
Merge branch 'microsoft:master' into position_unbiased
robert-howley-zocdoc Dec 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1075,6 +1075,17 @@ Objective Parameters

- separate by ``,``

- ``lambdarank_unbiased`` :raw-html:`<a id="lambdarank_unbiased" title="Permalink to this parameter" href="#lambdarank_unbiased">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

- used only in ``lambdarank`` application

- set this to ``true`` to use the position bias correction of `Unbiased LambdaMART <https://arxiv.org/pdf/1809.05818.pdf>`__

- ``lambdarank_bias_p_norm`` :raw-html:`<a id="lambdarank_bias_p_norm" title="Permalink to this parameter" href="#lambdarank_bias_p_norm">&#x1F517;&#xFE0E;</a>`, default = ``0.5``, type = double, constraints: ``lambdarank_bias_p_norm >= 0.0``

- used only in ``lambdarank`` application where ``lambdarank_unbiased = true``


Metric Parameters
-----------------

Expand Down
8 changes: 8 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -900,6 +900,14 @@ struct Config {
// desc = separate by ``,``
std::vector<double> label_gain;

// desc = used only in ``lambdarank`` application
// desc = set this to ``true`` to use the position bias correction of `Unbiased LambdaMART <https://arxiv.org/pdf/1809.05818.pdf>`__
bool lambdarank_unbiased = false;

// check = >=0.0
// desc = used only in ``lambdarank`` application where ``lambdarank_unbiased = true``
double lambdarank_bias_p_norm = 0.5;

#pragma endregion

#pragma region Metric Parameters
Expand Down
9 changes: 9 additions & 0 deletions src/io/config_auto.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,8 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"lambdarank_truncation_level",
"lambdarank_norm",
"label_gain",
"lambdarank_unbiased",
"lambdarank_bias_p_norm",
"metric",
"metric_freq",
"is_provide_training_metric",
Expand Down Expand Up @@ -606,6 +608,11 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
label_gain = Common::StringToArray<double>(tmp_str, ',');
}

GetBool(params, "lambdarank_unbiased", &lambdarank_unbiased);

GetDouble(params, "lambdarank_bias_p_norm", &lambdarank_bias_p_norm);
CHECK_GE(lambdarank_bias_p_norm, 0.0);

GetInt(params, "metric_freq", &metric_freq);
CHECK_GT(metric_freq, 0);

Expand Down Expand Up @@ -741,6 +748,8 @@ std::string Config::SaveMembersToString() const {
str_buf << "[lambdarank_truncation_level: " << lambdarank_truncation_level << "]\n";
str_buf << "[lambdarank_norm: " << lambdarank_norm << "]\n";
str_buf << "[label_gain: " << Common::Join(label_gain, ",") << "]\n";
str_buf << "[lambdarank_unbiased: " << lambdarank_unbiased << "]\n";
str_buf << "[lambdarank_bias_p_norm: " << lambdarank_bias_p_norm << "]\n";
str_buf << "[eval_at: " << Common::Join(eval_at, ",") << "]\n";
str_buf << "[multi_error_top_k: " << multi_error_top_k << "]\n";
str_buf << "[auc_mu_weights: " << Common::Join(auc_mu_weights, ",") << "]\n";
Expand Down
148 changes: 144 additions & 4 deletions src/objective/rank_objective.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include <LightGBM/metric.h>
#include <LightGBM/objective_function.h>
#include <LightGBM/utils/log.h>

#include <algorithm>
#include <cmath>
Expand Down Expand Up @@ -101,7 +102,9 @@ class LambdarankNDCG : public RankingObjective {
: RankingObjective(config),
sigmoid_(config.sigmoid),
norm_(config.lambdarank_norm),
truncation_level_(config.lambdarank_truncation_level) {
truncation_level_(config.lambdarank_truncation_level),
unbiased_(config.lambdarank_unbiased),
bias_p_norm_(config.lambdarank_bias_p_norm) {
label_gain_ = config.label_gain;
// initialize DCG calculator
DCGCalculator::DefaultLabelGain(&label_gain_);
Expand All @@ -111,6 +114,14 @@ class LambdarankNDCG : public RankingObjective {
if (sigmoid_ <= 0.0) {
Log::Fatal("Sigmoid param %f should be greater than zero", sigmoid_);
}

#pragma omp parallel
#pragma omp master
{
num_threads_ = omp_get_num_threads();
}

position_bias_regularizer = 1.0f / (1.0f + bias_p_norm_);
}

explicit LambdarankNDCG(const std::vector<std::string>& strs)
Expand All @@ -135,12 +146,24 @@ class LambdarankNDCG : public RankingObjective {
}
// construct Sigmoid table to speed up Sigmoid transform
ConstructSigmoidTable();

// initialize position bias vectors
InitPositionBiasesAndGradients();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this initialization be conditional on the value of unbiased_? Otherwise, the default behavior will be to initialize them and then not use them, right?

(apologies if I've misunderstood how this works)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they get initialized to 1 and only updated if unbiased_ is true. so any time you see them when unbiased_ is false, you're just dividing by 1.

}

void GetGradients(const double* score, score_t* gradients,
score_t* hessians) const override {
RankingObjective::GetGradients(score, gradients, hessians);

if (unbiased_) { UpdatePositionBiasesAndGradients(); }
}

inline void GetGradientsForOneQuery(data_size_t query_id, data_size_t cnt,
const label_t* label, const double* score,
score_t* lambdas,
score_t* hessians) const override {
const int tid = omp_get_thread_num(); // get thread id

// get max DCG on current query
const double inverse_max_dcg = inverse_max_dcgs_[query_id];
// initialize with zero
Expand Down Expand Up @@ -199,15 +222,26 @@ class LambdarankNDCG : public RankingObjective {
// get delta NDCG
double delta_pair_NDCG = dcg_gap * paired_discount * inverse_max_dcg;
// regular the delta_pair_NDCG by score distance
if (norm_ && best_score != worst_score) {
if ((norm_ || unbiased_) && best_score != worst_score) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me why with unbiased Lambdarank we must normalize the delta score here.

delta_pair_NDCG /= (0.01f + fabs(delta_score));
}
// calculate lambda for this pair
double p_lambda = GetSigmoid(delta_score);
double p_hessian = p_lambda * (1.0f - p_lambda);

int debias_high_rank = static_cast<int>(std::min(high, truncation_level_ - 1));
int debias_low_rank = static_cast<int>(std::min(low, truncation_level_ - 1));
Comment on lines +232 to +233
Copy link

@thvasilo thvasilo Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I am trying to investigate what the issue is with the biases for the truncation cutoff spot that I'm observing and I'm wondering if the thresholding here is the culprit.

Within the context of this implementation, the case where debias_high_rank gets the value truncation_level - 1 is when high > truncation_level - 1, i.e. sorted_idx[high_rank] > truncation_level - 1 i.e. sorted_idx[j] > truncation_level - 1.

Correct me if I'm wrong but the only case the above happens is when j is the high_rank, and j starts from i+1 which can take a >truncation_level value right?

I was wondering if it's better to check the truncation level for both high and low ranks and only update the position biases when both are lower than truncation_level, instead of thresholding their values.

As I'm looking for the update to introduce the position column as a metadata value, what I'm thinking is along the lines of:

const data_size_t high = sorted_idx[high_rank];
const data_size_t low = sorted_idx[low_rank];
data_size_t high_position, low_position;
if (unbiased_) {
  // record_positions_ is a vector of length data_size, with the position of each 
  // record in the data. start is the offset for the current query
  high_position = record_positions_[start + sorted_idx[high_rank]];
  low_position = record_positions_[start + sorted_idx[low_rank]];
} else {
  high_position = high;
  low_position = low;
}

if (unbiased_) {
  double p_cost = log(1.0f / (1.0f - p_lambda)) * delta_pair_NDCG;

  // more relevant (clicked) gets debiased by less relevant (unclicked), only if within truncation levels
  if (high_position <= truncation_level_ && low_position <= truncation_level_) {
    i_costs_buffer_[tid][high_position] += p_cost / j_biases_pow_[low_position];
    j_costs_buffer_[tid][low_position] += p_cost / i_biases_pow_[high_position];  // and vice versa
  }
}
// By default we set values of 1.0 as no-ops
double i_bias_pow = 1.0;
double j_bias_pow = 1.0;
// We only use actual bias values if they are both within the truncation limits
if (unbiased_ && high_position <= truncation_level_ && low_position <= truncation_level_) {
  i_bias_pow = i_biases_pow_[high_position];
  j_bias_pow = j_biases_pow_[low_position];
}
// update, either with 1.0 values if at least one of data points ended up outside the truncation threshold or the actual biases
p_lambda *= -sigmoid_ * delta_pair_NDCG / i_bias_pow / j_bias_pow;
p_hessian *= sigmoid_ * sigmoid_ * delta_pair_NDCG / i_bias_pow / j_bias_pow;

Does this make sense? The main suggestion is to not "clamp" the bias positions between [0, truncation_level - 1] but rather not perform bias updates when one of the document-pair items falls outside the truncation level (making all bias positions valid).

It's a bit unclear to me still the difference between high/low and high_rank/low_rank. IIRC We sort records in a query by their score and to establish high/low rank we compare their label values.

What I want to ensure is that: since my record_positions_ will have the original data order, the extracted position corresponds to the expected record, which I think is what we accomplish by record_positions_[start + sorted_idx[high_rank]].

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thvasilo Thanks for reviewing this.
What confuses me here is that the index of the cost buffer is truncation of the high and low variables. It doesn't make sense to me because high and low are just document indices. According to equations (28)~(31) in the Unbiased Lambdarank paper, the truncation should be done on the current rank by the model instead of the document index. Since with truncation level, we are truncating the loss in equation (28) to consider only the top ranked i's.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here static_cast<int>(std::min(high, truncation_level_ - 1)); and static_cast<int>(std::min(low, truncation_level_ - 1)); should be static_cast<int>(std::min(high_rank, truncation_level_ - 1)); and static_cast<int>(std::min(low_rank, truncation_level_ - 1)); instead.


if (unbiased_) {
double p_cost = log(1.0f / (1.0f - p_lambda)) * delta_pair_NDCG;

// more relevant (clicked) gets debiased by less relevant (unclicked)
i_costs_buffer_[tid][debias_high_rank] += p_cost / j_biases_pow_[debias_low_rank];
j_costs_buffer_[tid][debias_low_rank] += p_cost / i_biases_pow_[debias_high_rank]; // and vice versa
}
// update
p_lambda *= -sigmoid_ * delta_pair_NDCG;
p_hessian *= sigmoid_ * sigmoid_ * delta_pair_NDCG;
p_lambda *= -sigmoid_ * delta_pair_NDCG / i_biases_pow_[debias_high_rank] / j_biases_pow_[debias_low_rank];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we are now before starting iteration k + 1, then the i_biases_pow_ and j_biases_pow_ would be calculated based the prediction results up to iteration k - 1. Only the newly updated i_biases_pow_ and j_biases_pow_ in UpdatePositionBiasesAndGradients after GetGradients consider the prediction values up to iteration k. This is inconsistent with Algorithm 1 in the paper. Though I don't think this should affect the performance much.

p_hessian *= sigmoid_ * sigmoid_ * delta_pair_NDCG / i_biases_pow_[debias_high_rank] / j_biases_pow_[debias_low_rank];
lambdas[low] -= static_cast<score_t>(p_lambda);
hessians[low] += static_cast<score_t>(p_hessian);
lambdas[high] += static_cast<score_t>(p_lambda);
Expand Down Expand Up @@ -253,9 +287,86 @@ class LambdarankNDCG : public RankingObjective {
}
}

void InitPositionBiasesAndGradients() {
i_biases_pow_.resize(truncation_level_);
j_biases_pow_.resize(truncation_level_);
i_costs_.resize(truncation_level_);
j_costs_.resize(truncation_level_);

for (int i = 0; i < truncation_level_; ++i) {
// init position biases
i_biases_pow_[i] = 1.0f;
j_biases_pow_[i] = 1.0f;

// init position gradients
i_costs_[i] = 0.0f;
j_costs_[i] = 0.0f;
}

// init gradient buffers for gathering results across threads
for (int i = 0; i < num_threads_; i++) {
i_costs_buffer_.emplace_back(truncation_level_, 0.0f);
j_costs_buffer_.emplace_back(truncation_level_, 0.0f);
}
}

void UpdatePositionBiasesAndGradients() const {
// accumulate the parallel results
for (int i = 0; i < num_threads_; i++) {
for (int j = 0; j < truncation_level_; j++) {
i_costs_[j] += i_costs_buffer_[i][j];
j_costs_[j] += j_costs_buffer_[i][j];
}
}

for (int i = 0; i < num_threads_; i++) {
for (int j = 0; j < truncation_level_; j++) {
// clear buffer for next run
i_costs_buffer_[i][j] = 0.0f;
j_costs_buffer_[i][j] = 0.0f;
}
}

for (int i = 0; i < truncation_level_; i++) {
// Update bias
i_biases_pow_[i] = pow(i_costs_[i] / i_costs_[0], position_bias_regularizer);
j_biases_pow_[i] = pow(j_costs_[i] / j_costs_[0], position_bias_regularizer);
}

LogDebugPositionBiases();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while i found being able to track evolution and end point of position bias adjustment values useful in debug mode, happy to remove this since it doesn't have any new feature benefit during normal usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please remove this or wrap the debug code with #ifdef DEBUG macro.


for (int i = 0; i < truncation_level_; i++) {
// Clear position info
i_costs_[i] = 0.0f;
j_costs_[i] = 0.0f;
}
}

const char* GetName() const override { return "lambdarank"; }

private:
void LogDebugPositionBiases() const {
std::stringstream message_stream;
message_stream << std::setw(10) << "position"
<< std::setw(15) << "bias_i"
<< std::setw(15) << "bias_j"
<< std::setw(15) << "i_cost"
<< std::setw(15) << "j_cost"
<< std::endl;
Log::Debug(message_stream.str().c_str());
message_stream.str("");

for (int i = 0; i < truncation_level_; ++i) {
message_stream << std::setw(10) << i
<< std::setw(15) << i_biases_pow_[i]
<< std::setw(15) << j_biases_pow_[i]
<< std::setw(15) << i_costs_[i]
<< std::setw(15) << j_costs_[i];
Log::Debug(message_stream.str().c_str());
message_stream.str("");
}
}

/*! \brief Sigmoid param */
double sigmoid_;
/*! \brief Normalize the lambdas or not */
Expand All @@ -276,6 +387,35 @@ class LambdarankNDCG : public RankingObjective {
double max_sigmoid_input_ = 50;
/*! \brief Factor that covert score to bin in Sigmoid table */
double sigmoid_table_idx_factor_;

// bias correction variables
/*! \brief power of (click) position biases */
mutable std::vector<label_t> i_biases_pow_;

/*! \brief power of (unclick) position biases */
mutable std::vector<label_t> j_biases_pow_;

// mutable double position cost;
mutable std::vector<label_t> i_costs_;
mutable std::vector<std::vector<label_t>> i_costs_buffer_;

mutable std::vector<label_t> j_costs_;
mutable std::vector<std::vector<label_t>> j_costs_buffer_;

/*!
* \brief Should use lambdarank with position bias correction
* [arxiv.org/pdf/1809.05818.pdf]
*/
bool unbiased_;

/*! \brief Position bias regularizer norm */
double bias_p_norm_;

/*! \brief Position bias regularizer exponent, 1 / (1 + bias_p_norm_) */
double position_bias_regularizer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better add an underscore after the member's name to be consistent with the whole code base, i.e. position_bias_regularizer_.


/*! \brief Number of threads */
int num_threads_;
};

/*!
Expand Down
15 changes: 15 additions & 0 deletions tests/python_package_test/test_sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,21 @@ def test_lambdarank():
assert gbm.best_score_['valid_0']['ndcg@3'] > 0.578


def test_lambdarank_unbiased():
rank_example_dir = Path(__file__).absolute().parents[2] / 'examples' / 'lambdarank'
X_train, y_train = load_svmlight_file(str(rank_example_dir / 'rank.train'))
X_test, y_test = load_svmlight_file(str(rank_example_dir / 'rank.test'))
q_train = np.loadtxt(str(rank_example_dir / 'rank.train.query'))
q_test = np.loadtxt(str(rank_example_dir / 'rank.test.query'))
gbm = lgb.LGBMRanker(n_estimators=50, lambdarank_unbiased=True, sigmoid=2)
gbm.fit(X_train, y_train, group=q_train, eval_set=[(X_test, y_test)],
eval_group=[q_test], eval_at=[1, 3], early_stopping_rounds=10, verbose=False,
callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))])
assert gbm.best_iteration_ <= 24
assert gbm.best_score_['valid_0']['ndcg@1'] > 0.569
assert gbm.best_score_['valid_0']['ndcg@3'] > 0.62


def test_xendcg():
xendcg_example_dir = Path(__file__).absolute().parents[2] / 'examples' / 'xendcg'
X_train, y_train = load_svmlight_file(str(xendcg_example_dir / 'rank.train'))
Expand Down