Rankings depend on OS #6709

gprovencher-cdpq · 2021-02-15T22:37:24Z

Hi,

I observe that the rankings obtained using XGBRanker are not the same whether I use Windows or Linux. (I use xgboost 1.3.3)

from xgboost import XGBRanker
import numpy as np
import pandas as pd

# Training set.
seed = 1
np.random.seed(seed)
input_df = pd.DataFrame({
    "feature_1": np.random.rand(1000),
    "feature_2": np.random.rand(1000)
})

# Target column has a simple relation with respect to the feature columns.
input_df["target"] = input_df["feature_1"] * input_df["feature_2"]
 
feature_columns = ["feature_1", "feature_2"]
target_columns = "target"
    
estimator = XGBRanker(
        base_score=0.5, booster="gbtree", colsample_bylevel=1.0,
        colsample_bynode=1.0, colsample_bytree=1.0, gamma=0.5, gpu_id=-1,
        importance_type='gain', interaction_constraints='',
        learning_rate=0.3, max_delta_step=1, max_depth=3,
        min_child_weight=5, missing=np.nan, monotone_constraints='()',
        n_estimators=50, n_jobs=1, num_parallel_tree=1,
        random_state=0, reg_alpha=0, reg_lambda=1.0,
        scale_pos_weight=None, subsample=1.0, tree_method="exact",
        validate_parameters=False, verbosity=0, nthread=1
)
 
X = input_df[feature_columns].copy()
y = input_df[target_columns].copy()
group = np.array([len(X)])
 
estimator.fit(X=X, y=y, group=group, sample_weight=None, base_margin=None, 
              eval_set=None, sample_weight_eval_set=None, 
              eval_group=None, eval_metric=None, 
              early_stopping_rounds=None, verbose=False, xgb_model=None, 
              feature_weights=None, callbacks=None)

print("Feature importance:", estimator.feature_importances_)

# Dataset for predictions.
np.random.seed(seed + 1)
to_predict_df = pd.DataFrame({
    "feature_1": np.random.rand(1000),
    "feature_2": np.random.rand(1000)
})
to_predict_df["result"] = estimator.predict(data=to_predict_df)

to_predict_df["answer"] = to_predict_df["feature_1"] * to_predict_df["feature_2"]
to_predict_df["answer_rank"] = to_predict_df["answer"].rank().astype(int)
to_predict_df["result_rank"] = to_predict_df["result"].rank().astype(int)
to_predict_df["rank_error"] = abs(to_predict_df["result_rank"] - to_predict_df["answer_rank"])
 
print("Mean rank error:", to_predict_df["rank_error"].sum() / len(to_predict_df))

Windows 10 output:

Feature importance: [0.47474855 0.5252515 ]
Mean rank error: 12.312

to_predict_df
	feature_1 	feature_2 	result 		answer 		 answer_rank result_rank rank_error
0 	0.435995 	0.621843 	1.746030 	0.271120 	669 		679 		10
1 	0.025926 	0.565579 	-4.864852 	0.014663 	71 		59 		12
2 	0.549662 	0.152671 	-1.752274 	0.083917 	317 		316 		1
3 	0.435322 	0.813437 	2.556569 	0.354107 	748 		753 		5
4 	0.420368 	0.983462 	3.193708 	0.413416 	799 		807 		8
... 	... 		... 		... 		... 		... 			... 			...
995 	0.598505 	0.339895 	0.660354 	0.203429 	572 		564 		8
996 	0.358920 	0.884508 	2.221937 	0.317468 	710 		717 		7
997 	0.680391 	0.940445 	5.266938 	0.639871 	930 		923 		7
998 	0.853200 	0.759583 	5.297351 	0.648076 	932 		928 		4
999 	0.792843 	0.236106 	0.339883 	0.187195 	545 		536 		9

Linux output (Amazon Linux AMI 2018.03, kernel 4.14.214-118.339.amzn1.x86_64):

Feature importance: [0.5133879  0.48661205]
Mean rank error: 12.831

to_predict_df
	feature_1 	feature_2 	result 		answer 		answer_rank 	result_rank  	rank_error
0 	0.435995 	0.621843 	1.481333 	0.271120 	669 		662 		7
1 	0.025926 	0.565579 	-4.684215 	0.014663 	71 		61 		10
2 	0.549662 	0.152671 	-1.649529 	0.083917 	317 		322 		5
3 	0.435322 	0.813437 	2.654828 	0.354107 	748 		759 		11
4 	0.420368 	0.983462 	3.061021 	0.413416 	799 		792 		7
... 	... 		... 		... 		... 		... 			... 			...
995 	0.598505 	0.339895 	0.751275 	0.203429 	572 		580 		8
996 	0.358920 	0.884508 	2.387831 	0.317468 	710 		729 		19
997 	0.680391 	0.940445 	5.248150 	0.639871 	930 		925 		5
998 	0.853200 	0.759583 	5.553242 	0.648076 	932 		939 		7
999 	0.792843 	0.236106 	0.337867 	0.187195 	545 		522 		23

My understanding is that nothing in this computation should involve randomness: Am I wrong?

Thanks,
Guillaume

The text was updated successfully, but these errors were encountered:

trivialfis · 2021-02-15T22:41:24Z

Could you please confirm the generated data is the same, and share the difference between outputs?

gprovencher-cdpq · 2021-02-15T23:07:54Z

I confirm the generated data is exactly the same.

trivialfis · 2021-02-16T05:42:03Z

There's many contributing factors to floating point difference, like different hardware, different compiler optimisations etc. It's quite difficult to get portable result if possible at all. I will look into it later to see if there's any human error on xgboost s side. But I can't promise to get the exact result for 2 platforms.

gprovencher-cdpq · 2021-02-16T14:05:29Z

I agree that it's not possible to get the exact same results, up to machine precision. Here however something doesn't seem quite right.

When I take the same code snippet as above but use XGBRegressor instead of XGBRanker (and remove any superfluous parameters), I get the same results (8th decimal place), that is the same feature importance and prediction results, on Windows and Linux.

My concern is: which implementation gives the "correct" results?

Again, thanks for your help!

trivialfis · 2023-02-27T20:00:16Z

It's caused by the different implementations of linear_congruential_engine between gcc and msvc. Will document the behaviour in #8822

gprovencher-cdpq changed the title ~~Mismatch OS~~ Rankings depends on OS Feb 15, 2021

gprovencher-cdpq changed the title ~~Rankings depends on OS~~ Rankings depend on OS Feb 15, 2021

trivialfis added the LTR Learning to rank label Mar 30, 2021

trivialfis mentioned this issue Feb 17, 2023

Rework learning to rank. #8822

Merged

trivialfis closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rankings depend on OS #6709

Rankings depend on OS #6709

gprovencher-cdpq commented Feb 15, 2021 •

edited

Loading

trivialfis commented Feb 15, 2021 •

edited

Loading

gprovencher-cdpq commented Feb 15, 2021

trivialfis commented Feb 16, 2021

gprovencher-cdpq commented Feb 16, 2021 •

edited

Loading

trivialfis commented Feb 27, 2023

Rankings depend on OS #6709

Rankings depend on OS #6709

Comments

gprovencher-cdpq commented Feb 15, 2021 • edited Loading

trivialfis commented Feb 15, 2021 • edited Loading

gprovencher-cdpq commented Feb 15, 2021

trivialfis commented Feb 16, 2021

gprovencher-cdpq commented Feb 16, 2021 • edited Loading

trivialfis commented Feb 27, 2023

gprovencher-cdpq commented Feb 15, 2021 •

edited

Loading

trivialfis commented Feb 15, 2021 •

edited

Loading

gprovencher-cdpq commented Feb 16, 2021 •

edited

Loading