Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rankings depend on OS #6709

Closed
gprovencher-cdpq opened this issue Feb 15, 2021 · 5 comments · Fixed by #8822
Closed

Rankings depend on OS #6709

gprovencher-cdpq opened this issue Feb 15, 2021 · 5 comments · Fixed by #8822
Labels
LTR Learning to rank

Comments

@gprovencher-cdpq
Copy link

gprovencher-cdpq commented Feb 15, 2021

Hi,

I observe that the rankings obtained using XGBRanker are not the same whether I use Windows or Linux. (I use xgboost 1.3.3)

from xgboost import XGBRanker
import numpy as np
import pandas as pd

# Training set.
seed = 1
np.random.seed(seed)
input_df = pd.DataFrame({
    "feature_1": np.random.rand(1000),
    "feature_2": np.random.rand(1000)
})

# Target column has a simple relation with respect to the feature columns.
input_df["target"] = input_df["feature_1"] * input_df["feature_2"]
 
feature_columns = ["feature_1", "feature_2"]
target_columns = "target"
    
estimator = XGBRanker(
        base_score=0.5, booster="gbtree", colsample_bylevel=1.0,
        colsample_bynode=1.0, colsample_bytree=1.0, gamma=0.5, gpu_id=-1,
        importance_type='gain', interaction_constraints='',
        learning_rate=0.3, max_delta_step=1, max_depth=3,
        min_child_weight=5, missing=np.nan, monotone_constraints='()',
        n_estimators=50, n_jobs=1, num_parallel_tree=1,
        random_state=0, reg_alpha=0, reg_lambda=1.0,
        scale_pos_weight=None, subsample=1.0, tree_method="exact",
        validate_parameters=False, verbosity=0, nthread=1
)
 
X = input_df[feature_columns].copy()
y = input_df[target_columns].copy()
group = np.array([len(X)])
 
estimator.fit(X=X, y=y, group=group, sample_weight=None, base_margin=None, 
              eval_set=None, sample_weight_eval_set=None, 
              eval_group=None, eval_metric=None, 
              early_stopping_rounds=None, verbose=False, xgb_model=None, 
              feature_weights=None, callbacks=None)

print("Feature importance:", estimator.feature_importances_)

# Dataset for predictions.
np.random.seed(seed + 1)
to_predict_df = pd.DataFrame({
    "feature_1": np.random.rand(1000),
    "feature_2": np.random.rand(1000)
})
to_predict_df["result"] = estimator.predict(data=to_predict_df)

to_predict_df["answer"] = to_predict_df["feature_1"] * to_predict_df["feature_2"]
to_predict_df["answer_rank"] = to_predict_df["answer"].rank().astype(int)
to_predict_df["result_rank"] = to_predict_df["result"].rank().astype(int)
to_predict_df["rank_error"] = abs(to_predict_df["result_rank"] - to_predict_df["answer_rank"])
 
print("Mean rank error:", to_predict_df["rank_error"].sum() / len(to_predict_df))

Windows 10 output:

Feature importance: [0.47474855 0.5252515 ]
Mean rank error: 12.312

to_predict_df
	feature_1 	feature_2 	result 		answer 		 answer_rank result_rank rank_error
0 	0.435995 	0.621843 	1.746030 	0.271120 	669 		679 		10
1 	0.025926 	0.565579 	-4.864852 	0.014663 	71 		59 		12
2 	0.549662 	0.152671 	-1.752274 	0.083917 	317 		316 		1
3 	0.435322 	0.813437 	2.556569 	0.354107 	748 		753 		5
4 	0.420368 	0.983462 	3.193708 	0.413416 	799 		807 		8
... 	... 		... 		... 		... 		... 			... 			...
995 	0.598505 	0.339895 	0.660354 	0.203429 	572 		564 		8
996 	0.358920 	0.884508 	2.221937 	0.317468 	710 		717 		7
997 	0.680391 	0.940445 	5.266938 	0.639871 	930 		923 		7
998 	0.853200 	0.759583 	5.297351 	0.648076 	932 		928 		4
999 	0.792843 	0.236106 	0.339883 	0.187195 	545 		536 		9

Linux output (Amazon Linux AMI 2018.03, kernel 4.14.214-118.339.amzn1.x86_64):

Feature importance: [0.5133879  0.48661205]
Mean rank error: 12.831

to_predict_df
	feature_1 	feature_2 	result 		answer 		answer_rank 	result_rank  	rank_error
0 	0.435995 	0.621843 	1.481333 	0.271120 	669 		662 		7
1 	0.025926 	0.565579 	-4.684215 	0.014663 	71 		61 		10
2 	0.549662 	0.152671 	-1.649529 	0.083917 	317 		322 		5
3 	0.435322 	0.813437 	2.654828 	0.354107 	748 		759 		11
4 	0.420368 	0.983462 	3.061021 	0.413416 	799 		792 		7
... 	... 		... 		... 		... 		... 			... 			...
995 	0.598505 	0.339895 	0.751275 	0.203429 	572 		580 		8
996 	0.358920 	0.884508 	2.387831 	0.317468 	710 		729 		19
997 	0.680391 	0.940445 	5.248150 	0.639871 	930 		925 		5
998 	0.853200 	0.759583 	5.553242 	0.648076 	932 		939 		7
999 	0.792843 	0.236106 	0.337867 	0.187195 	545 		522 		23

My understanding is that nothing in this computation should involve randomness: Am I wrong?

Thanks,
Guillaume

@gprovencher-cdpq gprovencher-cdpq changed the title Mismatch OS Rankings depends on OS Feb 15, 2021
@gprovencher-cdpq gprovencher-cdpq changed the title Rankings depends on OS Rankings depend on OS Feb 15, 2021
@trivialfis
Copy link
Member

trivialfis commented Feb 15, 2021

Could you please confirm the generated data is the same, and share the difference between outputs?

@gprovencher-cdpq
Copy link
Author

I confirm the generated data is exactly the same.

@trivialfis
Copy link
Member

There's many contributing factors to floating point difference, like different hardware, different compiler optimisations etc. It's quite difficult to get portable result if possible at all. I will look into it later to see if there's any human error on xgboost s side. But I can't promise to get the exact result for 2 platforms.

@gprovencher-cdpq
Copy link
Author

gprovencher-cdpq commented Feb 16, 2021

I agree that it's not possible to get the exact same results, up to machine precision. Here however something doesn't seem quite right.

When I take the same code snippet as above but use XGBRegressor instead of XGBRanker (and remove any superfluous parameters), I get the same results (8th decimal place), that is the same feature importance and prediction results, on Windows and Linux.

My concern is: which implementation gives the "correct" results?

Again, thanks for your help!

@trivialfis trivialfis added the LTR Learning to rank label Mar 30, 2021
@trivialfis
Copy link
Member

It's caused by the different implementations of linear_congruential_engine between gcc and msvc. Will document the behaviour in #8822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LTR Learning to rank
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants