Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xgboost giving different results on mac and ubuntu #8834

Closed
alishametkari opened this issue Feb 23, 2023 · 6 comments
Closed

Xgboost giving different results on mac and ubuntu #8834

alishametkari opened this issue Feb 23, 2023 · 6 comments

Comments

@alishametkari
Copy link

I am using xgboost version 1.5.0.2 on mac and ubuntu machine. On both the machines I am getting different predictions using xgb.train() for time series forecasting problem. On mac I am getting acceptable prediction in correct range but on ubuntu very weird prediction observed which is almost a straight line. And there is huge difference between these two predictions. Why such difference? I want to get same prediction as mac on linux. How can I get it? Can anyone help me?

R version - 3.6.3
Xgboost version - 1.5.0.2

@alishametkari alishametkari changed the title Xgboost giving different results on mac and linux Xgboost giving different results on mac and ubuntu Feb 23, 2023
@trivialfis
Copy link
Member

Could you please share a reproducible example?

@alishametkari
Copy link
Author

Sure @trivialfis
Here it is.

testing_data_length <- nrow(test_x)
train_pred <- rep(NaN, nrow(train_x))
test_pred <- rep(NaN, testing_data_length)
params <- list(valid_sample_len = 5, cols_to_drop = c(), seed = 2017, nthread = 1,
nrounds = 2000, early_stopping_rounds = 500, eval_metric = 'rmse', objective = "reg:linear",
booster = "gbtree", eta = 0.1, subsample = 0.5, colsample_bytree = 0.5)
hyper_params <- list(max_depth = 4, enable_shap_values = FALSE, enable_variable_importance_values = FALSE)

define hyperparameters

validation_sample_len = params$valid_sample_len
train_data_len = nrow(train_x)
train_period_end = train_data_len - validation_sample_len
set.seed(params$seed)
features <- sort(colnames(train_x))
if (length(params$cols_to_drop) > 0) {
features <- features[!features %in% params$cols_to_drop]
}
dsample <- xgb.DMatrix(data.matrix(train_x[, features]), missing = NA)

prepare validation dataset

dval <- data.matrix(train_x[train_period_end:train_data_len, features])
dval <- xgb.DMatrix(data = dval, label = data.matrix(train_y[train_period_end:train_data_len]), missing = NA)
watchlist_dval <- list(dval = dval)

prepare training dataset

save the column names

dtrain = data.matrix(train_x[start:train_data_len, features])
dtrain = xgb.DMatrix(data = dtrain, label = data.matrix(train_y[start:train_data_len]), missing = NA)

prepare test dataset

dtest = data.matrix(test_x[, features])
dtest = xgb.DMatrix(dtest, missing = NA)

train model

xgb_model <- xgb.train(params = hyper_params,
data = dtrain,
nrounds = params$nrounds,
verbose = 0,
print_every_n = 5,
early_stopping_rounds = params$early_stopping_rounds,
eval_metric = params$eval_metric,
nthread = params$nthread,
watchlist = watchlist_dval,
maximize = FALSE)

forecast on test data

test_pred <- predict(xgb_model, dtest)
test_pred <- ifelse(test_pred < 0, 0, test_pred)

forecast on entire dataset

train_pred <- predict(xgb_model, dsample)
train_pred <- ifelse(train_pred < 0, 0, train_pred )

out = data.frame(forecast=c(train_pred, test_pred))

@alishametkari
Copy link
Author

Is this happening due to differences in package version of other packages on which xgboost depends?
Can you tell me on which packages does xgboost depends for R?

@trivialfis
Copy link
Member

trivialfis commented Feb 25, 2023

Let me take a closer look, have been working on #8822 for a while and trying to switch back to normal maintenance work.

@trivialfis
Copy link
Member

apologies, I can't debug the code you shared without a dataset. Would be great if you can share something I can run, maybe with a pseudo dataset. I'm asking since we run tests on multiple machines (see our CI runs on PRs) and the results are consistent. The issue you are describing is new to me and I can't guess the reason without actually reproducing it.

@trivialfis
Copy link
Member

Closing due to stalled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants