-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction of 2.1.1 compared to 1.7.6 is significantly slower #10882
Comments
Thank you for opening the issue. Yeah, we have added some more inspection for |
Hello. I was not sure if I should open new issue, but I have the same problem. I am working on binary classification on 220M with an imbalanced dataset. The training was incremental with 5 batches of data. Version 2.1.2 was really slow, and after that, I tried 1.6.2, which was much faster. |
@jankogasic Could you please share the training parameters? |
As for the original issue, here are the two functions that take up the bulk of the time:
I'm sure some people understand the internals of pandas way better than me. If you have suggestions for how to optimize, please share. |
We are currently using xgboost 1.6.2 and are trying to upgrade to 2.1.1. On the way through the versions, we observed the following prediction time averages:
1.6.2: 15ms
1.7.6: 17ms
2.0.3: 43ms
2.1.1: 110ms
As you can see, there is a big jump from 1.7 to 2.0, and then an even bigger jump from 2.0 to 2.1. It's not easy for me to share the model unfortunately, but I found this related bug report & updated the scripts to my use case: #8865
I get the following times when they stabilize:
While not as severe for this artificial model, it still looks like a significant performance degradation. I see now that using
pd.DataFrame
is a lot worse thannp.array
, so I think I can work around my issue. But it is still surprising to me that the performance regressed that significantly.Additional context
Our production model has the following attributes (extracted from the model.json, in case that is helpful):
The model was trained on xgboost 1.5.2 but then re-saved on 2.1.1.
The
requirements.lock
fileI used these version locks when measuring the above numbers. The only change to the file being
xgboost==1.7.6
when testing for that version.All tested on
Ubuntu 24.04.1
,11th Gen Intel(R) Core(TM) i7-11800H
requirements_dev.zip
The text was updated successfully, but these errors were encountered: