-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable federated XGBoost using bootstrap aggregation in Task Runner #1151
Enable federated XGBoost using bootstrap aggregation in Task Runner #1151
Conversation
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
This reverts commit d3937ef. Signed-off-by: kta-intel <[email protected]>
6c79178
to
dd2027c
Compare
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: Have you used specific formatters for yaml
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied over the yaml
from other workspaces as a template then ran bash shell/format.sh
in the whole repo. Is there something additional that you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks @kta-intel! I have a couple of questions and comments, but overall the PR looks in excellent shape for such a sizeable new feature.
PS: is there an easy way to add a CI job that covers at least the "happy path" of an XGBoost-based federation?
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Thanks for the review Teo! I'm not sure what you mean by this, would this just be a toy sanity check CI? |
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Signed-off-by: kta-intel <[email protected]>
Yes, I mean a very basic CI job that does an E2E run of the XGBoost workspace. Like here, but using Other than that, the PR is ready to be merged IMO. Thanks, @kta-intel ! |
I see! This is a good idea, but lets save it as a separate PR. Thanks for the review and suggestion(s) @teoparvanov! |
This PR enables a TaskRunner-based federated XGBoost using the bootstrap aggregation
Specifically this PR:
xgb_higgs
task runner workspace to train on the higgs dataset [ref] with all required code (i.e.src/taskrunner.py
,src/dataloader.py
,plan/*.yaml
, etc.tasks_xgb.yaml
to enable newFedBaggingXGBoost
aggregation when running xgb training workloadsdelta_updates
parameter to Aggregator in order to bypass delta updating (for deep learning models getting weight deltas makes sense since the model size should stay relatively consistent, for tree-based algorithms, this makes less sense because more trees are added over time)delta_updates
is set totrue
by default to preserve normal behavior. xgboost taskrunner explicitly sets it tofalse
to bypass itloader_xgb.py
as the backend / superclass tosrc/dataloader.py
runner_xgb.py
as the backend / superclass tosrc/taskrunner.py
aggregation_function.fed_bagging
which bags the latest trees to a global model, consistent with currently accept federated xgboost algorithms in the industry