This directory contains a demo of Horizontal Federated Learning using NVFlare.
To run the demo, first build XGBoost with the federated learning plugin enabled (see the README).
Install NVFlare:
pip install nvflare
Prepare the data:
./prepare_data.sh
Start the NVFlare federated server:
/tmp/nvflare/poc/server/startup/start.sh
In another terminal, start the first worker:
/tmp/nvflare/poc/site-1/startup/start.sh
And the second worker:
/tmp/nvflare/poc/site-2/startup/start.sh
Then start the admin CLI:
/tmp/nvflare/poc/admin/startup/fl_admin.sh
In the admin CLI, run the following command:
submit_job horizontal-xgboost
Make a note of the job id:
Submitted job: 28309e77-a7c5-45e6-b2bc-c2e3655122d8
On both workers, you should see train and eval losses printed:
[10:45:41] [0] eval-logloss:0.22646 train-logloss:0.23316
[10:45:41] [1] eval-logloss:0.13776 train-logloss:0.13654
[10:45:41] [2] eval-logloss:0.08036 train-logloss:0.08243
[10:45:41] [3] eval-logloss:0.05830 train-logloss:0.05645
[10:45:41] [4] eval-logloss:0.03825 train-logloss:0.04148
[10:45:41] [5] eval-logloss:0.02660 train-logloss:0.02958
[10:45:41] [6] eval-logloss:0.01386 train-logloss:0.01918
[10:45:41] [7] eval-logloss:0.01018 train-logloss:0.01331
[10:45:41] [8] eval-logloss:0.00847 train-logloss:0.01112
[10:45:41] [9] eval-logloss:0.00691 train-logloss:0.00662
[10:45:41] [10] eval-logloss:0.00543 train-logloss:0.00503
[10:45:41] [11] eval-logloss:0.00445 train-logloss:0.00420
[10:45:41] [12] eval-logloss:0.00336 train-logloss:0.00355
[10:45:41] [13] eval-logloss:0.00277 train-logloss:0.00280
[10:45:41] [14] eval-logloss:0.00252 train-logloss:0.00244
[10:45:41] [15] eval-logloss:0.00177 train-logloss:0.00193
[10:45:41] [16] eval-logloss:0.00156 train-logloss:0.00161
[10:45:41] [17] eval-logloss:0.00135 train-logloss:0.00142
[10:45:41] [18] eval-logloss:0.00123 train-logloss:0.00125
[10:45:41] [19] eval-logloss:0.00106 train-logloss:0.00107
Once the training finishes, the model file should be written into
/tmp/nvlfare/poc/site-1/${job_id}/test.model.json
and /tmp/nvflare/poc/site-2/${job_id}/test.model.json
respectively, where job_id
is the UUID printed out when we ran submit_job
.
Finally, shutdown everything from the admin CLI, using admin
as password:
shutdown client
shutdown server
To demo with Federated Learning using GPUs, make sure your machine has at least 2 GPUs. Build XGBoost with the federated learning plugin enabled along with CUDA (see the README).
Modify ../config/config_fed_client.json
and set use_gpus
to true
, then repeat the steps
above.