Skip to content

Latest commit

 

History

History
92 lines (75 loc) · 2.88 KB

README.md

File metadata and controls

92 lines (75 loc) · 2.88 KB

Experimental Support of Horizontal Federated XGBoost using NVFlare

This directory contains a demo of Horizontal Federated Learning using NVFlare.

Training with CPU only

To run the demo, first build XGBoost with the federated learning plugin enabled (see the README).

Install NVFlare:

pip install nvflare

Prepare the data:

./prepare_data.sh

Start the NVFlare federated server:

/tmp/nvflare/poc/server/startup/start.sh

In another terminal, start the first worker:

/tmp/nvflare/poc/site-1/startup/start.sh

And the second worker:

/tmp/nvflare/poc/site-2/startup/start.sh

Then start the admin CLI:

/tmp/nvflare/poc/admin/startup/fl_admin.sh

In the admin CLI, run the following command:

submit_job horizontal-xgboost

Make a note of the job id:

Submitted job: 28309e77-a7c5-45e6-b2bc-c2e3655122d8

On both workers, you should see train and eval losses printed:

[10:45:41] [0]	eval-logloss:0.22646	train-logloss:0.23316
[10:45:41] [1]	eval-logloss:0.13776	train-logloss:0.13654
[10:45:41] [2]	eval-logloss:0.08036	train-logloss:0.08243
[10:45:41] [3]	eval-logloss:0.05830	train-logloss:0.05645
[10:45:41] [4]	eval-logloss:0.03825	train-logloss:0.04148
[10:45:41] [5]	eval-logloss:0.02660	train-logloss:0.02958
[10:45:41] [6]	eval-logloss:0.01386	train-logloss:0.01918
[10:45:41] [7]	eval-logloss:0.01018	train-logloss:0.01331
[10:45:41] [8]	eval-logloss:0.00847	train-logloss:0.01112
[10:45:41] [9]	eval-logloss:0.00691	train-logloss:0.00662
[10:45:41] [10]	eval-logloss:0.00543	train-logloss:0.00503
[10:45:41] [11]	eval-logloss:0.00445	train-logloss:0.00420
[10:45:41] [12]	eval-logloss:0.00336	train-logloss:0.00355
[10:45:41] [13]	eval-logloss:0.00277	train-logloss:0.00280
[10:45:41] [14]	eval-logloss:0.00252	train-logloss:0.00244
[10:45:41] [15]	eval-logloss:0.00177	train-logloss:0.00193
[10:45:41] [16]	eval-logloss:0.00156	train-logloss:0.00161
[10:45:41] [17]	eval-logloss:0.00135	train-logloss:0.00142
[10:45:41] [18]	eval-logloss:0.00123	train-logloss:0.00125
[10:45:41] [19]	eval-logloss:0.00106	train-logloss:0.00107

Once the training finishes, the model file should be written into /tmp/nvlfare/poc/site-1/${job_id}/test.model.json and /tmp/nvflare/poc/site-2/${job_id}/test.model.json respectively, where job_id is the UUID printed out when we ran submit_job.

Finally, shutdown everything from the admin CLI, using admin as password:

shutdown client
shutdown server

Training with GPUs

To demo with Federated Learning using GPUs, make sure your machine has at least 2 GPUs. Build XGBoost with the federated learning plugin enabled along with CUDA (see the README).

Modify ../config/config_fed_client.json and set use_gpus to true, then repeat the steps above.