This benchmark is dedicated to tuning cross-silo FL strategies on Flamby's datasets. The goal is to maximize the average metric across clients using each provided model on the val/test clients:
where n_features
) stands for the number of features
,
Try to beat the FLamby by adding your own solver !
You can even use your favorite python FL-frameworks such as substra or FedBioMed to build your solver !
First go to Flamby and install it using the following commands (see the API Doc if needed):
$ git clone https://github.com/owkin/FLamby.git $ cd FLamby $ conda create -n benchmark_flamby $ conda activate benchmark_flamby $ pip install -e ".[all_extra]" # Note that the all_extra option installs all dependencies for all 7 datasets
This benchmark can then be run on Fed-TCGA-BRCA's validation sets using the following commands, which will launch a grid-search on all parameters found in utils/common.py for the FederatedAveraging strategy doing 120 rounds (--max-runs 12 * 10) with 100 local updates per round:
$ pip install -U benchopt $ cd .. $ git clone https://github.com/owkin/benchmark_flamby $ cd benchmark_flamby $ benchopt run --timeout 24h --max-runs 12 -s FederatedAveraging -d Fed-TCGA-BRCA
To test a specific value of hyper-parameters just fill a yaml config file with the appropriate hyper-parameters for each solver following the example_config.yml example config file.
$ benchopt run --config ./example_config.yml
Or use directly the CLI:
$ benchopt run -s FederatedAveraging[batch_size=32,learning_rate=0.031622776601683794]
For the whole benchmark on Fed-TCGA-BRCA we successively run all hyper-parameters of the grid for all strategies. To reproduce results just launch the following command (note that it takes several hours to complete but can be cached):
$ bash launch_validation_benchmarks.sh
This script should reproduce the html plot visible on the results for Fed-TCGA-BRCA and produce a config with all best validation hyper-parameters for each strategy.
To produce the final plot on the test run:
$ benchopt run --timeout 24h --config ./best_config_test_Fed-TCGA-BRCA.yml
To benchmark on other datasets of FLamby, follow FLamby's instructions to download each dataset, for example you can find Fed-Heart-Disease's download's instructions here. Then once the dataset is downloaded one can run the same commands changing the dataset argument i.e.:
For the validation:
$ bash launch_validation_benchmarks.sh Fed-Heart-Disease
For the results on the test sets:
$ benchopt run --timeout 24h --config ./best_config_found_for_heart_disease.yml
Use benchopt run -h
for more details about these options, or visit https://benchopt.github.io/api.html.
Unfortunately some of flamby dependencies still rely on old sklearn versions see sklearn doc. about ways to fix it. So one way is to set the SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL environment variable to True. On Linux do:
$ export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
$ ModuleNotFoundError: No module named 'flamby.whatever'