Skip to content

Commit

Permalink
Merge branch 'main' into xgboost-quickstart
Browse files Browse the repository at this point in the history
  • Loading branch information
danieljanes authored Nov 17, 2023
2 parents 410210b + bc346ac commit 4662ffa
Show file tree
Hide file tree
Showing 24 changed files with 331 additions and 92 deletions.
32 changes: 11 additions & 21 deletions .github/ISSUE_TEMPLATE/baseline_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,32 +41,22 @@ body:
- [ ] Read the [`first contribution` doc](https://flower.dev/docs/first-time-contributors.html)
- [ ] Complete the Flower tutorial
- [ ] Read the Flower Baselines docs to get an overview:
- [ ] [https://flower.dev/docs/using-baselines.html](https://flower.dev/docs/using-baselines.html)
- [ ] [https://flower.dev/docs/contributing-baselines.html](https://flower.dev/docs/contributing-baselines.html)
- [ ] [How to use Flower Baselines](https://flower.dev/docs/baselines/how-to-use-baselines.html)
- [ ] [How to contribute a Flower Baseline](https://flower.dev/docs/baselines/how-to-contribute-baselines.html)
- type: checkboxes
attributes:
label: Prepare - understand the scope
options:
- label: Read the paper linked above
- label: Create the directory structure in Flower Baselines (just the `__init__.py` files and a `README.md`)
- label: Before starting to write code, write down all of the specs of this experiment in a README (dataset, partitioning, model, number of clients, all hyperparameters, …)
- label: Open a draft PR
- label: Decide which experiments you'd like to reproduce. The more the better!
- label: Follow the steps outlined in [Add a new Flower Baseline](https://flower.dev/docs/baselines/how-to-contribute-baselines.html#add-a-new-flower-baseline).
- label: You can use as reference [other baselines](https://github.com/adap/flower/tree/main/baselines) that the community merged following those steps.
- type: checkboxes
attributes:
label: Implement - make it work
label: Verify your implementation
options:
- label: Implement some form of dataset loading and partitioning in a separate `dataset.py` (doesn’t have to match the paper exactly)
- label: Implement the model in PyTorch
- label: Write a test that shows that the model has the number of parameters mentioned in the paper
- label: Implement the federated learning setup outlined in the paper, maybe starting with fewer clients
- label: Plot accuracy and loss
- label: Run it and check if the model starts to converge
- type: checkboxes
attributes:
label: Align - make it converge
options:
- label: Implement the exact data partitioning outlined in the paper
- label: Use the exact hyperparameters outlined in the paper
- label: Make it converge to roughly the same accuracy that the paper states
- label: Commit the final hyperparameters and plots
- label: Mark the PR as ready
- label: Follow the steps indicated in the `EXTENDED_README.md` that was created in your baseline directory
- label: Ensure your code reproduces the results for the experiments you chose
- label: Ensure your `README.md` is ready to be run by someone that is no familiar with your code. Are all step-by-step instructions clear?
- label: Ensure running the formatting and typing tests for your baseline runs without errors.
- label: Clone your repo on a new directory, follow the guide on your own `README.md` and verify everything runs.
4 changes: 3 additions & 1 deletion .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ jobs:
include:
- directory: bare

- directory: bare-https

- directory: jax

- directory: pytorch
Expand Down Expand Up @@ -135,7 +137,7 @@ jobs:
- name: Run virtual client test
run: python simulation.py
- name: Run driver test
run: ./../test_driver.sh
run: ./../test_driver.sh "${{ matrix.directory }}"

strategies:
runs-on: ubuntu-22.04
Expand Down
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,18 +91,26 @@ Stay tuned, more tutorials are coming soon. Topics include **Privacy and Securit

## Flower Baselines

Flower Baselines is a collection of community-contributed experiments that reproduce the experiments performed in popular federated learning publications. Researchers can build on Flower Baselines to quickly evaluate new ideas:

- [FedAvg](https://arxiv.org/abs/1602.05629):
- [MNIST](https://github.com/adap/flower/tree/main/baselines/flwr_baselines/flwr_baselines/publications/fedavg_mnist)
- [FedProx](https://arxiv.org/abs/1812.06127):
- [MNIST](https://github.com/adap/flower/tree/main/baselines/fedprox/)
- [Adaptive Federated Optimization](https://arxiv.org/abs/2003.00295):
- [CIFAR-10/100](https://github.com/adap/flower/tree/main/baselines/flwr_baselines/flwr_baselines/publications/adaptive_federated_optimization)

Check the Flower documentation to learn more: [Using Baselines](https://flower.dev/docs/baselines/how-to-use-baselines.html)

The Flower community loves contributions! Make your work more visible and enable others to build on it by contributing it as a baseline: [Contributing Baselines](https://flower.dev/docs/baselines/how-to-contribute-baselines.html)
Flower Baselines is a collection of community-contributed projects that reproduce the experiments performed in popular federated learning publications. Researchers can build on Flower Baselines to quickly evaluate new ideas. The Flower community loves contributions! Make your work more visible and enable others to build on it by contributing it as a baseline!

- [DASHA](https://github.com/adap/flower/tree/main/baselines/dasha)
- [DepthFL](https://github.com/adap/flower/tree/main/baselines/depthfl)
- [FedBN](https://github.com/adap/flower/tree/main/baselines/fedbn)
- [FedMeta](https://github.com/adap/flower/tree/main/baselines/fedmeta)
- [FedMLB](https://github.com/adap/flower/tree/main/baselines/fedmlb)
- [FedPer](https://github.com/adap/flower/tree/main/baselines/fedper)
- [FedProx](https://github.com/adap/flower/tree/main/baselines/fedprox)
- [FedWav2vec2](https://github.com/adap/flower/tree/main/baselines/fedwav2vec2)
- [FjORD](https://github.com/adap/flower/tree/main/baselines/fjord)
- [MOON](https://github.com/adap/flower/tree/main/baselines/moon)
- [niid-Bench](https://github.com/adap/flower/tree/main/baselines/niid_bench)
- [TAMUNA](https://github.com/adap/flower/tree/main/baselines/tamuna)
- [FedAvg](https://github.com/adap/flower/tree/main/baselines/flwr_baselines/flwr_baselines/publications/fedavg_mnist)
- [FedOpt](https://github.com/adap/flower/tree/main/baselines/flwr_baselines/flwr_baselines/publications/adaptive_federated_optimization)

Please refer to the [Flower Baselines Documentation](https://flower.dev/docs/baselines/) for a detailed categorization of baselines and for additional info including:
* [How to use Flower Baselines](https://flower.dev/docs/baselines/how-to-use-baselines.html)
* [How to contribute a new Flower Baseline](https://flower.dev/docs/baselines/how-to-contribute-baselines.html)

## Flower Usage Examples

Expand Down
28 changes: 14 additions & 14 deletions baselines/baseline_template/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: title of the paper
url: URL to the paper page (not the pdf)
labels: [label1, label2] # please add between 4 and 10 single-word (maybe two-words) labels (e.g. "system heterogeneity", "image classification", "asynchronous", "weight sharing", "cross-silo")
dataset: [dataset1, dataset2] # list of datasets you include in your baseline
labels: [label1, label2] # please add between 4 and 10 single-word (maybe two-words) labels (e.g. system heterogeneity, image classification, asynchronous, weight sharing, cross-silo). Do not use ""
dataset: [dataset1, dataset2] # list of datasets you include in your baseline. Do not use ""
---

# :warning:*_Title of your baseline_*
# :warning: *_Title of your baseline_*

> Note: If you use this baseline in your work, please remember to cite the original authors of the paper as well as the Flower paper.
Expand All @@ -15,33 +15,33 @@ dataset: [dataset1, dataset2] # list of datasets you include in your baseline
> :warning: Please complete the metadata section at the very top of this README. This generates a table at the top of the file that will facilitate indexing baselines.
****Paper:**** :warning: *_add the URL of the paper page (not to the .pdf). For instance if you link a paper on ArXiv, add here the URL to the abstract page (e.g. https://arxiv.org/abs/1512.03385). If your paper is in from a journal or conference proceedings, please follow the same logic._*
**Paper:** :warning: *_add the URL of the paper page (not to the .pdf). For instance if you link a paper on ArXiv, add here the URL to the abstract page (e.g. https://arxiv.org/abs/1512.03385). If your paper is in from a journal or conference proceedings, please follow the same logic._*

****Authors:**** :warning: *_list authors of the paper_*
**Authors:** :warning: *_list authors of the paper_*

****Abstract:**** :warning: *_add here the abstract of the paper you are implementing_*
**Abstract:** :warning: *_add here the abstract of the paper you are implementing_*


## About this baseline

****What’s implemented:**** :warning: *_Concisely describe what experiment(s) in the publication can be replicated by running the code. Please only use a few sentences. Start with: “The code in this directory …”_*
**What’s implemented:** :warning: *_Concisely describe what experiment(s) in the publication can be replicated by running the code. Please only use a few sentences. Start with: “The code in this directory …”_*

****Datasets:**** :warning: *_List the datasets you used (if you used a medium to large dataset, >10GB please also include the sizes of the dataset)._*
**Datasets:** :warning: *_List the datasets you used (if you used a medium to large dataset, >10GB please also include the sizes of the dataset)._*

****Hardware Setup:**** :warning: *_Give some details about the hardware (e.g. a server with 8x V100 32GB and 256GB of RAM) you used to run the experiments for this baseline. Someone out there might not have access to the same resources you have so, could list the absolute minimum hardware needed to run the experiment in a reasonable amount of time ? (e.g. minimum is 1x 16GB GPU otherwise a client model can’t be trained with a sufficiently large batch size). Could you test this works too?_*
**Hardware Setup:** :warning: *_Give some details about the hardware (e.g. a server with 8x V100 32GB and 256GB of RAM) you used to run the experiments for this baseline. Someone out there might not have access to the same resources you have so, could list the absolute minimum hardware needed to run the experiment in a reasonable amount of time ? (e.g. minimum is 1x 16GB GPU otherwise a client model can’t be trained with a sufficiently large batch size). Could you test this works too?_*

****Contributors:**** :warning: *_let the world know who contributed to this baseline. This could be either your name, your name and affiliation at the time, or your GitHub profile name if you prefer. If multiple contributors signed up for this baseline, please list yourself and your colleagues_*
**Contributors:** :warning: *_let the world know who contributed to this baseline. This could be either your name, your name and affiliation at the time, or your GitHub profile name if you prefer. If multiple contributors signed up for this baseline, please list yourself and your colleagues_*


## Experimental Setup

****Task:**** :warning: *_what’s the primary task that is being federated? (e.g. image classification, next-word prediction). If you have experiments for several, please list them_*
**Task:** :warning: *_what’s the primary task that is being federated? (e.g. image classification, next-word prediction). If you have experiments for several, please list them_*

****Model:**** :warning: *_provide details about the model you used in your experiments (if more than use a list). If your model is small, describing it as a table would be :100:. Some FL methods do not use an off-the-shelve model (e.g. ResNet18) instead they create your own. If this is your case, please provide a summary here and give pointers to where in the paper (e.g. Appendix B.4) is detailed._*
**Model:** :warning: *_provide details about the model you used in your experiments (if more than use a list). If your model is small, describing it as a table would be :100:. Some FL methods do not use an off-the-shelve model (e.g. ResNet18) instead they create your own. If this is your case, please provide a summary here and give pointers to where in the paper (e.g. Appendix B.4) is detailed._*

****Dataset:**** :warning: *_Earlier you listed already the datasets that your baseline uses. Now you should include a breakdown of the details about each of them. Please include information about: how the dataset is partitioned (e.g. LDA with alpha 0.1 as default and all clients have the same number of training examples; or each client gets assigned a different number of samples following a power-law distribution with each client only instances of 2 classes)? if your dataset is naturally partitioned just state “naturally partitioned”; how many partitions there are (i.e. how many clients)? Please include this an all information relevant about the dataset and its partitioning into a table._*
**Dataset:** :warning: *_Earlier you listed already the datasets that your baseline uses. Now you should include a breakdown of the details about each of them. Please include information about: how the dataset is partitioned (e.g. LDA with alpha 0.1 as default and all clients have the same number of training examples; or each client gets assigned a different number of samples following a power-law distribution with each client only instances of 2 classes)? if your dataset is naturally partitioned just state “naturally partitioned”; how many partitions there are (i.e. how many clients)? Please include this an all information relevant about the dataset and its partitioning into a table._*

****Training Hyperparameters:**** :warning: *_Include a table with all the main hyperparameters in your baseline. Please show them with their default value._*
**Training Hyperparameters:** :warning: *_Include a table with all the main hyperparameters in your baseline. Please show them with their default value._*


## Environment Setup
Expand Down
4 changes: 4 additions & 0 deletions baselines/baseline_template/baseline_template/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,7 @@ def main(cfg: DictConfig) -> None:
# Hydra will generate for you a directory each time you run the code. You
# can retrieve the path to that directory with this:
# save_path = HydraConfig.get().runtime.output_dir


if __name__ == "__main__":
main()
2 changes: 1 addition & 1 deletion baselines/fedbn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ A more detailed explanation of the datasets is given in the following table.
| strategy_fraction_fit | 1.0 |
| strategy.fraction_evaluate | 0.0 |
| training samples per client| 743 |
| lr | 10E-2 |
| client.l_r | 10E-2 |
| local epochs | 1 |
| loss | cross entropy loss |
| optimizer | SGD |
Expand Down
36 changes: 23 additions & 13 deletions baselines/fedbn/fedbn/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,24 +16,26 @@


class FlowerClient(fl.client.NumPyClient):
"""A standar FlowerClient. This base class.
"""A standard FlowerClient.
is what plain FedAvg clients do.
This base class is what plain FedAvg clients do.
"""

def __init__(
def __init__( # pylint: disable=too-many-arguments
self,
model: CNNModel,
trainloader: DataLoader,
testloader: DataLoader,
dataset_name: str,
l_r: float,
**kwargs, # pylint: disable=unused-argument
) -> None:
self.trainloader = trainloader
self.testloader = testloader
self.dataset_name = dataset_name
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
self.model = model.to(self.device)
self.l_r = l_r

def get_parameters(self, config) -> NDArrays:
"""Return model parameters as a list of NumPy ndarrays w or w/o.
Expand All @@ -58,22 +60,23 @@ def fit(
"""Set model parameters, train model, return updated model parameters."""
self.set_parameters(parameters)

# evaluate the state of the global model on the train set; the loss returned
# Evaluate the state of the global model on the train set; the loss returned
# is what's reported in Fig3 in the FedBN paper (what this baseline focuses
# in reproducing)
pre_train_loss, pre_train_acc = test(
self.model, self.trainloader, device=self.device
)

# train model on local dataset
# Train model on local dataset
loss, acc = train(
self.model,
self.trainloader,
epochs=1,
l_r=self.l_r,
device=self.device,
)

# construct metrics to return to server
# Construct metrics to return to server
fl_round = config["round"]
metrics = {
"dataset_name": self.dataset_name,
Expand Down Expand Up @@ -107,9 +110,16 @@ def evaluate(
class FedBNFlowerClient(FlowerClient):
"""Similar to FlowerClient but this is used by FedBN clients."""

def __init__(self, bn_state_dir: Path, client_id: int, *args, **kwargs) -> None:
def __init__(self, save_path: Path, client_id: int, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self.bn_state_dir = bn_state_dir
# For FedBN clients we need to persist the state of the BN
# layers across rounds. In Simulation clients are statess
# so everything not communicated to the server (as it is the
# case as with params in BN layers of FedBN clients) is lost
# once a client completes its training. An upcoming version of
# Flower suports stateful clients
bn_state_dir = save_path / "bn_states"
bn_state_dir.mkdir(exist_ok=True)
self.bn_state_pkl = bn_state_dir / f"client_{client_id}.pkl"

def _save_bn_statedict(self) -> None:
Expand All @@ -135,7 +145,7 @@ def get_parameters(self, config) -> NDArrays:
layers.
"""
# first update bn_state_dir
# First update bn_state_dir
self._save_bn_statedict()
# Excluding parameters of BN layers when using FedBN
return [
Expand All @@ -154,8 +164,8 @@ def set_parameters(self, parameters: NDArrays) -> None:
state_dict = OrderedDict({k: torch.tensor(v) for k, v in params_dict})
self.model.load_state_dict(state_dict, strict=False)

# now also load from bn_state_dir
if self.bn_state_pkl.exists(): # it won't exist in the first round
# Now also load from bn_state_dir
if self.bn_state_pkl.exists(): # It won't exist in the first round
bn_state_dict = self._load_bn_statedict()
self.model.load_state_dict(bn_state_dict, strict=False)

Expand All @@ -164,7 +174,7 @@ def gen_client_fn(
client_data: List[Tuple[DataLoader, DataLoader, str]],
client_cfg: DictConfig,
model_cfg: DictConfig,
bn_state_dir: Path,
save_path: Path,
) -> Callable[[str], FlowerClient]:
"""Return a function that will be called to instantiate the cid-th client."""

Expand All @@ -182,7 +192,7 @@ def client_fn(cid: str) -> FlowerClient:
trainloader=trainloader,
testloader=valloader,
dataset_name=dataset_name,
bn_state_dir=bn_state_dir,
save_path=save_path,
client_id=int(cid),
)

Expand Down
3 changes: 2 additions & 1 deletion baselines/fedbn/fedbn/conf/client/fedavg.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
# standard FedAvg Flower Client
_target_: fedbn.client.FlowerClient
client_label: FedAvg
client_label: FedAvg
l_r: 0.01
3 changes: 2 additions & 1 deletion baselines/fedbn/fedbn/conf/client/fedbn.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
# standard FedBN Flower Client
_target_: fedbn.client.FedBNFlowerClient
client_label: FedBN
client_label: FedBN
l_r: 0.01
Loading

0 comments on commit 4662ffa

Please sign in to comment.