Skip to content

Commit

Permalink
Merge branch 'main' into licensecheck
Browse files Browse the repository at this point in the history
  • Loading branch information
danieljanes authored Oct 16, 2023
2 parents e3d50bb + 7f8a7e2 commit 610b632
Show file tree
Hide file tree
Showing 13 changed files with 465 additions and 52 deletions.
2 changes: 1 addition & 1 deletion baselines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Do you have a new federated learning paper and want to add a new baseline to Flo
The steps to follow are:
1. Fork the Flower repo and clone it into your machine.
2. Navigate to the `baselines/` directory and from there run:
2. Navigate to the `baselines/` directory, choose a single-word (and **lowercase**) name for your baseline, and from there run:
```bash
# This will create a new directory with the same structure as `baseline_template`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,31 @@ The goal of Flower Baselines is to reproduce experiments from popular papers to

Before you start to work on a new baseline or experiment, please check the `Flower Issues <https://github.com/adap/flower/issues>`_ or `Flower Pull Requests <https://github.com/adap/flower/pulls>`_ to see if someone else is already working on it. Please open a new issue if you are planning to work on a new baseline or experiment with a short description of the corresponding paper and the experiment you want to contribute.

TL;DR: Add a new Flower Baseline
--------------------------------
.. warning::
We are in the process of changing how Flower Baselines are structured and updating the instructions for new contributors. Bear with us until we have finalised this transition. For now, follow the steps described below and reach out to us if something is not clear. We look forward to welcoming your baseline into Flower!!
Requirements
------------

Contributing a new baseline is really easy. You only have to make sure that your federated learning experiments are running with Flower and replicate the results of a paper. Flower baselines need to make use of:

* `Poetry <https://python-poetry.org/docs/>`_ to manage the Python environment.
* `Hydra <https://hydra.cc/>`_ to manage the configuration files for your experiments.

You can find more information about how to setup Poetry in your machine in the ``EXTENDED_README.md`` that is generated when you prepare your baseline.

Add a new Flower Baseline
-------------------------
.. note::
For a detailed set of steps to follow, check the `Baselines README on GitHub <https://github.com/adap/flower/tree/main/baselines>`_.
The instructions below are a more verbose version of what's present in the `Baselines README on GitHub <https://github.com/adap/flower/tree/main/baselines>`_.

Let's say you want to contribute the code of your most recent Federated Learning publication, *FedAwesome*. There are only three steps necessary to create a new *FedAwesome* Flower Baseline:

#. **Get the Flower source code on your machine**
#. Fork the Flower codebase: go to the `Flower GitHub repo <https://github.com/adap/flower>`_ and fork the code (click the *Fork* button in the top-right corner and follow the instructions)
#. Clone the (forked) Flower source code: :code:`git clone [email protected]:[your_github_username]/flower.git`
#. Open the code in your favorite editor.
#. **Create a directory for your baseline and add the FedAwesome code**
#. **Use the provided script to create your baseline directory**
#. Navigate to the baselines directory and run :code:`./dev/create-baseline.sh fedawesome`
#. A new directory in :code:`baselines/fedawesome` is created.
#. Follow the instructions in :code:`EXTENDED_README.md` and :code:`README.md` in :code:`baselines/fedawesome/`.
#. Follow the instructions in :code:`EXTENDED_README.md` and :code:`README.md` in your baseline directory.
#. **Open a pull request**
#. Stage your changes: :code:`git add .`
#. Commit & push: :code:`git commit -m "Create new FedAwesome baseline" ; git push`
Expand All @@ -36,18 +44,20 @@ Further reading:
* `GitHub docs: Creating a pull request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>`_
* `GitHub docs: Creating a pull request from a fork <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork>`_

Requirements
------------

Contributing a new baseline is really easy. You only have to make sure that your federated learning experiments are running with Flower and replicate the results of a paper.

The only requirement you need in your system in order to create a baseline is to have `Poetry <https://python-poetry.org/docs/>`_ installed. This is our package manager tool of choice.

We are adopting `Hydra <https://hydra.cc/>`_ as the default mechanism to manage everything related to config files and the parameterisation of the Flower baseline.

Usability
---------

Flower is known and loved for its usability. Therefore, make sure that your baseline or experiment can be executed with a single command such as :code:`conda run -m <your-baseline>.main` or :code:`python main.py` (when sourced into your environment). We provide you with a `template-baseline <https://github.com/adap/flower/tree/main/baselines/baseline_template>`_ to use as guidance when contributing your baseline. Having all baselines follow a homogenous structure helps users to tryout many baselines without the overheads of having to understand each individual codebase. Similarly, by using Hydra throughout, users will immediately know how to parameterise your experiments directly from the command line.
Flower is known and loved for its usability. Therefore, make sure that your baseline or experiment can be executed with a single command such as:

.. code-block:: bash
poetry run python -m <your-baseline>.main
# or, once sourced into your environment
python -m <your-baseline>.main
We provide you with a `template-baseline <https://github.com/adap/flower/tree/main/baselines/baseline_template>`_ to use as guidance when contributing your baseline. Having all baselines follow a homogenous structure helps users to tryout many baselines without the overheads of having to understand each individual codebase. Similarly, by using Hydra throughout, users will immediately know how to parameterise your experiments directly from the command line.

We look forward to your contribution!
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,3 @@ To install Poetry on a different OS, to customise your installation, or to furth
poetry install
3. Run the baseline as indicated in the :code:`[Running the Experiments]` section in the :code:`README.md`


Available Baselines
-------------------

.. note::
To be updated soon once the existing baselines are adjusted to the new format.
21 changes: 12 additions & 9 deletions baselines/doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,29 +19,32 @@ The Flower Community is growing quickly - we're a friendly group of researchers,
Flower Baselines
----------------

Flower Baselines are a collection of organised scripts used to reproduce results from well-known publications or benchmarks. You can check which baselines already exist and/or contribute your own baseline.
Flower Baselines are a collection of organised directories used to reproduce results from well-known publications or benchmarks. You can check which baselines already exist and/or contribute your own baseline.

.. BASELINES_TABLE_ANCHOR
Tutorials
~~~~~~~~~

A learning-oriented series of tutorials, the best place to start.

.. toctree::
:maxdepth: 1
:caption: Tutorials

tutorial-use-baselines
tutorial-contribute-baselines
.. note::
Coming soon


How-to guides
~~~~~~~~~~~~~

Problem-oriented how-to guides show step-by-step how to achieve a specific goal.

.. note::
Coming soon
.. toctree::
:maxdepth: 1
:caption: How-to Guides

how-to-use-baselines
how-to-contribute-baselines


Explanations
~~~~~~~~~~~~
Expand Down
28 changes: 14 additions & 14 deletions baselines/fedmlb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
title: Multi-Level Branched Regularization for Federated Learning
url: https://proceedings.mlr.press/v162/kim22a.html
labels: [data heterogeneity, knowledge distillation, image classification]
dataset: [cifar100, tiny-imagenet]
dataset: [CIFAR-100, Tiny-ImageNet]
---

# *_FedMLB_*
# FedMLB: Multi-Level Branched Regularization for Federated Learning

> Note: If you use this baseline in your work, please remember to cite the original authors of the paper as well as the Flower paper.
****Paper:**** [proceedings.mlr.press/v162/kim22a.html](https://proceedings.mlr.press/v162/kim22a.html)
**Paper:** [proceedings.mlr.press/v162/kim22a.html](https://proceedings.mlr.press/v162/kim22a.html)

****Authors:**** Jinkyu Kim, Geeho Kim, Bohyung Han
**Authors:** Jinkyu Kim, Geeho Kim, Bohyung Han

****Abstract:**** *_A critical challenge of federated learning is data
**Abstract:** *_A critical challenge of federated learning is data
heterogeneity and imbalance across clients, which
leads to inconsistency between local networks and
unstable convergence of global models. To alleviate
Expand All @@ -37,40 +37,40 @@ The source code is available in our project page._*

## About this baseline

****What’s implemented:**** The code in this directory reproduces the results for FedMLB, FedAvg, and FedAvg+KD.
**What’s implemented:** The code in this directory reproduces the results for FedMLB, FedAvg, and FedAvg+KD.
The reproduced results use the CIFAR-100 dataset or the TinyImagenet dataset. Four settings are available for both
the datasets,
1. Moderate-scale with Dir(0.3), 100 clients, 5% participation, balanced dataset.
2. Large-scale experiments with Dir(0.3), 500 clients, 2% participation rate, balanced dataset.
3. Moderate-scale with Dir(0.6), 100 clients, 5% participation rate, balanced dataset.
4. Large-scale experiments with Dir(0.6), 500 clients, 2% participation rate, balanced dataset.

****Datasets:**** CIFAR-100, Tiny-ImageNet.
**Datasets:** CIFAR-100, Tiny-ImageNet.

****Hardware Setup:**** The code in this repository has been tested on a Linux machine with 64GB RAM.
**Hardware Setup:** The code in this repository has been tested on a Linux machine with 64GB RAM.
Be aware that in the default config the memory usage can exceed 10GB.

****Contributors:**** Alessio Mora (University of Bologna, PhD, [email protected]).
**Contributors:** Alessio Mora (University of Bologna, PhD, [email protected]).

## Experimental Setup

****Task:**** Image classification
**Task:** Image classification

****Model:**** ResNet-18.
**Model:** ResNet-18.

****Dataset:**** Four settings are available for CIFAR-100,
**Dataset:** Four settings are available for CIFAR-100,
1. Moderate-scale with Dir(0.3), 100 clients, 5% participation, balanced dataset (500 examples per client).
2. Large-scale experiments with Dir(0.3), 500 clients, 2% participation rate, balanced dataset (100 examples per client).
3. Moderate-scale with Dir(0.6), 100 clients, 5% participation rate, balanced dataset (500 examples per client).
4. Large-scale experiments with Dir(0.6), 500 clients, 2% participation rate, balanced dataset (100 examples per client).

****Dataset:**** Four settings are available for Tiny-Imagenet,
**Dataset:** Four settings are available for Tiny-Imagenet,
1. Moderate-scale with Dir(0.3), 100 clients, 5% participation, balanced dataset (1000 examples per client).
2. Large-scale experiments with Dir(0.3), 500 clients, 2% participation rate, balanced dataset (200 examples per client).
3. Moderate-scale with Dir(0.6), 100 clients, 5% participation rate, balanced dataset (1000 examples per client).
4. Large-scale experiments with Dir(0.6), 500 clients, 2% participation rate, balanced dataset (200 examples per client).

****Training Hyperparameters:****
**Training Hyperparameters:**

| Hyperparameter | Description | Default Value |
| ------------- | ------------- | ------------- |
Expand Down
20 changes: 20 additions & 0 deletions datasets/flwr_datasets/common/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright 2023 Flower Labs GmbH. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Common components in Flower Datasets."""


from .typing import Resplitter

__all__ = ["Resplitter"]
22 changes: 22 additions & 0 deletions datasets/flwr_datasets/common/typing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright 2023 Flower Labs GmbH. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Flower Datasets type definitions."""


from typing import Callable

from datasets import DatasetDict

Resplitter = Callable[[DatasetDict], DatasetDict]
43 changes: 39 additions & 4 deletions datasets/flwr_datasets/federated_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,17 @@
"""FederatedDataset."""


from typing import Dict, Optional, Union
from typing import Dict, Optional, Tuple, Union

import datasets
from datasets import Dataset, DatasetDict
from flwr_datasets.common import Resplitter
from flwr_datasets.partitioner import Partitioner
from flwr_datasets.utils import _check_if_dataset_tested, _instantiate_partitioners
from flwr_datasets.utils import (
_check_if_dataset_tested,
_instantiate_partitioners,
_instantiate_resplitter_if_needed,
)


class FederatedDataset:
Expand All @@ -35,10 +40,16 @@ class FederatedDataset:
----------
dataset: str
The name of the dataset in the Hugging Face Hub.
subset: str
Secondary information regarding the dataset, most often subset or version
(that is passed to the name in datasets.load_dataset).
resplitter: Optional[Union[Resplitter, Dict[str, Tuple[str, ...]]]]
`Callable` that transforms `DatasetDict` splits, or configuration dict for
`MergeResplitter`.
partitioners: Dict[str, Union[Partitioner, int]]
A dictionary mapping the Dataset split (a `str`) to a `Partitioner` or an `int`
(representing the number of IID partitions that this split should be partitioned
into).
into).
Examples
--------
Expand All @@ -59,15 +70,22 @@ def __init__(
self,
*,
dataset: str,
subset: Optional[str] = None,
resplitter: Optional[Union[Resplitter, Dict[str, Tuple[str, ...]]]] = None,
partitioners: Dict[str, Union[Partitioner, int]],
) -> None:
_check_if_dataset_tested(dataset)
self._dataset_name: str = dataset
self._subset: Optional[str] = subset
self._resplitter: Optional[Resplitter] = _instantiate_resplitter_if_needed(
resplitter
)
self._partitioners: Dict[str, Partitioner] = _instantiate_partitioners(
partitioners
)
# Init (download) lazily on the first call to `load_partition` or `load_full`
self._dataset: Optional[DatasetDict] = None
self._resplit: bool = False # Indicate if the resplit happened

def load_partition(self, idx: int, split: str) -> Dataset:
"""Load the partition specified by the idx in the selected split.
Expand All @@ -88,6 +106,7 @@ def load_partition(self, idx: int, split: str) -> Dataset:
Single partition from the dataset split.
"""
self._download_dataset_if_none()
self._resplit_dataset_if_needed()
if self._dataset is None:
raise ValueError("Dataset is not loaded yet.")
self._check_if_split_present(split)
Expand All @@ -113,6 +132,7 @@ def load_full(self, split: str) -> Dataset:
Part of the dataset identified by its split name.
"""
self._download_dataset_if_none()
self._resplit_dataset_if_needed()
if self._dataset is None:
raise ValueError("Dataset is not loaded yet.")
self._check_if_split_present(split)
Expand All @@ -121,7 +141,9 @@ def load_full(self, split: str) -> Dataset:
def _download_dataset_if_none(self) -> None:
"""Lazily load (and potentially download) the Dataset instance into memory."""
if self._dataset is None:
self._dataset = datasets.load_dataset(self._dataset_name)
self._dataset = datasets.load_dataset(
path=self._dataset_name, name=self._subset
)

def _check_if_split_present(self, split: str) -> None:
"""Check if the split (for partitioning or full return) is in the dataset."""
Expand Down Expand Up @@ -153,3 +175,16 @@ def _assign_dataset_to_partitioner(self, split: str) -> None:
raise ValueError("Dataset is not loaded yet.")
if not self._partitioners[split].is_dataset_assigned():
self._partitioners[split].dataset = self._dataset[split]

def _resplit_dataset_if_needed(self) -> None:
# The actual re-splitting can't be done more than once.
# The attribute `_resplit` indicates that the resplit happened.

# Resplit only once
if self._resplit:
return
if self._dataset is None:
raise ValueError("The dataset resplit should happen after the download.")
if self._resplitter:
self._dataset = self._resplitter(self._dataset)
self._resplit = True
Loading

0 comments on commit 610b632

Please sign in to comment.