Skip to content

Commit

Permalink
Update FDS README (#2605)
Browse files Browse the repository at this point in the history
  • Loading branch information
adam-narozniak authored Nov 15, 2023
1 parent d8fef6e commit 0c3d8b0
Showing 1 changed file with 15 additions and 15 deletions.
30 changes: 15 additions & 15 deletions datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)
![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)
![Downloads](https://pepy.tech/badge/flwr)
![Downloads](https://pepy.tech/badge/flwr-datasets)
[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.dev/join-slack)

Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework.
Expand All @@ -22,7 +22,13 @@ Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets i
* Arrow.

Create **custom partitioning schemes** or choose from the **implemented partitioning schemes**:
* Partitioner (the abstract base class) `Partitioner`
* IID partitioning `IidPartitioner(num_partitions)`
* Natural ID partitioner `NaturalIdPartitioner`
* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner`
* Linear partitioner `LinearPartitioner`
* Square partitioner `SquarePartitioner`
* Exponential partitioner `ExponentialPartitioner`
* more to come in future releases.

# Installation
Expand Down Expand Up @@ -53,9 +59,9 @@ If you plan to change the type of the dataset to run the code with your ML frame

# Usage

The Flower Datasets exposes `FederatedDataset(dataset, partitioners)` abstraction to represent the dataset needed for federated learning/analytics. It has two powerful methods that let you handle the dataset preprocessing. They are `load_partition(node_id, split)` and `load_full(split)`.
Flower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(node_id, split)` and `load_full(split)`.

Here's a quick example of how to partition the MNIST dataset:
Here's a basic quickstart example of how to partition the MNIST dataset:

```
from flwr_datasets import FederatedDataset
Expand All @@ -68,23 +74,17 @@ mnist_partition_0 = mnist_fds.load_partition(0, "train")
centralized_data = mnist_fds.load_full("test")
```

`FederatedDataset(dataset, partitioners)` allows you specification of:

* `dataset:str` - the name of the dataset.

* `partitioners: Dict[str: int]` - `{split_name: str` to `number-of-partitions: int}` - partitioner that will be used with an associated split of the dataset e.g. `{"train": 100}`. It assumes by default the i.i.d. partitioning.

More customization of `partitioners` is coming in future releases.
For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features.

# Future release

Here are a few of the things that we will work on in future releases:

* Support for more datasets (especially the ones that have user id present).
* Creation of custom `Partitioner`s.
* More out-of-the-box `Partitioner`s.
* Passing `Partitioner`s via `FederatedDataset`'s `partitioner` argument.
* Customization of the dataset splitting before the partitioning.
* Support for more datasets (especially the ones that have user id present).
* Creation of custom `Partitioner`s.
* More out-of-the-box `Partitioner`s.
* Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument.
* Customization of the dataset splitting before the partitioning.
* Simplification of the dataset transformation to the popular frameworks/types.
* Creation of the synthetic data,
* Support for Vertical FL.

0 comments on commit 0c3d8b0

Please sign in to comment.