From 0c3d8b0954117e8e6996f54489210f9e9d5ceced Mon Sep 17 00:00:00 2001 From: Adam Narozniak <51029327+adam-narozniak@users.noreply.github.com> Date: Wed, 15 Nov 2023 16:49:35 +0100 Subject: [PATCH] Update FDS README (#2605) --- datasets/README.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/datasets/README.md b/datasets/README.md index f4b3fab73d8e..876b6f453fa5 100644 --- a/datasets/README.md +++ b/datasets/README.md @@ -3,7 +3,7 @@ [![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md) ![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg) -![Downloads](https://pepy.tech/badge/flwr) +![Downloads](https://pepy.tech/badge/flwr-datasets) [![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.dev/join-slack) Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework. @@ -22,7 +22,13 @@ Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets i * Arrow. Create **custom partitioning schemes** or choose from the **implemented partitioning schemes**: +* Partitioner (the abstract base class) `Partitioner` * IID partitioning `IidPartitioner(num_partitions)` +* Natural ID partitioner `NaturalIdPartitioner` +* Size partitioner (the abstract base class for the partitioners dictating the division based the number of samples) `SizePartitioner` +* Linear partitioner `LinearPartitioner` +* Square partitioner `SquarePartitioner` +* Exponential partitioner `ExponentialPartitioner` * more to come in future releases. # Installation @@ -53,9 +59,9 @@ If you plan to change the type of the dataset to run the code with your ML frame # Usage -The Flower Datasets exposes `FederatedDataset(dataset, partitioners)` abstraction to represent the dataset needed for federated learning/analytics. It has two powerful methods that let you handle the dataset preprocessing. They are `load_partition(node_id, split)` and `load_full(split)`. +Flower Datasets exposes the `FederatedDataset` abstraction to represent the dataset needed for federated learning/evaluation/analytics. It has two powerful methods that let you handle the dataset preprocessing: `load_partition(node_id, split)` and `load_full(split)`. -Here's a quick example of how to partition the MNIST dataset: +Here's a basic quickstart example of how to partition the MNIST dataset: ``` from flwr_datasets import FederatedDataset @@ -68,23 +74,17 @@ mnist_partition_0 = mnist_fds.load_partition(0, "train") centralized_data = mnist_fds.load_full("test") ``` -`FederatedDataset(dataset, partitioners)` allows you specification of: - -* `dataset:str` - the name of the dataset. - -* `partitioners: Dict[str: int]` - `{split_name: str` to `number-of-partitions: int}` - partitioner that will be used with an associated split of the dataset e.g. `{"train": 100}`. It assumes by default the i.i.d. partitioning. - -More customization of `partitioners` is coming in future releases. +For more details, please refer to the specific how-to guides or tutorial. They showcase customization and more advanced features. # Future release Here are a few of the things that we will work on in future releases: -* Support for more datasets (especially the ones that have user id present). -* Creation of custom `Partitioner`s. -* More out-of-the-box `Partitioner`s. -* Passing `Partitioner`s via `FederatedDataset`'s `partitioner` argument. -* Customization of the dataset splitting before the partitioning. +* ✅ Support for more datasets (especially the ones that have user id present). +* ✅ Creation of custom `Partitioner`s. +* ✅ More out-of-the-box `Partitioner`s. +* ✅ Passing `Partitioner`s via `FederatedDataset`'s `partitioners` argument. +* ✅ Customization of the dataset splitting before the partitioning. * Simplification of the dataset transformation to the popular frameworks/types. * Creation of the synthetic data, * Support for Vertical FL.