Skip to content

Commit

Permalink
fix code tree
Browse files Browse the repository at this point in the history
  • Loading branch information
JinsooKim-KR committed Oct 1, 2023
1 parent 978f000 commit e8e8fef
Show file tree
Hide file tree
Showing 4 changed files with 1 addition and 72 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 0 additions & 64 deletions datasets/README.md
Original file line number Diff line number Diff line change
@@ -1,65 +1 @@
# Flower Datasets

[![GitHub license](https://img.shields.io/github/license/adap/flower)](https://github.com/adap/flower/blob/main/LICENSE)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/adap/flower/blob/main/CONTRIBUTING.md)
![Build](https://github.com/adap/flower/actions/workflows/framework.yml/badge.svg)
![Downloads](https://pepy.tech/badge/flwr)
[![Slack](https://img.shields.io/badge/Chat-Slack-red)](https://flower.dev/join-slack)

Flower Datasets (`flwr-datasets`) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the `Flower Labs` team that also created Flower: A Friendly Federated Learning Framework.
Flower Datasets library supports:
* **downloading datasets** - choose the dataset from Hugging Face's `datasets`,
* **partitioning datasets** - customize the partitioning scheme,
* **creating centralized datasets** - leave parts of the dataset unpartitioned (e.g. for centralized evaluation).

Thanks to using Hugging Face's `datasets` used under the hood, Flower Datasets integrates with the following popular formats/frameworks:
* Hugging Face,
* PyTorch,
* TensorFlow,
* Numpy,
* Pandas,
* Jax,
* Arrow.

Create **custom partitioning schemes** or choose from the **implemented partitioning schemes**:
* IID partitioning `IidPartitioner(num_partitions)`
* more to come in future releases.

# Installation

## With pip

Flower Datasets can be installed from PyPi

```bash
pip install flwr-datasets

If you plan to change the type of the dataset to run the code with your ML framework, make sure to have it installed too.

# Usage

The Flower Datasets exposes `FederatedDataset(dataset, partitioners)` abstraction to represent the dataset needed for federated learning/analytics. It has two powerful methods that let you handle the dataset preprocessing. They are `load_partition(idx, split)` and `load_full(split)`.

Here's a quick example of how to partition the MNIST dataset:
`FederatedDataset(dataset, partitioners)` allows you specification of:
* `dataset:str` - the name of the dataset.
* `partitioners: Dict[str: int]` - `{split_name: str` to `number-of-partitions: int}` - partitioner that will be used with an associated split of the dataset e.g. `{"train": 100}`. It assumes by default the i.i.d. partitioning.
More customization of `partitioners` is coming in future releases.
# Future release
Here are a few of the things that we will work on in future releases:
* Support for more datasets (especially the ones that have user id present).
* Creation of custom `Partitioner`s.
* More out-of-the-box `Partitioner`s.
* Passing `Partitioner`s via `FederatedDataset`'s `partitioner` argument.
* Customization of the dataset splitting before the partitioning.
* Simplification of the dataset transformation to the popular frameworks/types.
* Creation of the synthetic data,
* Support for Vertical FL.
7 changes: 1 addition & 6 deletions datasets/flwr_datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2023 Flower Labs GmbH. All Rights Reserved.
# Copyright 2023 Adap GmbH. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -13,8 +13,3 @@
# limitations under the License.
# ==============================================================================
"""Flower Datasets main package."""


from .federated_dataset import FederatedDataset

__all__ = ["FederatedDataset"]
2 changes: 0 additions & 2 deletions datasets/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ exclude = [
[tool.poetry.dependencies]
python = "^3.8"
numpy = "^1.21.0"
datasets = "^2.14.3"

[tool.poetry.dev-dependencies]
isort = "==5.11.5"
Expand All @@ -63,7 +62,6 @@ docformatter = "==1.7.1"
mypy = "==1.4.0"
pylint = "==2.13.9"
flake8 = "==3.9.2"
parameterized = "==0.9.0"
pytest = "==7.1.2"
pytest-watch = "==4.2.0"
ruff = "==0.0.277"
Expand Down

0 comments on commit e8e8fef

Please sign in to comment.