Skip to content

Commit

Permalink
Merge branch 'main' into fix-simulation-exec
Browse files Browse the repository at this point in the history
  • Loading branch information
chongshenng committed Nov 22, 2024
2 parents ac8ee85 + 67688c6 commit 6a00df2
Show file tree
Hide file tree
Showing 6 changed files with 174 additions and 4 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/docker-readme.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
list-files: "json"
filters: |
readme:
- 'src/docker/**/README.md'
- added|modified: 'src/docker/**/README.md'
update:
if: ${{ needs.collect.outputs.readme_files != '' && toJson(fromJson(needs.collect.outputs.readme_files)) != '[]' }}
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/update_translations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ jobs:
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
branch: '${{ env.branch-name }}'
force_with_lease: true

- name: Create Pull Request
if: steps.calculate_diff.outputs.additions > 228 && steps.calculate_diff.outputs.deletions > 60
Expand Down
167 changes: 167 additions & 0 deletions datasets/doc/source/recommended-fl-datasets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
Recommended FL Datasets
=======================

This page lists the recommended datasets for federated learning research, which can be
used with Flower Datasets ``flwr-datasets``. To learn about the library, see the
`quickstart tutorial <https://flower.ai/docs/datasets/tutorial-quickstart.html>`_ . To
see the full FL example with Flower and Flower Datasets open the `quickstart-pytorch
<https://github.com/adap/flower/tree/main/examples/quickstart-pytorch>`_.

.. note::

All datasets from `HuggingFace Hub <https://huggingface.co/datasets>`_ can be used with our library. This page presents just a set of datasets we collected that you might find useful.

For more information about any dataset, visit its page by clicking the dataset name. For more information how to use the

Image Datasets
--------------

.. list-table:: Image Datasets
:widths: 40 40 20
:header-rows: 1

* - Name
- Size
- Image Shape
* - `ylecun/mnist <https://huggingface.co/datasets/ylecun/mnist>`_
- train 60k;
test 10k
- 28x28
* - `uoft-cs/cifar10 <https://huggingface.co/datasets/uoft-cs/cifar10>`_
- train 50k;
test 10k
- 32x32x3
* - `uoft-cs/cifar100 <https://huggingface.co/datasets/uoft-cs/cifar100>`_
- train 50k;
test 10k
- 32x32x3
* - `zalando-datasets/fashion_mnist <https://huggingface.co/datasets/zalando-datasets/fashion_mnist>`_
- train 60k;
test 10k
- 28x28
* - `flwrlabs/femnist <https://huggingface.co/datasets/flwrlabs/femnist>`_
- train 814k
- 28x28
* - `zh-plus/tiny-imagenet <https://huggingface.co/datasets/zh-plus/tiny-imagenet>`_
- train 100k;
valid 10k
- 64x64x3
* - `flwrlabs/usps <https://huggingface.co/datasets/flwrlabs/usps>`_
- train 7.3k;
test 2k
- 16x16
* - `flwrlabs/pacs <https://huggingface.co/datasets/flwrlabs/pacs>`_
- train 10k
- 227x227
* - `flwrlabs/cinic10 <https://huggingface.co/datasets/flwrlabs/cinic10>`_
- train 90k;
valid 90k;
test 90k
- 32x32x3
* - `flwrlabs/caltech101 <https://huggingface.co/datasets/flwrlabs/caltech101>`_
- train 8.7k
- varies
* - `flwrlabs/office-home <https://huggingface.co/datasets/flwrlabs/office-home>`_
- train 15.6k
- varies
* - `flwrlabs/fed-isic2019 <https://huggingface.co/datasets/flwrlabs/fed-isic2019>`_
- train 18.6k;
test 4.7k
- varies
* - `ufldl-stanford/svhn <https://huggingface.co/datasets/ufldl-stanford/svhn>`_
- train 73.3k;
test 26k;
extra 531k
- 32x32x3
* - `sasha/dog-food <https://huggingface.co/datasets/sasha/dog-food>`_
- train 2.1k;
test 0.9k
- varies
* - `Mike0307/MNIST-M <https://huggingface.co/datasets/Mike0307/MNIST-M>`_
- train 59k;
test 9k
- 32x32

Audio Datasets
--------------

.. list-table:: Audio Datasets
:widths: 35 30 15
:header-rows: 1

* - Name
- Size
- Subset
* - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
- train 64.7k
- v0.01
* - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
- train 105.8k
- v0.02
* - `flwrlabs/ambient-acoustic-context <https://huggingface.co/datasets/flwrlabs/ambient-acoustic-context>`_
- train 70.3k
-
* - `fixie-ai/common_voice_17_0 <https://huggingface.co/datasets/fixie-ai/common_voice_17_0>`_
- varies
- 14 versions
* - `fixie-ai/librispeech_asr <https://huggingface.co/datasets/fixie-ai/librispeech_asr>`_
- varies
- clean/other

Tabular Datasets
----------------

.. list-table:: Tabular Datasets
:widths: 35 30
:header-rows: 1

* - Name
- Size
* - `scikit-learn/adult-census-income <https://huggingface.co/datasets/scikit-learn/adult-census-income>`_
- train 32.6k
* - `jlh/uci-mushrooms <https://huggingface.co/datasets/jlh/uci-mushrooms>`_
- train 8.1k
* - `scikit-learn/iris <https://huggingface.co/datasets/scikit-learn/iris>`_
- train 150

Text Datasets
-------------

.. list-table:: Text Datasets
:widths: 40 30 30
:header-rows: 1

* - Name
- Size
- Category
* - `sentiment140 <https://huggingface.co/datasets/sentiment140>`_
- train 1.6M;
test 0.5k
- Sentiment
* - `google-research-datasets/mbpp <https://huggingface.co/datasets/google-research-datasets/mbpp>`_
- full 974; sanitized 427
- General
* - `openai/openai_humaneval <https://huggingface.co/datasets/openai/openai_humaneval>`_
- test 164
- General
* - `lukaemon/mmlu <https://huggingface.co/datasets/lukaemon/mmlu>`_
- varies
- General
* - `takala/financial_phrasebank <https://huggingface.co/datasets/takala/financial_phrasebank>`_
- train 4.8k
- Financial
* - `pauri32/fiqa-2018 <https://huggingface.co/datasets/pauri32/fiqa-2018>`_
- train 0.9k; validation 0.1k; test 0.2k
- Financial
* - `zeroshot/twitter-financial-news-sentiment <https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment>`_
- train 9.5k; validation 2.4k
- Financial
* - `bigbio/pubmed_qa <https://huggingface.co/datasets/bigbio/pubmed_qa>`_
- train 2M; validation 11k
- Medical
* - `openlifescienceai/medmcqa <https://huggingface.co/datasets/openlifescienceai/medmcqa>`_
- train 183k; validation 4.3k; test 6.2k
- Medical
* - `bigbio/med_qa <https://huggingface.co/datasets/bigbio/med_qa>`_
- train 10.1k; test 1.3k; validation 1.3k
- Medical
2 changes: 1 addition & 1 deletion datasets/flwr_datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"""Flower Datasets main package."""


from flwr_datasets import partitioner, preprocessor
from flwr_datasets import metrics, partitioner, preprocessor
from flwr_datasets import utils as utils
from flwr_datasets import visualization
from flwr_datasets.common.version import package_version as _package_version
Expand Down
3 changes: 2 additions & 1 deletion doc/source/docker/tutorial-deploy-on-multiple-machines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,9 @@ Step 1: Set Up
1. Clone the Flower repository and change to the ``distributed`` directory:

.. code-block:: bash
:substitutions:
$ git clone --depth=1 https://github.com/adap/flower.git
$ git clone --depth=1 --branch v|stable_flwr_version| https://github.com/adap/flower.git
$ cd flower/src/docker/distributed
2. Get the IP address from the remote machine and save it for later.
Expand Down
3 changes: 2 additions & 1 deletion doc/source/docker/tutorial-quickstart-docker-compose.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ Step 1: Set Up
1. Clone the Docker Compose ``complete`` directory:

.. code-block:: bash
:substitutions:
$ git clone --depth=1 https://github.com/adap/flower.git _tmp \
$ git clone --depth=1 --branch v|stable_flwr_version| https://github.com/adap/flower.git _tmp \
&& mv _tmp/src/docker/complete . \
&& rm -rf _tmp && cd complete
Expand Down

0 comments on commit 6a00df2

Please sign in to comment.