Skip to content

Commit

Permalink
Add recommended fl datasets docs
Browse files Browse the repository at this point in the history
  • Loading branch information
adam-narozniak committed Nov 20, 2024
1 parent 094e28f commit b43d457
Showing 1 changed file with 163 additions and 0 deletions.
163 changes: 163 additions & 0 deletions datasets/doc/source/recommended-fl-datasets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
Recommended FL Datasets
=======================

This page lists the recommended datasets for federated learning research, which can be used with Flower Datasets ``flwr-datasets``.

.. note::

All datasets from HuggingFace Hub can be used with our library. This page presents just a set of datasets we collected that you might find useful.

For more information about any dataset, visit its page by clicking the dataset name.

Image Datasets
--------------

.. list-table:: Image Datasets
:widths: 40 40 20
:header-rows: 1

* - Name
- Size
- Image Shape
* - `ylecun/mnist <https://huggingface.co/datasets/ylecun/mnist>`_
- train 60k;
test 10k
- 28x28
* - `uoft-cs/cifar10 <https://huggingface.co/datasets/uoft-cs/cifar10>`_
- train 50k;
test 10k
- 32x32x3
* - `uoft-cs/cifar100 <https://huggingface.co/datasets/uoft-cs/cifar100>`_
- train 50k;
test 10k
- 32x32x3
* - `zalando-datasets/fashion_mnist <https://huggingface.co/datasets/zalando-datasets/fashion_mnist>`_
- train 60k;
test 10k
- 28x28
* - `flwrlabs/femnist <https://huggingface.co/datasets/flwrlabs/femnist>`_
- train 814k
- 28x28
* - `zh-plus/tiny-imagenet <https://huggingface.co/datasets/zh-plus/tiny-imagenet>`_
- train 100k;
valid 10k
- 64x64x3
* - `flwrlabs/usps <https://huggingface.co/datasets/flwrlabs/usps>`_
- train 7.3k;
test 2k
- 16x16
* - `flwrlabs/pacs <https://huggingface.co/datasets/flwrlabs/pacs>`_
- train 10k
- 227x227
* - `flwrlabs/cinic10 <https://huggingface.co/datasets/flwrlabs/cinic10>`_
- train 90k;
valid 90k;
test 90k
- 32x32x3
* - `flwrlabs/caltech101 <https://huggingface.co/datasets/flwrlabs/caltech101>`_
- train 8.7k
- varies
* - `flwrlabs/office-home <https://huggingface.co/datasets/flwrlabs/office-home>`_
- train 15.6k
- varies
* - `flwrlabs/fed-isic2019 <https://huggingface.co/datasets/flwrlabs/fed-isic2019>`_
- train 18.6k;
test 4.7k
- varies
* - `ufldl-stanford/svhn <https://huggingface.co/datasets/ufldl-stanford/svhn>`_
- train 73.3k;
test 26k;
extra 531k
- 32x32x3
* - `sasha/dog-food <https://huggingface.co/datasets/sasha/dog-food>`_
- train 2.1k;
test 0.9k
- varies
* - `Mike0307/MNIST-M <https://huggingface.co/datasets/Mike0307/MNIST-M>`_
- train 59k;
test 9k
- 32x32

Audio Datasets
--------------

.. list-table:: Audio Datasets
:widths: 35 30 15
:header-rows: 1

* - Name
- Size
- Subset
* - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
- train 64.7k
- v0.01
* - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
- train 105.8k
- v0.02
* - `flwrlabs/ambient-acoustic-context <https://huggingface.co/datasets/flwrlabs/ambient-acoustic-context>`_
- train 70.3k
-
* - `fixie-ai/common_voice_17_0 <https://huggingface.co/datasets/fixie-ai/common_voice_17_0>`_
- varies
- 14 versions
* - `fixie-ai/librispeech_asr <https://huggingface.co/datasets/fixie-ai/librispeech_asr>`_
- varies
- clean/other

Tabular Datasets
----------------

.. list-table:: Tabular Datasets
:widths: 35 30
:header-rows: 1

* - Name
- Size
* - `scikit-learn/adult-census-income <https://huggingface.co/datasets/scikit-learn/adult-census-income>`_
- train 32.6k
* - `jlh/uci-mushrooms <https://huggingface.co/datasets/jlh/uci-mushrooms>`_
- train 8.1k
* - `scikit-learn/iris <https://huggingface.co/datasets/scikit-learn/iris>`_
- train 150

Text Datasets
-------------

.. list-table:: Text Datasets
:widths: 40 30 30
:header-rows: 1

* - Name
- Size
- Category
* - `sentiment140 <https://huggingface.co/datasets/sentiment140>`_
- train 1.6M;
test 0.5k
- Sentiment
* - `google-research-datasets/mbpp <https://huggingface.co/datasets/google-research-datasets/mbpp>`_
- full 974; sanitized 427
- General
* - `openai/openai_humaneval <https://huggingface.co/datasets/openai/openai_humaneval>`_
- test 164
- General
* - `lukaemon/mmlu <https://huggingface.co/datasets/lukaemon/mmlu>`_
- varies
- General
* - `takala/financial_phrasebank <https://huggingface.co/datasets/takala/financial_phrasebank>`_
- train 4.8k
- Financial
* - `pauri32/fiqa-2018 <https://huggingface.co/datasets/pauri32/fiqa-2018>`_
- train 0.9k; validation 0.1k; test 0.2k
- Financial
* - `zeroshot/twitter-financial-news-sentiment <https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment>`_
- train 9.5k; validation 2.4k
- Financial
* - `bigbio/pubmed_qa <https://huggingface.co/datasets/bigbio/pubmed_qa>`_
- train 2M; validation 11k
- Medical
* - `openlifescienceai/medmcqa <https://huggingface.co/datasets/openlifescienceai/medmcqa>`_
- train 183k; validation 4.3k; test 6.2k
- Medical
* - `bigbio/med_qa <https://huggingface.co/datasets/bigbio/med_qa>`_
- train 10.1k; test 1.3k; validation 1.3k
- Medical

0 comments on commit b43d457

Please sign in to comment.