Add recommended fl datasets docs

adap · Nov 20, 2024 · b43d457 · b43d457
1 parent 094e28f
commit b43d457
Showing 1 changed file with 163 additions and 0 deletions.
diff --git a/datasets/doc/source/recommended-fl-datasets.rst b/datasets/doc/source/recommended-fl-datasets.rst
@@ -0,0 +1,163 @@
+Recommended FL Datasets
+=======================
+
+This page lists the recommended datasets for federated learning research, which can be used with Flower Datasets ``flwr-datasets``.
+
+.. note::
+
+    All datasets from HuggingFace Hub can be used with our library. This page presents just a set of datasets we collected that you might find useful.
+
+For more information about any dataset, visit its page by clicking the dataset name.
+
+Image Datasets
+--------------
+
+.. list-table:: Image Datasets
+   :widths: 40 40 20
+   :header-rows: 1
+
+   * - Name
+     - Size
+     - Image Shape
+   * - `ylecun/mnist <https://huggingface.co/datasets/ylecun/mnist>`_
+     - train 60k;  
+       test 10k
+     - 28x28
+   * - `uoft-cs/cifar10 <https://huggingface.co/datasets/uoft-cs/cifar10>`_
+     - train 50k;  
+       test 10k
+     - 32x32x3
+   * - `uoft-cs/cifar100 <https://huggingface.co/datasets/uoft-cs/cifar100>`_
+     - train 50k;  
+       test 10k
+     - 32x32x3
+   * - `zalando-datasets/fashion_mnist <https://huggingface.co/datasets/zalando-datasets/fashion_mnist>`_
+     - train 60k;  
+       test 10k
+     - 28x28
+   * - `flwrlabs/femnist <https://huggingface.co/datasets/flwrlabs/femnist>`_
+     - train 814k
+     - 28x28
+   * - `zh-plus/tiny-imagenet <https://huggingface.co/datasets/zh-plus/tiny-imagenet>`_
+     - train 100k;  
+       valid 10k
+     - 64x64x3
+   * - `flwrlabs/usps <https://huggingface.co/datasets/flwrlabs/usps>`_
+     - train 7.3k;  
+       test 2k
+     - 16x16
+   * - `flwrlabs/pacs <https://huggingface.co/datasets/flwrlabs/pacs>`_
+     - train 10k
+     - 227x227
+   * - `flwrlabs/cinic10 <https://huggingface.co/datasets/flwrlabs/cinic10>`_
+     - train 90k;  
+       valid 90k;  
+       test 90k
+     - 32x32x3
+   * - `flwrlabs/caltech101 <https://huggingface.co/datasets/flwrlabs/caltech101>`_
+     - train 8.7k
+     - varies
+   * - `flwrlabs/office-home <https://huggingface.co/datasets/flwrlabs/office-home>`_
+     - train 15.6k
+     - varies
+   * - `flwrlabs/fed-isic2019 <https://huggingface.co/datasets/flwrlabs/fed-isic2019>`_
+     - train 18.6k;  
+       test 4.7k
+     - varies
+   * - `ufldl-stanford/svhn <https://huggingface.co/datasets/ufldl-stanford/svhn>`_
+     - train 73.3k;  
+       test 26k;  
+       extra 531k
+     - 32x32x3
+   * - `sasha/dog-food <https://huggingface.co/datasets/sasha/dog-food>`_
+     - train 2.1k;  
+       test 0.9k
+     - varies
+   * - `Mike0307/MNIST-M <https://huggingface.co/datasets/Mike0307/MNIST-M>`_
+     - train 59k;  
+       test 9k
+     - 32x32
+
+Audio Datasets
+--------------
+
+.. list-table:: Audio Datasets
+   :widths: 35 30 15
+   :header-rows: 1
+
+   * - Name
+     - Size
+     - Subset
+   * - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
+     - train 64.7k
+     - v0.01
+   * - `google/speech_commands <https://huggingface.co/datasets/google/speech_commands>`_
+     - train 105.8k
+     - v0.02
+   * - `flwrlabs/ambient-acoustic-context <https://huggingface.co/datasets/flwrlabs/ambient-acoustic-context>`_
+     - train 70.3k
+     - 
+   * - `fixie-ai/common_voice_17_0 <https://huggingface.co/datasets/fixie-ai/common_voice_17_0>`_
+     - varies
+     - 14 versions
+   * - `fixie-ai/librispeech_asr <https://huggingface.co/datasets/fixie-ai/librispeech_asr>`_
+     - varies
+     - clean/other
+
+Tabular Datasets
+----------------
+
+.. list-table:: Tabular Datasets
+   :widths: 35 30
+   :header-rows: 1
+
+   * - Name
+     - Size
+   * - `scikit-learn/adult-census-income <https://huggingface.co/datasets/scikit-learn/adult-census-income>`_
+     - train 32.6k
+   * - `jlh/uci-mushrooms <https://huggingface.co/datasets/jlh/uci-mushrooms>`_
+     - train 8.1k
+   * - `scikit-learn/iris <https://huggingface.co/datasets/scikit-learn/iris>`_
+     - train 150
+
+Text Datasets
+-------------
+
+.. list-table:: Text Datasets
+   :widths: 40 30 30
+   :header-rows: 1
+
+   * - Name
+     - Size
+     - Category
+   * - `sentiment140 <https://huggingface.co/datasets/sentiment140>`_
+     - train 1.6M;  
+       test 0.5k
+     - Sentiment
+   * - `google-research-datasets/mbpp <https://huggingface.co/datasets/google-research-datasets/mbpp>`_
+     - full 974; sanitized 427
+     - General
+   * - `openai/openai_humaneval <https://huggingface.co/datasets/openai/openai_humaneval>`_
+     - test 164
+     - General
+   * - `lukaemon/mmlu <https://huggingface.co/datasets/lukaemon/mmlu>`_
+     - varies
+     - General
+   * - `takala/financial_phrasebank <https://huggingface.co/datasets/takala/financial_phrasebank>`_
+     - train 4.8k
+     - Financial
+   * - `pauri32/fiqa-2018 <https://huggingface.co/datasets/pauri32/fiqa-2018>`_
+     - train 0.9k; validation 0.1k; test 0.2k
+     - Financial
+   * - `zeroshot/twitter-financial-news-sentiment <https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment>`_
+     - train 9.5k; validation 2.4k
+     - Financial
+   * - `bigbio/pubmed_qa <https://huggingface.co/datasets/bigbio/pubmed_qa>`_
+     - train 2M; validation 11k
+     - Medical
+   * - `openlifescienceai/medmcqa <https://huggingface.co/datasets/openlifescienceai/medmcqa>`_
+     - train 183k; validation 4.3k; test 6.2k
+     - Medical
+   * - `bigbio/med_qa <https://huggingface.co/datasets/bigbio/med_qa>`_
+     - train 10.1k; test 1.3k; validation 1.3k
+     - Medical