Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information how to see the features of the dataset #2627

Merged
merged 4 commits into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion datasets/doc/source/how-to-use-with-numpy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,30 @@ Use with NumPy

Let's integrate ``flwr-datasets`` with NumPy.

Prepare the desired partitioning::
Create a ``FederatedDataset``::

from flwr_datasets import FederatedDataset

fds = FederatedDataset(dataset="cifar10", partitioners={"train": 10})
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_full("test")

Determine the names of the features::
adam-narozniak marked this conversation as resolved.
Show resolved Hide resolved

partition.features

In case of CIFAR10, you should see the following output.

.. code-block:: none

{'img': Image(decode=True, id=None),
'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck'], id=None)}

Let's move to the transformations.
adam-narozniak marked this conversation as resolved.
Show resolved Hide resolved

NumPy
-----
Transform to NumPy::

partition_np = partition.with_format("numpy")
Expand Down
48 changes: 25 additions & 23 deletions datasets/doc/source/how-to-use-with-tensorflow.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,32 @@
Use with TensorFlow
===================

Let's integrate ``flwr-datasets`` with TensorFlow. We show you three ways how to convert the data into the formats
Let's integrate ``flwr-datasets`` with ``TensorFlow``. We show you three ways how to convert the data into the formats
that ``TensorFlow``'s models expect. Please note that, especially for the smaller datasets, the performance of the
following methods is very close. We recommend you choose the method you are the most comfortable with.

Create a ``FederatedDataset``::

from flwr_datasets import FederatedDataset

fds = FederatedDataset(dataset="cifar10", partitioners={"train": 10})
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_full("test")

Determine the names of the features::
adam-narozniak marked this conversation as resolved.
Show resolved Hide resolved

partition.features

In case of CIFAR10, you should see the following output.

.. code-block:: none

{'img': Image(decode=True, id=None),
'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck'], id=None)}

Let's move to the transformations.
adam-narozniak marked this conversation as resolved.
Show resolved Hide resolved

NumPy
-----
The first way is to transform the data into the NumPy arrays. It's an easier option that is commonly used. Feel free to
Expand All @@ -14,17 +36,7 @@ follow the :doc:`how-to-use-with-numpy` tutorial, especially if you are a beginn

TensorFlow Dataset
------------------
Work with ``TensorFlow Dataset`` abstraction.

Standard setup::

from flwr_datasets import FederatedDataset

fds = FederatedDataset(dataset="cifar10", partitioners={"train": 10})
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_full("test")

Transformation to the TensorFlow Dataset::
Transform the data to ``TensorFlow Dataset``::

tf_dataset = partition.to_tf_dataset(columns="img", label_cols="label", batch_size=64,
shuffle=True)
Expand All @@ -33,17 +45,7 @@ Transformation to the TensorFlow Dataset::

TensorFlow Tensors
------------------
Change the data type to TensorFlow Tensors (it's not the TensorFlow dataset).

Standard setup::

from flwr_datasets import FederatedDataset

fds = FederatedDataset(dataset="cifar10", partitioners={"train": 10})
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_full("test")

Transformation to the TensorFlow Tensors ::
Transform the data to the ``TensorFlow Tensors`` (it's not the TensorFlow dataset)::
adam-narozniak marked this conversation as resolved.
Show resolved Hide resolved

data_tf = partition.with_format("tf")
# Assuming you have defined your model and compiled it
Expand Down
33 changes: 26 additions & 7 deletions datasets/doc/source/tutorial-quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Run Flower Datasets as fast as possible by learning only the essentials.

Install Federated Datasets
--------------------------
Run on the command line
On the command line, run

.. code-block:: bash
adam-narozniak marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -28,12 +28,11 @@ PyTorch
Choose the dataset
------------------
Choose the dataset by going to Hugging Face `Datasets Hub <https://huggingface.co/datasets>`_ and searching for your
dataset by name. Note that the name is case sensitive, so make sure to pass the correct name as the `dataset` parameter
to `FederatedDataset`.
dataset by name that you will pass to the `dataset` parameter of `FederatedDataset`. Note that the name is case sensitive.

Partition the dataset
---------------------
::
To iid partition your dataset, choose the split you want to partition and the number of partitions::

from flwr_datasets import FederatedDataset

Expand All @@ -42,12 +41,32 @@ Partition the dataset
centralized_dataset = fds.load_full("test")

Now you're ready to go. You have ten partitions created from the train split of the MNIST dataset and the test split
for the centralized evaluation. We will convert the type of the dataset from Hugging Face's Dataset type to the one
for the centralized evaluation. We will convert the type of the dataset from Hugging Face's `Dataset` type to the one
supported by your framework.

Display the features
--------------------
Determine the names of the features of your dataset (you can alternatively do that directly on the Hugging Face
website). The names can vary along different datasets e.g. "img" or "image", "label" or "labels". You will also see
the names of label categories. Type::

partition.features

In case of CIFAR10, you should see the following output.

.. code-block:: none

{'img': Image(decode=True, id=None),
'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck'], id=None)}

Note that the image is denoted by "img" which is crucial for the next steps (conversion you the ML
framework of your choice).

Conversion
----------
For more detailed instructions, go to :doc:`how-to-use-with-pytorch`.
For more detailed instructions, go to :doc:`how-to-use-with-pytorch`, :doc:`how-to-use-with-numpy`, or
:doc:`how-to-use-with-tensorflow`.

PyTorch DataLoader
^^^^^^^^^^^^^^^^^^
Expand All @@ -64,7 +83,7 @@ Transform the Dataset directly into the DataLoader::

NumPy
^^^^^
NumPy can be used as input to the TensorFlow model and is very straightforward::
NumPy can be used as input to the TensorFlow and scikit-learn models and it is very straightforward::

partition_np = partition.with_format("numpy")
X_train, y_train = partition_np["img"], partition_np["label"]
Expand Down