Skip to content

Commit

Permalink
Merge branch 'main' into fds-add-info-about-features
Browse files Browse the repository at this point in the history
  • Loading branch information
jafermarq authored Nov 23, 2023
2 parents 78ee0a0 + 3e07f97 commit d7b81b5
Show file tree
Hide file tree
Showing 9 changed files with 183 additions and 40 deletions.
30 changes: 30 additions & 0 deletions datasets/dev/build-flwr-datasets-docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
# Generating the docs, rename and move the files such that the meet the convention used in Flower.
# Note that it involves two runs of sphinx-build that are necessary.
# The first run generates the .rst files (and the html files that are discarded)
# The second time it is run after the files are renamed and moved to the correct place. It generates the final htmls.

set -e

cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"/../doc

# Remove the old docs from source/ref-api
REF_API_DIR="source/ref-api"
if [[ -d "$REF_API_DIR" ]]; then

echo "Removing ${REF_API_DIR}"
rm -r ${REF_API_DIR}
fi

# Remove the old html files
if [[ -d build ]]; then
echo "Removing ./build"
rm -r build
fi

# Docs generation: Generate new rst files
# It starts at the __init__ in the main directory and recursively generated the documentation for the
# specified classes/modules/packages specified in __all__.
# Note if a package cannot be reach via the recursive traversal, even if it has __all__, it won't be documented.
echo "Generating the docs based on only the functionality given in the __all__."
sphinx-build -M html source build
33 changes: 33 additions & 0 deletions datasets/doc/source/_templates/autosummary/class.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{{ name | escape | underline}}

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}
:members:
:show-inheritance:
:inherited-members:

{% block methods %}

{% if methods %}
.. rubric:: {{ _('Methods') }}

.. autosummary::
{% for item in methods %}
{% if item != "__init__" %}
~{{ name }}.{{ item }}
{% endif %}
{%- endfor %}
{% endif %}
{% endblock %}

{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}

.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
66 changes: 66 additions & 0 deletions datasets/doc/source/_templates/autosummary/module.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{{ name | escape | underline}}

.. automodule:: {{ fullname }}

{% block attributes %}
{% if attributes %}
.. rubric:: Module Attributes

.. autosummary::
:toctree:
{% for item in attributes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block functions %}
{% if functions %}
.. rubric:: {{ _('Functions') }}

.. autosummary::
:toctree:
{% for item in functions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block classes %}
{% if classes %}
.. rubric:: {{ _('Classes') }}

.. autosummary::
:toctree:
:template: autosummary/class.rst
{% for item in classes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block exceptions %}
{% if exceptions %}
.. rubric:: {{ _('Exceptions') }}

.. autosummary::
:toctree:
{% for item in exceptions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block modules %}
{% if modules %}
.. rubric:: Modules

.. autosummary::
:toctree:
:template: autosummary/module.rst
:recursive:
{% for item in modules %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
34 changes: 34 additions & 0 deletions datasets/doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,42 @@
"nbsphinx",
]

# Generate .rst files
autosummary_generate = True

# Document ONLY the objects from __all__ (present in __init__ files).
# It will be done recursively starting from flwr_dataset.__init__
# It's controlled in the index.rst file.
autosummary_ignore_module_all = False

# Each class and function docs start with the path to it
# Make the flwr_datasets.federated_dataset.FederatedDataset appear as FederatedDataset
# The full name is still at the top of the page
add_module_names = False

def find_test_modules(package_path):
"""Go through the python files and exclude every *_test.py file."""
full_path_modules = []
for root, dirs, files in os.walk(package_path):
for file in files:
if file.endswith('_test.py'):
# Construct the module path relative to the package directory
full_path = os.path.join(root, file)
relative_path = os.path.relpath(full_path, package_path)
# Convert file path to dotted module path
module_path = os.path.splitext(relative_path)[0].replace(os.sep, '.')
full_path_modules.append(module_path)
modules = []
for full_path_module in full_path_modules:
parts = full_path_module.split('.')
for i in range(len(parts)):
modules.append('.'.join(parts[i:]))
return modules

# Stop from documenting the *_test.py files.
# That's the only way to do that in autosummary (make the modules as mock_imports).
autodoc_mock_imports = find_test_modules(os.path.abspath("../../"))

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand Down
10 changes: 6 additions & 4 deletions datasets/doc/source/how-to-use-with-pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Standard setup - download the dataset, choose the partitioning::
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_full("test")

Determine the names of our features (you can alternatively do that directly on the Hugging Face website). The name can
Determine the names of the features (you can alternatively do that directly on the Hugging Face website). The name can
vary e.g. "img" or "image", "label" or "labels"::

partition.features
Expand Down Expand Up @@ -38,7 +38,7 @@ That is why we iterate over all the samples from this batch and apply our transf
return batch

partition_torch = partition.with_transform(apply_transforms)
# At this point, you can check if you didn't make any mistakes by calling partition_torch[0]
# Now, you can check if you didn't make any mistakes by calling partition_torch[0]
dataloader = DataLoader(partition_torch, batch_size=64)


Expand Down Expand Up @@ -70,8 +70,10 @@ If you want to divide the dataset, you can use (at any point before passing the
Or you can simply calculate the indices yourself::

partition_len = len(partition)
partition_train = partition[:int(0.8 * partition_len)]
partition_test = partition[int(0.8 * partition_len):]
# Split `partition` 80:20
num_train_examples = int(0.8 * partition_len)
partition_train = partition.select(range(num_train_examples)) ) # use first 80%
partition_test = partition.select(range(num_train_examples, partition_len)) ) # use last 20%

And during the training loop, you need to apply one change. With a typical dataloader, you get a list returned for each iteration::

Expand Down
10 changes: 7 additions & 3 deletions datasets/doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,15 @@ References

Information-oriented API reference and other reference material.

.. toctree::
:maxdepth: 2
.. autosummary::
:toctree: ref-api
:template: autosummary/module.rst
:caption: API reference
:recursive:

flwr_datasets


ref-api-flwr-datasets

Main features
-------------
Expand Down
27 changes: 0 additions & 27 deletions datasets/doc/source/ref-api-flwr-datasets.rst

This file was deleted.

10 changes: 6 additions & 4 deletions datasets/doc/source/tutorial-quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,15 +70,17 @@ For more detailed instructions, go to :doc:`how-to-use-with-pytorch`, :doc:`how-

PyTorch DataLoader
^^^^^^^^^^^^^^^^^^
Transform the Dataset directly into the DataLoader::
Transform the Dataset into the DataLoader, use the PyTorch transforms (`Compose` and all the others are also
possible)::

from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

transforms = ToTensor()
partition_torch = partition.map(
lambda img: {"img": transforms(img)}, input_columns="img"
).with_format("torch")
def apply_transforms(batch):
batch["img"] = [transforms(img) for img in batch["img"]]
return batch
partition_torch = partition.with_transform(apply_transforms)
dataloader = DataLoader(partition_torch, batch_size=64)

NumPy
Expand Down
3 changes: 1 addition & 2 deletions dev/build-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,7 @@ cd examples/doc
make docs

cd $ROOT
cd datasets/doc
make docs
./datasets/dev/build-flwr-datasets-docs.sh

cd $ROOT
cd doc
Expand Down

0 comments on commit d7b81b5

Please sign in to comment.