From 6a4ca13f95fceca6eb212b38c62a90fd3de7ee76 Mon Sep 17 00:00:00 2001 From: Adam Narozniak Date: Fri, 13 Oct 2023 14:32:59 +0200 Subject: [PATCH 1/3] Explain on the fly transforms for PyTorch --- .../doc/source/how-to-use-with-pytorch.rst | 28 ++++++++++++++++--- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/datasets/doc/source/how-to-use-with-pytorch.rst b/datasets/doc/source/how-to-use-with-pytorch.rst index 5981f88c26b8..59dfcaa05b43 100644 --- a/datasets/doc/source/how-to-use-with-pytorch.rst +++ b/datasets/doc/source/how-to-use-with-pytorch.rst @@ -15,7 +15,7 @@ vary e.g. "img" or "image", "label" or "labels":: partition.features -In case of CIFAR10, you should see the following output +In case of CIFAR10, you should see the following output. .. code-block:: none @@ -23,9 +23,29 @@ In case of CIFAR10, you should see the following output 'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'], id=None)} -Apply Transforms, Create DataLoader. We will use the `map() `_ -function. Please note that the map will modify the existing dataset if the key in the dictionary you return is already present -and append a new feature if it did not exist before. Below, we modify the "img" feature of our dataset.:: + +Apply Transforms, Create DataLoader. We will use `Dataset.with_transform() `_. +It works on-the-fly, meaning the transforms you specified will be applied only when you access the data, which is also how the transforms work in the PyTorch ecosystem. +The last detail is to know that this function works on the batches of data (even if you select a single element, it is represented as a batch). +That is why we iterate over all the samples from this batch and apply our transforms:: + + from torch.utils.data import DataLoader + from torchvision.transforms import ToTensor + + transforms = ToTensor() + def apply_transforms(batch): + batch["img"] = [transforms(img) for img in batch["img"]] + return batch + + partition_torch = partition.with_transform(apply_transforms) + # At this point, you can check if you didn't make any mistakes by calling partition_torch[0] + dataloader = DataLoader(partition_torch, batch_size=64) + + +Alternatively, you can use the `map() `_ +function. Note that the operation is instant (contrary to the set_transform and with_transform). Remember that the map +will modify the existing dataset if the key in the dictionary you return is already present and append a new feature if +it did not exist before. Below, we modify the "img" feature of our dataset.:: from torch.utils.data import DataLoader from torchvision.transforms import ToTensor From 018e46eb731efaf7c7e30001744d3c832b9cd82d Mon Sep 17 00:00:00 2001 From: "Daniel J. Beutel" Date: Sat, 14 Oct 2023 12:22:48 +0200 Subject: [PATCH 2/3] Update datasets/doc/source/how-to-use-with-pytorch.rst --- datasets/doc/source/how-to-use-with-pytorch.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datasets/doc/source/how-to-use-with-pytorch.rst b/datasets/doc/source/how-to-use-with-pytorch.rst index 59dfcaa05b43..cc1bffa6a75b 100644 --- a/datasets/doc/source/how-to-use-with-pytorch.rst +++ b/datasets/doc/source/how-to-use-with-pytorch.rst @@ -43,7 +43,7 @@ That is why we iterate over all the samples from this batch and apply our transf Alternatively, you can use the `map() `_ -function. Note that the operation is instant (contrary to the set_transform and with_transform). Remember that the map +function. Note that the operation is instant (contrary to the ``set_transform`` and ``with_transform``). Remember that the ``map`` will modify the existing dataset if the key in the dictionary you return is already present and append a new feature if it did not exist before. Below, we modify the "img" feature of our dataset.:: From 22f5bf6475a08e2ea0e85a8038bc37923dc5fe5a Mon Sep 17 00:00:00 2001 From: "Daniel J. Beutel" Date: Sat, 14 Oct 2023 12:22:52 +0200 Subject: [PATCH 3/3] Update datasets/doc/source/how-to-use-with-pytorch.rst --- datasets/doc/source/how-to-use-with-pytorch.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datasets/doc/source/how-to-use-with-pytorch.rst b/datasets/doc/source/how-to-use-with-pytorch.rst index cc1bffa6a75b..497266dd1e69 100644 --- a/datasets/doc/source/how-to-use-with-pytorch.rst +++ b/datasets/doc/source/how-to-use-with-pytorch.rst @@ -45,7 +45,7 @@ That is why we iterate over all the samples from this batch and apply our transf Alternatively, you can use the `map() `_ function. Note that the operation is instant (contrary to the ``set_transform`` and ``with_transform``). Remember that the ``map`` will modify the existing dataset if the key in the dictionary you return is already present and append a new feature if -it did not exist before. Below, we modify the "img" feature of our dataset.:: +it did not exist before. Below, we modify the ``"img"`` feature of our dataset.:: from torch.utils.data import DataLoader from torchvision.transforms import ToTensor