Skip to content

Commit

Permalink
Update the dataset creation docs
Browse files Browse the repository at this point in the history
  • Loading branch information
adam-narozniak committed Nov 23, 2023
1 parent 1f9fa75 commit de87d99
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions datasets/doc/source/how-to-use-with-pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Standard setup - download the dataset, choose the partitioning::
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_full("test")

Determine the names of our features (you can alternatively do that directly on the Hugging Face website). The name can
Determine the names of the features (you can alternatively do that directly on the Hugging Face website). The name can
vary e.g. "img" or "image", "label" or "labels"::

partition.features
Expand Down Expand Up @@ -38,7 +38,7 @@ That is why we iterate over all the samples from this batch and apply our transf
return batch

partition_torch = partition.with_transform(apply_transforms)
# At this point, you can check if you didn't make any mistakes by calling partition_torch[0]
# Now, you can check if you didn't make any mistakes by calling partition_torch[0]
dataloader = DataLoader(partition_torch, batch_size=64)


Expand Down Expand Up @@ -70,8 +70,8 @@ If you want to divide the dataset, you can use (at any point before passing the
Or you can simply calculate the indices yourself::

partition_len = len(partition)
partition_train = partition[:int(0.8 * partition_len)]
partition_test = partition[int(0.8 * partition_len):]
partition_train = partition.select(range(int(0.8 * partition_len)))
partition_test = partition.select(range(int(0.8 * partition_len), partition_len))

And during the training loop, you need to apply one change. With a typical dataloader, you get a list returned for each iteration::

Expand Down

0 comments on commit de87d99

Please sign in to comment.