Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
tmbdev authored Apr 29, 2021
1 parent 3099b46 commit 44c602a
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ Since WebDatasets are just tar files, you can use many different tools to create
If your data is already laid out like that on the file system, you can use `tar --sorted`:

```Shell
$ tar --sorted name -cf - dataset > dataset.tar
$ tar --sort=name -cf - dataset > dataset.tar
```

You can also use the `tarp create` command (at [github.com/tmbdev/tarp](http://github.com/tmbdev/tarp)) with a recipe file.
You can also use the `tarp create` command (at [github.com/tmbdev/tarp](http://github.com/tmbdev/tarp)) with a recipe file, use `tarp split` to split large datasets into multiple shards, and `tarp shuffle` to shuffle datasets.

And you can use Python or Julia scripts to write such files directly. For example, [makeshards.py](https://github.com/tmbdev/webdataset-lightning/blob/main/makeshards.py) uses some existing PyTorch code to quickly convert Imagenet data into sharded tar files.

Expand Down

0 comments on commit 44c602a

Please sign in to comment.