Skip to content

Commit

Permalink
[Tutorials] Add a readme file for the TinyStories tutorial (#5)
Browse files Browse the repository at this point in the history
This PR adds a short README.md file for the TinyStories tutorial, with
instructions on how to run it.

Signed-off-by: Mehran Maghoumi <[email protected]>
  • Loading branch information
Maghoumi authored Mar 21, 2024
1 parent 83129e2 commit 8d47911
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions tutorials/tinystories/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# TinyStories

This tutorial demonstrates the usage of NeMo Curator's Python API to curate the [TinyStories](https://arxiv.org/abs/2305.07759) dataset. TinyStories is a dataset of short stories generated by GPT-3.5 and GPT-4, featuring words that are undersood by 3 to 4-year olds. The small size of this dataset makes it ideal for creating and validating data curation pipelines on a local machine.

For simplicity, this tutorial uses the validation split of this dataset, which contains around 22,000 samples.

## Usage
After installing the NeMo Curator package, you can simply run the following command:
```
python tutorials/tinystories/main.py
```

This will download the validation split of the TinyStories dataset and begin the data curation pipeline.

0 comments on commit 8d47911

Please sign in to comment.