Skip to content

Commit

Permalink
fix caching
Browse files Browse the repository at this point in the history
  • Loading branch information
ccurme committed Oct 23, 2024
1 parent 10a613b commit 2ab40c5
Show file tree
Hide file tree
Showing 6 changed files with 7 additions and 11 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

8 changes: 2 additions & 6 deletions docs/scripts/cache_data.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
import nltk
import tiktoken
from unstructured.nlp.tokenize import download_nltk_packages


def download_tiktoken_data():
# This will trigger the download and caching of the necessary files
_ = tiktoken.encoding_for_model("gpt-3.5-turbo")


def download_nltk_data():
nltk.download("punkt")


if __name__ == "__main__":
download_tiktoken_data()
download_nltk_data()
download_nltk_packages()

0 comments on commit 2ab40c5

Please sign in to comment.