Skip to content

speakleash/speakleash-extractor-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

General Information

We know you're a professional, but here is some information you might find useful.

  1. Use a virtual environment (venv) and the packages specified in the requirements.txt file.
  2. Use the lm-dataformat library to create the resulting jsonl.zst file
  3. Don't forget the metadata for each document(characters, sentences, words, verbs, nouns, punctuations, symbols) and manifest file (the most important source of data and rights). If in doubt, ask on the SpeakLeash discord.
  4. The data must be shuffled.
  5. In the README.md file, always add a Usage section and a few sentences on how to use the tool.
  6. For processing large files, we recommend using tqdm, threading, and saving state (for the resume function).
  7. Have fun!

Usage

Run example:

python main.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages