Skip to content

Turn your whatsapp conversations into training data for a large language model

License

Notifications You must be signed in to change notification settings

sumukshashidhar-archive/whatsapp-to-train

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whatsapp-to-train

Turn your whatsapp conversations into training data for a large language model

Usage

Data Collection

  1. Export your whatsapp conversation from the app, and put it in the data directory, with the filename _chat.txt (this is the default export).
  2. Run python3 src/clean.py to clean the data and save it to data/cleaned.txt.
  3. Then, to make your huggingface dataset, run python3 src/make_dataset.py. This will save the dataset to data/chat_dataset.

Training

About

Turn your whatsapp conversations into training data for a large language model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published