Turn your whatsapp conversations into training data for a large language model
- Export your whatsapp conversation from the app, and put it in the
data
directory, with the filename_chat.txt
(this is the default export). - Run
python3 src/clean.py
to clean the data and save it todata/cleaned.txt
. - Then, to make your huggingface dataset, run
python3 src/make_dataset.py
. This will save the dataset todata/chat_dataset
.