Generative_replay

Generative_replay for simulating infant language acquisition

Here's the procedure to go through in order to train the synthetic data-augmented LM

1. Prepare your data.
2. Train your LM model (or download a checkpoint adn dict.txt).
3. Save chunk ppl to a datastore.
4. Get the vector database from the training set
5. Build the faisee index (prefix-only and prefix+generation)

The generative_replay model is divided into 2 parts

word LM: use standard bpe tokenizer tested on BLIMP
char LM tested on Machine-CDI, see [https://github.com/Jing-L97/Lexical-benchmark]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
knnlm.py		knnlm.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
retomaton.py		retomaton.py
run_clm.py		run_clm.py
train_lm.py		train_lm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative_replay

The generative_replay model is divided into 2 parts

About

Releases

Packages

Languages

Jing-L97/Generative_replay

Folders and files

Latest commit

History

Repository files navigation

Generative_replay

The generative_replay model is divided into 2 parts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages