Project Leads:
- Amirali Danai ([email protected])
- Jason Yen ([email protected])
This is a repository containing the work done for the Next Word Prediction project at the Michigan Data Science during the Fall semester of 2023. The project's main objective was to leverage machine learning methods to reliably generate the next word, or next n words, given a sequence of text. Generating conversational-like text or using transformer architecture were outside the scope of the project.
This repository contains several approaches to solving this task, using varied model architectures and datasets. We enumerate them below in the Next Word Training Models section. We also created front-end interfaces for getting predictions from the model, which are described in the Streamlit section.
Our approaches included using a LSTM neural network, and a simple n-gram model. Much of the focus of the project meetings were the LSTM.
Using a simple LSTM architecture, we trained a neural network on a Sherlock Holmes book. The predictions of the model were generally poor, and after many attempts of training and hyperparamter tuning, we failed to get the training accuracy above 25%. This motivated our search for other datasets, to see if we could achieve better performance.
We move to a more expansive dataset - a set of Amazon technology reviews. The LSTM had comparative performance, but had significantly better validation metrics and made more sensible predictions.
Surprisingly, switching the LSTM layer to a GRU layer led to a boost in performance. The model jumped to 30% accuracy, and made the best predictions we had seen yet. After reducing the learning rate, we saw a rise to 50% accuracy.
Using n-grams for predictions were never explicitly covered by the Fall '23 project, but is included in this repository for the sake of completeness. The Neural Net approaches all leveraged n-grams to generate input sequences that the models were trained on. But what if we cut out the neural net training process, and train the next word simply using next word probabilities derived from the n-grams themselves?
It turns out that the model performs quite well, generating sequences of text that make sense, but are most times ripped off completely from the training data. The model also cannot generalize at all outside of the given dataset.
The shortcomings of RNN methods make it abundantly clearly why Transformers are the contemporary architecture used for Large Language Models. The project was a positive learning experience, covering topics from preprocessing with Regex to (manual) hyperparameter tuning of Neural Networks. Future iterations of the project should leverage Transformers to gain a more comprehensive overview of next word prediction algorithms.
We took advantage of the easy-to-use Streamlit library in Python to create front-ends for the next word prediction models. The models are contained in the streamlit/
directory. To run the apps and see them, pip install streamlit
and python -m streamlit run {path_to_file}
.
Alternatively, the front-ends can be viewed here and here respectively. Both GUIs utilize the GRU_amazon_reviews_model as their predictor.