This project is based on developing language models using N-grams. The project includes 2 code files and a report.
Code File1 (NLU_Assignment1_LM.py): Contains the code for both Task1 and Task2. Run this file to see all the results.
Code File2 (Sentence_Generator.py) Run this file if you only want to see the generated sentences.
As task 1 I have developed language model using three different models
1.) Bigram Kneser-Ney
2.) Bigram Katz BackOff
3.) Trigram Stupid Backoff
The code file doesn't include the hyper parameter tuning experiments. Though the parameters used in the project are tuned in separate expermient. The results of hyper parameter tuning have been included in the report.
As task 2, using a language model I have generated an english language exactly 10 token sentence. The model is able to generate really good sentences both semantically and syntactically.
A sample generated sentence
"We are worried about their machinery beyond mechanical details."