- Libraries used
- Project Inspiration
- File Descriptions
- Data Insights
- Licensing, Authors, and Acknowledgements
Python version 3.0. You need to install the following Python modules to complete this project:
Imagine working for a production company and your job is as complicated as wirting script everyday, of course it can be this way forever, or there is way to save us. Therefore, here is where this project starts. build a RNN that train on existing scripts and generates new script for us.
How it works? Details can be found in the notebook. In general, we have a exsiting Seinfield scripts (file is in Data) which has over 45k unique words, and over 100k lines, then do all the pre-processing step to get data ready for the RNN. The architect of the RNN is, input data --> embedding layers --> LSTM --> LSTM --> output, more techinical details can be found in the notebook, including model building steps, hyperparameters setting, etc... Eventually, use the trained RNN to generates new scripts, I used 5 starting words and generated 5 script that has 400 words in each one.Some new scripts do not make sense sometimes, but so far it at least looks like original script we have.
dlnd_tv_script_generation.ipynb : Jupyter notebook containing all the codes and results
dlnd_tv_script_generation.html : HTML form of the notebook
generated_scripts_1-5.txt : new scripts generated by trained RNN, starting words are: jerry, monica, elaine, kramer, george. Notice only lower case, bc RNN is trained on pre-processed words which are all lower case.
file-name.py : all the python files that check our work in notebook, or help us to keep GPU connection stable, etc
Detail can be found in the notebook. In a word, with train loss less than 3.5 and most of the new scripts make some sense, its a big improvment on speed compared to human manual scripts generation. Mayber we could do more to make the model performs better to generate script.
Data : Seinfiled scripts can be found in the Data file.
Notebook: here