For this project at Metis, I used unsupervised learning with non-negative matrix factorization, Word2Vec, Markov chains, Flask, and HTML/CSS to create a web app that allows the user to find similar articles to a given input. Users can also play with a random press release generator that creates press releases for select topics using a Markov chain generator.
In this repo, I've uploaded my code and the presentation I gave at Metis on this project.
Blog post is currently a work in progress.
- All data is available on Kaggle.
- doj_eda.py - some light data cleaning and data exploration
- nmf_testing.ipynb - Testing of non-negative matrix factorization
- topic_building.ipynb - Final topic modeling using non-negative matrix factorization
- taxes.ipynb - Topic modeling for tax topic only to see if it can be broken down into further subtopics
- w2v.ipynb - Building and examining word vectors for potential bias
- flaskapp.py - Flask app to look up relevant press releases and generate random press releases
- testing_for_web_app.ipynb - Notebook building and testing functions for use in Flask app
- generator.ipynb - Building rough generator for random press releases using Markov chains
- page.html - First page of Flask app (search bar)
- template2.html - Page for list of relevant press releases
- article.html - Page for randomly generated press releases
- doj_presentation.pdf - Powerpoint presentation given at Metis
- Web App Demo - Demonstration of Flask app