A simple method of clustering and viewing Hacker News posts.
Data obtained from https://www.kaggle.com/hacker-news/hacker-news-posts.
This screenshot shows the first 1000 titles clustered.
For an interactive plot, see it directly on Plotly.
- Numpy
- Pandas
- Scikit-learn
- Gensim
- A pre-trained Word2Vec model (eg this model trained on the Google News corpus)
- Plotly (this is just for visualising, feel free to use any other library).
pip install cython numpy pandas scikit-learn gensim plotly
Alternately, see the requirements.txt
file.