Juexiao Zhang*, Yubei Chen*, Brian Cheung and Bruno Olshausen
This repository hosts our work Word Embedding Visualization via Dictionary Learning.
The arxiv preprint is at this link.
An outline of the files contained in thie repository:
-
sparsify_Pytorch.py
is our library for dictionary learning. -
WordFactor_reproduce.ipynb
andFactorGroup_reproduce.ipynb
are our reorginzed and renewed notebooks for to show how to learn a dicionary containing the word factors and reproduce the results. The reader can refer to the notebooks for more details -
data
directory stores the corpus data used for training, for example text8, download. -
embeddings
directory stores the pretrained word embeddings, for example the GloVe. -
results
directory stores the results you can obtain from running the notebooks. Take the provided as an example,basis.pt
stores the trained dictionary elements, aka the word factors.nmed_factor_cooc.npy
is the normalized factor cooccurrence matrix andsym_labels_knn20_c175.npy
stores the factor clustering labels. Both are obtained fromFactorGroup_reproduce.ipynb
.
Follow the instructions in the notebook, particularly WordFactor_reproduce.ipynb
, and have the corpus and embeddings placed in data/
and embeddings
respectively. The reader should be able to reproduce the results of results/glove-text8-reproduce-1k-factors
. Please refer to the notebooks for specific instructions.
This project is tested with:
-
Python 3.7
-
PyTorch 1.1.0
-
scipy 1.2.0
-
scikit-learn 0.21.2
-
matplotlib 3.1.0
-
plotly 4.1.1
@article{DBLP:journals/corr/abs-1910-03833,
author = {Juexiao Zhang and
Yubei Chen and
Brian Cheung and
Bruno A. Olshausen},
title = {Word Embedding Visualization Via Dictionary Learning},
journal = {CoRR},
volume = {abs/1910.03833},
year = {2019},
url = {http://arxiv.org/abs/1910.03833},
eprinttype = {arXiv},
eprint = {1910.03833},
timestamp = {Wed, 16 Oct 2019 16:25:53 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1910-03833.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}