A simple app to visualize and enable playing around with word embeddings. Word embeddings are vectors learned by language models to represent words and expressions. These vectors often have linear properties, so we can add and subtract different meanings. For example, if you take France's representation vector, subtract Paris, and add Tokyo, you get a vector close to Japan (France-Paris+Tokyo≈Japan). More details can be read here in these papers: Word2Vec , GloVe
This app makes it easy to play around with embeddings and see what other words you can build from word vectors.
You can also see relationships between vectors projected down to 2D:
France-Paris and Japan-Tokyo have very similar relationships in the vector space.
-
clone the repo
git clone https://github.com/Sneccello/WordMaze.git
-
setup environment
conda create --name word_maze python=3.11
conda activate word_maze
pip install -r requirements.txt
-
Run the streamlit app (it may take some time to install the GloVe vector embeddings for the first time)
streamlit run source/main.py
- pip install may fail for M1(+) Macs on pysqlite3-binary. This requirement is only meant for streamlit deployment where the default sqlite3 library version does not match the required chromadb version. You can remove this requirement with the right sqlite3 library version. More info on this issue here