This repository contains the code that investigates the similarities between Lexical and Vector Semantics.
The code for training Word2Vec and TF-iDF models on the Brown Corpus and Reuters Corpus can be found in the /code/word2vec
and code/tfidf
directories.
These models are then saved in the models
directory, which are then used to create testing dictionaries for every single model saved at data/similarities
.
These dictionaries are then compared with the golden truth (SimLex-999) words using the nDCG metric.
To run our code, you need to have python >= 3.8
installed. You can then use pip
to install all the required dependencies that are listed in requirements.txt
.
Step 1: Clone this github repository and set it as your working directory by the following command:
!git clone https://github.com/Mrulay/COMP8730_Assign03.git
!cd /content/COMP8730_Assign03
Step 2: Install all the dependencies from the requirements.txt
pip install -r requirements.txt
A tutorial notebook is available here that displays the execution of all these steps and performs testing of the code as well.
Upon testing all the Word2Vec models, the best nDCG score was obtained with