-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New ranking feature based on word embeddings Word2Vec based on cosinus value #34
Labels
Comments
As mentioned in a private message, a test implementation is installed on black: testing/evaluation is needed! |
To be clear an excerpt from the IRC conversation:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We currently have ranking feature:
(skip[12]?0:(*vit)->cosine_rank);
This is based on the top 20 semantic nearest neighbours as returned on the basis of word2vec word embeddings and a further check on the cosine values. This works, but is too slow for production work. Perhaps the current request will warrant this earlier feature to be renamed.
I would like a new feature that for each pair of variant and particular CC retrieves the cosine value (as does ticcltool W2V-dist). Given all the values for all the CCs for a variant, the smallest value should then be ranked 'best', i.e. being assigned 1. Larger values then get assigned ranks 2, 3 ,4, etc. Possible draws get the same rank.
Many thanks!
The text was updated successfully, but these errors were encountered: