Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 598 Bytes

README.md

File metadata and controls

19 lines (13 loc) · 598 Bytes

SeSiMe

(Sentence/Sequence Similarity Measures)

Protoype name. And prototyoe code.

Here used to calculate similarities (or distances) between mass spectra or between biosynthetic gene clusters (BGCs).

Method categories

Method Sensitive to word context Sensitive to word order
PCA on 1-hot document vector No No
Autoencoder on 1-hot document vector No No
Autoencoder on full sequence (Yes) (Yes)
Word2Vec + document centroid Yes No
GloVe Yes No
ELMo, bi-LSTM etc. Yes Yes