You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a question about computing the replacement S score.
In your paper, the score is obtained by $S(w) = freq(w)IDF(w)$. However, in the code, this score is calculated by adding the TF-IDF score of a term in every document as below. However, $freq(w)$ in the corpus is not the sum of word frequency in a document. Moreover, the idf score of a term in the corpus should always be the same since the number of documents that contains term $w$ and the number of documents are always the same.
# Compute TF-IDF
tf_idf = {}
for i in range(len(examples)):
cur_word_dict = {}
cur_sent = copy.deepcopy(examples[i].word_list_a)
if examples[i].text_b:
cur_sent += examples[i].word_list_b
for word in cur_sent:
if word not in tf_idf:
tf_idf[word] = 0
tf_idf[word] += 1. / len(cur_sent) * idf[word]
The text was updated successfully, but these errors were encountered:
Hi, I have a question about computing the replacement S score.
In your paper, the score is obtained by$S(w) = freq(w)IDF(w)$ . However, in the code, this score is calculated by adding the TF-IDF score of a term in every document as below. However, $freq(w)$ in the corpus is not the sum of word frequency in a document. Moreover, the idf score of a term in the corpus should always be the same since the number of documents that contains term $w$ and the number of documents are always the same.
The text was updated successfully, but these errors were encountered: