Global word frequency calculation #121

ClaudiaShu · 2023-05-22T14:00:26Z

Hi, I have a question about computing the replacement S score.

In your paper, the score is obtained by $S(w) = freq(w)IDF(w)$. However, in the code, this score is calculated by adding the TF-IDF score of a term in every document as below. However, $freq(w)$ in the corpus is not the sum of word frequency in a document. Moreover, the idf score of a term in the corpus should always be the same since the number of documents that contains term $w$ and the number of documents are always the same.

# Compute TF-IDF
tf_idf = {}
for i in range(len(examples)):
  cur_word_dict = {}
  cur_sent = copy.deepcopy(examples[i].word_list_a)
  if examples[i].text_b:
    cur_sent += examples[i].word_list_b
  for word in cur_sent:
    if word not in tf_idf:
      tf_idf[word] = 0
    tf_idf[word] += 1. / len(cur_sent) * idf[word]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global word frequency calculation #121

Global word frequency calculation #121

ClaudiaShu commented May 22, 2023 •

edited

Loading

Global word frequency calculation #121

Global word frequency calculation #121

Comments

ClaudiaShu commented May 22, 2023 • edited Loading

ClaudiaShu commented May 22, 2023 •

edited

Loading