Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global word frequency calculation #121

Open
ClaudiaShu opened this issue May 22, 2023 · 0 comments
Open

Global word frequency calculation #121

ClaudiaShu opened this issue May 22, 2023 · 0 comments

Comments

@ClaudiaShu
Copy link

ClaudiaShu commented May 22, 2023

Hi, I have a question about computing the replacement S score.

In your paper, the score is obtained by $S(w) = freq(w)IDF(w)$. However, in the code, this score is calculated by adding the TF-IDF score of a term in every document as below. However, $freq(w)$ in the corpus is not the sum of word frequency in a document. Moreover, the idf score of a term in the corpus should always be the same since the number of documents that contains term $w$ and the number of documents are always the same.

# Compute TF-IDF
tf_idf = {}
for i in range(len(examples)):
  cur_word_dict = {}
  cur_sent = copy.deepcopy(examples[i].word_list_a)
  if examples[i].text_b:
    cur_sent += examples[i].word_list_b
  for word in cur_sent:
    if word not in tf_idf:
      tf_idf[word] = 0
    tf_idf[word] += 1. / len(cur_sent) * idf[word]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant