You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I expect that the issues and PRs I've posted here are going to my own fork and no further, but if anyone out there is reading, your feedback is welcome.
set() does not return an ordered enumerable, so uniqueWords is an unordered list. Iterating it to build keywords means this list is also unordered, so the return order of sorted() on the count key alone is inconsistent.
Down in Summarizer.summarize() it means sentence scores are also inconsistent:
# ...
(keywords, wordCount) =self.parser.getKeywords(text)
topKeywords=self.getTopKeywords(keywords[:10], wordCount, source, category)
# ... iterating sentences ...sbsFeature=self.sbs(words, topKeywords, keywordList)
dbsFeature=self.dbs(words, topKeywords, keywordList)
# ... calculate sentence score based on these features ...# ...
For consistency, when the summarize method calls getTopKeywords(), should it still pass a fixed list of ten keywords that's been sorted with more tiebreakers…
Computationally, the advanced sort seems unnecessarily expensive, but I don't know if there's a rationale for exactly ten top keywords. What's the best way to make this work?
The text was updated successfully, but these errors were encountered:
For anyone out there playing along at home, switching from a fixed ten to topKeywordSlice resulted in as many as 18 "top keywords" in my sample texts. It slightly boosted some scores, with one outlier getting a +14% bump… yet it remained the lowest scoring sentence.
Differences after change, absolute percentage AP and standard deviations SD: AP (SD)
I expect that the issues and PRs I've posted here are going to my own fork and no further, but if anyone out there is reading, your feedback is welcome.
In
Parser.getKeywords()
:set()
does not return an ordered enumerable, souniqueWords
is an unordered list. Iterating it to buildkeywords
means this list is also unordered, so the return order ofsorted()
on the count key alone is inconsistent.Down in
Summarizer.summarize()
it means sentence scores are also inconsistent:For consistency, when the
summarize
method callsgetTopKeywords()
, should it still pass a fixed list of ten keywords that's been sorted with more tiebreakers……every keyword that ranks 10th or better, i.e. this comprehension…
… or maybe both?
Computationally, the advanced sort seems unnecessarily expensive, but I don't know if there's a rationale for exactly ten top keywords. What's the best way to make this work?
The text was updated successfully, but these errors were encountered: