-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate results #59
Comments
Good catch. For posterity, here's the result CSV content of the repo this user is talking about: https://justpaste.it/8evui Worth noting: the CSV code is not responsible. The result table itself contains duplicates. And then if I look at the Network tab in the Chrome Dev Tools, we see that it seems like GitHub's API is returning duplicates: The simplest fix to this is to keep a HashSet of all entries added in the table, and to only append new rows if the repo isn't in the HashSet. I'll label this issue as a bug, despite the bug not being in our codebase. |
The even better fix is to change |
Theoretically, a set has no ordering. Thus, the operation of sorting would make no sense against that data structure. Also, it seems like JavaScript has no built-in set (and thus I regret saying "The simplest fix to this is to keep a HashSet"). We'd have to make one, and at that point it might become not worth it: the extra code increases complexity and reduces maintainability, while not providing much performance gains because we are talking about relatively small tables. The simplest approach would be to simply scan the entire table array to check for a duplicate: that's an acceptable |
Update: I have the fix ready. I'm simply waiting on #58 to get merged in as it'll build on it. |
This was what I thought would fix it. |
It's this the set you want? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set |
Yes, although technically given I have done an implementation with a list which should have solved this issue, it is clear that the problem is not what I thought it was.
Feel free to investigate as I do not have much free time these days. |
Not sure if this is to be expected somehow, but I noticed nothing in the Issues about this: I'm getting lots of duplicate results displayed, the same lines with the same values. When exported to csv, it is the same.
For example, the file
useful-forks-https\ __github.com_zyedidia_micro_.csv
has 104 lines (including the header), while with duplicate lines removed, it only has 59.The text was updated successfully, but these errors were encountered: