Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Export] Check whether exported data vote counts add up #1889

Open
1 of 2 tasks
jucor opened this issue Jan 29, 2025 · 2 comments
Open
1 of 2 tasks

[Export] Check whether exported data vote counts add up #1889

jucor opened this issue Jan 29, 2025 · 2 comments

Comments

@jucor
Copy link
Contributor

jucor commented Jan 29, 2025

@metasoarous has flagged in compdemocracy/openData#6 that the previously exported data files have an issue:

For at least some of exports, vote counts from the comments.csv file don't always add up to what's actually in the participants-votes.csv file.

I have a feeling this is because the old clojure exporter was for at least some period of time pulling vote counts directly from the comments table in the database. I think these caches were never properly maintained by the server, and so became inaccurate.

This may have been fixed in the updated export functionality, but just flagging that there are at least some files here where these values are off.

We thus need to:

@jucor
Copy link
Contributor Author

jucor commented Feb 1, 2025

Hi @metasoarous

Update: I'm digging into this. The current clojure code at

(defn get-export-data
[darwin {:keys [zid zinvite update-math] :as kw-args}]
(let [zid (or zid (postgres/get-zid-from-zinvite (:postgres darwin) zinvite))
;; assert zid
votes (get-corrected-conversation-votes darwin zid)
participants (get-participation-data darwin zid)
;; Should factor out into separate function
conv (cond-> (utils/apply-kwargs load-conv darwin kw-args)
update-math (conv/conv-update votes))
comments (-> (get-comments-data darwin zid)
(enriched-comments-data votes)
(add-group-data-to-comments conv))]

seems to load the votes afresh from the votes table, then aggregating those per comment. It is not using any cached value from another table, at least that I can see.

Since you mention (emphasis added):

the old clojure exporter was for at least some period of time pulling vote counts directly from the comments table in the database.
I went back in time in the git history, to 2016, when you switched from Mongo to Postgres, and it still seemed clear at that stage. Before that, though, I haven't gone further back because eating an out-of-date codebase seems high effort - low return 😅

I'm considering loading all the conversations and comparing the counts to find where the issue occured. Alternatively, to speed things up: could you maybe point to the export files where you've seen the problem, please? (saves me time -- if not no worries).

@jucor
Copy link
Contributor Author

jucor commented Feb 1, 2025

I have also checked the new fully Typescript, synchronous real-time export route, as now used in report pages screenshotted below.

I can confirm that the comment export code written by @ballPointPenguin and @colinmegill re-computes on-demand the vote sums for each comment straight from the votes database, as it should. It is not using any cached value. See:

export async function sendCommentSummary(zid: number, res: Response) {
const comments = new Map<number, CommentRow>();
try {
// First query: Load comments metadata
const commentRows = (await pgQueryP_readOnly(
"SELECT tid, pid, created, txt, mod, velocity, active FROM comments WHERE zid = ($1)",
[zid]
)) as CommentRow[];
for (const comment of commentRows) {
comment.agrees = 0;
comment.disagrees = 0;
comment.pass = 0;
comments.set(comment.tid, comment);
}
// Second query: Count votes in a single pass
stream_pgQueryP_readOnly(
"SELECT tid, vote FROM votes WHERE zid = ($1) ORDER BY tid",
[zid],
(row) => {
const comment = comments.get(row.tid);
if (comment) {
// note that -1 means agree and 1 means disagree
if (row.vote === -1) comment.agrees += 1;
else if (row.vote === 1) comment.disagrees += 1;
else if (row.vote === 0) comment.pass += 1;
} else {
logger.warn(`Comment row not found for [zid=${zid}, tid=${row.tid}]`);
}
},

So we're clear on that front too, the new exports are good :)

To go further, if I have some time I'll load the old data exports and check the tallies, and compare to the new ones. (Setting expectations: I'm a bit busy with a few other issues and pull requests ahead of this :) )

PS: Screenshot of the new export, if you haven't seen them -- they have some Curl URLs too that can be of interest :)
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant