-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279
base: main
Are you sure you want to change the base?
Conversation
@chaoran-chen don't approve too early, I'm not done yet. I've figured out how to solve all review page speed issues with a little extra hack 😁 |
haha, but it already looked good ;) but I can withdraw and review again when you're ready :)
Now it's really ready @chaoran-chen |
Speeds up request!
Reduced from 2s to 80ms!
…een backend and website
7e9fae0
to
541a1fe
Compare
Hey, I see it's not done yet, but great work already! 🥳 |
@fhennig it has been done for a few hours! Perf and inconsistency fixed completely, better than I could have imagined! |
resolves #3095, resolves #1612, resolves #3269, resolves #2880
helps with #3267
preview URL: https://profile-get-seqs.loculus.org
Summary
@fhennig correctly identified that the way we calculated status counts in getSequences was the culprit behind the inconsistent total counts.
We issued 4 independent db queries one after the other. In between statuses changed. What made it worse was that each of the calls was hugely inefficient, using a join on data use terms, ordering by accession column and much more that is totally unnecessary for the counts calculation.
The fluctuation could be fixed by higher transaction isolation but this is much much faster and efficient so should be enough (I've profiled with simple logs and tested).
Also fixes fact that log message misses mention of some filters.
Future work
There's still lots to be done but this goes some way and fixes the inconsistency bug.
We should not have this single endpoint that calculates much more than necessary. It's behind the review page slowness #3269: we really just want total numbers of sequences to review but the backend needs to send loads of unnecessary information to the website simply because we have no simpler endpoint.
Instead, we should spin up a few simple endpoints that do what the website needs - let's mark them as "non-public" in the backend API docs so that we don't need stability guarantees, they should be used solely by the website.
Screenshot
Reload the page and look at both screencasts together to see speedup.
This PR:
Before (main):
PR Checklist