Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

corneliusroemer
Copy link
Contributor

@corneliusroemer corneliusroemer commented Nov 23, 2024

resolves #3095, resolves #1612, resolves #3269, resolves #2880

helps with #3267

preview URL: https://profile-get-seqs.loculus.org

Summary

@fhennig correctly identified that the way we calculated status counts in getSequences was the culprit behind the inconsistent total counts.

We issued 4 independent db queries one after the other. In between statuses changed. What made it worse was that each of the calls was hugely inefficient, using a join on data use terms, ordering by accession column and much more that is totally unnecessary for the counts calculation.

The fluctuation could be fixed by higher transaction isolation but this is much much faster and efficient so should be enough (I've profiled with simple logs and tested).

Also fixes fact that log message misses mention of some filters.

Future work

There's still lots to be done but this goes some way and fixes the inconsistency bug.

We should not have this single endpoint that calculates much more than necessary. It's behind the review page slowness #3269: we really just want total numbers of sequences to review but the backend needs to send loads of unnecessary information to the website simply because we have no simpler endpoint.

Instead, we should spin up a few simple endpoints that do what the website needs - let's mark them as "non-public" in the backend API docs so that we don't need stability guarantees, they should be used solely by the website.

Screenshot

Reload the page and look at both screencasts together to see speedup.

This PR:
2024-11-23 07 49 50

Before (main):
2024-11-23 05 39 47

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.

@corneliusroemer corneliusroemer added the preview Triggers a deployment to argocd label Nov 23, 2024
@corneliusroemer corneliusroemer changed the title Let's add some logs for profiling fix(backend): calculate status counts efficiently and consistently with single and lean groupBy Nov 23, 2024
@corneliusroemer corneliusroemer marked this pull request as ready for review November 23, 2024 04:37
chaoran-chen
chaoran-chen previously approved these changes Nov 23, 2024
@corneliusroemer
Copy link
Contributor Author

@chaoran-chen don't approve too early, I'm not done yet. I've figured out how to solve all review page speed issues with a little extra hack 😁

@corneliusroemer corneliusroemer changed the title fix(backend): calculate status counts efficiently and consistently with single and lean groupBy fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal Nov 23, 2024
@chaoran-chen chaoran-chen dismissed their stale review November 23, 2024 07:23

haha, but it already looked good ;) but I can withdraw and review again when you're ready :)

@corneliusroemer
Copy link
Contributor Author

Now it's really ready @chaoran-chen

@fhennig
Copy link
Contributor

fhennig commented Nov 23, 2024

Hey, I see it's not done yet, but great work already! 🥳

@corneliusroemer
Copy link
Contributor Author

@fhennig it has been done for a few hours! Perf and inconsistency fixed completely, better than I could have imagined!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
3 participants