fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279

corneliusroemer · 2024-11-23T03:08:48Z

resolves #3095, resolves #1612, resolves #3269, resolves #2880

helps with #3267

preview URL: https://profile-get-seqs.loculus.org

Summary

@fhennig correctly identified that the way we calculated status counts in getSequences was the culprit behind the inconsistent total counts.

We issued 4 independent db queries one after the other. In between statuses changed. What made it worse was that each of the calls was hugely inefficient, using a join on data use terms, ordering by accession column and much more that is totally unnecessary for the counts calculation.

The fluctuation could be fixed by higher transaction isolation but this is much much faster and efficient so should be enough (I've profiled with simple logs and tested).

Also fixes fact that log message misses mention of some filters.

Future work

There's still lots to be done but this goes some way and fixes the inconsistency bug.

We should not have this single endpoint that calculates much more than necessary. It's behind the review page slowness #3269: we really just want total numbers of sequences to review but the backend needs to send loads of unnecessary information to the website simply because we have no simpler endpoint.

Instead, we should spin up a few simple endpoints that do what the website needs - let's mark them as "non-public" in the backend API docs so that we don't need stability guarantees, they should be used solely by the website.

Screenshot

Reload the page and look at both screencasts together to see speedup.

This PR:

Before (main):

PR Checklist

All necessary documentation has been adapted.
The implemented feature is covered by an appropriate test.

corneliusroemer · 2024-11-23T06:47:00Z

@chaoran-chen don't approve too early, I'm not done yet. I've figured out how to solve all review page speed issues with a little extra hack 😁

haha, but it already looked good ;) but I can withdraw and review again when you're ready :)

corneliusroemer · 2024-11-23T08:46:40Z

Now it's really ready @chaoran-chen

Speeds up request!

Reduced from 2s to 80ms!

…een backend and website

fhennig · 2024-11-23T12:02:22Z

Hey, I see it's not done yet, but great work already! 🥳

corneliusroemer · 2024-11-23T16:31:54Z

@fhennig it has been done for a few hours! Perf and inconsistency fixed completely, better than I could have imagined!

website/src/pages/[organism]/submission/[groupId]/index.astro

corneliusroemer added the preview Triggers a deployment to argocd label Nov 23, 2024

corneliusroemer mentioned this pull request Nov 23, 2024

Review page: total number of sequences fluctuates and can be too high #3095

Open

corneliusroemer changed the title ~~Let's add some logs for profiling~~ fix(backend): calculate status counts efficiently and consistently with single and lean groupBy Nov 23, 2024

corneliusroemer marked this pull request as ready for review November 23, 2024 04:37

corneliusroemer requested review from fhennig, chaoran-chen, theosanderson and fengelniederhammer November 23, 2024 04:37

chaoran-chen previously approved these changes Nov 23, 2024

View reviewed changes

corneliusroemer changed the title ~~fix(backend): calculate status counts efficiently and consistently with single and lean groupBy~~ fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal Nov 23, 2024

corneliusroemer mentioned this pull request Nov 23, 2024

Is getSequences endpoint really used in full by anything? It seems no longer fit for purpose: should it be replaced? #3280

Open

corneliusroemer requested a review from chaoran-chen November 23, 2024 08:46

corneliusroemer added 10 commits November 23, 2024 09:46

Let's add some logs for profiling

38f7625

log at info

aa97419

Let's try speedup 1 (and better consistency too)

e8d79e0

Speedup a lot

dc03e34

Slight refactor

5ea9397

Simplify getProcessingResultCounts

3a5791e

Bit of a hack: don't fetch any entries by asking for page 0 of size 0.

cf5b493

Speeds up request!

Wow, the speedup is amazing!

c43f045

Reduced from 2s to 80ms!

Indicate whether pages are 0 and 1 indexed, confusing difference betw…

98e6e8b

…een backend and website

Simplify get sequences a bitm ore

541a1fe

corneliusroemer force-pushed the profile-get-seqs branch from 7e9fae0 to 541a1fe Compare November 23, 2024 08:46

corneliusroemer requested a review from anna-parker November 24, 2024 14:34

fhennig reviewed Nov 25, 2024

View reviewed changes

website/src/pages/[organism]/submission/[groupId]/index.astro Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279

fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279

corneliusroemer commented Nov 23, 2024 •

edited

Loading

corneliusroemer commented Nov 23, 2024

corneliusroemer commented Nov 23, 2024

fhennig commented Nov 23, 2024 •

edited

Loading

corneliusroemer commented Nov 23, 2024

fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279

Are you sure you want to change the base?

fix(backend,website): calculate status counts efficiently & request only aggregates for submission portal #3279

Conversation

corneliusroemer commented Nov 23, 2024 • edited Loading

Summary

Future work

Screenshot

PR Checklist

corneliusroemer commented Nov 23, 2024

corneliusroemer commented Nov 23, 2024

fhennig commented Nov 23, 2024 • edited Loading

corneliusroemer commented Nov 23, 2024

corneliusroemer commented Nov 23, 2024 •

edited

Loading

fhennig commented Nov 23, 2024 •

edited

Loading