Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve review page loading times #3189

Closed
wants to merge 2 commits into from
Closed

Conversation

fhennig
Copy link
Contributor

@fhennig fhennig commented Nov 7, 2024

resolves #1000

preview URL: https://improve-review-load.loculus.org

Summary

This PR changes it so that all data for the whole page is fetched in a single request. Before, every ReviewCard made a separate request for its own data.

Screenshot

n/A -- no visual changes.

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.

@fhennig fhennig added the preview Triggers a deployment to argocd label Nov 7, 2024
@fhennig fhennig changed the base branch from main to feat/3031-improve-review-page November 11, 2024 15:52
@fhennig fhennig changed the title Improve review load (WIP) feat: improve review page loading times Nov 12, 2024
Base automatically changed from feat/3031-improve-review-page to main November 12, 2024 22:52
Copy link
Member

@theosanderson theosanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow review, and sorry that this issue wasn't fully specced. Unfortunately I think the PR at present is problematic as it could use 20 MB of bandwidth every few seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we were to make the changes in this PR we need to not send sequences in the endpoint used on the review page. This is is because:

  • sequences for some viruses are 200 kb, 100 per page = 20 MB, polling constantly = a lot of bandwidth

We may want to delay making any change here for now for simplicity to avoid having to add extra types etc.

@fhennig
Copy link
Contributor Author

fhennig commented Nov 18, 2024

Thanks for bringing that up Theo! I didn't realize that the sequences were in there too. In that case you're right, I think it's best to shelve this again for now, as we have something that's currently working and we initially thought that this might be a "quick win".

I'm closing this for now.

@corneliusroemer
Copy link
Contributor

It's not really working well though. Load times of 15+ seconds if there are 15k sequences aren't really acceptable.

@fhennig
Copy link
Contributor Author

fhennig commented Nov 21, 2024

Maybe worth discussing next monday? pinging @chaoran-chen

I haven't looked into it that much, but it seems like it might be a good idea to try and redesign it so we can send the metadata withtout the raw sequence data - which is the largest chunk of the data in my understanding. Maybe us 3 (Theo, Cornelius and me) can have a refinement meeting on this ticket and see what would need to get done for this, and then we can decide if the amount of work is something we want to put in now or later.

@chaoran-chen
Copy link
Member

Sounds good, I added it to the agenda for next Monday.

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Nov 21, 2024

There might be multiple potential causes here:

  • sending sequence data (totally unnecessary) - but that shouldn't itself cause that much of a slowdown alone especially as many of the sequences loaded for WNV are very short there isn't that much data to be transferred in practice.
  • slow performance of backend endpoint even for small number of sequences, maybe due to a sort or something like that in the SQL, see e.g. Review page (and /get-sequences endpoint) is slow #1612
  • we also send lots of nulls for processed data which is wasteful, we only care about the non-null fields (can assume nulls for the others)

I've made an issue #3269

@theosanderson
Copy link
Member

The issue might be uncompressing the sequence data rather than sending it (we at most send a page's worth of data IIUC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rework get-sequences endpoint to return metadata
4 participants