Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations to support large tracklists #4499

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Jul 26, 2024

This is a re-opening of the frozen tracks PR

It will be a challenging PR to get fully merged but interested users can try the branch out

Two optimizations were added to frozen_tracks4 today including (a) removing the clone module, which was accidentally quadratic and introduced big slowdowns after around 64000 tracks and (b) a change to the generateHierarchy function for the track selector to make it faster

cc @Maarten-vd-Sande

this branch (frozen_tracks4)

2.json  0.956s total
4.json  0.944s total
8.json  0.944s total
16.json  0.949s total
32.json  0.960s total
64.json  0.959s total
128.json  0.923s total
256.json  0.905s total
512.json  0.941s total
1024.json  0.934s total
2048.json  0.911s total
4096.json  0.995s total
8192.json  1.069s total
16384.json  1.097s total
32768.json  1.183s total
65536.json  1.463s total
131072.json  2.021s total
262144.json  3.212s total

before today on frozen_tracks4

2.json  4.234s total
4.json  0.935s total
8.json  0.917s total
16.json  0.931s total
32.json  0.987s total
64.json  0.918s total
128.json  0.943s total
256.json  0.904s total
512.json  0.948s total
1024.json  0.991s total
2048.json  0.960s total
4096.json  1.079s total
8192.json  1.369s total
16384.json  2.578s total
32768.json  7.298s total
65536.json  25.797s total
131072.json  1m34.86s total
...timeout at 262144...

main branch

2.json  0.935s total
4.json  1.044s total
8.json  1.021s total
16.json  1.006s total
32.json  1.086s total
64.json  1.216s total
128.json  1.491s total
256.json  1.981s total
512.json  2.828s total
1024.json  4.876s total
2048.json  8.627s total
4096.json  16.225s total
8192.json  42.861s total
...timeout at 16384...

some code for testing https://github.com/cmdcolin/jb2-large-tracklist-profiling

@cmdcolin cmdcolin changed the title Optimizations to support for large tracklists Optimizations to support large tracklists Jul 26, 2024
@cmdcolin cmdcolin added the scalability related to speed and/or scalability label Jul 26, 2024
@Maarten-vd-Sande
Copy link
Contributor

Maarten-vd-Sande commented Jul 29, 2024

While the initial load is very fast 🥳 , it introduced a bug loading tracks:

Error: HTTP 404 fetching data/dm61_FS_ampliconset-dec-2023_800nt.ampliconref/alignments/24-FS-13/P001__WB09__24-FS-13__24MB03939-1034_64a6acfdfd.cram bytes 0-131071
../../../packages/core/util/io/RemoteFileWithRangeCache.ts:94:13 (fetchBinaryRange@)

JBrowse 2.13.0

It even happens with the test data:

Error: HTTP 404 fetching volvox.filtered.vcf.gz.tbi
../../../node_modules/generic-filehandle/src/remoteFile.ts:171:13 (readFile@)

JBrowse 2.13.0

@cmdcolin
Copy link
Collaborator Author

interesting. i'm not sure if i fixed it but i pushed another change just now that could potentially help...can try to refetch the branch potentially with jbrowse upgrade --branch frozen_tracks4 again...

@Maarten-vd-Sande
Copy link
Contributor

Yes that seems to work! 🙏

@Maarten-vd-Sande
Copy link
Contributor

So far I'm very grateful for all the changes. I don’t mean to push for more, but I wanted to share some observations based on my experience with +/- 100,000 tracks across two assemblies (~50,000 each):

  • initial load is really fast (2 secs), very happy with that ⚡
  • opening the track selector takes about 7 seconds. Not too bad, but the page looks like it's frozen.
  • Opening a track either in the track selector or faceted track selector takes also around 7 seconds. Clicking on other tracks while it is already loading actually does queue the next track to be loaded, but also prolongs the time of "frozenness".
  • Searching for text is decently fast (+/- 2 seconds), especially considering the amount of metadata I attach to the tracks, but the page again freezes.
  • Opening from url query string takes pretty long. For example, 4 bam tracks, the assembly fasta, and 2 gtf tracks takes 90 seconds to load. During this time the page is frozen again.

These are just some positive experiences and pain points I’ve encountered. Please feel free to ignore if it’s not actionable right now, I just wanted to share in case it helps!

Remove clone. It seems to be unneeded, even though it will mutate the original object, this doesn't seem like it should matter

Another optimization to the track selector

Use structuredClone

Misc lint fixes

Misc
@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Nov 19, 2024

@Maarten-vd-Sande thanks for commenting. just to aid reproducibility here are some links that can be used for testing (same configs used for the small benchmark above)

there might be some opportunities for optimizing but it would take a bit of a deep dive probably. we use this library for rendering the tree and it exposes a fair amount of low level stuff. https://github.com/Lodin/react-vtree

i made a comment on their repo (awhile ago!) that resulted in some discussion about "filtering the tree" but it is actually somewhat more of a fundamental operation for recomputing the tree which happens when a track opens and closes, etc. Lodin/react-vtree#61

local dev links

http://localhost:3000/?config=https://jbrowse.org/demos/large_configs/8192.json
http://localhost:3000/?config=https://jbrowse.org/demos/large_configs/16384.json
http://localhost:3000/?config=https://jbrowse.org/demos/large_configs/32768.json
http://localhost:3000/?config=https://jbrowse.org/demos/large_configs/65536.json
http://localhost:3000/?config=https://jbrowse.org/demos/large_configs/131072.json
http://localhost:3000/?config=https://jbrowse.org/demos/large_configs/262144.json

live version of frozen_tracks4

https://jbrowse.org/code/jb2/frozen_tracks4/?config=https://jbrowse.org/demos/large_configs/8192.json
https://jbrowse.org/code/jb2/frozen_tracks4/?config=https://jbrowse.org/demos/large_configs/16384.json
https://jbrowse.org/code/jb2/frozen_tracks4/?config=https://jbrowse.org/demos/large_configs/32768.json
https://jbrowse.org/code/jb2/frozen_tracks4/?config=https://jbrowse.org/demos/large_configs/65536.json
https://jbrowse.org/code/jb2/frozen_tracks4/?config=https://jbrowse.org/demos/large_configs/131072.json
https://jbrowse.org/code/jb2/frozen_tracks4/?config=https://jbrowse.org/demos/large_configs/262144.json

@Maarten-vd-Sande
Copy link
Contributor

Thanks for your response. When it's a bit quieter here I will try to study this a bit more consistently. Because indeed, the 26k tracks load more than twice as fast as the 50k-ish tracks I have here.. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scalability related to speed and/or scalability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants