docs: describe ways to design the schema #2409

chaoran-chen · 2024-08-09T20:19:47Z

I reused a Slack message I sent a while ago and added a page to the docs to describe different ways to model the schema of a Loculus instance. Very happy to receive suggestions for improvement!

docs/src/content/docs/for-administrators/schema-designs.md

chaoran-chen · 2024-08-14T14:53:18Z

@theosanderson, @anna-parker, thanks for your reviews! I adapted the text, could you please check whether this is better?

anna-parker · 2024-08-14T17:45:28Z

docs/src/content/docs/for-administrators/schema-designs.md

+
+### Multiple references for an organism
+
+An organism has one unaligned sequence (per segment) but multiple aligned ones. Users submit an unaligned nucleotide sequence and the processing pipeline aligns it against all multiple references.


Maybe still add this would require a custom prepro pipeline - like this it sounds like prepro could do this now and it can't.

I have low confidence that this would work at all atm even with a custom pipeline (I could imagine that it might but I'm just not remotely confident, given we haven't tried it and given that often when one first tries something one finds a problem.

I added a general note that the development of a preprocessing pipeline might be needed. I didn't make it model-specific because I would consider the need to develop a preprocessing pipeline as very "normal". We provide one opinionated pipeline with a few features but in many cases, I imagine that someone would want to develop a new pipeline.

To give an example of what I mean: presumably the custom-built preprocessing pipeline would take as input e.g. the L segment (which would be the unaligned sequence), and would have the name L and would output two alignments LalignedToSequenceX and LalignedToSequenceY (not necessarily those names, but some name like them). But it looks like our code for displaying aligned sequences iterates over the same nucleotideSegmentNames for both aligned and unaligned sequences:

loculus/website/src/components/SequenceDetailsPage/SequencesContainer.tsx

Line 154 in cec2fa9

{nucleotideSegmentNames.map((segmentName) => (

IMO we can't suggest this model while that's the case

What one can do is to have all three segments: L, LalignedToSequenceX and LalignedToSequenceY. Users only upload sequences to the L segment whereas the pipeline will generate aligned sequences for the other two. The UI at this moment wouldn't be very nice but once we fix #2376, it should be mostly alright. (Still not perfect as in the download panel, one would be able to select "Aligned L" and "Unaligned LalignedToSequenceX" but I wouldn't consider it as a blocker.)

Oh, interesting idea. But if we do that (if I'm understanding correctly), then our UI will show up like this for submission:

and fairly soon I think we would like to add submission through a form as an option, and that will then have a box for people to paste their LalignedtoX sequence. Actually potentially our edit form might already show this? (not sure)

I know it's not ideal but whether this is acceptable depends a lot on the use case in my opinion. For a public-facing instance with many external users, this is probably problematic. For a lab-internal instance or if someone would like to build a dashboard or run analysis on top of LAPIS and use Loculus just for storing and preprocessing the data (as we do in GenSpectrum), this might be entirely fine.

I agree! I think my issue is that we don't really communicate to the user that this model is a hack that may work, which we haven't tested, whereas the "default" model we have tried a lot. (We now have a caveat that we haven't tried all the models, but we don't say which). I expect in the future we will actually design features around the ability to do this, at which point the configuration would distinguish between the input sequences and the aligned sequences, so we wouldn't have these issues. I made #2433 to try to make clear to users that we're not in that place yet, if that makes sense

docs/src/content/docs/for-administrators/schema-designs.md

Co-authored-by: Theo Sanderson <[email protected]>

chaoran-chen · 2024-08-14T20:17:59Z

Thanks for the feedback, @anna-parker and @theosanderson! I'll merge this now but I agree that it's not perfect and am happy to continue improving it.

chaoran-chen requested review from theosanderson, corneliusroemer and anna-parker August 9, 2024 20:19

theosanderson reviewed Aug 9, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Show resolved Hide resolved

theosanderson reviewed Aug 9, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Show resolved Hide resolved

theosanderson reviewed Aug 9, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Show resolved Hide resolved

anna-parker reviewed Aug 10, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Outdated Show resolved Hide resolved

chaoran-chen force-pushed the docs-organism-modeling branch from f7d5ae5 to 1bf1d2b Compare August 13, 2024 11:06

chaoran-chen added 2 commits August 14, 2024 16:20

docs: describe ways to design the schema

ad7bd6a

address review comments

a369c82

chaoran-chen force-pushed the docs-organism-modeling branch from 1bf1d2b to a369c82 Compare August 14, 2024 14:52

chaoran-chen requested review from theosanderson and anna-parker August 14, 2024 14:53

anna-parker approved these changes Aug 14, 2024

View reviewed changes

theosanderson reviewed Aug 14, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Outdated Show resolved Hide resolved

theosanderson reviewed Aug 14, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Outdated Show resolved Hide resolved

theosanderson reviewed Aug 14, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Outdated Show resolved Hide resolved

theosanderson reviewed Aug 14, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.md Show resolved Hide resolved

chaoran-chen and others added 2 commits August 14, 2024 22:04

Apply suggestions from code review

1e5001b

Co-authored-by: Theo Sanderson <[email protected]>

Address review comments

03963ae

chaoran-chen merged commit cec2fa9 into main Aug 14, 2024
10 checks passed

chaoran-chen deleted the docs-organism-modeling branch August 14, 2024 20:18

theosanderson mentioned this pull request Aug 14, 2024

docs: add caveats about some schema models #2433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: describe ways to design the schema #2409

docs: describe ways to design the schema #2409

chaoran-chen commented Aug 9, 2024

chaoran-chen commented Aug 14, 2024

anna-parker Aug 14, 2024

theosanderson Aug 14, 2024

chaoran-chen Aug 14, 2024

theosanderson Aug 14, 2024

chaoran-chen Aug 14, 2024 •

edited

Loading

theosanderson Aug 14, 2024

chaoran-chen Aug 14, 2024

theosanderson Aug 14, 2024

chaoran-chen commented Aug 14, 2024


		### Multiple references for an organism

		An organism has one unaligned sequence (per segment) but multiple aligned ones. Users submit an unaligned nucleotide sequence and the processing pipeline aligns it against all multiple references.

docs: describe ways to design the schema #2409

docs: describe ways to design the schema #2409

Conversation

chaoran-chen commented Aug 9, 2024

chaoran-chen commented Aug 14, 2024

anna-parker Aug 14, 2024

Choose a reason for hiding this comment

theosanderson Aug 14, 2024

Choose a reason for hiding this comment

chaoran-chen Aug 14, 2024

Choose a reason for hiding this comment

theosanderson Aug 14, 2024

Choose a reason for hiding this comment

chaoran-chen Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

theosanderson Aug 14, 2024

Choose a reason for hiding this comment

chaoran-chen Aug 14, 2024

Choose a reason for hiding this comment

theosanderson Aug 14, 2024

Choose a reason for hiding this comment

chaoran-chen commented Aug 14, 2024

chaoran-chen Aug 14, 2024 •

edited

Loading