docs: add caveats about some schema models #2433

theosanderson · 2024-08-14T21:47:25Z

We've been having some discussions in #2409 about descriptions of alternative schemas. I think my main concern isn't that we mention them but that we don't provide enough context to users about how tested they each are and how well they are expected to work. This tries to address that.

theosanderson · 2024-08-14T21:47:44Z

Old version with warnings that we are no longer using:

chaoran-chen · 2024-08-14T22:20:08Z

I am happy with the note regarding the confusing UI elements but I don't think that we should warn about the need to develop a pipeline. If someone wants to use UShER to call pango lineages or Dengue-GLUE to call dengue lineages or just want to do something with metadata that the Nextclade pipeline does not support, they also have to develop a new pipeline. Instead of warning people, I would actually want to encourage people to build new pipelines.

theosanderson · 2024-08-14T22:33:19Z

I think this comes in part from my experience working with someone to set up a Loculus instance for another organism. It was quite a hard involved process, with a lot of troubleshooting, despite a Nextclade organism already existing for their organism and them fitting into our default model! I think anyone who wants to use Loculus should probably start off by setting up an instance with an off-the-shelf pipeline (as soon as we can, we should provide an off the shelf pipeline for the no-alignment case, which could be what they would use). Once they are confident about how to do that they can move onto writing a pipeline. Trying to write a pipeline as part of deploying one's very first instance would be a really really hard learning curve - you wouldn't know if problems were due to the general configuration or the pipeline. I think we need to communicate that one of these things is a lot harder than the other.

theosanderson · 2024-08-14T22:34:17Z

And essentially, some people may want to write their own custom pipelines, and some people may want to use a Nextclade-based pipeline, and so people should know what their schema choice will mean about which of those options are available to them, IMO

chaoran-chen · 2024-08-15T08:39:48Z

I think anyone who wants to use Loculus should probably start off by setting up an instance with an off-the-shelf pipeline

I think for training/learning about Loculus, we should create a tutorial where one would set up a new instance for a specific pathogen and use an existing pipeline (and more tutorials on how to get more advanced and write new pipelines). For a real instance, however, it depends on the actual use case and requirements whether an existing pipeline is sufficient or not.

And essentially, some people may want to write their own custom pipelines, and some people may want to use a Nextclade-based pipeline, and so people should know what their schema choice will mean about which of those options are available to them, IMO

There are many factors that determine whether our Nextclade pipeline works or not, so I don't think that this should be specifically added to the schema page. What about creating a new page with a list of reusable/well-maintained preprocessing pipelines that will initially only consist of our Nextclade pipeline but can later easily extended with other pipelines from the community? We will add a short description to each pipeline about the features they support. For a instance, you can then check the list and determine whether an existing pipeline is likely to work or not. For the Nextclade pipeline, we can explictly say that it only supports the first example schema model at this moment.

In other words, I think that it is a "limitation" of the Nextclade pipeline (limitation in quotes as we may simply consider it to be out-of-scope and that's entirely fine) that it does not support certain schema models, and not a limitation of the schema models/core Loculus.

anna-parker

I like adding this warning for the multiple references for an organism - maybe we could say we are happy to help if people have issues if we want to encourage use but I also agree we should let people know because writing a preprocessing pipeline for loculus from scratch atm requires a lot of knowledge about the structure of expected input and output for the backend and website and is a lot of work.

I think we might be able to remove the warning for No alignments though and just say they can use the dummy prepro pipeline which essentially does nothing. (unless we think this is also not tested enough - in which case Im also fine with a warning)

docs/src/content/docs/for-administrators/schema-designs.mdx

chaoran-chen · 2024-08-15T10:13:46Z

I know that writing a pipeline takes effort, and setting Loculus up in general takes effort, but these are not intrinsic to the schema models, so I think that a warning in this document is inappropriate. It makes much more sense to me to mention it in a document about pipelines.

theosanderson · 2024-08-15T10:26:34Z

I think for training/learning about Loculus, we should create a tutorial where one would set up a new instance for a specific pathogen and use an existing pipeline (and more tutorials on how to get more advanced and write new pipelines).

Very much agreed!

For a real instance, however, it depends on the actual use case and requirements whether an existing pipeline is sufficient or not.

Agreed, although I think the Nextclade pipelines will be widely applicable, at least as first versions.

What about creating a new page with a list of reusable/well-maintained preprocessing pipelines that will initially only consist of our Nextclade pipeline but can later easily extended with other pipelines from the community?

This sounds good!

I know that writing a pipeline takes effort, and setting Loculus up in general takes effort, but these are not intrinsic to the schema models, so I think that a warning in this document is inappropriate.

There is currently a property of the schema that some schemas are supported by already-written pipelines and others are not. This would definitely feed into my decision as to what schema to pick, and I think that's a totally reasonable consideration. (Yes, sometimes I wouldn't have a choice about what schema to pick, but other times I would). It would also give me realistic expectations of what to expect - e.g. if I'd just followed the tutorial where I used a pre-written pipeline and everything was easy. Currently the "Getting started" docs say "the first thing to do is to pick a schema", with a link to this page. When people are making that decision I think this is relevant info.

I will hold off merging this for the moment and maybe we can chat through in the next meeting and get more feedback.

emmahodcroft · 2024-08-19T10:53:33Z

I think my general take is:

It's good to ensure people have a good understanding of where they would have more or less support at any decision points in setting up Loculus. I think pre-processing is an important part of this.

I don't think that information needs to be negative, just factual. I am not even sure if we need 'Note' boxes - just something (maybe just text) at the top of bottom that says (perhaps for each of the examples if we want to be 'equal):

Examples of this approach exist using the Nextclade preprocessing pipeline and user interface, and this may be usable for other similar set-ups.
This approach has not yet been specifically implemented with an existing preprocessing pipeline and/or user interface.

(I'm on the fence about including the UI bit, could take that out)

I think it would also be appropriate to have something in the description of the Nextclade preprocessing pipeline description that highlights that while "existing datasets exists for some pathogens, not all have them, so you might have to develop your own (Link to Nextclade list or something)". I couldn't tell if we already have this or not.

I just think this may help people to shape how much & what kind of work it would be to set up a specific type of instance, so they can figure out a balance between the resources they have and what they want.

chaoran-chen · 2024-08-19T11:54:23Z

I just open a PR for an alternative suggestion: #2450

theosanderson requested review from chaoran-chen and anna-parker August 14, 2024 21:48

theosanderson mentioned this pull request Aug 14, 2024

docs: describe ways to design the schema #2409

Merged

anna-parker approved these changes Aug 15, 2024

View reviewed changes

docs/src/content/docs/for-administrators/schema-designs.mdx Show resolved Hide resolved

chaoran-chen mentioned this pull request Aug 19, 2024

docs: add caveats about some schema models (alternative) #2450

Open

theosanderson and others added 3 commits August 19, 2024 15:34

docs: add caveats about some schema models

e6862f9

Update schema-designs.mdx

645cb01

update

8cf67df

theosanderson force-pushed the docs--add-caveats-about-some-schema-models branch from de01fe9 to 8cf67df Compare August 19, 2024 14:34

theosanderson added preview Triggers a deployment to argocd and removed preview Triggers a deployment to argocd labels Aug 19, 2024

corneliusroemer added preview Triggers a deployment to argocd and removed preview Triggers a deployment to argocd labels Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add caveats about some schema models #2433

docs: add caveats about some schema models #2433

theosanderson commented Aug 14, 2024 •

edited

Loading

theosanderson commented Aug 14, 2024 •

edited

Loading

chaoran-chen commented Aug 14, 2024

theosanderson commented Aug 14, 2024

theosanderson commented Aug 14, 2024

chaoran-chen commented Aug 15, 2024 •

edited

Loading

anna-parker left a comment

chaoran-chen commented Aug 15, 2024

theosanderson commented Aug 15, 2024 •

edited

Loading

emmahodcroft commented Aug 19, 2024

chaoran-chen commented Aug 19, 2024

docs: add caveats about some schema models #2433

Are you sure you want to change the base?

docs: add caveats about some schema models #2433

Conversation

theosanderson commented Aug 14, 2024 • edited Loading

theosanderson commented Aug 14, 2024 • edited Loading

chaoran-chen commented Aug 14, 2024

theosanderson commented Aug 14, 2024

theosanderson commented Aug 14, 2024

chaoran-chen commented Aug 15, 2024 • edited Loading

anna-parker left a comment

Choose a reason for hiding this comment

chaoran-chen commented Aug 15, 2024

theosanderson commented Aug 15, 2024 • edited Loading

emmahodcroft commented Aug 19, 2024

chaoran-chen commented Aug 19, 2024

theosanderson commented Aug 14, 2024 •

edited

Loading

theosanderson commented Aug 14, 2024 •

edited

Loading

chaoran-chen commented Aug 15, 2024 •

edited

Loading

theosanderson commented Aug 15, 2024 •

edited

Loading