-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CCHFV to loculus #1920
Add CCHFV to loculus #1920
Conversation
Update format_segmented_viruses to also update metadata. Add cchfv nextclade_dataset (still need to figure out how to add clade_memberships to the auspice trees) and start to modify the preprocessing pipeline to allow for multiple segments. Add correct trees to nextclade_datasets and update preprocessing pipelines to take multiple segments.
…nfigs and a dict in referencegenome configs.
Thanks Theo! @corneliusroemer just created #2100 as a review - I am looking through his suggestions and will merge them into this branch before pushing this branch to main |
I don't mind either way, you could merge as is and I could make my PR a refactor PR, or Anya goes through it and we merge into this one first. I should be able to finish my review/refactor today so it should be ok to wait. But if I stall, feel free to merge this one here. |
It works, but is a bit brittle We probably want to use `nextclade_split` in the future rather than header information
…o level lower than DEBUG we should use INFO as default, and debug only when we're actually debugging
…n change Conditionally include segmented rules to avoid ruleorder ambiguity
…nput is that this way snakemake will automatically rerun the rule when the script has been changed during development
…case nucleotideSequences -> nucleotide_sequences for consistency
As discussed offline with @corneliusroemer - I merged all of Cornelius' suggestions except for the refactor of the |
I might have been unclear but I wasn't done yet, just made it to mid-ingest 😀 but as discussed, happy to be merge and I'll just keep a review branch. |
Amazing work, thanks @anna-parker! |
@@ -127,10 +145,6 @@ To be able to run tests independently, we should use UUIDs for mock data. Curren | |||
|
|||
One complication for testing is that we don't have ARM containers for the backend yet (see <https://github.com/loculus-project/loculus/issues/1765>). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do now 😀
ingest/Snakefile
Outdated
rule prepare_metadata: | ||
input: | ||
metadata="results/metadata_post_rename.tsv", | ||
metadata=get_prepare_metadata(SEGMENTED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a function here, we can just use an inline foo if boolean else baz
expression. Calling the variable wildcard looks like a ChatGPT hallucination 😀
@@ -45,7 +46,7 @@ def split_authors(authors: str) -> str: | |||
else: | |||
result.append(single_split[i].strip()) | |||
|
|||
return ", ".join(result) | |||
return ", ".join(sorted(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where authors get sorted @theosanderson
Summary
This PR adds the multi-segmented pathogen CCHFV to loculus. It does the following:
Update ingest pipeline to work for multi-segmented viruses: take info from multiple segments, merge metadata, compare hashes with multiple segments, implement reingest for multiple segments.
Add cchfv nextclade_dataset.
Add
segmented
option to metadata items in values.yaml. Adding thesegmented
option means that the metadata item is different for each segment. Thus, the config files will create #(segments) metadata items, each with the segment name concatenated to the end of the name, i.e.name_segment
.Update preprocessing pipelines: take multi segmented input.
resolves #1915
preview URL: https://ccfv.loculus.org/cchf/search
Screenshot
PR Checklist
The implemented feature is covered by an appropriate test.-> moved to a later PR