-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CCHFV to loculus #1920
Merged
Merged
Add CCHFV to loculus #1920
Changes from 60 commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
9277b4b
Add ccfv to yaml
anna-parker 21d2877
Fix ingest for single segment case
anna-parker d65a076
Fix: values.yaml - nucleotideSequences need to be a list in prepro co…
anna-parker fef7ebe
Add correct genome annotations from NCBI
anna-parker c323743
Update configs to use githubusercontent for nextclade_datasets.
anna-parker 309dfeb
Use new dataset link
anna-parker c94ba9f
Fix preprocessing issues after default values.yaml changes.
anna-parker 4756ffe
Add segmented as a config param
anna-parker 49ff8e2
Join segments based on isolate name.
anna-parker 1d9df16
Fix some prepro issues
anna-parker e0f8801
Add default config changes
anna-parker 9dea930
Update silo configs
anna-parker b3c7645
Remove preprocessing temp results file.
anna-parker 530cb30
Fix cchfv table columns as metdata has now been renamed.
anna-parker 5717cd4
Fix author_affiliations
anna-parker 61bb4f7
Merge branch 'main' into ccfv
anna-parker 7069e0b
Merge branch 'ccfv' of github.com:loculus-project/loculus into ccfv
anna-parker 9d1eb2a
Fix merge issues with instanceName.
anna-parker f375fb7
Merge branch 'main' into ccfv
anna-parker 1d915a1
Fix prepare_metdata bug.
anna-parker 320d1f3
Merge branch 'ccfv' of github.com:loculus-project/loculus into ccfv
anna-parker 9b299d4
Add back missing website metadata.
anna-parker 303c630
Fix author list sorting, fix displayName.
anna-parker 6e6e75c
Fix values.yaml
anna-parker ddbc4d9
Fix reingest.
anna-parker 7afc4ed
Add segmented to ingest configs and make use in scripts consistent.
anna-parker 60f5992
Update README.
anna-parker 7e70b73
Fix little ingest bug
anna-parker 1c6d8ea
Refactor ingest to make steps clearer.
anna-parker 31d1af1
Fix webpage bug.
anna-parker b14e4ee
Small prepro fixes
anna-parker 1c0c841
Remove unnecessary files from gitignore
anna-parker 0fbb72c
Merge branch 'main' into ccfv
anna-parker 38778f3
Small fixes
anna-parker 80c52de
Merge branch 'ccfv' of github.com:loculus-project/loculus into ccfv
anna-parker 4fdfd1b
Clean up preprocessing
anna-parker fc6c7e4
add args
anna-parker f535691
Use links to sequences instead of full sequences in values.yaml.
anna-parker 3cbbcc7
Merge branch 'main' into ccfv
anna-parker 498a88d
Fix little bug
anna-parker 3e1c6e0
Merge branch 'ccfv' of github.com:loculus-project/loculus into ccfv
anna-parker 8b85432
Fix length bug
anna-parker c177584
Merge branch 'main' into ccfv
anna-parker aacad98
Merge branch 'ccfv' of github.com:loculus-project/loculus into ccfv
anna-parker 4b48c46
Fix merge bug
anna-parker 3e5377d
Make check stricter
anna-parker 0591f09
Update docs
anna-parker 16bb0aa
Merge remote-tracking branch 'origin/main' into ccfv
anna-parker eaba61f
Merge branch 'main' into ccfv
anna-parker b16026e
Fix prepro bug introduced by merge
anna-parker aab29a1
Merge branch 'ccfv' of github.com:loculus-project/loculus into ccfv
anna-parker 7e512a7
Remove ncbi_length from defaults - this was removed from values.yaml …
anna-parker 0186f52
Update READMEs with suggestions.
anna-parker a29ca19
Resolve some issues
anna-parker 8ede192
Change `segmented` to `per_segment`.
anna-parker 28c9402
Remove the requirement for adding `segmented:True` to the config.yaml
anna-parker 9026276
Fix backend bug
anna-parker f77b447
Fix bug
anna-parker 9c14af7
Second try to fix bug
anna-parker 22c29dd
Merge branch 'main' into ccfv
corneliusroemer 542606d
Add dag for segmented
corneliusroemer 0914133
Simplify segmentation inference
corneliusroemer 18db327
Remove unnecessary/confusing functions
corneliusroemer 3cb089a
Simplify extraction script, DRYer
corneliusroemer ef375a4
Reorder to never have rules do forward references
corneliusroemer 94b9706
Remove unused function
corneliusroemer fe20091
Keep top level dir clean by moving images to folder
corneliusroemer 3fb9060
Review segment parsing script
corneliusroemer 47afdea
Switch default log level to INFO, debug is very verbose and there's n…
corneliusroemer 14d784d
Log a few important lines at INFO, not everything at debug only
corneliusroemer 1fbc79c
Avoid a very broad try/except block, if necessary, use in more locali…
corneliusroemer c6d0ceb
Mention all config in `params:` blocks, so snakemake can rerun rule o…
corneliusroemer 1012ecb
Use input.script consistently (the advantage of using the script as i…
corneliusroemer 2cfdb12
All config files to be used by Python MUST use snake case, not camel …
corneliusroemer de3461b
Fix ruff lints and unnecessary indentations
corneliusroemer fac633e
Update documentation of group_segments
anna-parker 8543f6c
Fix issues raised in get_segment_details
anna-parker df9c57d
Fix weird error I introduced when merging changes
anna-parker 77119db
Go back to old regex as this catches more cases.
anna-parker bcefc7c
Update ingest config file
anna-parker cd19b1c
Merge branch 'main' into ccfv
anna-parker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,11 +10,20 @@ for key, value in defaults.items(): | |
if not key in config: | ||
config[key] = value | ||
|
||
# Check if organism is segmented | ||
if "nucleotideSequences" not in config: | ||
config["nucleotideSequences"] = ["main"] | ||
config["segmented"] = not ( | ||
anna-parker marked this conversation as resolved.
Show resolved
Hide resolved
|
||
len(config["nucleotideSequences"]) == 1 | ||
and config["nucleotideSequences"][0] == "main" | ||
) | ||
|
||
Path("results").mkdir(parents=True, exist_ok=True) | ||
with open("results/config.yaml", "w") as f: | ||
f.write(yaml.dump(config)) | ||
|
||
TAXON_ID = config["taxon_id"] | ||
SEGMENTED = config["segmented"] | ||
ALL_FIELDS = ",".join(config["all_fields"]) | ||
COLUMN_MAPPING = config["column_mapping"] | ||
LOG_LEVEL = config.get("log_level", "INFO") | ||
|
@@ -50,17 +59,54 @@ rule fetch_ncbi_dataset_package: | |
""" | ||
|
||
|
||
def get_extract_output(wildcard): | ||
if wildcard: | ||
return ("results/sequences_full.fasta",) | ||
else: | ||
return ("results/sequences.fasta",) | ||
|
||
|
||
rule extract_ncbi_dataset_sequences: | ||
input: | ||
dataset_package="results/ncbi_dataset.zip", | ||
output: | ||
ncbi_dataset_sequences="results/sequences.fasta", | ||
ncbi_dataset_sequences=get_extract_output(SEGMENTED), | ||
params: | ||
segmented=SEGMENTED, | ||
shell: | ||
""" | ||
unzip -jp {input.dataset_package} \ | ||
if [[ {params.segmented} ]]; then | ||
unzip -jp {input.dataset_package} \ | ||
ncbi_dataset/data/genomic.fna \ | ||
| seqkit seq -w0 \ | ||
> {output.ncbi_dataset_sequences} | ||
else | ||
unzip -jp {input.dataset_package} \ | ||
ncbi_dataset/data/genomic.fna \ | ||
| seqkit seq -i -w0 \ | ||
> {output.ncbi_dataset_sequences} | ||
| seqkit seq -i -w0 \ | ||
> {output.ncbi_dataset_sequences} | ||
fi | ||
""" | ||
|
||
|
||
rule get_segment_details: | ||
"""Check if viruses are segmented, if so add segment to metadata""" | ||
input: | ||
sequences="results/sequences_full.fasta", | ||
script="scripts/get_segment_details.py", | ||
ncbi_dataset_tsv="results/metadata_post_rename.tsv", | ||
config="results/config.yaml", | ||
output: | ||
sequences_processed="results/sequences.fasta", | ||
ncbi_dataset_tsv="results/metadata_post_segment.tsv", | ||
shell: | ||
""" | ||
python {input.script} \ | ||
--config-file {input.config} \ | ||
--input-seq {input.sequences} \ | ||
--input-metadata {input.ncbi_dataset_tsv} \ | ||
--output-seq {output.sequences_processed} \ | ||
--output-metadata {output.ncbi_dataset_tsv} | ||
""" | ||
|
||
|
||
|
@@ -106,9 +152,16 @@ rule rename_columns: | |
rename_columns(input.ncbi_dataset_tsv, output.ncbi_dataset_tsv) | ||
|
||
|
||
def get_prepare_metadata(wildcard): | ||
if wildcard: | ||
return ("results/metadata_post_segment.tsv",) | ||
else: | ||
return ("results/metadata_post_rename.tsv",) | ||
|
||
|
||
rule prepare_metadata: | ||
input: | ||
metadata="results/metadata_post_rename.tsv", | ||
metadata=get_prepare_metadata(SEGMENTED), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for a function here, we can just use an inline |
||
sequence_hashes="results/sequence_hashes.json", | ||
config="results/config.yaml", | ||
script="scripts/prepare_metadata.py", | ||
|
@@ -127,6 +180,36 @@ rule prepare_metadata: | |
""" | ||
|
||
|
||
rule group_segments: | ||
input: | ||
metadata="results/metadata_post_prepare.json", | ||
sequences="results/sequences.json", | ||
config="results/config.yaml", | ||
script="scripts/group_segments.py", | ||
output: | ||
metadata="results/metadata_post_group.json", | ||
sequences="results/sequences_post_group.json", | ||
params: | ||
log_level=LOG_LEVEL, | ||
shell: | ||
""" | ||
python scripts/group_segments.py \ | ||
--config-file {input.config} \ | ||
--input-metadata {input.metadata} \ | ||
--input-seq {input.sequences} \ | ||
--output-metadata {output.metadata} \ | ||
--output-seq {output.sequences} \ | ||
--log-level {params.log_level} \ | ||
""" | ||
|
||
|
||
def get_grouped_metadata(wildcard): | ||
if wildcard: | ||
return ("results/metadata_post_group.json",) | ||
else: | ||
return ("results/metadata_post_prepare.json",) | ||
|
||
|
||
rule get_previous_submissions: | ||
"""Download metadata and sequence hashes of all previously submitted sequences | ||
Produces mapping from INSDC accession to loculus id/version/hash | ||
|
@@ -142,7 +225,7 @@ rule get_previous_submissions: | |
... | ||
""" | ||
input: | ||
prepped_metadata="results/metadata_post_prepare.json", # Reduce likelihood of race condition of multi-submission | ||
prepped_metadata=get_grouped_metadata(SEGMENTED), # Reduce likelihood of race condition of multi-submission | ||
config="results/config.yaml", | ||
script="scripts/call_loculus.py", | ||
output: | ||
|
@@ -163,8 +246,9 @@ rule get_previous_submissions: | |
|
||
rule compare_hashes: | ||
input: | ||
config="results/config.yaml", | ||
old_hashes="results/previous_submissions.json", | ||
metadata="results/metadata_post_prepare.json", | ||
metadata=get_grouped_metadata(SEGMENTED), | ||
script="scripts/compare_hashes.py", | ||
output: | ||
to_submit="results/to_submit.json", | ||
|
@@ -177,6 +261,7 @@ rule compare_hashes: | |
shell: | ||
""" | ||
python scripts/compare_hashes.py \ | ||
--config-file {input.config} \ | ||
--old-hashes {input.old_hashes} \ | ||
--metadata {input.metadata} \ | ||
--to-submit {output.to_submit} \ | ||
|
@@ -188,10 +273,18 @@ rule compare_hashes: | |
""" | ||
|
||
|
||
def get_grouped_sequences(wildcard): | ||
if wildcard: | ||
return ("results/sequences_post_group.json",) | ||
else: | ||
return ("results/sequences.json",) | ||
|
||
|
||
rule prepare_files: | ||
input: | ||
metadata="results/metadata_post_prepare.json", | ||
sequences="results/sequences.json", | ||
config="results/config.yaml", | ||
metadata=get_grouped_metadata(SEGMENTED), | ||
sequences=get_grouped_sequences(SEGMENTED), | ||
to_submit="results/to_submit.json", | ||
to_revise="results/to_revise.json", | ||
script="scripts/prepare_files.py", | ||
|
@@ -203,6 +296,7 @@ rule prepare_files: | |
shell: | ||
""" | ||
python scripts/prepare_files.py \ | ||
--config-file {input.config} \ | ||
--metadata-path {input.metadata} \ | ||
--sequences-path {input.sequences} \ | ||
--to-submit-path {input.to_submit} \ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do now 😀