Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextclade v3 #1185

Merged
merged 704 commits into from
Aug 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
704 commits
Select commit Hold shift + click to select a range
e701b12
fix: compare the correct keys in finetune loops
rneher Jul 13, 2023
db61d87
fix: set labels even if they are not present in the cloned node
ivan-aksamentov Jul 13, 2023
04dc460
docs: add FIXME comment on double mutations
rneher Jul 13, 2023
add0d66
feat: improved stripe construction and visualization
corneliusroemer Jul 13, 2023
cd5a145
feat: bring back the old version of split_muts()
ivan-aksamentov Jul 13, 2023
4a2c0a2
feat: reimplement mut split, union and diff using iterators
ivan-aksamentov Jul 14, 2023
3dadf06
feat: set operations with iterators, handle state at end of loop
rneher Jul 14, 2023
75a6e66
feat: additional case for mutation splitting in set difference
rneher Jul 14, 2023
ad3e1fd
refactor: lint
ivan-aksamentov Jul 14, 2023
d64d985
refactor: make mut split, union and diff functions fallible
ivan-aksamentov Jul 14, 2023
43fd63e
feat: impl Display trait for deletions
ivan-aksamentov Jul 14, 2023
0a98964
feat: add more error context in tree builder code
ivan-aksamentov Jul 14, 2023
a37a3c7
feat: handle error cases in mu split and union
ivan-aksamentov Jul 14, 2023
36b6298
refactor: clarify var name
ivan-aksamentov Jul 14, 2023
e527a3f
fix: handle parent muts correctly during tree preprocess
ivan-aksamentov Jul 14, 2023
d14ac84
Merge remote-tracking branch 'origin/v3' into feat/greedy-tree-builder-2
ivan-aksamentov Jul 14, 2023
bb97731
Merge remote-tracking branch 'origin/feat/greedy-tree-builder-2' into…
ivan-aksamentov Jul 14, 2023
4cf9bfc
More robust stripe construction when gap boxes overlap each other
corneliusroemer Jul 15, 2023
42ec126
fix: consistent private mutations handling on the auspice tree
rneher Jul 15, 2023
e78aaa4
refactor: rename variable to clarify role
rneher Jul 15, 2023
9d41cde
fix: replicate aa fix for nucs, remove unused var
rneher Jul 15, 2023
aca8ad8
fix: avoid duplicated clade labels when cloning a node with a label
rneher Jul 16, 2023
c49833e
feat: remove "_new" suffix from names of new nodes
ivan-aksamentov Jul 17, 2023
44bb1ad
test: add unit tests for split, diff and union
ivan-aksamentov Jul 17, 2023
05a7007
fix: ensure aa private reversions are in sequenced ranges
ivan-aksamentov Jul 17, 2023
8c3d41b
fix: ensure unsequenced cdses are included into unsequenced aa ranges
ivan-aksamentov Jul 17, 2023
4ad4a21
refactor: remove dbg statement
ivan-aksamentov Jul 17, 2023
d7bc1fb
Merge pull request #1200 from nextstrain/feat/greedy-tree-builder-2-m…
ivan-aksamentov Jul 17, 2023
d532ea3
Merge pull request #1187 from nextstrain/feat/greedy-tree-builder-2
ivan-aksamentov Jul 17, 2023
661ce48
feat: use graph; remove usage of tree for calculations
ivan-aksamentov Jul 14, 2023
ccf85df
refactor: lint
ivan-aksamentov Jul 13, 2023
07ce59d
fix: correctly use depth first preorder graph traversal
ivan-aksamentov Jul 13, 2023
13b6b39
feat: use traversal library for graph traversal
ivan-aksamentov Jul 14, 2023
4220e5f
refactor: lint
ivan-aksamentov Jul 14, 2023
8ef03cf
refactor: format
ivan-aksamentov Jul 14, 2023
e1c800d
Merge remote-tracking branch 'origin/v3' into feat/better-seed-matching
ivan-aksamentov Jul 17, 2023
895a208
refactor: simplify graph to tree conversion
ivan-aksamentov Jul 17, 2023
b92e54a
refactor: separate tree to graph conversion from preprocessing
ivan-aksamentov Jul 17, 2023
77ab7a4
feat: remove "Node type" from ref nodes temporarily
ivan-aksamentov Jul 17, 2023
b06c2ef
fix: avoid dependency on sys libs for xz2 and bzip2 crates
ivan-aksamentov Jul 19, 2023
4d9be64
feat: handle ambiguous nucs when decoding
ivan-aksamentov Jul 19, 2023
b1612d0
feat: add Newick output
ivan-aksamentov Jul 19, 2023
32f84f3
Merge pull request #1204 from nextstrain/feat/decode-ambiguous
ivan-aksamentov Jul 19, 2023
53c6728
fix: correct joining of children into nwk token
rneher Jul 19, 2023
d49de81
feat: add Newick export in web
ivan-aksamentov Jul 19, 2023
f3a44f8
refactor: lint
ivan-aksamentov Jul 19, 2023
e8c65df
fix: flag to align gaps left
rneher Jul 19, 2023
3653a04
Merge pull request #1203 from nextstrain/fix/cli-static-xz-and-bzip2
ivan-aksamentov Jul 19, 2023
09811ff
feat(web): clarify text for tree file formats in export dialog
ivan-aksamentov Jul 19, 2023
ed1d9b8
Merge pull request #1205 from nextstrain/feat/output-tree-nwk
ivan-aksamentov Jul 20, 2023
90beed5
fix: don't split root node
ivan-aksamentov Jul 20, 2023
4cf7ade
fix: avoid crash when there is no parent node
ivan-aksamentov Jul 20, 2023
fd6fda8
fix: start with empty split muts map when the node is root.
ivan-aksamentov Jul 20, 2023
b3c1e29
fix: return early when there are no children and no parents to finetune
ivan-aksamentov Jul 20, 2023
572881c
refactor: remove print statement
ivan-aksamentov Jul 20, 2023
0b7d4c5
refactor: rename nearest_node, add comments
rneher Jul 20, 2023
41aea83
chore: add experimental datasets to smoke tests
ivan-aksamentov Jul 20, 2023
7a79d1a
Merge remote-tracking branch 'origin/v3' into fix/missing-graph-paren…
ivan-aksamentov Jul 20, 2023
213814f
refactor: inline variable current_best_node_key
ivan-aksamentov Jul 20, 2023
348ab8c
chore: add more console info in smoke tests
ivan-aksamentov Jul 20, 2023
5c138c6
chore: run normal and experimental smoke tests in parallel
ivan-aksamentov Jul 20, 2023
b766e7a
Merge remote-tracking branch 'origin/v3' into fix/missing-graph-paren…
ivan-aksamentov Jul 20, 2023
3ece1e0
feat: avoid matching with unknown states when possible
rneher Jul 20, 2023
dc91c2f
refactor: avoid score-look-up for unknowns
rneher Jul 20, 2023
438ec2d
Merge pull request #1209 from nextstrain/feat/avoid-matches-with-N-X
ivan-aksamentov Jul 20, 2023
1d81d59
Merge pull request #1208 from nextstrain/fix/missing-graph-parent-error
ivan-aksamentov Jul 20, 2023
0233b17
Merge remote-tracking branch 'origin/v3' into feat/better-seed-matching
ivan-aksamentov Jul 20, 2023
5fd51bf
feat: add new logo
ivan-aksamentov Jul 20, 2023
9bed93f
feat: allow /loading page in prod
ivan-aksamentov Jul 20, 2023
f3e6424
feat: add new favicon
ivan-aksamentov Jul 21, 2023
9b34c10
Merge pull request #1210 from nextstrain/feat/new-logo
ivan-aksamentov Jul 21, 2023
2338120
feat: check consistency between ref tree and ref seq
ivan-aksamentov Jul 21, 2023
18ee0c9
feat: check consistency between ref tree and ref seq for aa
ivan-aksamentov Jul 21, 2023
a686009
chore: keep order of entries in smoke test's console output
ivan-aksamentov Jul 21, 2023
519b88f
feat: elaborate comments and error messages
ivan-aksamentov Jul 21, 2023
b5b0ce8
chore: temporarily exclude sars-cov-2-21L from smoke tests
ivan-aksamentov Jul 21, 2023
5abc689
Merge remote-tracking branch 'origin/v3' into feat/check-tree-and-ref…
ivan-aksamentov Jul 21, 2023
afcee19
Merge pull request #1211 from nextstrain/feat/check-tree-and-ref-cons…
ivan-aksamentov Jul 21, 2023
ebe98af
Merge pull request #1206 from nextstrain/fix/alignment-side-left
ivan-aksamentov Jul 21, 2023
e22d400
feat: increase font size in genetic feature selector
ivan-aksamentov Jul 21, 2023
841032b
feat: make full genome badge full width
ivan-aksamentov Jul 21, 2023
399f0ca
Merge remote-tracking branch 'origin/v3' into feat/better-seed-matching
ivan-aksamentov Jul 22, 2023
5fabdda
Merge remote-tracking branch 'origin/feat/better-seed-matching' into …
ivan-aksamentov Jul 22, 2023
15c3fc2
feat/implement alternative chopping of matches
rneher Jul 20, 2023
21265e8
feat: alternative implementation of stripe construction
rneher Jul 21, 2023
f21a073
fix: underflow in integer subtraction
rneher Jul 21, 2023
f75f35f
tests: disable smoke testing of seed_matching directory
rneher Jul 22, 2023
0df70b6
refactor: do integer arithmetic in isize
rneher Jul 22, 2023
4e704e3
lint: seed_alignment
rneher Jul 22, 2023
05188a4
refactor: rename variables
rneher Jul 22, 2023
31ebabf
docs: comments in seed_alignment
rneher Jul 22, 2023
536a655
fix: fix inconsistency in start and end of band collection
rneher Jul 22, 2023
99de757
fix: consistency of ref_start and end of bands
rneher Jul 22, 2023
4a08ae3
chore: minimize diff
ivan-aksamentov Jul 22, 2023
a959390
test: adjust alignment unit tests to the new interface (add seed index)
ivan-aksamentov Jul 22, 2023
6ea8198
refactor: remove unrelated import
ivan-aksamentov Jul 22, 2023
9ccea39
refactor(web): simplify data transfer between rust and typescript
ivan-aksamentov Jul 19, 2023
defedb0
feat: reduce font size of plain text in gene selector
ivan-aksamentov Jul 24, 2023
fdb8d91
refactor: remove constructor annotation
ivan-aksamentov Jul 24, 2023
465bbc1
refactor: remove unused imports
ivan-aksamentov Jul 24, 2023
da1658b
Merge remote-tracking branch 'origin/perf/wasm-json-minified' into re…
ivan-aksamentov Jul 24, 2023
f718775
refactor: remove commented code
ivan-aksamentov Jul 24, 2023
205468b
Merge remote-tracking branch 'origin/v3' into refactor/use-only-graph
ivan-aksamentov Jul 24, 2023
32d2454
feat: output graph json
ivan-aksamentov Jul 24, 2023
5804f68
Merge remote-tracking branch 'origin/v3' into refactor/use-only-graph
ivan-aksamentov Jul 24, 2023
a28d714
refactor: remove unused field from graph node
ivan-aksamentov Jul 24, 2023
0c2374a
Merge pull request #1213 from nextstrain/feat/feature-selector-visuals
ivan-aksamentov Jul 24, 2023
eb4229e
fix: remove children field from graph node payload
ivan-aksamentov Jul 24, 2023
665eba6
refactor: lint
ivan-aksamentov Jul 24, 2023
aa09fcc
Merge remote-tracking branch 'origin/v3' into refactor/wasm-transfer-…
ivan-aksamentov Jul 24, 2023
3c9f480
data: specify alignment params for hiv
rneher Jul 24, 2023
a8d1325
fix: adjust function call to the new interface
ivan-aksamentov Jul 24, 2023
2bf61fb
chore: trigger ci
ivan-aksamentov Jul 24, 2023
7c544e5
Add profiling profile to cargo.toml
corneliusroemer Jul 24, 2023
70c231e
Add `cargo install wasm-opt ` to dev docs
corneliusroemer Jul 24, 2023
ab28d05
refactor: rename offsets to min/max instead of left/right
rneher Jul 24, 2023
50970b1
refactor: remove unused param, rename variable to clarify intent
rneher Jul 24, 2023
49ec678
feat: Add validation workflow to compare outputs between versions
corneliusroemer Jul 24, 2023
6e6114b
Pass all arguments to run.sh on to snakemake
corneliusroemer Jul 24, 2023
309dcb0
Improve logging to file of validation workflow
corneliusroemer Jul 24, 2023
4862b4e
fix: end position of stripes, avoid empty bands, symmetric look-back
rneher Jul 24, 2023
e25a50f
Allow custom base branch via "base_branch" config
corneliusroemer Jul 24, 2023
19ea70d
fix: clamp end of stripe
rneher Jul 24, 2023
ef0a2b3
refactor: remove code duplication be introducing rewind function
rneher Jul 24, 2023
c01e4dc
fix: return new end position from rewind
rneher Jul 24, 2023
f227921
refactor: reduce duplication
rneher Jul 24, 2023
3fd8198
docs: add comments
rneher Jul 24, 2023
b64e504
tests: output sequences in order, add clean and clobber rules
rneher Jul 24, 2023
c2befd5
refactor: remove unused code
rneher Jul 24, 2023
87c9dd7
Revert "Add profiling profile to cargo.toml"
corneliusroemer Jul 24, 2023
e519309
Revert "Add `cargo install wasm-opt ` to dev docs"
corneliusroemer Jul 24, 2023
b656215
chore: remove CARGO_BUILD_TARGET_DIR env var from dev container
ivan-aksamentov Jul 25, 2023
1084ffe
chore: add seqkit to dev container
ivan-aksamentov Jul 25, 2023
881a6df
Merge remote-tracking branch 'origin/v3' into feat/validation-workflow
ivan-aksamentov Jul 25, 2023
9740c72
chore: use python3
ivan-aksamentov Jul 25, 2023
746ebe5
chore: add requirements.txt
ivan-aksamentov Jul 25, 2023
79c30d6
chore: calm down cargo build output
ivan-aksamentov Jul 25, 2023
d120ab2
Merge pull request #1219 from nextstrain/feat/validation-workflow
rneher Jul 25, 2023
c4d6c64
Merge remote-tracking branch 'origin/v3' into feat/better-seed-matching
ivan-aksamentov Jul 25, 2023
8cec5c9
Merge remote-tracking branch 'origin/feat/better-seed-matching' into …
ivan-aksamentov Jul 25, 2023
cdee12d
Merge remote-tracking branch 'origin/feat/better-stripes-for-better-s…
ivan-aksamentov Jul 25, 2023
1173c5e
Merge remote-tracking branch 'origin/v3' into refactor/use-only-graph
ivan-aksamentov Jul 25, 2023
7465e79
Merge remote-tracking branch 'origin/v3' into refactor/wasm-transfer-…
ivan-aksamentov Jul 25, 2023
dadb3f1
refactor: reorder default params, label obsolete ones
rneher Jul 25, 2023
23319ca
tests: add alignment parameters to virus properties
rneher Jul 25, 2023
955ec58
tests: remove obsolete testing data
rneher Jul 25, 2023
c36ce11
tests: remove erroneous testing data for sc2
rneher Jul 25, 2023
4c6acfe
Merge pull request #1215 from nextstrain/refactor/wasm-transfer-string
ivan-aksamentov Jul 25, 2023
3c086a6
Merge remote-tracking branch 'origin/v3' into refactor/use-only-graph
ivan-aksamentov Jul 25, 2023
3e1398f
Merge pull request #1220 from nextstrain/feat/better-stripes-for-bett…
ivan-aksamentov Jul 25, 2023
0444c87
Merge pull request #1221 from nextstrain/feat/better-stripes-for-bett…
ivan-aksamentov Jul 25, 2023
0cd71d2
Merge remote-tracking branch 'origin/v3' into feat/better-seed-matching
ivan-aksamentov Jul 25, 2023
68818ab
Merge pull request #1190 from nextstrain/feat/better-seed-matching
ivan-aksamentov Jul 25, 2023
898cb97
Merge remote-tracking branch 'origin/v3' into refactor/use-only-graph
ivan-aksamentov Jul 25, 2023
bbf688f
fix: add missing arg in wasm
ivan-aksamentov Jul 25, 2023
c0956e5
Merge remote-tracking branch 'origin/v3' into refactor/use-only-graph
ivan-aksamentov Jul 25, 2023
c441ac4
feat: make tree building process deterministic
rneher Jul 25, 2023
0fa32da
refactor: avoid uninitialized variable
ivan-aksamentov Jul 26, 2023
d8e819c
refactor: remove unused imports
ivan-aksamentov Jul 26, 2023
d4e9f99
Merge pull request #1223 from nextstrain/feat/deterministic-tree-buil…
ivan-aksamentov Jul 26, 2023
a1051f9
Merge pull request #1201 from nextstrain/refactor/use-only-graph
ivan-aksamentov Jul 26, 2023
04d8d40
fix: preserve root fields of auspice tree object when (de)serializing
ivan-aksamentov Jul 26, 2023
6240546
chore: emit graph json in smoke tests
ivan-aksamentov Jul 26, 2023
12842f1
fix: don't consider ambiguous nucs fro reversions
ivan-aksamentov Jul 26, 2023
50f4699
Merge pull request #1224 from nextstrain/fix/ambig-reversions
ivan-aksamentov Jul 26, 2023
0cfea13
refactor: unify initialization step between cli and web
ivan-aksamentov Jul 26, 2023
e47e924
feat: remove nextalign executable
ivan-aksamentov Jul 27, 2023
e7a8347
feat: implement (de)serialization for Genotype struct
ivan-aksamentov Jul 27, 2023
6a68b71
feat: move QC and primer configs into pathogen.json; move masked ranges
ivan-aksamentov Jul 27, 2023
faef27c
feat: make all fields in pathogen.json optional
ivan-aksamentov Jul 27, 2023
9983b98
feat: allow omitting some of the optional config fields
ivan-aksamentov Jul 27, 2023
119829a
feat: warn if schema version of `pathogen.json` does not match supported
ivan-aksamentov Jul 27, 2023
0a32a9e
chore: add a sketch of smoke tests for datasets v3
ivan-aksamentov Jul 27, 2023
cbdc432
chore: flatten dir tree of results of smoke tests
ivan-aksamentov Jul 28, 2023
cce1649
chore: flatten dir tree of results of v3 smoke tests
ivan-aksamentov Jul 28, 2023
2af4fdb
chore: temporarily disable default reference in smoke tests
ivan-aksamentov Jul 28, 2023
419ac07
feat: omit empty tree fields; preserve version
ivan-aksamentov Jul 28, 2023
464325f
fix: treat nucleotide subs and dels on equal footing to prevent tree-…
rneher Jul 31, 2023
0d3bb50
refactor: rename PrivateMutationsMinimal.nuc_sub to nuc_muts
rneher Jul 31, 2023
aa37b12
refactor: rename PrivateMutationsMinimal to BranchMutations
rneher Jul 31, 2023
d36daf9
fix(web): fasta export with comma-separated nucs
ivan-aksamentov Aug 2, 2023
874637e
fix: only count ACGT characters in mutations
ivan-aksamentov Aug 2, 2023
4973693
fix: parse tree aa muts regardless of whether nuc muts are present
ivan-aksamentov Aug 2, 2023
3c418af
fix: counting of nuc muts in the branch length calculation
ivan-aksamentov Aug 2, 2023
bcdabac
docs: update terminilofy on branch length, divergence and private mut…
ivan-aksamentov Aug 2, 2023
4755671
docs: add missing info on frame shifts
ivan-aksamentov Aug 2, 2023
65c8158
feat: make all dataset files optional except ref
ivan-aksamentov Aug 2, 2023
338aace
Merge pull request #1226 from nextstrain/fix/subs-dels_interaction
rneher Aug 2, 2023
a64080a
fix: non-monotonous bands for stripe construction
rneher Aug 2, 2023
7432f82
Merge pull request #1228 from nextstrain/fix/stripes
rneher Aug 3, 2023
7dfae15
feat: stub out boundary detector
rneher Aug 3, 2023
f70a3a3
feat: add boundary indicator to return value of alignment
rneher Aug 3, 2023
321ef24
feat: detect if alignment boundary was hit, increase parameters, and …
rneher Aug 5, 2023
cccdd94
docs: info/warn statements
rneher Aug 5, 2023
1999eaf
Merge remote-tracking branch 'origin/v3' into feat/datasets-v3
ivan-aksamentov Aug 7, 2023
4f048a4
refactor: remove explicit types
ivan-aksamentov Aug 8, 2023
ee93d41
refactor: return early, remove "else" block
ivan-aksamentov Aug 8, 2023
22382df
fix: adjust benchmarks for the changes in the alignment code
ivan-aksamentov Aug 8, 2023
afa08da
refactor: format
ivan-aksamentov Aug 8, 2023
31c0b28
fix: tracing message
ivan-aksamentov Aug 8, 2023
16684ba
refactor: extract seed computation into a separate function
ivan-aksamentov Aug 8, 2023
3d285a5
feat: clarify info messages, downgrade warning to info
ivan-aksamentov Aug 8, 2023
eb4fe31
feat: add cli flag for maximum alignment attempts
ivan-aksamentov Aug 8, 2023
9e0e92e
feat: stub out check to prevent large allocations
rneher Aug 5, 2023
a3eed4c
fix: error message
rneher Aug 5, 2023
dd74dbc
feat: make max alignment band area check configurable
ivan-aksamentov Aug 9, 2023
97f67b8
fix: fasta files without trailing newline are read correctly
ivan-aksamentov Aug 9, 2023
62418df
fix: double mutation in codon when moving up the tree
rneher Aug 9, 2023
a8e2a49
Merge pull request #1234 from nextstrain/fix/fasta-without-trailing-n…
ivan-aksamentov Aug 9, 2023
40c238c
feat: boundary detection through flags in path matrix
rneher Aug 9, 2023
07ba1d7
Merge pull request #1235 from nextstrain/fix/double-mutation
ivan-aksamentov Aug 9, 2023
b4154db
feat: expose dataset collection structs, schema and typings
ivan-aksamentov Aug 10, 2023
f1b4ac5
fix: don't make boundary elements at ends of ref/qry seq, adjust tests
rneher Aug 10, 2023
30117dd
docs: revise CLI params description, comment on default value
rneher Aug 10, 2023
0f3b17b
feat(web): clear main page from title, downloads and about info
ivan-aksamentov Aug 11, 2023
9116fbf
feat: add list of files to capabilities
ivan-aksamentov Aug 11, 2023
551d993
Merge pull request #1233 from nextstrain/feat/maximal-alignment-area2
ivan-aksamentov Aug 11, 2023
19b91c4
Merge pull request #1231 from nextstrain/feat/adaptive-bands
ivan-aksamentov Aug 11, 2023
2ad61d9
feat(web): render remote markdown
ivan-aksamentov Aug 11, 2023
31db610
feat(web): revamp main page layout and nav bar
ivan-aksamentov Aug 12, 2023
f59874d
chore: make dataset path dynamic in smoke tests
ivan-aksamentov Aug 23, 2023
1edd904
feat: make input files optional, adjust to changes in dataset format
ivan-aksamentov Aug 24, 2023
9cf9513
fix: gap-open-close score calculation when no gene map provided
ivan-aksamentov Aug 24, 2023
764805d
fix: ensure error when provided a non-existent explicit input file
ivan-aksamentov Aug 24, 2023
4c99c03
test: add smoke tests with dataset zip and with individual input args
ivan-aksamentov Aug 24, 2023
4d69f5b
feat: remove output insertions and errors csv files
ivan-aksamentov Aug 24, 2023
24cef6c
fix: use correct .zst file ext; keep incorrect .zstd ext for compat
ivan-aksamentov Aug 24, 2023
b00227f
feat(cli): rename gene map to genome annotation
ivan-aksamentov Aug 24, 2023
a6d49ce
feat: remove output insertions and errors csv files from web
ivan-aksamentov Aug 26, 2023
2662005
feat: adjust cli for improved dataset versioning
ivan-aksamentov Aug 28, 2023
be0a310
feat(web): adjust web app for new dataset format (wip)
ivan-aksamentov Aug 28, 2023
ecec67d
fix(web): main page layout and dataset selector scrolling
ivan-aksamentov Aug 28, 2023
587950e
feat(web): adjust dataset file fetching to the new format
ivan-aksamentov Aug 28, 2023
0a4798d
feat: split datasets into collections
ivan-aksamentov Aug 28, 2023
0925e23
feat: allow unreleased versions
ivan-aksamentov Aug 28, 2023
42a1511
chore: add new dataset server urls
ivan-aksamentov Aug 28, 2023
04881ce
Merge branch 'feat/datasets-v3' into v3
ivan-aksamentov Aug 28, 2023
ae1b139
chore: lint
ivan-aksamentov Aug 28, 2023
389ae14
fix: correctly append path in url join
ivan-aksamentov Aug 29, 2023
8b0d2d3
feat: check json schema version when parsing index.json
ivan-aksamentov Aug 29, 2023
4c3630e
fix: allow dataset capabilities to be empty
ivan-aksamentov Aug 29, 2023
173792a
feat(cli): adjust "dataset get" and "dataset list" commands
ivan-aksamentov Aug 29, 2023
3122db5
refactor: lint
ivan-aksamentov Aug 29, 2023
7f35469
fix(cli): handle http status codes correctly
ivan-aksamentov Aug 30, 2023
d1d7b73
fix(cli): dataset fetching
ivan-aksamentov Aug 30, 2023
c8e1e01
fix(cli): correct args in dataset list and get commands
ivan-aksamentov Aug 30, 2023
89d6cbc
test: adjust CLI smoke tests to the new dataset format
ivan-aksamentov Aug 30, 2023
e66d566
fix: remove url field from dataset
ivan-aksamentov Aug 30, 2023
563fa91
fix: ensure smoke and distro tests are failing when there are errors
ivan-aksamentov Aug 30, 2023
eef91b1
chore(ci): fix incorrect data url in ci
ivan-aksamentov Aug 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 9 additions & 0 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ codegen-units = 4096
incremental = true
lto = "off"

[profile.profiling]
inherits = "release"
debug = 1
strip = false

# Optimize dependencies even in debug mode
[profile.dev.package."*"]
opt-level = 2
Expand Down Expand Up @@ -189,6 +194,7 @@ rustflags = [
"-Aclippy::cognitive_complexity",
"-Aclippy::comparison-chain",
"-Aclippy::default_numeric_fallback",
"-Aclippy::deref_by_slicing",
"-Aclippy::doc_markdown",
"-Aclippy::else_if_without_else",
"-Aclippy::exhaustive_enums",
Expand Down Expand Up @@ -217,7 +223,9 @@ rustflags = [
"-Aclippy::mod_module_files",
"-Aclippy::module_inception",
"-Aclippy::module_name_repetitions",
"-Aclippy::modulo_arithmetic",
"-Aclippy::must_use_candidate",
"-Aclippy::needless_for_each",
"-Aclippy::new_without_default",
"-Aclippy::non_ascii_literal",
"-Aclippy::option_if_let_else",
Expand All @@ -243,6 +251,7 @@ rustflags = [
"-Aclippy::suboptimal_flops",
"-Aclippy::too_many_arguments",
"-Aclippy::too_many_lines",
"-Aclippy::type_repetition_in_bounds",
"-Aclippy::unnecessary_wraps",
"-Aclippy::unreachable",
"-Aclippy::unreadable_literal",
Expand Down
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ SYNC_DESTINATION=123.456.789.123:~/nextclade

# URL of Nextclade datasets server. See: https://github.com/neherlab/nextclade_data
# Replace this with `http://localhost:27722` to use local data server instead
DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org
DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org/v3
# DATA_FULL_DOMAIN=http://localhost:27722

# Directory path (relative to the root of the project) from which local data server takes the data.
Expand Down
12 changes: 5 additions & 7 deletions .github/workflows/cli.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,17 +57,17 @@ jobs:
- name: "Setup environment (release)"
if: endsWith(github.ref, '/release-cli')
run: |
echo "DATA_FULL_DOMAIN=https://data.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.clades.nextstrain.org/v3" >> $GITHUB_ENV

- name: "Setup environment (staging)"
if: endsWith(github.ref, '/staging-cli')
run: |
echo "DATA_FULL_DOMAIN=https://data.staging.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.staging.clades.nextstrain.org/v3" >> $GITHUB_ENV

- name: "Setup environment (master)"
if: ${{ !endsWith(github.ref, '/staging-cli') && !endsWith(github.ref, '/release-cli') }}
run: |
echo "DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org/v3" >> $GITHUB_ENV

- name: "Checkout code"
uses: actions/checkout@v3
Expand Down Expand Up @@ -106,7 +106,7 @@ jobs:
run: |
cp .env.example .env
sed -i -e "s|OSXCROSS_URL=http://example.com/osxcross/osxcross.tar.xz|OSXCROSS_URL=${{ secrets.OSXCROSS_URL }}|g" .env
sed -i -e "s|DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org|DATA_FULL_DOMAIN=${DATA_FULL_DOMAIN}|g" .env
sed -i -e "s|DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org/v3|DATA_FULL_DOMAIN=${DATA_FULL_DOMAIN}|g" .env

- name: "Login to Docker Hub"
uses: docker/login-action@v2
Expand Down Expand Up @@ -170,7 +170,6 @@ jobs:
run: |
cp .env.example .env
sed -i -e "s|OSXCROSS_URL=http://example.com/osxcross/osxcross.tar.xz|OSXCROSS_URL=${{ secrets.OSXCROSS_URL }}|g" .env
sed -i -e "s|DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org|DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org|g" .env

- name: "Run unit tests"
run: |
Expand Down Expand Up @@ -217,7 +216,6 @@ jobs:
run: |
cp .env.example .env
sed -i -e "s|OSXCROSS_URL=http://example.com/osxcross/osxcross.tar.xz|OSXCROSS_URL=${{ secrets.OSXCROSS_URL }}|g" .env
sed -i -e "s|DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org|DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org|g" .env

- name: "Run lints"
run: |
Expand Down Expand Up @@ -251,7 +249,7 @@ jobs:
- name: "Run smoke tests (linux)"
run: |
chmod +x ./.out/*
./tests/run-smoke-tests ./.out/nextclade-x86_64-unknown-linux-gnu
JOBS=2 ./tests/run-smoke-tests ./.out/nextclade-x86_64-unknown-linux-gnu

# run-smoke-tests-mac:
# name: "Run smoke tests (mac)"
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/web.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,23 @@ jobs:
run: |
echo "ENV_NAME=release" >> $GITHUB_ENV
echo "FULL_DOMAIN=https://clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.clades.nextstrain.org/v3" >> $GITHUB_ENV
echo "PLAUSIBLE_IO_DOMAIN=clades.nextstrain.org" >> $GITHUB_ENV

- name: "Setup environment (staging)"
if: endsWith(github.ref, '/staging')
run: |
echo "ENV_NAME=staging" >> $GITHUB_ENV
echo "FULL_DOMAIN=https://staging.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.staging.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.staging.clades.nextstrain.org/v3" >> $GITHUB_ENV
echo "PLAUSIBLE_IO_DOMAIN=staging.clades.nextstrain.org" >> $GITHUB_ENV

- name: "Setup environment (master)"
if: ${{ !endsWith(github.ref, '/staging') && !endsWith(github.ref, '/release') }}
run: |
echo "ENV_NAME=master" >> $GITHUB_ENV
echo "FULL_DOMAIN=https://master.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org" >> $GITHUB_ENV
echo "DATA_FULL_DOMAIN=https://data.master.clades.nextstrain.org/v3" >> $GITHUB_ENV
echo "PLAUSIBLE_IO_DOMAIN=master.clades.nextstrain.org" >> $GITHUB_ENV

- name: "Checkout code"
Expand Down
157 changes: 157 additions & 0 deletions 01cds.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
from Bio import Seq
import numpy as np

# 012345678901234567890123456789012345678901234567890123
ref = 'TGATGCACAATCGTTTTTAAACGGGTTTGCGGTGTAAGTGCAGCCCGTCTTACA'

ref_aln = 'TGATGCACA---ATCGTTTTTAAACGGGTTTGCGGTGTAAGTGCAGCCCGTCTTACA'
qry_aln = '-GATGCACACGCATC---TTTAAACGGGTTTGCGGTGTCAGT---GCCCGTCTTACA'



cds_1 = [{'start':4, 'end':37}]
cds_2 = [
{'start':4, 'end':21},
{'start':20, 'end':39}, # slippage at position 20 -- 20 is read twice
{'start':45, 'end':51}, # another cds segment
]



def extract_cds_sequence(cds_annotation, seq):
'''extract a cds from a raw sequence'''
cds = ''
for segment in cds_annotation:
cds += seq[segment['start']:segment['end']]

return cds

def extract_cds_alignment(cds_annotation, seq_aln, coord_map):
cds_aln = ''
cds_to_aln = []
rta = coord_map['ref_to_aln']
for si,segment in enumerate(cds_annotation):
start = rta[segment['start']]
end = rta[segment['end']]
cds_to_aln.append({"global":np.arange(start, end), 'start':len(cds_aln), "len": end-start})
cds_aln += seq_aln[start:end]

return cds_aln, cds_to_aln

def get_aln_to_seq(aln):
aln_array = np.array(list(aln))
aln_to_seq = np.cumsum(aln_array!='-') - 1
aln_to_seq = np.concatenate([aln_to_seq, [aln_to_seq[-1]+1]])
seq_to_aln = np.arange(len(aln))[aln_array!='-']
return aln_to_seq, seq_to_aln


def make_coord_map(ref_aln, qry_aln):

aln_to_ref, ref_to_aln = get_aln_to_seq(ref_aln)
aln_to_qry, qry_to_aln = get_aln_to_seq(qry_aln)

return {'aln_to_ref':aln_to_ref, 'aln_to_qry': aln_to_qry, 'ref_to_aln': ref_to_aln}

def cds_to_global_aln_position(pos, cds_to_aln_map):
'''
map a position in the extracted alignment of the CDS to the global alignment.
returns a result for each CDS segment, but a single position can only be in ONE CDS segment
'''
result = []
for segment in cds_to_aln_map:
res = {}
pos_in_segment = pos - segment['start']
if pos_in_segment<0:
res['status'] = 'before'
res['pos'] = np.nan
elif pos_in_segment>=segment['len']:
res['status'] = 'after'
res['pos'] = np.nan
else:
res['status'] = 'inside'
res['pos'] = segment["global"][pos_in_segment]
result.append(res)

return result

def cds_to_global_ref_position(pos, cds_to_aln_map, coord_map):
'''
map a position in the extracted alignment of the CDS to the reference sequence.
returns a result for each CDS segment, but a single position can only be in ONE CDS segment
'''
cds_to_aln_res = cds_to_global_aln_position(pos, cds_to_aln_map)
result = []
for segment in cds_to_aln_res:
if segment['status']=='inside':
result.append({'status':'inside', 'pos':coord_map['aln_to_ref'][segment['pos']]})
else:
result.append(segment)
return result

def cds_to_global_aln_range(start, end, cds_to_aln_map):
'''
map a range in the extracted alignment of the CDS to the global alignment.
returns a result for each CDS segment, as a range can span multiple CDS-segments
'''
cds_to_aln_start = cds_to_global_aln_position(start, cds_to_aln_map)
# need to map end position -1 to correspond to the last included position
cds_to_aln_end = cds_to_global_aln_position(end-1, cds_to_aln_map)
result = []
for seg_start, seg_end, seg_map in zip(cds_to_aln_start, cds_to_aln_end, cds_to_aln_map):
if seg_end['status']=="before":
result.append({"status":"before", "start":np.nan, "end":np.nan})
continue
if seg_start['status']=="after":
result.append({"status":"after", "start":np.nan, "end":np.nan})
continue

if seg_start['status']=="before":
start_pos = seg_map["global"][0]
elif seg_start['status']=='inside':
start_pos = seg_start["pos"]

# map end and increment by one to correspond to open interval
if seg_end['status']=="after":
end_pos = seg_map["global"][-1]+1
elif seg_end['status']=='inside':
end_pos = seg_end["pos"]+1

result.append({"status":"covered", "start":start_pos, "end":end_pos})
return result

def codon_to_global_aln_range(codon, cds_map):
'''
expands a codon in the extracted alignment to a range in the global alignment
'''
start_pos = codon*3
end_pos = codon*3 + 3
return cds_to_global_aln_range(start_pos, end_pos, cds_map)

print('Translation of reference sequence:')
print(Seq.translate(extract_cds_sequence(cds_1, ref)))
print(Seq.translate(extract_cds_sequence(cds_2, ref)))

coord_map = make_coord_map(ref_aln, qry_aln)
print('\nGlobal coordinate map:')
print(coord_map)

cds_aln, cds_map = extract_cds_alignment(cds_2, ref_aln, coord_map)

print('\nExtracted CDS from the alignment:')
print(extract_cds_alignment(cds_2, ref_aln, coord_map)[0])
print(extract_cds_alignment(cds_2, qry_aln, coord_map)[0])


print('\nPosition in the aligned CDS mapped to global alignment:')
print(cds_to_global_aln_position(10,cds_map))
print('\nPosition in the aligned CDS mapped to reference sequence:')
print(cds_to_global_ref_position(10,cds_map, coord_map))

print('\nRange in the aligned CDS mapped to global alignment:')
print(cds_to_global_aln_range(10,15,cds_map))

print('\nCodon in the aligned CDS mapped to global alignment:')
print("5:",codon_to_global_aln_range(5,cds_map))
print("6:",codon_to_global_aln_range(6,cds_map))
print("7:",codon_to_global_aln_range(7,cds_map))
Loading