-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix or remove 21I and 21J #10
Comments
Ah, I missed your explanation in #8 while I was writing in #4! Makes sense, and hopefully the mutation frequencies are not a hard requirement in virus_properties.json because I can easily construct a virus_properties.json with just the mutations for 21I and 21J, but it would be more complicated to include allele frequencies. [and accuracy of allele frequencies is subject to amplicon dropout issues in Delta and Omicron... working with SARS-CoV-2 sequences has been an education in how many ways there are for sequences to be not-quite-right in ways that confound attempts at phylogenetics...] |
If you want to proceed before I implement a proper fix: Within So if you want to give your own mutations lists a try, you can put Or you modify |
Here is a python snippet that fetches my nextstrain.clade-mutations.tsv from github and overwrites ./virus_properties.json with its mutations:
Now it identifies 471.genbank.aligned.fa.gz as 21J / 21K. BTW the AY.* numbers for 21I and 21J are not cleanly divided into lower and higher lineage numbers, they're all interspersed, so I'm not really sure how to describe them in mapping.csv. Both 21I and 21J are |
Just for fun I also tried making virus_properties.json from pango.clade-mutations.tsv and adding a line for each AY.* lineage to mapping.csv but it didn't work for this example, even with It would be cool to have a hierarchical spec so that for example when a 21J match is found, next the mutations in the relevant AY.* lineages could be searched to see if any of them can give a more specific match. But in the meantime we can just search some private mutations on covSPECTRUM too. :) |
Looking at CoVariants, my understanding is that you could, for example, define 20I as
If you could map a Nextstrain clade to multiple AY lineages, you can also write queries like |
By the way, (rudimentary) support for Nextstrain clades already exists. Here are the Nextstrain clades that have And here are the nucleotide mutations of 21J: What I don't like yet is that you have to write "21J (Delta)" instead of just "21J", so there will be changes/improvements. (@corneliusroemer, not sure if you actually know that.) |
@chaoran-chen that kind of search i ran that search for one day in november in belgium and it found both 21I and 21J lineages: |
@FedeGueli, yes, you're right. I was just reading https://covariants.org/variants/21I.Delta and thought that S:A222V is specific to 21I. But as I now also see from the following link, that's not the case at all: @lenaschimmel, maybe it's then the best if you directly filter for the Nextstrain clade? Here, you can also easily get a list of Nextstrain clades. |
Thanks, everyone! That's a lot of useful input with multiple possible ways to solve this. I really want to provide a quick fix for this problem within the next few hours. Having 21I and 21J broken is kind of embarrassing when the main use of this software right now might be working with Delta + Omicron recombinants. At the moment, it seems to me that a 1-to-1-mapping of all nextstrain clades and pango lineages can never work, even though it works fine for most of the non-Delta variants (BA.3 is another problem). And I don't think that one naming convention is inherently better or worse than the other, so I'd like to support both, without any need for a total 1-to-1-mapping. This should be possible, but I'm not sure if I can do it today, hence my plan to do a quick fix before that. Does that sound reasonable? @AngieHinrichs wrote:
Yeah, I'd really like that! Given that this needs some major changes in the code, and this is just my hobby project, it should be doable in a timeframe of 1-3 weeks. |
@chaoran-chen This is excellent: https://lapis.cov-spectrum.org/open/v1/sample/nuc-mutations?nextstrainClade=21J%20(Delta) @lenaschimmel you can use this to find mutation proportions in each Delta clade! The only slight drawback is that this uses only open data and not GISAID, so it's quite UK/US/Germany centric - but that's better than nothing. I think for Delta it's much better to use 21I/J rather than B.1.617.2* since I/J are actually quite different beasts. |
Already working on it. Currently it seems as if full support for both nextstrain clades and pango lineages is not too hard, so that I don't need to provide a quick fix before doing the real work. I used the cov-spectrum page for Delta to get a list of all AY.* which make up at least 3% of all Deltas. I checked both GISAID and Open Data, results were almost identical. Current You can now mix and match both naming schemes on the command line, i.e. Biggest remaining problem now is that lineages with large overlaps in their mutations cancel each other out, so if you select both a parent like You can use
For now, I tried to select a reasonable default for when you don't use the |
The list of supported clades, which is written down in
mappings.csv
, was taken from nextclade-data, and contains three Delta clades: 21A, 21I and 21J. The latter two do not map to single pango lineages.When I switched from nextclade to cov-spectrum to generate the contents of
virus_properties.json
, I just usedmappings.csv
to get the pango lineages and make requests to the cov-spectrum API. The pseudo-names "AY (higher)" and "AY (lower)" are not recognized, and the variant definitions invirus_properties.json
remain empty. The consequence:This tool claims to support 21I and 21J, but it currently does not.
To be honest, I never really got around to get an overview of the Delta/AY diversity, as I got into SARS-CoV-2 genomics when Delta was already declining and Omicron on the rise. Thus, I have no clear plan on how to solve this. I think, once I get my Delta-knowledge up to date, the technical solution might be quite easy.
The text was updated successfully, but these errors were encountered: