Matching tbprofiler mutations to WHO mutation catalogue #266
-
Hello, I was wondering if you might know the best way to match the tbprofiler mutations to the mutation names in the WHO mutation catalogue excel sheet (https://www.who.int/publications/i/item/9789240028173). I'm interested in parsing some of the data from the catalogue to append to the tbprofiler mutation outputs. I see in https://github.com/jodyphelan/tbdb/blob/master/tbdb.csv that you have the "WHO Confidence" column along with the tbprofiler "Gene" and "Mutation" columns so I was wondering if you may already have a script that is able to parse the data from the WHO catalogue and if that is something you would be willing to share? Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi @taranewman The process of converting the WHO format to HGVS was done by @LennertVerboven for this paper. Scripts are available at https://github.com/LennertVerboven/WHO_catalogue_paper. |
Beta Was this translation helpful? Give feedback.
-
Hi @taranewman
Yes, this is the same variant, the conversion is explained in the paper @jodyphelan linked earlier, but the gid gene (locus tag Rv3919c) is encoded on the complementary strand hence the decreasing number while the deletion lies further ahead.
This list looks correct at first sight.
No, this is not the case, a deletion is solely determined by its position, the nucleotide is invariable at a position (i.e., the number indicates which nucleotide from the reference genome). The HGVS notation for a deletion actually specifies not to list the nucleotide as that information is redundant. In our notation the nucleotide is listed, mostly due to the fact that this matched common practice within TB. Hope that clears it up. Best |
Beta Was this translation helpful? Give feedback.
Hi @taranewman
The process of converting the WHO format to HGVS was done by @LennertVerboven for this paper. Scripts are available at https://github.com/LennertVerboven/WHO_catalogue_paper.