You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ancestry and 23andme appear to have discrepant mapping of indel SNPs, causing them to be discarded by SNPs, and sometimes causing the program to fail.
Indel SNPs positions where the allele is named D (deletion) or I (insertion) and can be deletions or insertions of several bases, or more complex multi-nucleotide changes (so technically, not all are actually SNPs). In my test case, Ancestry covers 8073 indel SNPs, and 23andme (v5) has 4828, of which 2143 are found on both platforms (by rs#). However, only 1328 of the 2143 report the same chromosomal location, so the remaining 815 to be rejected as discrepant.
Most of the location discrepancies are tiny - 475 of them are 1 base, 174 are 2, 50 are 3, 48 are 4, 33 are 5, 16 are 6, and a further 18 are 17 or less. In almost all cases, the difference is the length of the indel, and the ancestry location is less than the 23andme location. This seems to be due to ancestry reading the position where the deletion started, and 23andme reading the position where it ended, but there are some additional odd cases, and neither ancestry nor 23andme is always consistent with dbSNP.
In practical terms, I'd like to suggest two possible solutions
Loosen the criterion for merged SNPs to have identical location, and if an ancestry/23andme merge, retain the 23andme location. Having the same rs# and being within 20 bases of each other would seem a reasonably strict criterion
Could also just ignore all discrepant indels. The vast majority of them are homozygous in most people (i.e. are rare and usually disease-associated alleles), so they are not very informative for genealogy.
I also wanted to note that I was unable to get my two genotype files to merge by altering the parameter discrepant_genotypes_threshold to 1000000, but only by manually deleting all indel SNPs from the two input files. I haven't looked further into this issue, but may be worth checking out.
The text was updated successfully, but these errors were encountered:
Ancestry and 23andme appear to have discrepant mapping of indel SNPs, causing them to be discarded by SNPs, and sometimes causing the program to fail.
Indel SNPs positions where the allele is named D (deletion) or I (insertion) and can be deletions or insertions of several bases, or more complex multi-nucleotide changes (so technically, not all are actually SNPs). In my test case, Ancestry covers 8073 indel SNPs, and 23andme (v5) has 4828, of which 2143 are found on both platforms (by rs#). However, only 1328 of the 2143 report the same chromosomal location, so the remaining 815 to be rejected as discrepant.
Most of the location discrepancies are tiny - 475 of them are 1 base, 174 are 2, 50 are 3, 48 are 4, 33 are 5, 16 are 6, and a further 18 are 17 or less. In almost all cases, the difference is the length of the indel, and the ancestry location is less than the 23andme location. This seems to be due to ancestry reading the position where the deletion started, and 23andme reading the position where it ended, but there are some additional odd cases, and neither ancestry nor 23andme is always consistent with dbSNP.
In practical terms, I'd like to suggest two possible solutions
Loosen the criterion for merged SNPs to have identical location, and if an ancestry/23andme merge, retain the 23andme location. Having the same rs# and being within 20 bases of each other would seem a reasonably strict criterion
Could also just ignore all discrepant indels. The vast majority of them are homozygous in most people (i.e. are rare and usually disease-associated alleles), so they are not very informative for genealogy.
I also wanted to note that I was unable to get my two genotype files to merge by altering the parameter discrepant_genotypes_threshold to 1000000, but only by manually deleting all indel SNPs from the two input files. I haven't looked further into this issue, but may be worth checking out.
The text was updated successfully, but these errors were encountered: