You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The VCF 4.4 spec allows for an initial symbol indicating the phasing of the first allele. For example, /0/1 is a valid genotype. This allows for partially phased diploid genotypes such as |0/1. The current VCF-Zarr spec encodes phasing using a single bool which implicitly assumes either no phasing or complete phasing.
This may have also be an issue in earlier versions of the VCF spec where a partially phased polyploid could have been encoded (e.g., 0/1|1/2). However, this isn't explicitly allowed in the 4.3 spec AFAICT.
The text was updated successfully, but these errors were encountered:
timothymillar
changed the title
VCF-Zarr spec does not support partial phasing
VCF-Zarr spec does not support partial phasing following the VCF 4.4 spec
Jul 1, 2024
We could change call_genotype_phased to have shape (variants, samples, ploidy) to support partial phasing. We could also support a shape of (variants, samples) for backwards compatibility.
I think we should consider adding a call_genotype_phase field of type integer which explicitly assigns a phase (0, ..., ploidy - 1) to each call. This would allow us to add estimated phase to datasets after the fact, rather than requiring a whole new dataset to be created when we run phasing algorithms. Ultimately this is where we want to get to with large biobanks (you could imagine having both call_genotype_phase_beagle and call_genotype_phase_shapeit stored).
There's some complexity here with how to interact with the PS and PSL fields I haven't got my head around, though.
The VCF 4.4 spec allows for an initial symbol indicating the phasing of the first allele. For example,
/0/1
is a valid genotype. This allows for partially phased diploid genotypes such as|0/1
. The current VCF-Zarr spec encodes phasing using a single bool which implicitly assumes either no phasing or complete phasing.This may have also be an issue in earlier versions of the VCF spec where a partially phased polyploid could have been encoded (e.g.,
0/1|1/2
). However, this isn't explicitly allowed in the 4.3 spec AFAICT.The text was updated successfully, but these errors were encountered: