-
Notifications
You must be signed in to change notification settings - Fork 1
Output
#Output of the classify
command
All subcommands of classify produce a file with a standard format. This is a tab delimited file with a header for each column.
The file will look like the following
chrom position ref_base var_base normal_counts_a normal_counts_b tumour_counts_a tumour_counts_b p_AA_AA p_AA_AB p_AA_BB p_AB_AA p_AB_AB p_AB_BB p_BB_AA p_BB_AB p_BB_BB
1 1299268 T C 26 25 3 17 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
The last nine columns of the file list the posterior probability of each of the joint genotypes. They have the form p_gN_gT where gN is the normal genotype and gT is the tumour genotype. For deterministic methods only one of these columns will be non-zero and will have a value of 1.
The rows of the file correspond to genomic positions. The columns are as follows
- chrom - Chromosome the site is on.
- position - 1-based position on the chromosome
- ref_base - Base found in reference genome at this position.
- var_base - Variant base found at this position. If no variant base is found this will be N.
- normal_counts_a - Number of read matching ref_base in the normal at this position
- normal_counts_b - Number of reads matching var_base in the normal at this position.
- tumour_counts_a - Number of read matching ref_base in the tumour at this position
- tumour_counts_b - Number of reads matching var_base in the tumour at this position.
- p_AA_AA - Probability of joint genotype AA_AA
- p_AA_AB - Probability of joint genotype AA_AB
- p_AA_BB - Probability of joint genotype AA_BB
- p_AB_AA - Probability of joint genotype AB_AA
- p_AB_AB - Probability of joint genotype AB_AB
- p_AB_BB - Probability of joint genotype AB_BB
- p_AB_AA - Probability of joint genotype BB_AA
- p_AB_AB - Probability of joint genotype BB_AB
- p_AB_BB - Probability of joint genotype BB_BB
To extract somatic positions from this file I suggest adding p_AA_AB + p_AA_BB together to get the somatic genotype probability. You can then choose to threshold at whatever level is appropriate.
This file format can easily be manipulated using Python and the csv library which is installed by default. The csv.DictReader class will be especially useful.