Skip to content

Create a "Diploid Genome" and "Diploid GTF GFF" file (method A)

Kiran N' Bishwa edited this page Jun 11, 2018 · 1 revision

Create a Diploid Genome (method-A) using:


Step 01: Select the sample of interest from multisample VCF file. (for eg. sample "MA625")

$ bcftools view -s MA625 phasedVCF-short.vcf.gz -U > phasedVCF-short-MA625.vcf

# bgzip the VCF (bcftools needs this exclusively)
$ bgzip phasedVCF-short-MA625.vcf
$ tabix -f phasedVCF-short-MA625.vcf.gz

Step 02: Now, create a chain and alternate reference file for each parental strains

$ bcftools consensus -c MA625-left.chain -f reference.fasta phasedVCF-short-MA625.vcf.gz -s MA625 -H 1  > MA625-left.fa
$ bcftools consensus -c MA625-right.chain -f reference.fasta phasedVCF-short-MA625.vcf.gz -s MA625 -H 2  > MA625-right.fa

Step 03 : Now, lift over the reference GTF,GFF file to each parental strains

liftOver is separate standalone tool provided by UCSC can can be downloaded as single binary executable file http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

$ liftOver -gff reference.gff3 MA625-left.chain MA625-left.genome.gff3 MA625-left.unmapped
$ liftOver -gff reference.gff3 MA625-right.chain MA625-right.genome.gff3 MA625-right.unmapped

Step 04 : The new strain based reference fasta and gff files will need additional parsing to add appropriate flags.

for eg:

from

to

This can be done using awk or more simply using python.


Step 05 : Concat the personal genome and gtf file


(Optional): Extract transcript sequence using reference or personal genome and gff or gtf file

bedtools getfasta [OPTIONS] -fi -bed <BED/GFF/VCF> -fo

Next Step: Now, proceed to competitive alignment on diploid genome.

Competitive alignment can be do on either Reference Transcriptome, Reference Genome or Both.

  • Alignment to Diploid Transcriptome can be done using Bowtie
  • Alignment to Diploid Genome can be done using using any reference genome base RNAseq alignment tool.
  • Alignment to both Genome and Transcriptome can be done using rnaSTAR.