-
Notifications
You must be signed in to change notification settings - Fork 0
Create a "Diploid Genome" and "Diploid GTF GFF" file (method B)
Kiran N' Bishwa edited this page Jun 11, 2018
·
1 revision
- python application called G2Gtools.
Below, is a bash script using a for-loop implementation for all available samples.
#!/bin/bash
set -euf pipefail # to break the pipeline if a workflow is disturbed in any of the code lines.
# redirect stdout/stderror to a file
exec &> CreateDiploidGenome.log
#### Function/Purpose:
# ** Note: Now, we have phased VCF after phaseExtension and Stitching we can extract the Gene sequences using Ref Genome, GTF/GFF and VCF.
# This gene sequence can be used for
# - aligning reads competitively to diploid transcriptome
# - used the sample wise genes, transcript sequence for population genetics analyses.
echo "Part A: Convert the VCF to gzipped format and create index"
# Convert VCF to gzipped VCF and create tabix index
bgzip -f phased.MySpF1.vcf
tabix -f phased.MySpF1.vcf.gz
echo
echo "Part B: Assign the file and folder path"
## Assign the file and folder path
PhasedVcfGz=phased.MySpF1.vcf.gz
G2GTools=/home/priyanka/g2gtools-master/bin/g2gtools
RefGenome=lyrata_genome1To10.fa
RefGtf=lyrata_RawatV2.chr1To10.gtf
RefGff=lyrata.V2.chr1To10.gff
echo
echo "Part C: Starting G2Gtools"
echo "Creating folder for storing output from G2Gtools"
G2GOUT=g2gResults
rm -rf ${G2GOUT}; mkdir ${G2GOUT}
# Now, start exacting required data for each sample.
echo
echo "Starting for loop on given sample names"
echo
# Put, the name of all the required samples.
for item in ms01e ms02g ms03g ms04h MA605 MA611 MA622 MA625 MA629 Ncm8 Sp3 Sp21 Sp76 Sp154 Sp164 SpNor33
do
echo
echo "working on sample ${item}"
# make sample wise directory on the fly
rm -rf ${G2GOUT}/${item}_g2gOut; mkdir ${G2GOUT}/${item}_g2gOut
# 01) now, create VCI file for each sample
echo
echo "Step 01: making VCI file for the given sample ${item}"
python ${G2GTools} vcf2vci -i ${PhasedVcfGz} \
-s ${item} -o ${G2GOUT}/${item}_g2gOut/${item}.vci --diploid
echo
echo "Step 02 - A: Creating Diploid Genome and patching SNPs"
# 02 - A) Create Diploid Genome using reference genome and patch SNP using sample wise VCI file
python ${G2GTools} patch -i ${RefGenome} \
-c ${G2GOUT}/${item}_g2gOut/${item}.vci.gz \
-o ${G2GOUT}/${item}_g2gOut/${item}.SNP.pathched.fa
echo
echo "Step 02 - B: Patching InDels to the SNP patched Genome"
# 02 - B) Patch InDels to the SNP patched Diploid Genome
python ${G2GTools} transform -i ${G2GOUT}/${item}_g2gOut/${item}.SNP.pathched.fa \
-c ${G2GOUT}/${item}_g2gOut/${item}.vci.gz \
-o ${G2GOUT}/${item}_g2gOut/${item}.Snp.Indels.patched.fa
echo
echo "Step 03: Creating sample GTF,GFF from sample ${item} VCI and reference GTF,GFF file"
# 03) Convert reference GFF and GTF to sample wise GTF and GFF
echo "Step 03 - A: Creating sample ${item} GTF file"
# 03 - A) Ref Gtf to sample Gtf
python ${G2GTools} convert -i ${RefGtf} \
-c ${G2GOUT}/${item}_g2gOut/${item}.vci.gz \
-o ${G2GOUT}/${item}_g2gOut/${item}.gtf
echo "Step 03 - B: Creating sample ${item} GFF file"
# 03 - B) Ref Gff to sample Gff
python ${G2GTools} convert -i ${RefGff} \
-c ${G2GOUT}/${item}_g2gOut/${item}.vci.gz \
-o ${G2GOUT}/${item}_g2gOut/${item}.gff
echo
echo "Step 04: Creating SQLite database from personal ${item} GTF"
# *Note: Need to add a method to prepare SQlite database from personal GFF.
# 04) Convert sample Gtf to SQlite database file
# Parse custom gene annotation into g2g database file
python ${G2GTools} gtf2db \
-i ${G2GOUT}/${item}_g2gOut/${item}.gtf \
-o ${G2GOUT}/${item}_g2gOut/${item}.gtf.db
echo
echo "Step 05 : Extract genes, transcritps sequences"
# 05) Extract genes, transcripts and exons
# for now, we are only extracting genes and transcripts
echo "Step 05 -A : Extracting gene sequences for this sample ${item}"
# 05 - A) Genes
python ${G2GTools} extract \
-i ${G2GOUT}/${item}_g2gOut/${item}.Snp.Indels.patched.fa \
-db ${G2GOUT}/${item}_g2gOut/${item}.gtf.db \
--genes > ${G2GOUT}/${item}_g2gOut/${item}.genes.fa
echo "Step 05 - B: Extracting transcript sequences for this sample ${item}"
# 05 - B) Transcripts
python ${G2GTools} extract \
-i ${G2GOUT}/${item}_g2gOut/${item}.Snp.Indels.patched.fa \
-db ${G2GOUT}/${item}_g2gOut/${item}.gtf.db \
--transcripts > ${G2GOUT}/${item}_g2gOut/${item}.transcripts.fa
echo
echo "Completed Sample ${item}"
echo "##### ****** #####"
echo
done
echo "#### Completed all G2GTools process ####"
echo
Competitive alignment can be do on either Reference Transcriptome, Reference Genome or Both.
- Alignment to Diploid Transcriptome can be done using Bowtie
- Alignment to Diploid Genome can be done using using any reference genome base RNAseq alignment tool.
- Alignment to both Genome and Transcriptome can be done using rnaSTAR.