We introduced a graph-based pangenome called GGCPan as a reference for gastric cancer, and systematically compared the results with those traditional genomics studies using the human reference genome or a linear pangenome as the references. This is our construction method and the analysis pipeline.
- step1:Align assembled contigs(>500bp) to GRCh38 using minimap2.
- step2:Detect variants using paftools.js and filter the small variants and variants with quality <= 60.
- step3:Embed variants to GRCh38 using vg toolkit.
step1,2,3 are included in the following construct.sh file, which can be run to generate the graph-modeled pangenome.
bash construct.sh
- We firstly align the raw reads to references with BWA MEM,then mark duplications and adjust the base quality with GATK BQSR.
- The codes are stored in
Alignment/gatk.slurm
.
- The raw reads are aligned to graph-modeled reference with
vg giraffe
. - To detect snps and indels from graph-modeled pangenome,we convert graph alignment format(.gam) to linear alignment format(.bam).
- The codes are stored in
Alignment/graph.alignment.sh
.
- We used
GATK Mutect2
to detect SNPs and Indels. - The coded are stored in
VariantCalling/mutect2.sh
.
- We used
Manta
,Delly
,Svaba
andSurvivor
to detect SVs based on linear references. We usedvg call
to detect SVs based on graph-modeled references. - The codes are stored in
VariantCalling/linear.variantDetection.sh
andVariantCalling/graph.variantDetection.sh
.
somatic.vcf
is the somatic structural variants generated by our graph modeled pangenome.