Skip to content

Insertions

kseniakh edited this page Mar 10, 2017 · 1 revision

Insertions

Insertion - an insertion of bases in the query sequence that were not present anywhere on the reference genome.

Figure 1: Insertion example



If an insertion difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.



An example with the insertion entries in the query_snps.gff file:

##gff-version 3
##sequence-region	query_1	1	5000
query_1	NucDiff_v2.0	SO:0000667	503	507	.	.	.	ID=SNP_1;Name=insertion;ins_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=502;query_bases=atata;ref_bases=-;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	1506	1525	.	.	.	ID=SNP_2;Name=insertion;ins_len=20;query_dir=1;ref_sequence=ref_1;ref_coord=1500;query_bases=aagctgtcggctgcagagtc;ref_bases=-;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	3576	3640	.	.	.	ID=SNP_3;Name=insertion;ins_len=65;query_dir=1;ref_sequence=ref_1;ref_coord=3500;query_bases=tcatggcaaaatgactattcgatccgggcggagtgttcctacggttgtctacaacattcaactta;ref_bases=-;color=#EE0000
##sequence-region	query_2	1	3000
query_2	NucDiff_v2.0	SO:0000667	4641	4725	.	.	.	ID=SNP_4;Name=insertion;ins_len=85;query_dir=-1;ref_sequence=ref_1;ref_coord=10500;query_bases=ctcattgcgtctgtttatcgtgccgtcatatcgcccgggtacaatccggctgccatcaggagataagatcgctgcgtcgcgtcgc;ref_bases=-;color=#EE0000



The query_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ins_st
col 5 Ins_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in query_snps.gff is equal to ID in ref_snps.gff
col 9, Name "insertion"
col 9, ins_len Length(Insertion)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion into a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord Ref_pos
col 9, query_bases ATGC's
col 9, ref_bases "-"



An example with the insertion entries in ref_snps.gff :

##gff-version 3
##sequence-region	ref_1	1	13000
ref_1	NucDiff_v2.0	SO:0000667	502	502	.	.	.	ID=SNP_1;Name=insertion;ins_len=5;query_dir=1;query_sequence=query_1;query_coord=503-507;query_bases=atata;ref_bases=-;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	1500	1500	.	.	.	ID=SNP_2;Name=insertion;ins_len=20;query_dir=1;query_sequence=query_1;query_coord=1506-1525;query_bases=aagctgtcggctgcagagtc;ref_bases=-;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	3500	3500	.	.	.	ID=SNP_3;Name=insertion;ins_len=65;query_dir=1;query_sequence=query_1;query_coord=3576-3640;query_bases=tcatggcaaaatgactattcgatccgggcggagtgttcctacggttgtctacaacattcaactta;ref_bases=-;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	10500	10500	.	.	.	ID=SNP_4;Name=insertion;ins_len=85;query_dir=-1;query_sequence=query_2;query_coord=4641-4725;query_bases=ctcattgcgtctgtttatcgtgccgtcatatcgcccgggtacaatccggctgccatcaggagataagatcgctgcgtcgcgtcgc;ref_bases=-;color=#EE0000



The ref_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ref_pos
col 5 Ref_pos
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in ref_snps.gff is equal to ID in query_snps.gff
col 9, Name "insertion"
col 9, ins_len Length(Insertion)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord Ins_st-Ins_end
col 9, query_bases ATGC's the subsequence is reverse complemented if the query_dir value is equal to -1
col 9, ref_bases "-"



An example with the insertion entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	15063
query_1	NucDiff_v2.0	SO:0000667	2526	2571	.	.	.	ID=SV_1;Name=insertion;ins_len=46;query_dir=1;ref_sequence=ref_1;ref_coord=2500;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	5726	5813	.	.	.	ID=SV_2;Name=insertion;ins_len=88;query_dir=1;ref_sequence=ref_1;ref_coord=5500;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	6814	6913	.	.	.	ID=SV_3;Name=insertion;ins_len=100;query_dir=1;ref_sequence=ref_1;ref_coord=6500;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	7914	8063	.	.	.	ID=SV_4;Name=insertion;ins_len=150;query_dir=1;ref_sequence=ref_1;ref_coord=7500;color=#EE0000



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ins_st
col 5 Ins_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is equal to ID in ref_struct.gff
col 9, Name "insertion"
col 9, ins_len Length(Insertion)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord Ref_pos



An example with the insertion entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	13000
ref_1	NucDiff_v2.0	SO:0000667	2500	2500	.	.	.	ID=SV_1;Name=insertion;ins_len=46;query_dir=1;query_sequence=query_1;query_coord=2526-2571;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	5500	5500	.	.	.	ID=SV_2;Name=insertion;ins_len=88;query_dir=1;query_sequence=query_1;query_coord=5726-5813;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	6500	6500	.	.	.	ID=SV_3;Name=insertion;ins_len=100;query_dir=1;query_sequence=query_1;query_coord=6814-6913;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	7500	7500	.	.	.	ID=SV_4;Name=insertion;ins_len=150;query_dir=1;query_sequence=query_1;query_coord=7914-8063;color=#EE0000



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ref_pos
col 5 Ref_pos
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in ref_struct.gff is equal to ID in query_struct.gff
col 9, Name "insertion"
col 9, ins_len Length(Insertion)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord Ins_st-Ins_end