Skip to content

Inserted gaps

kseniakh edited this page Mar 10, 2017 · 1 revision

Inserted gaps

inserted gap - an insertion of unknown bases (N's) in the query sequence in a region which is continuous (without a gap) in the reference, or which results in an elongation of the region of unknown bases in the reference.



Figure 1: Inserted gap example



If an inserted gap difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.

An example with the inserted gap entries in query_snps.gff :

##gff-version 3
##sequence-region	query_1	1	57855
query_1	NucDiff_v2.0	SO:0000667	1506	1530	.	.	.	ID=SNP_1;Name=inserted_gap;ins_len=25;query_dir=1;ref_sequence=ref_1;ref_coord=1505;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	2536	2610	.	.	.	ID=SNP_2;Name=inserted_gap;ins_len=75;query_dir=1;ref_sequence=ref_1;ref_coord=2510;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000



The query_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ins_st
col 5 Ins_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in query_snps.gff is equal to ID in ref_snps.gff
col 9, Name "inserted_gap"
col 9, ins_len Length(Inserted_gap)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord Ref_pos
col 9, query_bases N's
col 9, ref_bases "-"



An example with the inserted gap entries in ref_snps.gff :

##gff-version 3
##sequence-region	ref_1	1	57855
ref_1	NucDiff_v2.0	SO:0000667	1505	1505	.	.	.	ID=SNP_1;Name=inserted_gap;ins_len=25;query_dir=1;query_sequence=query_1;query_coord=1506-1530;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	2510	2510	.	.	.	ID=SNP_2;Name=inserted_gap;ins_len=75;query_dir=1;query_sequence=query_1;query_coord=2536-2610;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
ref_1	



The ref_snps.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ref_pos
col 5 Ref_pos
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SNP_1" ID in ref_snps.gff is equal to ID in query_snps.gff
col 9, Name "inserted_gap"
col 9, ins_len Length(Inserted_gap)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord Ins_st-Ins_end
col 9, query_bases N's the subsequence is reverse complemented if the query_dir value is equal to "-1"
col 9, ref_bases "-"



An example with the inserted gap entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	57855
query_1	NucDiff_v2.0	SO:0000667	3621	3765	.	.	.	ID=SV_1;Name=inserted_gap;ins_len=145;query_dir=1;ref_sequence=ref_1;ref_coord=3520;color=#EE0000
query_1	NucDiff_v2.0	SO:0000667	4771	5015	.	.	.	ID=SV_2;Name=inserted_gap;ins_len=245;query_dir=1;ref_sequence=ref_1;ref_coord=4525;color=#EE0000



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ins_st
col 5 Ins_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is equal to ID in ref_struct.gff
col 9, Name "inserted_gap"
col 9, ins_len Length(Inserted_gap)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord Ref_pos



An example with the inserted gap entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	57855
ref_1	NucDiff_v2.0	SO:0000667	3520	3520	.	.	.	ID=SV_1;Name=inserted_gap;ins_len=145;query_dir=1;query_sequence=query_1;query_coord=3621-3765;color=#EE0000
ref_1	NucDiff_v2.0	SO:0000667	4525	4525	.	.	.	ID=SV_2;Name=inserted_gap;ins_len=245;query_dir=1;query_sequence=query_1;query_coord=4771-5015;color=#EE0000



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000667 Sequence Ontology accession number corresponding to the "insertion" SO term
col 4 Ref_pos
col 5 Ref_pos
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in ref_struct.gff is equal to ID in query_struct.gff
col 9, Name "insertion"
col 9, ins_len Length(Insertion)
col 9, query_dir "1" or "-1" -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq
col 9, query_sequence Query_seq
col 9, query_coord Ins_st-Ins_end