-
Notifications
You must be signed in to change notification settings - Fork 10
Inserted gaps
inserted gap - an insertion of unknown bases (N's) in the query sequence in a region which is continuous (without a gap) in the reference, or which results in an elongation of the region of unknown bases in the reference.
Figure 1: Inserted gap example
If an inserted gap difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.
An example with the inserted gap entries in query_snps.gff :
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:0000667 1506 1530 . . . ID=SNP_1;Name=inserted_gap;ins_len=25;query_dir=1;ref_sequence=ref_1;ref_coord=1505;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 2536 2610 . . . ID=SNP_2;Name=inserted_gap;ins_len=75;query_dir=1;ref_sequence=ref_1;ref_coord=2510;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
The query_snps.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ins_st | |
col 5 | Ins_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SNP_1" | ID in query_snps.gff is equal to ID in ref_snps.gff |
col 9, Name | "inserted_gap" | |
col 9, ins_len | Length(Inserted_gap) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, ref_sequence | Ref_seq | |
col 9, ref_coord | Ref_pos | |
col 9, query_bases | N's | |
col 9, ref_bases | "-" |
An example with the inserted gap entries in ref_snps.gff :
##gff-version 3
##sequence-region ref_1 1 57855
ref_1 NucDiff_v2.0 SO:0000667 1505 1505 . . . ID=SNP_1;Name=inserted_gap;ins_len=25;query_dir=1;query_sequence=query_1;query_coord=1506-1530;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 2510 2510 . . . ID=SNP_2;Name=inserted_gap;ins_len=75;query_dir=1;query_sequence=query_1;query_coord=2536-2610;query_bases=NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN;ref_bases=-;color=#EE0000
ref_1
The ref_snps.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ref_pos | |
col 5 | Ref_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SNP_1" | ID in ref_snps.gff is equal to ID in query_snps.gff |
col 9, Name | "inserted_gap" | |
col 9, ins_len | Length(Inserted_gap) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, query_sequence | Query_seq | |
col 9, query_coord | Ins_st-Ins_end | |
col 9, query_bases | N's | the subsequence is reverse complemented if the query_dir value is equal to "-1" |
col 9, ref_bases | "-" |
An example with the inserted gap entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:0000667 3621 3765 . . . ID=SV_1;Name=inserted_gap;ins_len=145;query_dir=1;ref_sequence=ref_1;ref_coord=3520;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 4771 5015 . . . ID=SV_2;Name=inserted_gap;ins_len=245;query_dir=1;ref_sequence=ref_1;ref_coord=4525;color=#EE0000
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ins_st | |
col 5 | Ins_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is equal to ID in ref_struct.gff |
col 9, Name | "inserted_gap" | |
col 9, ins_len | Length(Inserted_gap) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, ref_sequence | Ref_seq | |
col 9, ref_coord | Ref_pos |
An example with the inserted gap entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 57855
ref_1 NucDiff_v2.0 SO:0000667 3520 3520 . . . ID=SV_1;Name=inserted_gap;ins_len=145;query_dir=1;query_sequence=query_1;query_coord=3621-3765;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 4525 4525 . . . ID=SV_2;Name=inserted_gap;ins_len=245;query_dir=1;query_sequence=query_1;query_coord=4771-5015;color=#EE0000
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ref_pos | |
col 5 | Ref_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in ref_struct.gff is equal to ID in query_struct.gff |
col 9, Name | "insertion" | |
col 9, ins_len | Length(Insertion) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, query_sequence | Query_seq | |
col 9, query_coord | Ins_st-Ins_end |