You must be signed in to change notification settings - Fork 10
Insertion - an insertion of bases in the query sequence that were not present anywhere on the reference genome.
Figure 1: Insertion example
If an insertion difference has caused alignment fragmentation, it is output in the query_struct.gff and ref_struct.gff files, otherwise it is output in the query_snps.gff and ref_snps.gff files.
An example with the insertion entries in the query_snps.gff file:
##gff-version 3
##sequence-region query_1 1 5000
query_1 NucDiff_v2.0 SO:0000667 503 507 . . . ID=SNP_1;Name=insertion;ins_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=502;query_bases=atata;ref_bases=-;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 1506 1525 . . . ID=SNP_2;Name=insertion;ins_len=20;query_dir=1;ref_sequence=ref_1;ref_coord=1500;query_bases=aagctgtcggctgcagagtc;ref_bases=-;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 3576 3640 . . . ID=SNP_3;Name=insertion;ins_len=65;query_dir=1;ref_sequence=ref_1;ref_coord=3500;query_bases=tcatggcaaaatgactattcgatccgggcggagtgttcctacggttgtctacaacattcaactta;ref_bases=-;color=#EE0000
##sequence-region query_2 1 3000
query_2 NucDiff_v2.0 SO:0000667 4641 4725 . . . ID=SNP_4;Name=insertion;ins_len=85;query_dir=-1;ref_sequence=ref_1;ref_coord=10500;query_bases=ctcattgcgtctgtttatcgtgccgtcatatcgcccgggtacaatccggctgccatcaggagataagatcgctgcgtcgcgtcgc;ref_bases=-;color=#EE0000
The query_snps.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ins_st | |
col 5 | Ins_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SNP_1" | ID in query_snps.gff is equal to ID in ref_snps.gff |
col 9, Name | "insertion" | |
col 9, ins_len | Length(Insertion) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion into a Ref_seq |
col 9, ref_sequence | Ref_seq | |
col 9, ref_coord | Ref_pos | |
col 9, query_bases | ATGC's | |
col 9, ref_bases | "-" |
An example with the insertion entries in ref_snps.gff :
##gff-version 3
##sequence-region ref_1 1 13000
ref_1 NucDiff_v2.0 SO:0000667 502 502 . . . ID=SNP_1;Name=insertion;ins_len=5;query_dir=1;query_sequence=query_1;query_coord=503-507;query_bases=atata;ref_bases=-;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 1500 1500 . . . ID=SNP_2;Name=insertion;ins_len=20;query_dir=1;query_sequence=query_1;query_coord=1506-1525;query_bases=aagctgtcggctgcagagtc;ref_bases=-;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 3500 3500 . . . ID=SNP_3;Name=insertion;ins_len=65;query_dir=1;query_sequence=query_1;query_coord=3576-3640;query_bases=tcatggcaaaatgactattcgatccgggcggagtgttcctacggttgtctacaacattcaactta;ref_bases=-;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 10500 10500 . . . ID=SNP_4;Name=insertion;ins_len=85;query_dir=-1;query_sequence=query_2;query_coord=4641-4725;query_bases=ctcattgcgtctgtttatcgtgccgtcatatcgcccgggtacaatccggctgccatcaggagataagatcgctgcgtcgcgtcgc;ref_bases=-;color=#EE0000
The ref_snps.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ref_pos | |
col 5 | Ref_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SNP_1" | ID in ref_snps.gff is equal to ID in query_snps.gff |
col 9, Name | "insertion" | |
col 9, ins_len | Length(Insertion) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, query_sequence | Query_seq | |
col 9, query_coord | Ins_st-Ins_end | |
col 9, query_bases | ATGC's | the subsequence is reverse complemented if the query_dir value is equal to -1 |
col 9, ref_bases | "-" |
An example with the insertion entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 15063
query_1 NucDiff_v2.0 SO:0000667 2526 2571 . . . ID=SV_1;Name=insertion;ins_len=46;query_dir=1;ref_sequence=ref_1;ref_coord=2500;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 5726 5813 . . . ID=SV_2;Name=insertion;ins_len=88;query_dir=1;ref_sequence=ref_1;ref_coord=5500;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 6814 6913 . . . ID=SV_3;Name=insertion;ins_len=100;query_dir=1;ref_sequence=ref_1;ref_coord=6500;color=#EE0000
query_1 NucDiff_v2.0 SO:0000667 7914 8063 . . . ID=SV_4;Name=insertion;ins_len=150;query_dir=1;ref_sequence=ref_1;ref_coord=7500;color=#EE0000
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ins_st | |
col 5 | Ins_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is equal to ID in ref_struct.gff |
col 9, Name | "insertion" | |
col 9, ins_len | Length(Insertion) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, ref_sequence | Ref_seq | |
col 9, ref_coord | Ref_pos |
An example with the insertion entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 13000
ref_1 NucDiff_v2.0 SO:0000667 2500 2500 . . . ID=SV_1;Name=insertion;ins_len=46;query_dir=1;query_sequence=query_1;query_coord=2526-2571;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 5500 5500 . . . ID=SV_2;Name=insertion;ins_len=88;query_dir=1;query_sequence=query_1;query_coord=5726-5813;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 6500 6500 . . . ID=SV_3;Name=insertion;ins_len=100;query_dir=1;query_sequence=query_1;query_coord=6814-6913;color=#EE0000
ref_1 NucDiff_v2.0 SO:0000667 7500 7500 . . . ID=SV_4;Name=insertion;ins_len=150;query_dir=1;query_sequence=query_1;query_coord=7914-8063;color=#EE0000
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000667 | Sequence Ontology accession number corresponding to the "insertion" SO term |
col 4 | Ref_pos | |
col 5 | Ref_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in ref_struct.gff is equal to ID in query_struct.gff |
col 9, Name | "insertion" | |
col 9, ins_len | Length(Insertion) | |
col 9, query_dir | "1" or "-1" | -1 if the inserted fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, query_sequence | Query_seq | |
col 9, query_coord | Ins_st-Ins_end |