-
Notifications
You must be signed in to change notification settings - Fork 10
Collapsed tandem repeats (case 1)
Collapsed tandem repeat - a deletion of one or more tandem repeat units from the reference sequence in a query sequence.
Figure 1: Collapsed tandem repeat example (case 1). a) corresponds to a case where a query sequence has the same direction as a reference sequence. b) corresponds to a case with a reverse complemented query sequence. Collapsed_tandem_repeat, Repeat_q and Repeat_r are similar or near-similar repeat units.
A collapsed tandem repeat difference is output in the query_struct.gff and ref_struct.gff files. Information about the locations of the repeated regions involved in a difference is contained in the query_additional.gff and ref_additional.gff files.
An example with the collapsed tandem repeat entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 15063
query_1 NucDiff_v2.0 SO:0000159 3640 3640 . . . ID=SV_1;Name=collapsed_tandem_repeat;del_len=65;query_dir=1;ref_sequence=ref_1;ref_coord=3716-3780;query_repeated_region=3576-3640;color=#0000EE
query_1 NucDiff_v2.0 SO:0000159 5813 5813 . . . ID=SV_2;Name=collapsed_tandem_repeat;del_len=88;query_dir=1;ref_sequence=ref_1;ref_coord=6039-6126;query_repeated_region=5726-5813;color=#0000EE
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000159 | Sequence Ontology accession number corresponding to the "deletion" SO term |
col 4 | Q_pos | |
col 5 | Q_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is equal to ID in ref_struct.gff |
col 9, Name | "collapsed_tandem_repeat" | |
col 9, del_len | Length(Collapsed_tandem_repeat) | |
col 9, query_dir | "1" or "-1" | -1 if the duplicated reference tandem unit should be reverse complemented before its insertion to a Query_seq |
col 9, ref_sequence | Ref_seq | |
col 9, ref_coord | Del_st - Del_end | |
col 9, query_repeated_region | St_q - Q_pos |
An example with the collapsed tandem repeat entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 17126
ref_1 NucDiff_v2.0 SO:0000159 3716 3780 . . . ID=SV_1;Name=collapsed_tandem_repeat;del_len=65;query_dir=1;query_sequence=query_1;query_coord=3640;ref_repeated_region=3651-3715;color=#0000EE
ref_1 NucDiff_v2.0 SO:0000159 6039 6126 . . . ID=SV_2;Name=collapsed_tandem_repeat;del_len=88;query_dir=1;query_sequence=query_1;query_coord=5813;ref_repeated_region=5951-6038;color=#0000EE
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000159 | Sequence Ontology accession number corresponding to the "deletion" SO term |
col 4 | Del_st | |
col 5 | Del_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in ref_struct.gff is equal to ID in query_struct.gff |
col 9, Name | "collapsed_tandem_repeat" | |
col 9, del_len | Length(Collapsed_tandem_repeat) | |
col 9, query_dir | "1" or "-1" | -1 if the duplicated reference tandem unit should be reverse complemented before its insertion to a Query_seq |
col 9, query_sequence | Query_seq | |
col 9, query_coord | Q_pos | |
col 9, ref_repeated_region | St_r - End_r |
An example with the additional information in query_additional.gff :
##gff-version 3
##sequence-region query_1 1 15063
query_1 NucDiff_v2.0 SO:0000657 3576 3640 . . . ID=Region_1;Name=Repeated_region;query_repeat_len=65;difference_type=collapsed_tandem_repeat;difference_coord_query=3640-3640;difference_len=65
query_1 NucDiff_v2.0 SO:0000657 5726 5813 . . . ID=Region_2;Name=Repeated_region;query_repeat_len=88;difference_type=collapsed_tandem_repeat;difference_coord_query=5813-5813;difference_len=88
A query_additional.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000657 | Sequence Ontology accession number corresponding to the "repeat_region" SO term |
col 4 | St_q | |
col 5 | Q_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "Region_1" | IDs in query_additional.gff and ref_additional.gff are independent |
col 9, Name | "Repeated_region" | |
col 9, query_repeat_len | Length(Repeat_q) | |
col 9, difference_type | "collapsed_tandem_repeat" | |
col 9, difference_coord_query | Q_pos - Q_pos | |
col 9, difference_len | Length(Collapsed_tandem_repeat) |
An example with the additional information in ref_additional.gff :
##gff-version 3
##sequence-region ref_1 1 17126
ref_1 NucDiff_v2.0 SO:0000657 3651 3715 . . . ID=Region_1;Name=Repeated_region;ref_repeat_len=65;difference_type=collapsed_tandem_repeat;difference_coord_ref=3716-3780;difference_len=65;color=#DB0101
ref_1 NucDiff_v2.0 SO:0000657 5951 6038 . . . ID=Region_2;Name=Repeated_region;ref_repeat_len=88;difference_type=collapsed_tandem_repeat;difference_coord_ref=6039-6126;difference_len=88;color=#DB0101
The ref_additional.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000657 | Sequence Ontology accession number corresponding to the "repeat_region" SO term |
col 4 | St_r | |
col 5 | End_r | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "Region_1" | IDs in query_additional.gff and ref_additional.gff are independent |
col 9, Name | "Repeated_region" | |
col 9, ref_repeat_len | Length(Repeat_r) | |
col 9, difference_type | "collapsed_tandem_repeat" | |
col 9, difference_coord_ref | Del_st - Del_end | |
col 9, difference_len | Length(Collapsed_tandem_repeat) |