Skip to content

Collapsed tandem repeats (case 1)

kseniakh edited this page Mar 10, 2017 · 1 revision

Collapsed tandem repeats (case 1)

Collapsed tandem repeat - a deletion of one or more tandem repeat units from the reference sequence in a query sequence.



Figure 1: Collapsed tandem repeat example (case 1). a) corresponds to a case where a query sequence has the same direction as a reference sequence. b) corresponds to a case with a reverse complemented query sequence. Collapsed_tandem_repeat, Repeat_q and Repeat_r are similar or near-similar repeat units.



A collapsed tandem repeat difference is output in the query_struct.gff and ref_struct.gff files. Information about the locations of the repeated regions involved in a difference is contained in the query_additional.gff and ref_additional.gff files.

An example with the collapsed tandem repeat entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	15063
query_1	NucDiff_v2.0	SO:0000159	3640	3640	.	.	.	ID=SV_1;Name=collapsed_tandem_repeat;del_len=65;query_dir=1;ref_sequence=ref_1;ref_coord=3716-3780;query_repeated_region=3576-3640;color=#0000EE
query_1	NucDiff_v2.0	SO:0000159	5813	5813	.	.	.	ID=SV_2;Name=collapsed_tandem_repeat;del_len=88;query_dir=1;ref_sequence=ref_1;ref_coord=6039-6126;query_repeated_region=5726-5813;color=#0000EE



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000159 Sequence Ontology accession number corresponding to the "deletion" SO term
col 4 Q_pos
col 5 Q_pos
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is equal to ID in ref_struct.gff
col 9, Name "collapsed_tandem_repeat"
col 9, del_len Length(Collapsed_tandem_repeat)
col 9, query_dir "1" or "-1" -1 if the duplicated reference tandem unit should be reverse complemented before its insertion to a Query_seq
col 9, ref_sequence Ref_seq
col 9, ref_coord Del_st - Del_end
col 9, query_repeated_region St_q - Q_pos



An example with the collapsed tandem repeat entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	17126
ref_1	NucDiff_v2.0	SO:0000159	3716	3780	.	.	.	ID=SV_1;Name=collapsed_tandem_repeat;del_len=65;query_dir=1;query_sequence=query_1;query_coord=3640;ref_repeated_region=3651-3715;color=#0000EE
ref_1	NucDiff_v2.0	SO:0000159	6039	6126	.	.	.	ID=SV_2;Name=collapsed_tandem_repeat;del_len=88;query_dir=1;query_sequence=query_1;query_coord=5813;ref_repeated_region=5951-6038;color=#0000EE



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000159 Sequence Ontology accession number corresponding to the "deletion" SO term
col 4 Del_st
col 5 Del_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in ref_struct.gff is equal to ID in query_struct.gff
col 9, Name "collapsed_tandem_repeat"
col 9, del_len Length(Collapsed_tandem_repeat)
col 9, query_dir "1" or "-1" -1 if the duplicated reference tandem unit should be reverse complemented before its insertion to a Query_seq
col 9, query_sequence Query_seq
col 9, query_coord Q_pos
col 9, ref_repeated_region St_r - End_r



An example with the additional information in query_additional.gff :

##gff-version 3
##sequence-region	query_1	1	15063
query_1	NucDiff_v2.0	SO:0000657	3576	3640	.	.	.	ID=Region_1;Name=Repeated_region;query_repeat_len=65;difference_type=collapsed_tandem_repeat;difference_coord_query=3640-3640;difference_len=65
query_1	NucDiff_v2.0	SO:0000657	5726	5813	.	.	.	ID=Region_2;Name=Repeated_region;query_repeat_len=88;difference_type=collapsed_tandem_repeat;difference_coord_query=5813-5813;difference_len=88



A query_additional.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000657 Sequence Ontology accession number corresponding to the "repeat_region" SO term
col 4 St_q
col 5 Q_pos
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "Region_1" IDs in query_additional.gff and ref_additional.gff are independent
col 9, Name "Repeated_region"
col 9, query_repeat_len Length(Repeat_q)
col 9, difference_type "collapsed_tandem_repeat"
col 9, difference_coord_query Q_pos - Q_pos
col 9, difference_len Length(Collapsed_tandem_repeat)



An example with the additional information in ref_additional.gff :

##gff-version 3
##sequence-region	ref_1	1	17126
ref_1	NucDiff_v2.0	SO:0000657	3651	3715	.	.	.	ID=Region_1;Name=Repeated_region;ref_repeat_len=65;difference_type=collapsed_tandem_repeat;difference_coord_ref=3716-3780;difference_len=65;color=#DB0101
ref_1	NucDiff_v2.0	SO:0000657	5951	6038	.	.	.	ID=Region_2;Name=Repeated_region;ref_repeat_len=88;difference_type=collapsed_tandem_repeat;difference_coord_ref=6039-6126;difference_len=88;color=#DB0101



The ref_additional.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Ref_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000657 Sequence Ontology accession number corresponding to the "repeat_region" SO term
col 4 St_r
col 5 End_r
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "Region_1" IDs in query_additional.gff and ref_additional.gff are independent
col 9, Name "Repeated_region"
col 9, ref_repeat_len Length(Repeat_r)
col 9, difference_type "collapsed_tandem_repeat"
col 9, difference_coord_ref Del_st - Del_end
col 9, difference_len Length(Collapsed_tandem_repeat)