-
Notifications
You must be signed in to change notification settings - Fork 10
Collapsed repeats
Collapsed repeat - a deletion of one copy of an interspersed repeat from the reference sequence in a query sequence.
Figure 1: Collapsed repeat example. a) corresponds to a case where a query sequence has the same direction as a reference sequence. b) corresponds to a case with a reverse complemented query sequence. Collapsed_repeat, Repeat_q and Repeat_r are similar or near-similar repeats.
A collapsed repeat difference is output in the query_struct.gff and ref_struct.gff files. Information about the locations of the repeated regions involved in a difference is contained in the query_additional.gff and ref_additional.gff files.
An example with the collapsed repeat entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:0000159 2515 2515 . . . ID=SV_1;Name=deletion;del_len=80;query_dir=1;ref_sequence=ref_1;ref_coord=2561-2640;color=#0000EE
query_1 NucDiff_v2.0 SO:0000159 2515 2515 . . . ID=SV_2;Name=collapsed_repeat;del_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=2641-2645;query_repeated_region=2511-2515;color=#0000EE
query_1 NucDiff_v2.0 SO:0000159 3523 3523 . . . ID=SV_3;Name=deletion;del_len=147;query_dir=1;ref_sequence=ref_1;ref_coord=3654-3800;color=#0000EE
query_1 NucDiff_v2.0 SO:0000159 3523 3523 . . . ID=SV_4;Name=collapsed_repeat;del_len=8;query_dir=1;ref_sequence=ref_1;ref_coord=3801-3808;query_repeated_region=3516-3523;color=#0000EE
query_1 NucDiff_v2.0 SO:0000159 4525 4525 . . . ID=SV_5;Name=deletion;del_len=250;query_dir=1;ref_sequence=ref_1;ref_coord=4811-5060;color=#0000EE
query_1 NucDiff_v2.0 SO:0000159 4525 4525 . . . ID=SV_6;Name=collapsed_repeat;del_len=5;query_dir=1;ref_sequence=ref_1;ref_coord=5061-5065;query_repeated_region=4521-4525;color=#0000EE
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000159 | Sequence Ontology accession number corresponding to the "deletion" SO term |
col 4 | Q_pos | |
col 5 | Q_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is equal to ID in ref_struct.gff |
col 9, Name | "collapsed_repeat" | |
col 9, del_len | Length(Collapsed_repeat) | |
col 9, query_dir | "1" or "-1" | -1 if the duplicated reference fragment should be reverse complemented before its insertion to a Query_seq |
col 9, ref_sequence | Ref_seq | |
col 9, ref_coord | Del_st - Del_end | |
col 9, query_repeated_region | St_q - Q_pos |
An example with the collapsed repeat entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 75565
ref_1 NucDiff_v2.0 SO:0000159 2561 2640 . . . ID=SV_1;Name=deletion;del_len=80;query_dir=1;query_sequence=query_1;query_coord=2515;color=#0000EE
ref_1 NucDiff_v2.0 SO:0000159 2641 2645 . . . ID=SV_2;Name=collapsed_repeat;del_len=5;query_dir=1;query_sequence=query_1;query_coord=2515;ref_repeated_region=2556-2560;color=#0000EE
ref_1 NucDiff_v2.0 SO:0000159 3654 3800 . . . ID=SV_3;Name=deletion;del_len=147;query_dir=1;query_sequence=query_1;query_coord=3523;color=#0000EE
ref_1 NucDiff_v2.0 SO:0000159 3801 3808 . . . ID=SV_4;Name=collapsed_repeat;del_len=8;query_dir=1;query_sequence=query_1;query_coord=3523;ref_repeated_region=3646-3653;color=#0000EE
ref_1 NucDiff_v2.0 SO:0000159 4811 5060 . . . ID=SV_5;Name=deletion;del_len=250;query_dir=1;query_sequence=query_1;query_coord=4525;color=#0000EE
ref_1 NucDiff_v2.0 SO:0000159 5061 5065 . . . ID=SV_6;Name=collapsed_repeat;del_len=5;query_dir=1;query_sequence=query_1;query_coord=4525;ref_repeated_region=4806-4810;color=#0000EE
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000159 | Sequence Ontology accession number corresponding to the "deletion" SO term |
col 4 | Del_st | |
col 5 | Del_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in ref_struct.gff is equal to ID in query_struct.gff |
col 9, Name | "collapsed_repeat" | |
col 9, del_len | Length(Collapsed_repeat) | |
col 9, query_dir | "1" or "-1" | -1 if the duplicated reference fragment should be reverse complemented before its insertion to a Ref_seq |
col 9, query_sequence | Query_seq | |
col 9, query_coord | Q_pos | |
col 9, ref_repeated_region | St_r - End_r |
An example with the additional information in query_additional.gff :
##gff-version 3
##sequence-region query_1 1 57855
query_1 NucDiff_v2.0 SO:0000657 2511 2515 . . . ID=Region_1;Name=Repeated_region;query_repeat_len=5;difference_type=collapsed_repeat;difference_coord_query=2515-2515;difference_len=5
query_1 NucDiff_v2.0 SO:0000657 3516 3523 . . . ID=Region_2;Name=Repeated_region;query_repeat_len=8;difference_type=collapsed_repeat;difference_coord_query=3523-3523;difference_len=8
query_1 NucDiff_v2.0 SO:0000657 4521 4525 . . . ID=Region_3;Name=Repeated_region;query_repeat_len=5;difference_type=collapsed_repeat;difference_coord_query=4525-4525;difference_len=5
The query_additional.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000657 | Sequence Ontology accession number corresponding to the "repeat_region" SO term |
col 4 | St_q | |
col 5 | Q_pos | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "Region_1" | IDs in query_additional.gff and ref_additional.gff are independent |
col 9, Name | "Repeated_region" | |
col 9, query_repeat_len | Length(Repeat_q) | |
col 9, difference_type | "collapsed_repeat" | |
col 9, difference_coord_query | Q_pos - Q_pos | |
col 9, difference_len | Length(Collapsed_repeat) |
An example with the additional information in ref_additional.gff :
##gff-version 3
##sequence-region ref_1 1 75565
ref_1 NucDiff_v2.0 SO:0000657 2556 2560 . . . ID=Region_1;Name=Repeated_region;ref_repeat_len=5;difference_type=collapsed_repeat;difference_coord_ref=2641-2645;difference_len=5;color=#DB0101
ref_1 NucDiff_v2.0 SO:0000657 3646 3653 . . . ID=Region_2;Name=Repeated_region;ref_repeat_len=8;difference_type=collapsed_repeat;difference_coord_ref=3801-3808;difference_len=8;color=#DB0101
ref_1 NucDiff_v2.0 SO:0000657 4806 4810 . . . ID=Region_3;Name=Repeated_region;ref_repeat_len=5;difference_type=collapsed_repeat;difference_coord_ref=5061-5065;difference_len=5;color=#DB0101
A ref_additional.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Ref_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000657 | Sequence Ontology accession number corresponding to the "repeat_region" SO term |
col 4 | St_r | |
col 5 | End_r | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "Region_1" | IDs in query_additional.gff and ref_additional.gff are independent |
col 9, Name | "Repeated_region" | |
col 9, ref_repeat_len | Length(Repeat_r) | |
col 9, difference_type | "collapsed_repeat" | |
col 9, difference_coord_ref | Del_st - Del_end | |
col 9, difference_len | Length(Collapsed_repeat) |