-
Notifications
You must be signed in to change notification settings - Fork 10
Relocations with overlap
Relocation - a group of different types of intra-chromosomal structural rearrangements which occur when two regions located in different parts of the same reference sequence are placed nearby in the same query sequence
Relocation with overlap - a relocation with a partial overlap between the two query fragments.
Figure 1: Relocation with overlap example. The reference coordinates End_r_1 and St_r_2, corresponding to the breakpoint ends End_q_1 and St_q_2, respectively, coincide with the reference relocated block ends Rel_end_r_1 and Rel_st_r_2.
Figure 2: Relocation with overlap example. The reference coordinates End_r_1 and St_r_2, corresponding to the breakpoint ends End_q_1 and St_q_2, respectively, do not coincide with the reference relocated block ends Rel_end_r_1 and Rel_st_r_2. A', B' and R' are reverse complements of A, B and R, respectively.
A relocation with overlap difference is output in the query_struct.gff and ref_struct.gff files. Information about the reference repeated regions corresponding to the query overlapped region is output in ref_additional.gff. Information about the relocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.
An example with the relocation with overlap entries in query_struct.gff :
##gff-version 3
##sequence-region query_4 1 1065
query_4 NucDiff_v2.0 SO:0001874 501 565 . . . ID=SV_1;Name=relocation-overlap;overlap_len=65;ref_sequence=ref_1;blk_1_query=1-565;blk_1_ref=34651-35215;blk_1_query_len=565;blk_1_ref_len=565;blk_1_st_query=1;blk_1_st_ref=34651;blk_1_end_query=565;blk_1_end_ref=35215;blk_2_query=501-1065;blk_2_ref=45716-46280;blk_2_query_len=565;blk_2_ref_len=565;blk_2_st_query=501;blk_2_st_ref=45716;blk_2_end_query=1065;blk_2_end_ref=46280;color=#990099
##sequence-region query_8 1 1150
query_8 NucDiff_v2.0 SO:0001874 501 650 . . . ID=SV_2;Name=relocation-overlap;overlap_len=150;ref_sequence=ref_1;blk_1_query=1-650;blk_1_ref=81327-81976;blk_1_query_len=650;blk_1_ref_len=650;blk_1_st_query=1;blk_1_st_ref=81327;blk_1_end_query=650;blk_1_end_ref=81976;blk_2_query=501-1150;blk_2_ref=92477-93126;blk_2_query_len=650;blk_2_ref_len=650;blk_2_st_query=501;blk_2_st_ref=92477;blk_2_end_query=1150;blk_2_end_ref=93126;color=#990099
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0001874 | Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term |
col 4 | St_q_2 | |
col 5 | End_q_1 | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is related to ID in ref_struct.gff |
col 9, Name | "relocation-overlap" | |
col 9, overlap_len | Length(R) | |
col 9, ref_sequence | Ref_seq | |
col 9, blk_1_query | St_q_1 - End_q_1 | |
col 9, blk_1_ref | Rel_st_r_1 - Rel_end_r_1 | |
col 9, blk_1_query_len | Length(A) | |
col 9, blk_1_ref_len | Length(A*) | |
col 9, blk_1_st_query | St_q_1 | |
col 9, blk_1_st_ref | St_r_1 | |
col 9, blk_1_end_query | End_q_1 | |
col 9, blk_1_end_ref | End_r_1 | |
col 9, blk_2_query | St_q_2 - End_q_2 | |
col 9, blk_2_ref | Rel_st_r_2 - Rel_end_r_2 | |
col 9, blk_2_query_len | Length(B) | |
col 9, blk_2_ref_len | Length(B*) | |
col 9, blk_2_st_query | St_q_2 | |
col 9, blk_2_st_ref | St_r_2 | |
col 9, blk_2_end_query | End_q_2 | |
col 9, blk_2_end_ref | End_r_2 |
An example with the relocation with overlap entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 153626
ref_1 NucDiff_v2.0 SO:0001874 35215 35215 . . . ID=SV_1.1;Name=relocation-overlap;overlap_len=65;query_sequence=query_4;query_coord=565;breakpoint_query=501-565;blk_query=1-565;blk_ref=34651-35215;blk_query_len=565;blk_ref_len=565;color=#990099
ref_1 NucDiff_v2.0 SO:0001874 45716 45716 . . . ID=SV_1.2;Name=relocation-overlap;overlap_len=65;query_sequence=query_4;query_coord=501;breakpoint_query=501-565;blk_query=501-1065;blk_ref=45716-46280;blk_query_len=565;blk_ref_len=565;color=#990099
ref_1 NucDiff_v2.0 SO:0001874 81976 81976 . . . ID=SV_2.1;Name=relocation-overlap;overlap_len=150;query_sequence=query_8;query_coord=650;breakpoint_query=501-650;blk_query=1-650;blk_ref=81327-81976;blk_query_len=650;blk_ref_len=650;color=#990099
ref_1 NucDiff_v2.0 SO:0001874 92477 92477 . . . ID=SV_2.2;Name=relocation-overlap;overlap_len=150;query_sequence=query_8;query_coord=501;breakpoint_query=501-650;blk_query=501-1150;blk_ref=92477-93126;blk_query_len=650;blk_ref_len=650;color=#990099
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content for Relocation block 1 | Content for Relocation block 2 | Notes |
---|---|---|---|
col 1 | Ref_seq | Ref_seq | |
col 2 | NucDiff_v2.0 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0001874 | SO:0001874 | Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term |
col 4 | End_r_1 | St_r_2 | |
col 5 | End_r_1 | St_r_2 | |
col 6/col 7/col8 | . | . | score/strand/phase fields are not used |
col 9, ID | "SV_1.1" | "SV_1.2" | ID in ref_struct.gff is related to ID in query_struct.gff |
col 9, Name | "relocation-overlap" | "relocation-overlap" | |
col 9, overlap_len | Length(R) | Length(R) | |
col 9, query_sequence | Query_seq | Query_seq | |
col 9, query_coord | End_q_1 | St_q_2 | |
col 9, breakpoint_query | St_q_2 - End_q_1 | St_q_2 - End_q_1 | |
col 9, blk_query | St_q_1 - End_q_1 | St_q_2 - End_q_2 | |
col 9, blk_ref | Rel_st_r_1 - Rel_end_r_1 | Rel_st_r_2 - Rel_end_r_2 | |
col 9, blk_query_len | Length(A) | Length(B) | |
col 9, blk_ref_len | Length(A*) | Length(B*) |
An example with the additional information in ref_additional.gff :
##gff-version 3
##sequence-region ref_1 1 153626
ref_1 NucDiff_v2.0 SO:0000001 35151 35215 . . . ID=Region_1;Name=Relocation_overlap_region;overlap_len=65;color=#00A123
ref_1 NucDiff_v2.0 SO:0000001 45716 45780 . . . ID=Region_2;Name=Relocation_overlap_region;overlap_len=65;color=#00A123
ref_1 NucDiff_v2.0 SO:0000001 81827 81976 . . . ID=Region_3;Name=Relocation_overlap_region;overlap_len=150;color=#00A123
ref_1 NucDiff_v2.0 SO:0000001 92477 92626 . . . ID=Region_4;Name=Relocation_overlap_region;overlap_len=150;color=#00A123
A ref_additional.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content for Relocation block 1 | Content for Relocation block 2 | Notes |
---|---|---|---|
col 1 | Ref_seq | Ref_seq | |
col 2 | NucDiff_v2.0 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0000001 | SO:0000001 | Sequence Ontology accession number corresponding to the "region" SO term |
col 4 | Rep_st | St_r_2 | |
col 5 | End_r_1 | Rep_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used | |
col 9, ID | "Region_1" | "Region_2" | |
col 9, Name | "Relocation_overlap_region" | "Relocation_overlap_region" | |
col 9, overlap_len | Length(R) | Length(R) |