Skip to content

Relocations with overlap

kseniakh edited this page Mar 10, 2017 · 1 revision

Relocations with overlap

Relocation - a group of different types of intra-chromosomal structural rearrangements which occur when two regions located in different parts of the same reference sequence are placed nearby in the same query sequence

Relocation with overlap - a relocation with a partial overlap between the two query fragments.



Figure 1: Relocation with overlap example. The reference coordinates End_r_1 and St_r_2, corresponding to the breakpoint ends End_q_1 and St_q_2, respectively, coincide with the reference relocated block ends Rel_end_r_1 and Rel_st_r_2.



Figure 2: Relocation with overlap example. The reference coordinates End_r_1 and St_r_2, corresponding to the breakpoint ends End_q_1 and St_q_2, respectively, do not coincide with the reference relocated block ends Rel_end_r_1 and Rel_st_r_2. A', B' and R' are reverse complements of A, B and R, respectively.



A relocation with overlap difference is output in the query_struct.gff and ref_struct.gff files. Information about the reference repeated regions corresponding to the query overlapped region is output in ref_additional.gff. Information about the relocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.

An example with the relocation with overlap entries in query_struct.gff :

##gff-version 3
##sequence-region	query_4	1	1065
query_4	NucDiff_v2.0	SO:0001874	501	565	.	.	.	ID=SV_1;Name=relocation-overlap;overlap_len=65;ref_sequence=ref_1;blk_1_query=1-565;blk_1_ref=34651-35215;blk_1_query_len=565;blk_1_ref_len=565;blk_1_st_query=1;blk_1_st_ref=34651;blk_1_end_query=565;blk_1_end_ref=35215;blk_2_query=501-1065;blk_2_ref=45716-46280;blk_2_query_len=565;blk_2_ref_len=565;blk_2_st_query=501;blk_2_st_ref=45716;blk_2_end_query=1065;blk_2_end_ref=46280;color=#990099
##sequence-region	query_8	1	1150
query_8	NucDiff_v2.0	SO:0001874	501	650	.	.	.	ID=SV_2;Name=relocation-overlap;overlap_len=150;ref_sequence=ref_1;blk_1_query=1-650;blk_1_ref=81327-81976;blk_1_query_len=650;blk_1_ref_len=650;blk_1_st_query=1;blk_1_st_ref=81327;blk_1_end_query=650;blk_1_end_ref=81976;blk_2_query=501-1150;blk_2_ref=92477-93126;blk_2_query_len=650;blk_2_ref_len=650;blk_2_st_query=501;blk_2_st_ref=92477;blk_2_end_query=1150;blk_2_end_ref=93126;color=#990099



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 St_q_2
col 5 End_q_1
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is related to ID in ref_struct.gff
col 9, Name "relocation-overlap"
col 9, overlap_len Length(R)
col 9, ref_sequence Ref_seq
col 9, blk_1_query St_q_1 - End_q_1
col 9, blk_1_ref Rel_st_r_1 - Rel_end_r_1
col 9, blk_1_query_len Length(A)
col 9, blk_1_ref_len Length(A*)
col 9, blk_1_st_query St_q_1
col 9, blk_1_st_ref St_r_1
col 9, blk_1_end_query End_q_1
col 9, blk_1_end_ref End_r_1
col 9, blk_2_query St_q_2 - End_q_2
col 9, blk_2_ref Rel_st_r_2 - Rel_end_r_2
col 9, blk_2_query_len Length(B)
col 9, blk_2_ref_len Length(B*)
col 9, blk_2_st_query St_q_2
col 9, blk_2_st_ref St_r_2
col 9, blk_2_end_query End_q_2
col 9, blk_2_end_ref End_r_2



An example with the relocation with overlap entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	153626
ref_1	NucDiff_v2.0	SO:0001874	35215	35215	.	.	.	ID=SV_1.1;Name=relocation-overlap;overlap_len=65;query_sequence=query_4;query_coord=565;breakpoint_query=501-565;blk_query=1-565;blk_ref=34651-35215;blk_query_len=565;blk_ref_len=565;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	45716	45716	.	.	.	ID=SV_1.2;Name=relocation-overlap;overlap_len=65;query_sequence=query_4;query_coord=501;breakpoint_query=501-565;blk_query=501-1065;blk_ref=45716-46280;blk_query_len=565;blk_ref_len=565;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	81976	81976	.	.	.	ID=SV_2.1;Name=relocation-overlap;overlap_len=150;query_sequence=query_8;query_coord=650;breakpoint_query=501-650;blk_query=1-650;blk_ref=81327-81976;blk_query_len=650;blk_ref_len=650;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	92477	92477	.	.	.	ID=SV_2.2;Name=relocation-overlap;overlap_len=150;query_sequence=query_8;query_coord=501;breakpoint_query=501-650;blk_query=501-1150;blk_ref=92477-93126;blk_query_len=650;blk_ref_len=650;color=#990099



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content for Relocation block 1 Content for Relocation block 2 Notes
col 1 Ref_seq Ref_seq
col 2 NucDiff_v2.0 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 End_r_1 St_r_2
col 5 End_r_1 St_r_2
col 6/col 7/col8 . . score/strand/phase fields are not used
col 9, ID "SV_1.1" "SV_1.2" ID in ref_struct.gff is related to ID in query_struct.gff
col 9, Name "relocation-overlap" "relocation-overlap"
col 9, overlap_len Length(R) Length(R)
col 9, query_sequence Query_seq Query_seq
col 9, query_coord End_q_1 St_q_2
col 9, breakpoint_query St_q_2 - End_q_1 St_q_2 - End_q_1
col 9, blk_query St_q_1 - End_q_1 St_q_2 - End_q_2
col 9, blk_ref Rel_st_r_1 - Rel_end_r_1 Rel_st_r_2 - Rel_end_r_2
col 9, blk_query_len Length(A) Length(B)
col 9, blk_ref_len Length(A*) Length(B*)



An example with the additional information in ref_additional.gff :

##gff-version 3
##sequence-region	ref_1	1	153626
ref_1	NucDiff_v2.0	SO:0000001	35151	35215	.	.	.	ID=Region_1;Name=Relocation_overlap_region;overlap_len=65;color=#00A123
ref_1	NucDiff_v2.0	SO:0000001	45716	45780	.	.	.	ID=Region_2;Name=Relocation_overlap_region;overlap_len=65;color=#00A123
ref_1	NucDiff_v2.0	SO:0000001	81827	81976	.	.	.	ID=Region_3;Name=Relocation_overlap_region;overlap_len=150;color=#00A123
ref_1	NucDiff_v2.0	SO:0000001	92477	92626	.	.	.	ID=Region_4;Name=Relocation_overlap_region;overlap_len=150;color=#00A123



A ref_additional.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content for Relocation block 1 Content for Relocation block 2 Notes
col 1 Ref_seq Ref_seq
col 2 NucDiff_v2.0 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000001 SO:0000001 Sequence Ontology accession number corresponding to the "region" SO term
col 4 Rep_st St_r_2
col 5 End_r_1 Rep_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "Region_1" "Region_2"
col 9, Name "Relocation_overlap_region" "Relocation_overlap_region"
col 9, overlap_len Length(R) Length(R)