Skip to content

Relocations with insertion

kseniakh edited this page Mar 10, 2017 · 1 revision

Relocations with insertion

Relocation - a group of different types of intra-chromosomal structural rearrangements which occur when two regions located in different parts of the same reference sequence are placed nearby in the same query sequence

Relocation with insertion - a relocation where two query fragments have a stretch of bases (not N's) inserted between them, not mapped anywhere on the reference genome. The inserted region is treated as a simple insertion difference.



Figure 1: Relocation with insertion example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query relocated block A and to the start of the query relocated block B (End_q_1 and St_q_2, respectively), coincide with the end of the reference relocated block A* and with the start of the reference relocated block B* (Rel_end_r_1 and Rel_st_r_2).



Figure 2: Relocation with insertion example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query relocated block A and to the start of the query relocated block B (End_q_1 and St_q_2, respectively), do not coincide with the end of the reference relocated block A* and with the start of the reference relocated block B* (Rel_end_r_1 and Rel_st_r_2).



A relocation with insertion difference is output in the query_struct.gff and ref_struct.gff files. Information about the relocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.



An example with the relocation with insertion entries in query_struct.gff :

##gff-version 3
##sequence-region	query_3	1	1050
query_3	NucDiff_v2.0	SO:0001874	501	550	.	.	.	ID=SV_1;Name=relocation-insertion;ins_len=50;ref_sequence=ref_1;blk_1_query=1-500;blk_1_ref=23001-23500;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=23001;blk_1_end_query=500;blk_1_end_ref=23500;blk_2_query=551-1050;blk_2_ref=34001-34500;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=551;blk_2_st_ref=34001;blk_2_end_query=1050;blk_2_end_ref=34500;color=#990099
##sequence-region	query_12	1	1350
query_12	NucDiff_v2.0	SO:0001874	501	850	.	.	.	ID=SV_2;Name=relocation-insertion;ins_len=350;ref_sequence=ref_1;blk_1_query=1-500;blk_1_ref=126501-127000;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=126501;blk_1_end_query=500;blk_1_end_ref=127000;blk_2_query=851-1350;blk_2_ref=137501-138000;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=851;blk_2_st_ref=137501;blk_2_end_query=1350;blk_2_end_ref=138000;color=#990099



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 Ins_st
col 5 Ins_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is related to ID in ref_struct.gff
col 9, Name "relocation-insertion"
col 9, ins_len Length(Ins)
col 9, ref_sequence Ref_seq
col 9, blk_1_query St_q_1 - End_q_1
col 9, blk_1_ref Rel_st_r_1 - Rel_end_r_1
col 9, blk_1_query_len Length(A)
col 9, blk_1_ref_len Length(A*)
col 9, blk_1_st_query St_q_1
col 9, blk_1_st_ref St_r_1
col 9, blk_1_end_query End_q_1
col 9, blk_1_end_ref End_r_1
col 9, blk_2_query St_q_2 - End_q_2
col 9, blk_2_ref Rel_st_r_2 - Rel_end_r_2
col 9, blk_2_query_len Length(B)
col 9, blk_2_ref_len Length(B*)
col 9, blk_2_st_query St_q_2
col 9, blk_2_st_ref St_r_2
col 9, blk_2_end_query End_q_2
col 9, blk_2_end_ref End_r_2



An example with the relocation with insertion entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	149500
ref_1	NucDiff_v2.0	SO:0001874	23500	23500	.	.	.	ID=SV_1.1;Name=relocation-insertion;ins_len=50;query_sequence=query_3;query_coord=500;breakpoint_query=501-550;blk_query=1-500;blk_ref=23001-23500;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	34001	34001	.	.	.	ID=SV_1.2;Name=relocation-insertion;ins_len=50;query_sequence=query_3;query_coord=551;breakpoint_query=501-550;blk_query=551-1050;blk_ref=34001-34500;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	127000	127000	.	.	.	ID=SV_2.1;Name=relocation-insertion;ins_len=350;query_sequence=query_12;query_coord=500;breakpoint_query=501-850;blk_query=1-500;blk_ref=126501-127000;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	137501	137501	.	.	.	ID=SV_2.2;Name=relocation-insertion;ins_len=350;query_sequence=query_12;query_coord=851;breakpoint_query=501-850;blk_query=851-1350;blk_ref=137501-138000;blk_query_len=500;blk_ref_len=500;color=#990099



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content for Relocation block 1 Content for Relocation block 2 Notes
col 1 Ref_seq Ref_seq
col 2 NucDiff_v2.0 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 End_r_1 St_r_2
col 5 End_r_1 St_r_2
col 6/col 7/col8 . . score/strand/phase fields are not used
col 9, ID "SV_1.1" "SV_1.2" ID in ref_struct.gff is related to ID in query_struct.gff
col 9, Name "relocation-insertion" "relocation-insertion"
col 9, ins_len Length(Ins) Length(Ins)
col 9, query_sequence Query_seq Query_seq
col 9, query_coord End_q_1 St_q_2 a query_coord base corresponds to the reference base from col 4
col 9, breakpoint_query Ins_st - Ins_end Ins_st - Ins_end
col 9, blk_query St_q_1 - End_q_1 St_q_2 - End_q_2
col 9, blk_ref Rel_st_r_1 - Rel_end_r_1 Rel_st_r_2 - Rel_end_r_2
col 9, blk_query_len Length(A) Length(B)
col 9, blk_ref_len Length(A*) Length(B*)