Skip to content

Relocations with insertion and inserted gap

kseniakh edited this page Mar 10, 2017 · 1 revision

Relocations with insertion and inserted gap

Relocation - a group of different types of intra-chromosomal structural rearrangements which occur when two regions located in different parts of the same reference sequence are placed nearby in the same query sequence

Relocation with insertion and inserted gap - a relocation where two query fragments have a stretch of bases (both ATGC's and N's) inserted between them, not mapped anywhere on the reference genome. The inserted region is treated as simple insertion and inserted gap difference.



Figure 1: Relocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query relocated block A and to the start of the query relocated block B (End_q_1 and St_q_2, respectively), coincide with the end of the reference relocated block A* and with the start of the reference relocated block B* (Rel_end_r_1 and Rel_st_r_2).



Figure 2: Relocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query relocated block A and to the start of the query relocated block B (End_q_1 and St_q_2, respectively), do not coincide with the end of the reference relocated block A* and with the start of the reference relocated block B* (Rel_end_r_1 and Rel_st_r_2).



A relocation with insertion and inserted gap difference is output in the query_struct.gff and ref_struct.gff files. Information about the relocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.



An example with the relocation with insertion and inserted gap entries in query_struct.gff :

##gff-version 3
##sequence-region	query_4	1	1065
query_4	NucDiff_v2.0	SO:0001874	501	565	.	.	.	ID=SV_1;Name=relocation-insertion_ATGCN;ins_len=65;ref_sequence=ref_1;blk_1_query=1-500;blk_1_ref=34501-35000;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=34501;blk_1_end_query=500;blk_1_end_ref=35000;blk_2_query=566-1065;blk_2_ref=45501-46000;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=566;blk_2_st_ref=45501;blk_2_end_query=1065;blk_2_end_ref=46000;color=#990099
##sequence-region	query_7	1	1100
query_7	NucDiff_v2.0	SO:0001874	501	600	.	.	.	ID=SV_2;Name=relocation-insertion_ATGCN;ins_len=100;ref_sequence=ref_1;blk_1_query=1-500;blk_1_ref=69001-69500;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=69001;blk_1_end_query=500;blk_1_end_ref=69500;blk_2_query=601-1100;blk_2_ref=80001-80500;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=601;blk_2_st_ref=80001;blk_2_end_query=1100;blk_2_end_ref=80500;color=#990099



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 Ins_st
col 5 Ins_end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is related to ID in ref_struct.gff
col 9, Name "relocation-insertion_ATGCN"
col 9, ins_len Length(Insertion_mixed)
col 9, ref_sequence Ref_seq
col 9, blk_1_query St_q_1 - End_q_1
col 9, blk_1_ref Rel_st_r_1 - Rel_end_r_1
col 9, blk_1_query_len Length(A)
col 9, blk_1_ref_len Length(A*)
col 9, blk_1_st_query St_q_1
col 9, blk_1_st_ref St_r_1
col 9, blk_1_end_query End_q_1
col 9, blk_1_end_ref End_r_1
col 9, blk_2_query St_q_2 - End_q_2
col 9, blk_2_ref Rel_st_r_2 - Rel_end_r_2
col 9, blk_2_query_len Length(B)
col 9, blk_2_ref_len Length(B*)
col 9, blk_2_st_query St_q_2
col 9, blk_2_st_ref St_r_2
col 9, blk_2_end_query End_q_2
col 9, blk_2_end_ref End_r_2



An example with the relocation with insertion and inserted gap entries in ref_struct.gff :

##gff-version 3
ref_1	NucDiff_v2.0	SO:0001874	35000	35000	.	.	.	ID=SV_1.1;Name=relocation-insertion_ATGCN;ins_len=65;query_sequence=query_4;query_coord=500;breakpoint_query=501-565;blk_query=1-500;blk_ref=34501-35000;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	45501	45501	.	.	.	ID=SV_1.2;Name=relocation-insertion_ATGCN;ins_len=65;query_sequence=query_4;query_coord=566;breakpoint_query=501-565;blk_query=566-1065;blk_ref=45501-46000;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	69500	69500	.	.	.	ID=SV_2.1;Name=relocation-insertion_ATGCN;ins_len=100;query_sequence=query_7;query_coord=500;breakpoint_query=501-600;blk_query=1-500;blk_ref=69001-69500;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	80001	80001	.	.	.	ID=SV_2.2;Name=relocation-insertion_ATGCN;ins_len=100;query_sequence=query_7;query_coord=601;breakpoint_query=501-600;blk_query=601-1100;blk_ref=80001-80500;blk_query_len=500;blk_ref_len=500;color=#990099



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content for Relocation block 1 Content for Relocation block 2 Notes
col 1 Ref_seq Ref_seq
col 2 NucDiff_v2.0 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 End_r_1 St_r_2
col 5 End_r_1 St_r_2
col 6/col 7/col8 . . score/strand/phase fields are not used
col 9, ID "SV_1.1" "SV_1.2" ID in ref_struct.gff is related to ID in query_struct.gff
col 9, Name "relocation-insertion_ATGCN" "relocation-insertion_ATGCN"
col 9, ins_len Length(Insertion_mixed) Length(Insertion_mixed)
col 9, query_sequence Query_seq Query_seq
col 9, query_coord End_q_1 St_q_2 a query_coord base corresponds to the reference base from col 4
col 9, breakpoint_query Ins_st - Ins_end Ins_st - Ins_end
col 9, blk_query St_q_1 - End_q_1 St_q_2 - End_q_2
col 9, blk_ref Rel_st_r_1 - Rel_end_r_1 Rel_st_r_2 - Rel_end_r_2
col 9, blk_query_len Length(A) Length(B)
col 9, blk_ref_len Length(A*) Length(B*)