Skip to content

Translocations with insertion and inserted gap

kseniakh edited this page Mar 10, 2017 · 1 revision

Translocations with insertion and inserted gap

Translocation - a group of different types of inter-chromosomal structural rearrangements which occur when two regions located on different reference sequences are placed nearby in the same query sequence.

Translocation with insertion and inserted gap - a translocation where two query fragments have a stretch of bases (both ATGC's and N's) inserted between them, not mapped anywhere on the reference genome. The inserted region is treated as simple insertion and inserted gap differences.



Figure 1: Translocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query translocated block A and to the start of the query translocated block B (End_q_1 and St_q_2, respectively), coincide with the end of the reference translocated block A* and with the start of the reference translocated block B* (Trl_end_r_1 and Trl_st_r_2).



Figure 2: Translocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query translocated block A and to the start of the query translocated block B (End_q_1 and St_q_2, respectively), do not coincide with the end of the reference translocated block A* and with the start of the reference translocated block B* (Trl_end_r_1 and Trl_st_r_2).



A translocation with insertion and inserted gap difference is output in the query_struct.gff and ref_struct.gff files. Information about the translocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.



An example with the translocation with insertion and inserted gap entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	1005
query_1	NucDiff_v2.0	SO:0001873	501	505	.	.	.	ID=SV_1;Name=translocation-insertion_ATGCN;ins_len=5;ref_sequence_1=ref_1;blk_1_query=1-500;blk_1_ref=501-1000;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=501;blk_1_end_query=500;blk_1_end_ref=1000;ref_sequence_2=ref_2;blk_2_query=506-1005;blk_2_ref=501-1000;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=506;blk_2_st_ref=501;blk_2_end_query=1005;blk_2_end_ref=1000;color=#A0A0A0
##sequence-region	query_2	1	1020
query_2	NucDiff_v2.0	SO:0001873	501	520	.	.	.	ID=SV_2;Name=translocation-insertion_ATGCN;ins_len=20;ref_sequence_1=ref_1;blk_1_query=1-500;blk_1_ref=2001-2500;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=2001;blk_1_end_query=500;blk_1_end_ref=2500;ref_sequence_2=ref_2;blk_2_query=521-1020;blk_2_ref=2001-2500;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=521;blk_2_st_ref=2001;blk_2_end_query=1020;blk_2_end_ref=2500;color=#A0A0A0



The query_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001873 Sequence Ontology accession number corresponding to the "interchromosomal_breakpoint" SO term
col 4 End_q_1
col 5 St_q_2
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is related to ID in ref_struct.gff
col 9, Name "translocation-insertion_ATGCN"
col 9, ins_len Length(Insertion_mixed)
col 9, ref_sequence_1 Ref_seq_1
col 9, blk_1_query St_q_1 - End_q_1
col 9, blk_1_ref Trl_st_r_1 - Trl_end_r_1
col 9, blk_1_query_len Length(A)
col 9, blk_1_ref_len Length(A*)
col 9, blk_1_st_query St_q_1
col 9, blk_1_st_ref St_r_1
col 9, blk_1_end_query End_q_1
col 9, blk_1_end_ref End_r_1
col 9, ref_sequence_2 Ref_seq_2
col 9, blk_2_query St_q_2 - End_q_2
col 9, blk_2_ref Trl_st_r_2 - Trl_end_r_2
col 9, blk_2_query_len Length(B)
col 9, blk_2_ref_len Length(B*)
col 9, blk_2_st_query St_q_2
col 9, blk_2_st_ref St_r_2
col 9, blk_2_end_query End_q_2
col 9, blk_2_end_ref End_r_2



An example with the translocation with insertion and inserted gap entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	19500
ref_1	NucDiff_v2.0	SO:0001873	1000	1000	.	.	.	ID=SV_1.1;Name=translocation-insertion_ATGCN;ins_len=5;query_sequence=query_1;query_coord=500;breakpoint_query=501-505;blk_query=1-500;blk_ref=501-1000;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
ref_1	NucDiff_v2.0	SO:0001873	2500	2500	.	.	.	ID=SV_2.1;Name=translocation-insertion_ATGCN;ins_len=20;query_sequence=query_2;query_coord=500;breakpoint_query=501-520;blk_query=1-500;blk_ref=2001-2500;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
##sequence-region	ref_2	1	19500
ref_2	NucDiff_v2.0	SO:0001873	501	501	.	.	.	ID=SV_1.2;Name=translocation-insertion_ATGCN;ins_len=5;query_sequence=query_1;query_coord=506;breakpoint_query=501-505;blk_query=506-1005;blk_ref=501-1000;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
ref_2	NucDiff_v2.0	SO:0001873	2001	2001	.	.	.	ID=SV_2.2;Name=translocation-insertion_ATGCN;ins_len=20;query_sequence=query_2;query_coord=521;breakpoint_query=501-520;blk_query=521-1020;blk_ref=2001-2500;blk_query_len=500;blk_ref_len=500;color=#A0A0A0



The ref_struct.gff file contains the following information (see Figure 1 for notations):

GFF3 fields Content for Translocation block 1 Content for Translocation block 2 Notes
col 1 Ref_seq_1 Ref_seq_2
col 2 NucDiff_v2.0 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001873 SO:0001873 Sequence Ontology accession number corresponding to the "interchromosomal_breakpoint" SO term
col 4 End_r_1 St_r_2
col 5 End_r_1 St_r_2
col 6/col 7/col8 . . score/strand/phase fields are not used
col 9, ID "SV_1.1" "SV_1.2" ID in ref_struct.gff is related to ID in query_struct.gff
col 9, Name translocation-insertion_ATGCN" "translocation-insertion_ATGCN"
co9 9, ins_len Length(Insertion_mixed) Length(Insertion_mixed)
col 9, query_sequence Query_seq Query_seq
col 9, query_coord End_q_1 St_q_2 a query_coord base corresponds to the reference base from col 4
col 9, breakpoint_query End_q_1 - St_q_2 End_q_1 - St_q_2
col 9, blk_query St_q_1 - End_q_1 St_q_2 - End_q_2
col 9, blk_ref Trl_st_r_1 - Trl_end_r_1 Trl_st_r_2 - Trl_end_r_2
col 9, blk_query_len Length(A) Length(B)
col 9, blk_ref_len Length(A*) Length(B*)