-
Notifications
You must be signed in to change notification settings - Fork 10
Translocations with insertion and inserted gap
Translocation - a group of different types of inter-chromosomal structural rearrangements which occur when two regions located on different reference sequences are placed nearby in the same query sequence.
Translocation with insertion and inserted gap - a translocation where two query fragments have a stretch of bases (both ATGC's and N's) inserted between them, not mapped anywhere on the reference genome. The inserted region is treated as simple insertion and inserted gap differences.
Figure 1: Translocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query translocated block A and to the start of the query translocated block B (End_q_1 and St_q_2, respectively), coincide with the end of the reference translocated block A* and with the start of the reference translocated block B* (Trl_end_r_1 and Trl_st_r_2).
Figure 2: Translocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query translocated block A and to the start of the query translocated block B (End_q_1 and St_q_2, respectively), do not coincide with the end of the reference translocated block A* and with the start of the reference translocated block B* (Trl_end_r_1 and Trl_st_r_2).
A translocation with insertion and inserted gap difference is output in the query_struct.gff and ref_struct.gff files. Information about the translocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.
An example with the translocation with insertion and inserted gap entries in query_struct.gff :
##gff-version 3
##sequence-region query_1 1 1005
query_1 NucDiff_v2.0 SO:0001873 501 505 . . . ID=SV_1;Name=translocation-insertion_ATGCN;ins_len=5;ref_sequence_1=ref_1;blk_1_query=1-500;blk_1_ref=501-1000;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=501;blk_1_end_query=500;blk_1_end_ref=1000;ref_sequence_2=ref_2;blk_2_query=506-1005;blk_2_ref=501-1000;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=506;blk_2_st_ref=501;blk_2_end_query=1005;blk_2_end_ref=1000;color=#A0A0A0
##sequence-region query_2 1 1020
query_2 NucDiff_v2.0 SO:0001873 501 520 . . . ID=SV_2;Name=translocation-insertion_ATGCN;ins_len=20;ref_sequence_1=ref_1;blk_1_query=1-500;blk_1_ref=2001-2500;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=2001;blk_1_end_query=500;blk_1_end_ref=2500;ref_sequence_2=ref_2;blk_2_query=521-1020;blk_2_ref=2001-2500;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=521;blk_2_st_ref=2001;blk_2_end_query=1020;blk_2_end_ref=2500;color=#A0A0A0
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0001873 | Sequence Ontology accession number corresponding to the "interchromosomal_breakpoint" SO term |
col 4 | End_q_1 | |
col 5 | St_q_2 | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is related to ID in ref_struct.gff |
col 9, Name | "translocation-insertion_ATGCN" | |
col 9, ins_len | Length(Insertion_mixed) | |
col 9, ref_sequence_1 | Ref_seq_1 | |
col 9, blk_1_query | St_q_1 - End_q_1 | |
col 9, blk_1_ref | Trl_st_r_1 - Trl_end_r_1 | |
col 9, blk_1_query_len | Length(A) | |
col 9, blk_1_ref_len | Length(A*) | |
col 9, blk_1_st_query | St_q_1 | |
col 9, blk_1_st_ref | St_r_1 | |
col 9, blk_1_end_query | End_q_1 | |
col 9, blk_1_end_ref | End_r_1 | |
col 9, ref_sequence_2 | Ref_seq_2 | |
col 9, blk_2_query | St_q_2 - End_q_2 | |
col 9, blk_2_ref | Trl_st_r_2 - Trl_end_r_2 | |
col 9, blk_2_query_len | Length(B) | |
col 9, blk_2_ref_len | Length(B*) | |
col 9, blk_2_st_query | St_q_2 | |
col 9, blk_2_st_ref | St_r_2 | |
col 9, blk_2_end_query | End_q_2 | |
col 9, blk_2_end_ref | End_r_2 |
An example with the translocation with insertion and inserted gap entries in ref_struct.gff :
##gff-version 3
##sequence-region ref_1 1 19500
ref_1 NucDiff_v2.0 SO:0001873 1000 1000 . . . ID=SV_1.1;Name=translocation-insertion_ATGCN;ins_len=5;query_sequence=query_1;query_coord=500;breakpoint_query=501-505;blk_query=1-500;blk_ref=501-1000;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
ref_1 NucDiff_v2.0 SO:0001873 2500 2500 . . . ID=SV_2.1;Name=translocation-insertion_ATGCN;ins_len=20;query_sequence=query_2;query_coord=500;breakpoint_query=501-520;blk_query=1-500;blk_ref=2001-2500;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
##sequence-region ref_2 1 19500
ref_2 NucDiff_v2.0 SO:0001873 501 501 . . . ID=SV_1.2;Name=translocation-insertion_ATGCN;ins_len=5;query_sequence=query_1;query_coord=506;breakpoint_query=501-505;blk_query=506-1005;blk_ref=501-1000;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
ref_2 NucDiff_v2.0 SO:0001873 2001 2001 . . . ID=SV_2.2;Name=translocation-insertion_ATGCN;ins_len=20;query_sequence=query_2;query_coord=521;breakpoint_query=501-520;blk_query=521-1020;blk_ref=2001-2500;blk_query_len=500;blk_ref_len=500;color=#A0A0A0
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content for Translocation block 1 | Content for Translocation block 2 | Notes |
---|---|---|---|
col 1 | Ref_seq_1 | Ref_seq_2 | |
col 2 | NucDiff_v2.0 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0001873 | SO:0001873 | Sequence Ontology accession number corresponding to the "interchromosomal_breakpoint" SO term |
col 4 | End_r_1 | St_r_2 | |
col 5 | End_r_1 | St_r_2 | |
col 6/col 7/col8 | . | . | score/strand/phase fields are not used |
col 9, ID | "SV_1.1" | "SV_1.2" | ID in ref_struct.gff is related to ID in query_struct.gff |
col 9, Name | translocation-insertion_ATGCN" | "translocation-insertion_ATGCN" | |
co9 9, ins_len | Length(Insertion_mixed) | Length(Insertion_mixed) | |
col 9, query_sequence | Query_seq | Query_seq | |
col 9, query_coord | End_q_1 | St_q_2 | a query_coord base corresponds to the reference base from col 4 |
col 9, breakpoint_query | End_q_1 - St_q_2 | End_q_1 - St_q_2 | |
col 9, blk_query | St_q_1 - End_q_1 | St_q_2 - End_q_2 | |
col 9, blk_ref | Trl_st_r_1 - Trl_end_r_1 | Trl_st_r_2 - Trl_end_r_2 | |
col 9, blk_query_len | Length(A) | Length(B) | |
col 9, blk_ref_len | Length(A*) | Length(B*) |