-
Notifications
You must be signed in to change notification settings - Fork 10
Relocations with insertion and inserted gap
Relocation - a group of different types of intra-chromosomal structural rearrangements which occur when two regions located in different parts of the same reference sequence are placed nearby in the same query sequence
Relocation with insertion and inserted gap - a relocation where two query fragments have a stretch of bases (both ATGC's and N's) inserted between them, not mapped anywhere on the reference genome. The inserted region is treated as simple insertion and inserted gap difference.
Figure 1: Relocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query relocated block A and to the start of the query relocated block B (End_q_1 and St_q_2, respectively), coincide with the end of the reference relocated block A* and with the start of the reference relocated block B* (Rel_end_r_1 and Rel_st_r_2).
Figure 2: Relocation with insertion and inserted gap example. The reference coordinates End_r_1 and St_r_2, corresponding to the end of the query relocated block A and to the start of the query relocated block B (End_q_1 and St_q_2, respectively), do not coincide with the end of the reference relocated block A* and with the start of the reference relocated block B* (Rel_end_r_1 and Rel_st_r_2).
A relocation with insertion and inserted gap difference is output in the query_struct.gff and ref_struct.gff files. Information about the relocated blocks is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.
An example with the relocation with insertion and inserted gap entries in query_struct.gff :
##gff-version 3
##sequence-region query_4 1 1065
query_4 NucDiff_v2.0 SO:0001874 501 565 . . . ID=SV_1;Name=relocation-insertion_ATGCN;ins_len=65;ref_sequence=ref_1;blk_1_query=1-500;blk_1_ref=34501-35000;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=34501;blk_1_end_query=500;blk_1_end_ref=35000;blk_2_query=566-1065;blk_2_ref=45501-46000;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=566;blk_2_st_ref=45501;blk_2_end_query=1065;blk_2_end_ref=46000;color=#990099
##sequence-region query_7 1 1100
query_7 NucDiff_v2.0 SO:0001874 501 600 . . . ID=SV_2;Name=relocation-insertion_ATGCN;ins_len=100;ref_sequence=ref_1;blk_1_query=1-500;blk_1_ref=69001-69500;blk_1_query_len=500;blk_1_ref_len=500;blk_1_st_query=1;blk_1_st_ref=69001;blk_1_end_query=500;blk_1_end_ref=69500;blk_2_query=601-1100;blk_2_ref=80001-80500;blk_2_query_len=500;blk_2_ref_len=500;blk_2_st_query=601;blk_2_st_ref=80001;blk_2_end_query=1100;blk_2_end_ref=80500;color=#990099
The query_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content | Notes |
---|---|---|
col 1 | Query_seq | |
col 2 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0001874 | Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term |
col 4 | Ins_st | |
col 5 | Ins_end | |
col 6/col 7/col8 | . | score/strand/phase fields are not used |
col 9, ID | "SV_1" | ID in query_struct.gff is related to ID in ref_struct.gff |
col 9, Name | "relocation-insertion_ATGCN" | |
col 9, ins_len | Length(Insertion_mixed) | |
col 9, ref_sequence | Ref_seq | |
col 9, blk_1_query | St_q_1 - End_q_1 | |
col 9, blk_1_ref | Rel_st_r_1 - Rel_end_r_1 | |
col 9, blk_1_query_len | Length(A) | |
col 9, blk_1_ref_len | Length(A*) | |
col 9, blk_1_st_query | St_q_1 | |
col 9, blk_1_st_ref | St_r_1 | |
col 9, blk_1_end_query | End_q_1 | |
col 9, blk_1_end_ref | End_r_1 | |
col 9, blk_2_query | St_q_2 - End_q_2 | |
col 9, blk_2_ref | Rel_st_r_2 - Rel_end_r_2 | |
col 9, blk_2_query_len | Length(B) | |
col 9, blk_2_ref_len | Length(B*) | |
col 9, blk_2_st_query | St_q_2 | |
col 9, blk_2_st_ref | St_r_2 | |
col 9, blk_2_end_query | End_q_2 | |
col 9, blk_2_end_ref | End_r_2 |
An example with the relocation with insertion and inserted gap entries in ref_struct.gff :
##gff-version 3
ref_1 NucDiff_v2.0 SO:0001874 35000 35000 . . . ID=SV_1.1;Name=relocation-insertion_ATGCN;ins_len=65;query_sequence=query_4;query_coord=500;breakpoint_query=501-565;blk_query=1-500;blk_ref=34501-35000;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1 NucDiff_v2.0 SO:0001874 45501 45501 . . . ID=SV_1.2;Name=relocation-insertion_ATGCN;ins_len=65;query_sequence=query_4;query_coord=566;breakpoint_query=501-565;blk_query=566-1065;blk_ref=45501-46000;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1 NucDiff_v2.0 SO:0001874 69500 69500 . . . ID=SV_2.1;Name=relocation-insertion_ATGCN;ins_len=100;query_sequence=query_7;query_coord=500;breakpoint_query=501-600;blk_query=1-500;blk_ref=69001-69500;blk_query_len=500;blk_ref_len=500;color=#990099
ref_1 NucDiff_v2.0 SO:0001874 80001 80001 . . . ID=SV_2.2;Name=relocation-insertion_ATGCN;ins_len=100;query_sequence=query_7;query_coord=601;breakpoint_query=501-600;blk_query=601-1100;blk_ref=80001-80500;blk_query_len=500;blk_ref_len=500;color=#990099
The ref_struct.gff file contains the following information (see Figure 1 for notations):
GFF3 fields | Content for Relocation block 1 | Content for Relocation block 2 | Notes |
---|---|---|---|
col 1 | Ref_seq | Ref_seq | |
col 2 | NucDiff_v2.0 | NucDiff_v2.0 | name and current version of the tool |
col 3 | SO:0001874 | SO:0001874 | Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term |
col 4 | End_r_1 | St_r_2 | |
col 5 | End_r_1 | St_r_2 | |
col 6/col 7/col8 | . | . | score/strand/phase fields are not used |
col 9, ID | "SV_1.1" | "SV_1.2" | ID in ref_struct.gff is related to ID in query_struct.gff |
col 9, Name | "relocation-insertion_ATGCN" | "relocation-insertion_ATGCN" | |
col 9, ins_len | Length(Insertion_mixed) | Length(Insertion_mixed) | |
col 9, query_sequence | Query_seq | Query_seq | |
col 9, query_coord | End_q_1 | St_q_2 | a query_coord base corresponds to the reference base from col 4 |
col 9, breakpoint_query | Ins_st - Ins_end | Ins_st - Ins_end | |
col 9, blk_query | St_q_1 - End_q_1 | St_q_2 - End_q_2 | |
col 9, blk_ref | Rel_st_r_1 - Rel_end_r_1 | Rel_st_r_2 - Rel_end_r_2 | |
col 9, blk_query_len | Length(A) | Length(B) | |
col 9, blk_ref_len | Length(A*) | Length(B*) |