Skip to content

Circular genome start

kseniakh edited this page Mar 10, 2017 · 1 revision

Circular genome start

Circular genome start - is a special case that may appeared in circular genomes when the start of the query sequence does not coincide with the start of the reference sequence and cause an alignment fragmentation. It is not treated as a difference, although it is included in the output.



Figure 1: Circular genome start breakpoint example.



A circular genome start breakpoint is output in the query_struct.gff and ref_struct.gff files. Information about the blocks before and after breakpoint is also output in the ref_blocks.gff and query_blocks.gff files. The descriptions and examples of the last two files can be found at their wiki pages.



An example with the circular genome start breakpoint entries in query_struct.gff :

##gff-version 3
##sequence-region	query_1	1	1920
query_1	NucDiff_v2.0	SO:0001874	961	961	.	.	.	ID=SV_1;Name=circular_genome_start;ref_sequence=ref_1;blk_1_query=1-960;blk_1_ref=11941-12900;blk_1_query_len=960;blk_1_ref_len=960;blk_1_st_query=1;blk_1_st_ref=11941;blk_1_end_query=960;blk_1_end_ref=12900;blk_2_query=961-1920;blk_2_ref=1-960;blk_2_query_len=960;blk_2_ref_len=960;blk_2_st_query=961;blk_2_st_ref=1;blk_2_end_query=1920;blk_2_end_ref=960;color=#990099



The query_struct.gff file contains the following information (see Figure 1a ) for notations):

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 End_q_1
col 5 St_q_2
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is related to ID in ref_struct.gff
col 9, Name "circular_genome_start"
col 9, ref_sequence Ref_seq
col 9, blk_1_query St_q_1 - End_q_1
col 9, blk_1_ref St_r_1 - Ref_end
col 9, blk_1_query_len Length(B)
col 9, blk_1_ref_len Length(B*)
col 9, blk_1_st_query St_q_1
col 9, blk_1_st_ref St_r_1
col 9, blk_1_end_query End_q_1
col 9, blk_1_end_ref Ref_end
col 9, blk_2_query St_q_2 - End_q_2
col 9, blk_2_ref Ref_st - End_r_2
col 9, blk_2_query_len Length(A)
col 9, blk_2_ref_len Length(A*)
col 9, blk_2_st_query St_q_2
col 9, blk_2_st_ref Ref_st
col 9, blk_2_end_query End_q_2
col 9, blk_2_end_ref End_r_2



An example with the circular genome start breakpoint entries in ref_struct.gff :

##gff-version 3
##sequence-region	ref_1	1	12900
ref_1	NucDiff_v2.0	SO:0001874	1	1	.	.	.	ID=SV_1.2;Name=circular_genome_start;query_sequence=query_1;query_coord=961;breakpoint_query=961-961;blk_query=961-1920;blk_ref=1-960;blk_query_len=960;blk_ref_len=960;color=#990099
ref_1	NucDiff_v2.0	SO:0001874	12900	12900	.	.	.	ID=SV_1.1;Name=circular_genome_start;query_sequence=query_1;query_coord=960;breakpoint_query=961-961;blk_query=1-960;blk_ref=11941-12900;blk_query_len=960;blk_ref_len=960;color=#990099



The ref_struct.gff file contains the following information (see Figure 1a for notations):

GFF3 fields Content for Relocation block 1 Content for Relocation block 2 Notes
col 1 Ref_seq Ref_seq
col 2 NucDiff_v2.0 NucDiff_v2.0 name and current version of the tool
col 3 SO:0001874 SO:0001874 Sequence Ontology accession number corresponding to the "intrachromosomal_breakpoint" SO term
col 4 Ref_end Ref_st
col 5 Ref_end Ref_st
col 6/col 7/col8 . . score/strand/phase fields are not used
col 9, ID "SV_1.1" "SV_1.2" ID in ref_struct.gff is related to ID in query_struct.gff
col 9, Name "circular_genome_start" "circular_genome_start"
col 9, query_sequence Query_seq Query_seq
col 9, query_coord End_q_1 St_q_2 a query_coord base corresponds to the reference base from col 4
col 9, breakpoint_query End_q_1 - St_q_2 End_q_1 - St_q_2
col 9, blk_query St_q_1 - End_q_1 St_q_2 - End_q_2
col 9, blk_ref St_r_1 - Ref_end Ref_st - End_r_2
col 9, blk_query_len Length(B) Length(A)
col 9, blk_ref_len Length(B*) Length(A*)