Skip to content

Unaligned sequences

kseniakh edited this page Mar 10, 2017 · 1 revision

Unaligned sequences

unaligned sequence - a query sequence that has no matches of length equal to or longer than a given number of bases with the reference genome.



An unaligned sequence difference is output in the query_struct.gff file.



An example with the unaligned sequence entries in query_struct.gff :

##gff-version 3
##sequence-region	query_5	1	65
query_5	NucDiff_v2.0	SO:0000001	1	65	.	.	.	ID=SV_4;Name=unaligned_sequence;color=#990000
##sequence-region	query_6	1	85
query_6	NucDiff_v2.0	SO:0000001	1	85	.	.	.	ID=SV_5;Name=unaligned_sequence;color=#990000



The query_struct.gff file contains the following information:

GFF3 fields Content Notes
col 1 Query_seq
col 2 NucDiff_v2.0 name and current version of the tool
col 3 SO:0000001 Sequence Ontology accession number corresponding to the "region" SO term
col 4 Query_seq start
col 5 Query_seq end
col 6/col 7/col8 . score/strand/phase fields are not used
col 9, ID "SV_1" ID in query_struct.gff is equal to ID in ref_struct.gff
col 9, Name "unaligned_sequence"