mygfa
doesn't parse optional CIGAR strings
#124
Labels
triage required
Ideas that need further discussion, whether asynchronous or synchronous, before we can take action
There are a number of
gfa
features thatmygfa
doesn't account for yet, so I'm not sure how high of a priority this fix should be, but this issue is preventingmygfa
from parsingodgi
's generatedgfa
files.Note that this is the gfa specification that I'm using as a reference: http://gfa-spec.github.io/GFA-spec/GFA1.html
Essentially,
mygfa
doesn't have the functionality to parse certain CIGAR strings, specifying the alignment of two segments (?). This particular issue shows up wherever an "alignment" string appears, for example in links:L 1 + 2 + 0M
and paths:
P path1 1+,2+,2+ 0M, 0M
The last column of these lines represents a CIGAR string (or list of CIGAR strings). My understanding is that in either case, this string can be replaced with
*
:Which indicates that the overlap is unspecified. According to the docs, if unspecified, "the CIGAR strings are determined by fetching the CIGAR string from the corresponding link records, or by performing a pairwise overlap alignment of the two sequences." I'm not yet sure what the latter is or how difficult it would be to accomplish, but this suggests that in order to support this, we may want to pre-process gfa files and sort the lines by type so that we parse Path lines after Link lines.
@anshumanmohan , based on your knowledge of
overlap
, does this sound doable? How much of a priority should this be?The text was updated successfully, but these errors were encountered: