Confusion/Documentation with IndexedFasta.getSequence #316

IanSudbery · 2017-03-13T12:11:08Z

I have naively assuming that IndexedFasta.getSequence(contig, "-", start, end) would return the reverse complement of the sequence between contig:start..end. But this is only the case if the obscure mConverter attribute of the IndexedFasta object is set to a coordinate system that includes "both" in its name (e.g. fasta.setConverter(IndexedFasta.getConverter("zero-both-open")) ).

If the converter is not set, fetching fasta on the -ve strand actually returns the seqeuence contig:contig_length-end..contig_length-start.

This is so unobvious that the author of gff2fasta jumps though all sorts of hoops to get the correct sequence out of the fasta rather than use the coordinates converter.

This is a whole world of bugs just waiting to happen (I just ordered a whole load of primers that amplify absolutely nothing because they target completely the wrong sequences).

I recommend that the "zero-both-open" converter should be seq as default on the creation of a IndexedFasta object. But this will require making sure that all uses of IndexedFasta in the collection are compatible with this.

The text was updated successfully, but these errors were encountered:

AndreasHeger · 2017-03-13T16:24:45Z

Hi @IanSudbery , sorry about this. The reason for the default conversion is our comparative genomics past and follows exonerate output and UCSC practice for some formats such as the chain format, which uses reverse strand coordinates for the reverse strand.

Changing the default should be fine, but indeed will need some thought. I can put it on my todo list.

IanSudbery · 2017-03-13T16:28:11Z

No worries. It hasn’t been a major bug this time, but I did wonder how many other people have made this mistake without realizing it. I was only investigating it to work out why 5 of my 10 primer pairs didn’t work. One supposes that this sort of thing could get buried deep inside a pipeline and no one might find the error until after something had been published (if ever!).

IanSudbery · 2017-04-05T13:12:28Z

@AndreasHeger

Okay, there are still some weird things going on here.

GTF.py assumes that you have not set the converter and does some gymnastics with interval co-ordinates to get around that (It does these wrong BTW, the co-ordinates are off-by-one, so if you do GTF.toSequence on a transcript, the first base of the transcript is missing).

This means you can't open a fasta and use it both to convert your own co-ordinates to sequence and also use the same file in GTF.toSequence.

AndreasHeger · 2017-05-08T10:52:01Z

Hi @IanSudbery , sorry about this. The reason for the default conversion is our comparative genomics past and follows exonerate output and UCSC practice for some formats such as the chain format, which uses reverse strand coordinates for the reverse strand.

Changing the default should be fine, but indeed will need some thought. I can put it on my todo list.

AndreasHeger · 2017-08-21T10:59:02Z

Hi @IanSudbery , sorry about this. The reason for the default conversion is our comparative genomics past and follows exonerate output and UCSC practice for some formats such as the chain format, which uses reverse strand coordinates for the reverse strand.

Changing the default should be fine, but indeed will need some thought. I can put it on my todo list.

AndreasHeger · 2018-01-05T12:40:25Z

I will add some tests.

AndreasHeger self-assigned this Mar 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion/Documentation with IndexedFasta.getSequence #316

Confusion/Documentation with IndexedFasta.getSequence #316

IanSudbery commented Mar 13, 2017

AndreasHeger commented Mar 13, 2017

IanSudbery commented Mar 13, 2017 via email

IanSudbery commented Apr 5, 2017

AndreasHeger commented May 8, 2017

AndreasHeger commented Aug 21, 2017

AndreasHeger commented Jan 5, 2018

Confusion/Documentation with IndexedFasta.getSequence #316

Confusion/Documentation with IndexedFasta.getSequence #316

Comments

IanSudbery commented Mar 13, 2017

AndreasHeger commented Mar 13, 2017

IanSudbery commented Mar 13, 2017 via email

IanSudbery commented Apr 5, 2017

AndreasHeger commented May 8, 2017

AndreasHeger commented Aug 21, 2017

AndreasHeger commented Jan 5, 2018