Next getitem implementation for circular sequences #191

manulera · 2024-01-29T12:07:54Z

A followup to #161

The problem

Currently, for a circular sequence seq, seq[0:0] or seq[1:1] return the linearised version of that sequence. That makes sense, but is problematic for certain cases. Let's imagine we create a function that wants to get the first x nucleotides after a given base, the function would return an unexpected result for x == 0: seq[0:0+0] would give the entire sequence.

Knowing that this is the behaviour, it is then possible to add an exception to the function for x == 0, but it's still not great.

Possible alternative

We could support slicing of circular molecules with indexes bigger than the length of the sequence, for instance, what now is represented as seq[7:2] for a sequence of length 10, it could be represented as seq[7:12]. This is equivalent to the behaviour of a circular string, and potentially would be allowed interesting functionality, such as getting more than a full circle. In the previous example of a sequence of length 10, seq[1:15] could return more than one loop.

The problem

When we discussed the other day, we said that a lot of pydna functions use module operations to not have expressions like seq[7:12], and have seq[7:2] instead, so this change may break some code, even if both syntaxes are still supported. The tests most likely would pick up the errors introduced, but is a breaking change for other users, so it should be postponed until a major release.

The text was updated successfully, but these errors were encountered:

JamesBagley · 2024-12-19T21:45:40Z

I think you're right in that having a special exception for 0:0 would be complicated, and even further I think maybe similar situations could come up for other indices anyway? e.g. maybe I want [7:7+x]. Maybe a way to enable this would just be to make a second accessor method and leave the slicing behavior as it is.

sequence[0:0] can still return a linearized version of the plasmid, but a second method e.g. sequence.get(wrap=False)[0:0], could return an empty Dseq object (or raise an error?)

manulera · 2024-12-20T07:13:50Z

Hi @JamesBagley I had pretty much abandoned this issue, but you are right that an alternative method to slice sequence is probably the way to go. Not sure when this would happen thought!

BjornFJohansson · 2024-12-29T07:09:48Z

Hi all, I am working on a rewrite of the Dseq class, Ill soon have something that passes the Dseq and Dseqrecord tests. The new Dseq class is much closer to the Bio.Seq class and I think that the slicing could and should be reworked in order to reduce the surprise of the user.

manulera assigned BjornFJohansson and manulera Jan 29, 2024

manulera added the enhancement label Jan 29, 2024

manulera mentioned this issue Feb 5, 2024

Edge case in Anneal.products #197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next getitem implementation for circular sequences #191

Next getitem implementation for circular sequences #191

manulera commented Jan 29, 2024

JamesBagley commented Dec 19, 2024 •

edited

Loading

manulera commented Dec 20, 2024

BjornFJohansson commented Dec 29, 2024

Next getitem implementation for circular sequences #191

Next getitem implementation for circular sequences #191

Comments

manulera commented Jan 29, 2024

The problem

Possible alternative

The problem

JamesBagley commented Dec 19, 2024 • edited Loading

manulera commented Dec 20, 2024

BjornFJohansson commented Dec 29, 2024

JamesBagley commented Dec 19, 2024 •

edited

Loading