Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next getitem implementation for circular sequences #191

Open
manulera opened this issue Jan 29, 2024 · 3 comments
Open

Next getitem implementation for circular sequences #191

manulera opened this issue Jan 29, 2024 · 3 comments
Assignees

Comments

@manulera
Copy link
Collaborator

A followup to #161

The problem

Currently, for a circular sequence seq, seq[0:0] or seq[1:1] return the linearised version of that sequence. That makes sense, but is problematic for certain cases. Let's imagine we create a function that wants to get the first x nucleotides after a given base, the function would return an unexpected result for x == 0: seq[0:0+0] would give the entire sequence.

Knowing that this is the behaviour, it is then possible to add an exception to the function for x == 0, but it's still not great.

Possible alternative

We could support slicing of circular molecules with indexes bigger than the length of the sequence, for instance, what now is represented as seq[7:2] for a sequence of length 10, it could be represented as seq[7:12]. This is equivalent to the behaviour of a circular string, and potentially would be allowed interesting functionality, such as getting more than a full circle. In the previous example of a sequence of length 10, seq[1:15] could return more than one loop.

The problem

When we discussed the other day, we said that a lot of pydna functions use module operations to not have expressions like seq[7:12], and have seq[7:2] instead, so this change may break some code, even if both syntaxes are still supported. The tests most likely would pick up the errors introduced, but is a breaking change for other users, so it should be postponed until a major release.

@JamesBagley
Copy link

JamesBagley commented Dec 19, 2024

I think you're right in that having a special exception for 0:0 would be complicated, and even further I think maybe similar situations could come up for other indices anyway? e.g. maybe I want [7:7+x]. Maybe a way to enable this would just be to make a second accessor method and leave the slicing behavior as it is.

sequence[0:0] can still return a linearized version of the plasmid, but a second method e.g. sequence.get(wrap=False)[0:0], could return an empty Dseq object (or raise an error?)

@manulera
Copy link
Collaborator Author

Hi @JamesBagley I had pretty much abandoned this issue, but you are right that an alternative method to slice sequence is probably the way to go. Not sure when this would happen thought!

@BjornFJohansson
Copy link
Collaborator

Hi all, I am working on a rewrite of the Dseq class, Ill soon have something that passes the Dseq and Dseqrecord tests. The new Dseq class is much closer to the Bio.Seq class and I think that the slicing could and should be reworked in order to reduce the surprise of the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants