Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check behaviour of spaces when creating Dseq objects from a text representation #322

Open
dgruano opened this issue Oct 30, 2024 · 2 comments

Comments

@dgruano
Copy link
Contributor

dgruano commented Oct 30, 2024

Related to #321

Also testing single-stranded restriction products, I tried to create them from a representation:

Dseq.from_representation("""\
              
    CCGAATTAAT
    """)

I find it funky that the overhang of the sequence depends on the number of spaces that are present on the watson strand, and the length too when there is more (or less) spaces beyond the crick strand. This is a problem for testing. Here are some examples that may be important to consider:

No spaces in first line -> Sequence considered as watson

Dseq.from_representation("""\

    CCGAATTAAT
    """).__dict__
{'ovhg': 0,
 'watson': CCGAATTAAT,
 'crick': ,
 'circular': False,
 'length': 10,
 'pos': 0}

Only one space (indentation does not match) -> negative overhang and higher length

Dseq.from_representation("""\
 
    CCGAATTAAT
    """).__dict__
{'ovhg': -3,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 13,
 'pos': 0}

Four spaces (correct indentation) -> Seems the accurate way to type it, but ovhg = 0

Dseq.from_representation("""\
    
    CCGAATTAAT
    """).__dict__
{'ovhg': 0,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 10,
 'pos': 0}

Sequence full of spaces -> Accurate way to type it so it matches a 10-bases long single-stranded restriction product

Dseq.from_representation("""\
              
    CCGAATTAAT
    """).__dict__
{'ovhg': 10,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 10,
 'pos': 0}

More spaces than indent + crick length -> The length is higher than expected, overhang matches length

Dseq.from_representation("""\
                  
    CCGAATTAAT
    """).__dict__
{'ovhg': 14,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 14,
 'pos': 0}

How would you go about fixing this? I can give it a look but don't want to break anything!

@BjornFJohansson
Copy link
Collaborator

BjornFJohansson commented Nov 12, 2024

Hi, I am actually working on a related thing right now. I have some ideas for expanding the representations for dsDNA.

I made the from_representation method in order to go from a figure similar to the ones made from the Dseq.__repr__() back to a Dseq object.

This method leaves it up to the user to correctly format the sequence. This format is imho not very good for storage.

We could add errors and warnings to the method to prevent malformed input.

I am curious what your use case might be?

@dgruano
Copy link
Contributor Author

dgruano commented Nov 12, 2024

Both this issue and #321 happened when writing tests for the USER and Nickase enzymes. For visualization, I find it handy to create Dseq objects of the "restriction" products. However, some of these products end up being single-stranded, so I would need a way to create this "single-stranded product of a single-strand cut of a double-stranded Dseq".

I don't know if this would be a widespread usecase, but it was intuitive for me. And the alternative I could think of (#321) also gave some errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants