Check behaviour of spaces when creating `Dseq` objects from a text representation #322

dgruano · 2024-10-30T20:30:42Z

Related to #321

Also testing single-stranded restriction products, I tried to create them from a representation:

Dseq.from_representation("""\
              
    CCGAATTAAT
    """)

I find it funky that the overhang of the sequence depends on the number of spaces that are present on the watson strand, and the length too when there is more (or less) spaces beyond the crick strand. This is a problem for testing. Here are some examples that may be important to consider:

No spaces in first line -> Sequence considered as `watson`

Dseq.from_representation("""\

    CCGAATTAAT
    """).__dict__

{'ovhg': 0,
 'watson': CCGAATTAAT,
 'crick': ,
 'circular': False,
 'length': 10,
 'pos': 0}

Only one space (indentation does not match) -> negative overhang and higher length

Dseq.from_representation("""\
 
    CCGAATTAAT
    """).__dict__

{'ovhg': -3,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 13,
 'pos': 0}

Four spaces (correct indentation) -> Seems the accurate way to type it, but ovhg = 0

Dseq.from_representation("""\
    
    CCGAATTAAT
    """).__dict__

{'ovhg': 0,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 10,
 'pos': 0}

Sequence full of spaces -> Accurate way to type it so it matches a 10-bases long single-stranded restriction product

Dseq.from_representation("""\
              
    CCGAATTAAT
    """).__dict__

{'ovhg': 10,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 10,
 'pos': 0}

More spaces than indent + crick length -> The length is higher than expected, overhang matches length

Dseq.from_representation("""\
                  
    CCGAATTAAT
    """).__dict__

{'ovhg': 14,
 'watson': ,
 'crick': TAATTAAGCC,
 'circular': False,
 'length': 14,
 'pos': 0}

How would you go about fixing this? I can give it a look but don't want to break anything!

The text was updated successfully, but these errors were encountered:

BjornFJohansson · 2024-11-12T09:30:43Z

Hi, I am actually working on a related thing right now. I have some ideas for expanding the representations for dsDNA.

I made the from_representation method in order to go from a figure similar to the ones made from the Dseq.__repr__() back to a Dseq object.

This method leaves it up to the user to correctly format the sequence. This format is imho not very good for storage.

We could add errors and warnings to the method to prevent malformed input.

I am curious what your use case might be?

dgruano · 2024-11-12T09:40:43Z

Both this issue and #321 happened when writing tests for the USER and Nickase enzymes. For visualization, I find it handy to create Dseq objects of the "restriction" products. However, some of these products end up being single-stranded, so I would need a way to create this "single-stranded product of a single-strand cut of a double-stranded Dseq".

I don't know if this would be a widespread usecase, but it was intuitive for me. And the alternative I could think of (#321) also gave some errors.

BjornFJohansson mentioned this issue Nov 12, 2024

Creating Dseq with crick strand but without watson nor ovhg returns math domain error #321

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check behaviour of spaces when creating `Dseq` objects from a text representation #322

Check behaviour of spaces when creating `Dseq` objects from a text representation #322

dgruano commented Oct 30, 2024

BjornFJohansson commented Nov 12, 2024 •

edited

Loading

dgruano commented Nov 12, 2024

Check behaviour of spaces when creating Dseq objects from a text representation #322

Check behaviour of spaces when creating Dseq objects from a text representation #322

Comments

dgruano commented Oct 30, 2024

No spaces in first line -> Sequence considered as watson

Only one space (indentation does not match) -> negative overhang and higher length

Four spaces (correct indentation) -> Seems the accurate way to type it, but ovhg = 0

Sequence full of spaces -> Accurate way to type it so it matches a 10-bases long single-stranded restriction product

More spaces than indent + crick length -> The length is higher than expected, overhang matches length

BjornFJohansson commented Nov 12, 2024 • edited Loading

dgruano commented Nov 12, 2024

Check behaviour of spaces when creating `Dseq` objects from a text representation #322

Check behaviour of spaces when creating `Dseq` objects from a text representation #322

No spaces in first line -> Sequence considered as `watson`

BjornFJohansson commented Nov 12, 2024 •

edited

Loading