Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut sites display regions outside recognition sequence as if they were part of recognition sequence #270

Open
jpsorensen-asimov opened this issue Sep 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jpsorensen-asimov
Copy link

Describe the bug

Cut sites show three important pieces of information in the seq viz sequence - the two cut locations and the sequence that is recognized by the enzyme.

In cases where the cut site extends outside the recognition sequence, the enzyme definitions in SeqViz "pad out" the recognition sequence with Ns. For example, the definition of the enzyme BbsI is:

  bbsi: {
    fcut: 8,
    name: "BbsI",
    rcut: 12,
    rseq: "GAAGACNNNNNN",
  },

Note that the part of the sequence that's actually recognized by the enzyme is "GAAGAC". However, because of the Ns padding the rseq in the enzyme definition, the entire range is shown inside the "recognition box" when this is rendered in SeqViz, suggesting that the characters after the "C" are part of the recognition site:

Screenshot 2024-09-19 at 11 26 02 AM

We've tried custom definitions of these enzymes which exclude the N's but it seems those are included intentionally - when the cut sites fall off the edge of a sequence, it causes the component to crash.

Expected behavior

I'd expect that the "recognition rectangle" is drawn around only the part of the sequence that is recognized by the enzyme - in this case, only around GAAGAC, not GAAGACACAGGG.

Screenshot 2024-09-19 at 12 50 57 PM

I'd expect this to be the case for any enzyme that's currently defined with leading or trailing N's. I know there are also some enzymes with non-N wildcards, or with Ns (or other degenerate nucleotides) in the middle of the recognition sequence.

For example:

  banii: {
    fcut: 5,
    name: "BanII",
    rcut: 1,
    rseq: "GRGCYC",
  },
  bgli: {
    fcut: 7,
    name: "BglI",
    rcut: 4,
    rseq: "GCCNNNNNGGC",
  },

I'd propose that it's reasonable not to attempt to handle these cases for the moment, and to include any degenerate nucleotides (other than N), or any interior Ns, as part of the displayed recognition sequence.

Screenshots

Inline

Your environment:

Observing the same behavior across multiple browsers & OSes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant