Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insertions counted twice (in STAR) + Default MM set to <4 #20

Open
eyalmpeer opened this issue Jan 3, 2022 · 1 comment
Open

Insertions counted twice (in STAR) + Default MM set to <4 #20

eyalmpeer opened this issue Jan 3, 2022 · 1 comment

Comments

@eyalmpeer
Copy link

eyalmpeer commented Jan 3, 2022

Hello. Great program and article.
Two possible issues:

  1. STAR aligner's NM tag includes insertions (in contrast to the "nM" tag which only includes mismatches in each pair).
    When calculating the MM score XenofilteR, at least when used on STAR aligned files, possibly counts insertions twice:
    Once from the CIGAR field, and once when adding the NM tag.
    So, a read which has two insertions and no mismatches will be filtered out because the MM score will be 4.
  2. Another issue is that the default MM_threshold is documented as 5 but in practice in script it is 4.
    Because the script only includes reads with MM score < MM_threshold (and not <=), all MM_score 4 or above will be filtered out, when you may have intended that only 5 and above will be filtered out.

Thanks!
Eyal

@RoelKluin
Copy link
Collaborator

RoelKluin commented Mar 25, 2022

Thanks. I'll add a NM_id argument, defaulting to "NM", but which can be adapted for STAR. Also the mismatch value was indeed not reflected correctly in the comments and README, I've adapted those to better reflect the mismatch threshold in use, hope this covered all occurences - the actual value in use by XenofilteR wasn't changed. BTW, If one expects clipped reads it may be beneficial to disable the threshold entirely by setting it to a value higher than the read length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants