-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UMI support #72
Add UMI support #72
Conversation
""" | ||
nt = re.compile("\*n([atcg])") | ||
nts = "".join(re.findall(nt, cs_string)) | ||
|
||
if umi_before > 0: | ||
nts = nts[umi_before:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really work? If there are substitutions/deletions/insertions in the alignment both the reference and query nucleotide will be in the cs string right?
For example:
cg:Z:6M2D21M cs:Z::1*at:2*ac:1-ac:21
Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works if you add the correct number of N's into the template sequence. So we need to know that the index region is e.g., exactly 10 nt index + 9 nt UMI long. That way we can read the sequence directly from the CS string. Unrelated example:
cs:Z::33*nt*nc*nc*ng*ng*na*ng*na:9
Insertions and deletions will of course affect this perfect picture, but I don't think too terribly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I think we could use the fact that we're looking specifically for cases where there is an n
to the left inside the *na*
section. But we can leave that for later
@@ -38,6 +38,7 @@ | |||
entry_points={ | |||
"console_scripts": [ | |||
"anglerfish=anglerfish.anglerfish:anglerfish", | |||
"anglerfish-explore=anglerfish.explore.cli:main", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Although I don't know what the effect of this is. Does it make the explore command executable or add it to the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With conda, and I guess with most package managers, both! So now this will work:
% anglerfish-explore --help
Usage: anglerfish-explore [OPTIONS]
Options:
-f, --fastq TEXT Fastq file to align [required]
...
I'm thinking maybe for a 1.0 release we can unify these commands. I'm open to having them as subcommands, eg. anglerfish run
and anglerfish explore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This breaks the explore command a tiny bit for UMI:s. I've located the issue and I should be able to create a fix for it. Separate PR I guess. Your call if it should be included in this release or not. If I can fix it quickly I think it would make sense to include it.
I was thinking of doing a release on Friday, if you can't get a fix out in time it's not a huge deal. Also I think this code, the adaptor handling parts, could do with simplifying or refactoring. For instance the separate I7 and I5 handling could be DRY'd up. But that's for another time. |
anglerfish-explore
command as entrypoint