-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UMI support #72
Add UMI support #72
Changes from 10 commits
9308f2c
a3a7cdf
cc7fd2c
e01aa3c
92483fa
7713574
51523bd
9f06527
17a9667
a52fd06
4b027d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
devcontainer.json |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,29 @@ | ||
# More adaptors can be added manually, following the format below. | ||
# The position of an index within an adaptor is represented by the delimiter "-NNN-". | ||
# The position of an index within an adaptor is represented by the delimiter "<N>". | ||
# The position and length of the UMI within an adaptor is represented by the delimiter "<U#>" where # is the length of the UMI. | ||
# The indexes themselves are represented in the sample sheet. | ||
|
||
# Ilumina unique dual indexes, see https://web.archive.org/web/20231129095351/https://support-docs.illumina.com/SHARE/AdapterSequences/Content/SHARE/AdapterSeq/Illumina_DNA/IlluminaUDIndexes.htm | ||
illumina_ud: | ||
i5: AATGATACGGCGACCACCGAGATCTACAC-NNN-TCGTCGGCAGCGTC | ||
i7: CAAGCAGAAGACGGCATACGAGAT-NNN-GTCTCGTGGGCTCGG | ||
i5: AATGATACGGCGACCACCGAGATCTACAC<N>TCGTCGGCAGCGTC | ||
i7: CAAGCAGAAGACGGCATACGAGAT<N>GTCTCGTGGGCTCGG | ||
|
||
truseq: | ||
i5: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT | ||
i7: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-NNN-ATCTCGTATGCCGTCTTCTGCTTG | ||
i7: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC<N>ATCTCGTATGCCGTCTTCTGCTTG | ||
|
||
truseq_dual: | ||
i5: AATGATACGGCGACCACCGAGATCTACAC-NNN-ACACTCTTTCCCTACACGACGCTCTTCCGATCT | ||
i7: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-NNN-ATCTCGTATGCCGTCTTCTGCTTG | ||
i5: AATGATACGGCGACCACCGAGATCTACAC<N>ACACTCTTTCCCTACACGACGCTCTTCCGATCT | ||
i7: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC<N>ATCTCGTATGCCGTCTTCTGCTTG | ||
|
||
truseq_umi: | ||
i5: AATGATACGGCGACCACCGAGATCTACAC<N>ACACTCTTTCCCTACACGACGCTCTTCCGATCT | ||
i7: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC<N><U9>ATCTCGTATGCCGTCTTCTGCTTG | ||
|
||
nextera_legacy: | ||
i5: AATGATACGGCGACCACCGAGATCTACACGCCTCCCTCGCGCCATCAG | ||
i7: CAAGCAGAAGACGGCATACGAGAT-NNN-CGGTCTGCCTTGCCAGCCCGCTCAG | ||
i7: CAAGCAGAAGACGGCATACGAGAT<N>CGGTCTGCCTTGCCAGCCCGCTCAG | ||
|
||
nextera_dual: | ||
i5: AATGATACGGCGACCACCGAGATCTACAC-NNN-GTCTCGTGGGCTCGG | ||
i7: CAAGCAGAAGACGGCATACGAGAT-NNN-ATCTCGTATGCCGTCTTCTGCTTG | ||
i5: AATGATACGGCGACCACCGAGATCTACAC<N>GTCTCGTGGGCTCGG | ||
i7: CAAGCAGAAGACGGCATACGAGAT<N>ATCTCGTATGCCGTCTTCTGCTTG |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,7 @@ | |
entry_points={ | ||
"console_scripts": [ | ||
"anglerfish=anglerfish.anglerfish:anglerfish", | ||
"anglerfish-explore=anglerfish.explore.cli:main", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Although I don't know what the effect of this is. Does it make the explore command executable or add it to the path? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With conda, and I guess with most package managers, both! So now this will work:
I'm thinking maybe for a 1.0 release we can unify these commands. I'm open to having them as subcommands, eg. |
||
], | ||
}, | ||
zip_safe=False, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really work? If there are substitutions/deletions/insertions in the alignment both the reference and query nucleotide will be in the cs string right?
For example:
cg:Z:6M2D21M cs:Z::1*at:2*ac:1-ac:21
Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works if you add the correct number of N's into the template sequence. So we need to know that the index region is e.g., exactly 10 nt index + 9 nt UMI long. That way we can read the sequence directly from the CS string. Unrelated example:
Insertions and deletions will of course affect this perfect picture, but I don't think too terribly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I think we could use the fact that we're looking specifically for cases where there is an
n
to the left inside the*na*
section. But we can leave that for later