Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Residual or excessive barcode sequences in Dorado Demux #420

Closed
starsyi opened this issue Oct 17, 2023 · 3 comments
Closed

Residual or excessive barcode sequences in Dorado Demux #420

starsyi opened this issue Oct 17, 2023 · 3 comments

Comments

@starsyi
Copy link

starsyi commented Oct 17, 2023

I don't know if the author is aware of the issue with incomplete removal of sequence adaptors and barcodes.
Specifically, when using tools such as Dorado Demux, Porechop, and Guppy_barcoder to trim adaptors and barcodes, they are unable to completely remove them. There are often residual sequences of 1-15bp remaining at the 5' end, and similar situations occur with residual sequences at the 3' end. Is it possible to optimize and solve this issue?
eq:
Sequencing raw data:

ATGTTATGTGGCTGCCTTCGTTCAGTTACGTATTGCTA ^^^ AGGTTAA ^^^ CCAAACCCAACAACCTAGATAGGC ^^^ CAGCACCT ^^^ CTGGACCTGAGGCCTCTGGAGGCTACTGATGATGCCTGCTGTGAACGCAGACACTGGTGTGATGCGATGCCTGCGCCTGCAGCGGCAGTGCCCTGGGCACGGTTTTGAGCTTGTACCCAGCGCTGCTTTTGCCTTGCTCTGTGACCCCAGGCAAGCTGCCTCACCTCTCTGGGCCAGTTTCCCCATCGTACAGTGGTGCTGCACACCCTGGCCCTGGCCCCGAGGTGGCTGGGAGGTGGCTCCTCAAACAGCCGCTGTCTCATCAGTGCCCGGTGCTGGGTCAGGGATCGACTGAGGCTCTGAGCTAACTGGGAAACACAGTGGCCT ^^^ AGGTGCTG ^^^ GCCTATCTAGGTTGTTGGGTTTGGTGAGCCTTCCTGAATGGTT

Among them, the sequences of adaptor, barcode, and barcode on both sides are separated by ^^^.

trimmed sequence:

GCACCTCTGGACCTGAGGCCTCTGGAGGCTACTGATGATGCCTGCTGTGAACGCAGACACTGGTGTGATGCGATGCCTGCGCCTGCAGCGGCAGTGCCCTGGGCACGGTTTTGAGCTTGTACCCAGCGCTGCTTTTGCCTTGCTCTGTGACCCCAGGCAAGCTGCCTCACCTCTCTGGGCCAGTTTCCCCATCGTACAGTGGTGCTGCACACCCTGGCCCTGGCCCCGAGGTGGCTGGGAGGTGGCTCCTCAAACAGCCGCTGTCTCATCAGTGCCCGGTGCTGGGTCAGGGATCGACTGAGGCTCTGAGCTAACTGGGAAACACAGTGGCC

The trimmed sequence contains a portion of the barcode flank sequence GCACCT. And one base 'T' was removed at the 3' end.

The actual insertion sequence should be as follows:

CTGGACCTGAGGCCTCTGGAGGCTACTGATGATGCCTGCTGTGAACGCAGACACTGGTGTGATGCGATGCCTGCGCCTGCAGCGGCAGTGCCCTGGGCACGGTTTTGAGCTTGTACCCAGCGCTGCTTTTGCCTTGCTCTGTGACCCCAGGCAAGCTGCCTCACCTCTCTGGGCCAGTTTCCCCATCGTACAGTGGTGCTGCACACCCTGGCCCTGGCCCCGAGGTGGCTGGGAGGTGGCTCCTCAAACAGCCGCTGTCTCATCAGTGCCCGGTGCTGGGTCAGGGATCGACTGAGGCTCTGAGCTAACTGGGAAACACAGTGGCCT
@starsyi
Copy link
Author

starsyi commented Oct 17, 2023

I believe in obtaining the accurate insertion sequence, which are crucial for genome assembly, sequence alignment, and the study of sequence features (especially cfDNA, scRNA, etc.). I hope to have better optimization and solution approaches.

@tijyojwad
Copy link
Collaborator

Hi @starsyi - thank you for the feedback and for the detailed analysis. I completely agree that we should improve the accuracy of our trimming. We have an ongoing effort to add adapter trimming as well, so we'll investigate ways to enhance the accuracy of the trim positions.

@malton-ont
Copy link
Collaborator

dorado barcoding has gone through several evolutions since this issue was raised, and adapter/primer trimming has also been added. I'm going to close it now - if there are specific cases that are still causing problems, please raise a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants