Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ASVs using --cluster_unoise #18

Open
colinbrislawn opened this issue Nov 8, 2018 · 6 comments
Open

make ASVs using --cluster_unoise #18

colinbrislawn opened this issue Nov 8, 2018 · 6 comments

Comments

@colinbrislawn
Copy link
Collaborator

Janet and others have expressed interest in having Hundo support denoising methods in addition to traditional OTU clustering.

This is something I'm interested in implementing, and I would appreciate any support or advice that you have. Dan (@dtnaylor124) expressed interested in helping to test this.

@colinbrislawn
Copy link
Collaborator Author

My main question about implementation is how to perform chimera checking when building ASVs. The developer recommends performing chimera filtering after clustering, while Hundo currently searches for chimeras before clustering.

Example unoise order:

vsearch --cluster_unoise uniques.fa --sizein --sizeout --centroids zotus_chim.fa 
vsearch --sortbysize zotus_chim.fa --output zotus_sorted.fa
vsearch --uchime3_denovo zotus_sorted.fa --nonchimeras zotus.fa

Current Hundo workflow:

dereplicate_sequences
precluster_sequences
run_denovo_chimera_filter
run_reference_chimera_filter
extract_filtered_sequences
pull_seqs_from_samples_with_filtered_sequences
cluster_sequences
run_aligner
compile_counts

Possible Hundo workflow for ASVs ❓

dereplicate_sequences
denoise_sequences         (new rule using --cluster_unoise)
run_denovo_chimera_filter (new rule using --uchime3_denovo)
run_reference_chimera_filter
extract_filtered_sequences
pull_seqs_from_samples_with_filtered_sequences
run_aligner
compile_counts            (do we have to update this rule?)

... If we move chimera checking afterward and use --search_exact, maybe we can avoid the extra filtering rules and jump directly to building the .biom table. 🤔

dereplicate_sequences
denoise_sequences         (new rule using --cluster_unoise)
run_denovo_chimera_filter (new rule using --uchime3_denovo)
run_reference_chimera_filter
run_aligner
compile_counts            (new rule, using --search_exact maybe?)

@brwnj
Copy link
Contributor

brwnj commented Nov 16, 2018

The last option is cleanest, but how do we resolve issues when people would like to continue to cluster their data?

@colinbrislawn
Copy link
Collaborator Author

We could maintain the existing OTU clustering workflow like we did with BLAST or vsearch aligners: add a new @click.option and an if statement to divide up the rules for the two workflows.

I guess we could also implement these as separate steps that a user can turn on or off in any combination, but that could cause conflicts. A user probably shouldn't precluster_sequences before denoise_sequences, and running run_denovo_chimera_filter before and after seems unneeded.

@brwnj
Copy link
Contributor

brwnj commented Nov 16, 2018

Seems reasonable to me. Feel like taking a stab at it? I can review your branch when you're ready.

@colinbrislawn
Copy link
Collaborator Author

Yes please! Thank you Joe.

I'm still rustling up the funding to build this, so this is a month away. 💵 🎅

@colinbrislawn colinbrislawn changed the title make ESVs using --cluster_unoise make ASVs using --cluster_unoise Apr 27, 2019
@colinbrislawn
Copy link
Collaborator Author

@brwnj I've made good progress on this! Can you review my code under #21?

I still need benchmark the resulting OTUs vs ASVs, but the technique works.

Feedback welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants