Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaxVamb - KeyError: "Separator 'C' not in sequence identifier #376

Closed
AnthonyC-S opened this issue Dec 10, 2024 · 3 comments
Closed

TaxVamb - KeyError: "Separator 'C' not in sequence identifier #376

AnthonyC-S opened this issue Dec 10, 2024 · 3 comments

Comments

@AnthonyC-S
Copy link

I wanted to try out TaxVamb for binning one metagenomic sample. But ran into this error after training. Here is the log.txt.

I ran TaxVamb with this command (excluding slurm parameters):

vamb bin taxvamb --outdir /workdir/awc93/abx-meta/taxvamb_output/seqid_14 \ --fasta /workdir/awc93/abx-meta/spades_output/before_treatment/JAX/seqid_14/contigs.fasta \ -m 250 \ --abundance_tsv /workdir/awc93/abx-meta/strobealign_output/seqid_14_abundances_250_minlength.tsv \ --taxonomy /workdir/awc93/abx-meta/taxvamb_input/seqid_14_taxvamb_tax_input_250_minlength.tsv

the contigs were generated with metaspades, and the abundance_tsv was made with strobealign with this command:

strobealign -t 30 --aemb /workdir/awc93/abx-meta/spades_output/before_treatment/JAX/seqid_14/contigs.fasta /workdir/awc93/abx-meta/reads_postqc/before_treatment/JAX/seqid_14/seqid_14.trim_1.fastq /workdir/awc93/abx-meta/reads_postqc/before_treatment/JAX/seqid_14/seqid_14.trim_2.fastq > seqid_14_abundances.tsv

and the taxonomy was done with Kraken2 and then converted with taxconverter.

the nodes in the contigs.fasta file are named in this manner:

NODE_1_length_224708_cov_8.543349,
NODE_2_length_175860_cov_10.485417
etc.

Python version: 3.10.16
Vamb version: 4.1.4.dev144+g95f1155

How should I be running TaxVamb to avoid this error? Should I be running multiple samples instead of doing a test run with a single sample? Thank you for any help!

@jakobnissen
Copy link
Member

This appears to be a bug in clustering - or rather, in the bin splitting. I'll look at it later today.

@AnthonyC-S
Copy link
Author

Thank you! I re-ran it but this time included -o "" in the vamb CLI parameters. It successfully ran and here is the log:
log_successful_run.txt. Would I get better results including bin splitting?

@jakobnissen
Copy link
Member

I have a fix in #377. Thanks for the bug report.
Yes, binsplitting normally give better results, since there will often be variation between samples. Assembling each sample individually, then binning them together (to make use of co-abundance across samples), and then splitting to yield sample-wise pure bins usually gives the best result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants