Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to assembly de-novo genome #2

Open
Dv1t opened this issue Nov 18, 2024 · 3 comments
Open

Fail to assembly de-novo genome #2

Dv1t opened this issue Nov 18, 2024 · 3 comments

Comments

@Dv1t
Copy link

Dv1t commented Nov 18, 2024

Hello VStrains team, thank you for developing such a great tool, but while using it, I faced the following problem.
I attempted to assemble the complete HIV genome from this sample: SRR29407826. I used corona-spades and it worked fine.
However VStrains crashes when assembling it.
First, there was an error related to rev_dict in VStrains_PE_Inference.py. It doesn't had lowercase nucleotides in it and therefore raised KeyError. I fixed it replacing:
rev_dict = {"A": "T", "T": "A", "C": "G", "G": "C"}
with this:
rev_dict = { "A": "T", "T": "A", "C": "G", "G": "C", "a": "t", "t": "a", "c": "g", "g": "c" }
But new issue occurred, after messages in CLI log:

----------------------Paired-End Information Alignment----------------------
Start aligning reads to gfa nodes
Number of processed reads: 0

It freezes forever and do not proceed any further.

Worth mentioning details
In the same log there is a suspicious message:

INFO - graph kmer size: 0

Also VStrains can't read assembly_graph_after_simplification.gfa file (which is the output of spades) without changing its version in header from 1.2 to 1.0 manually.

Steps to reproduce

  1. Assembly with spades:
spades.py --corona -1 SRR29407826_1.fastq -2 SRR29407826_2.fastq -o spades_G_SRR29407826
  1. Start VStrains:
vstrains -a spades -g spades_G_SRR29407826/assembly_graph_after_simplification.gfa \
-p spades_G_SRR29407826/contigs.paths \
-o vstrains_G -fwd SRR29407826_1.fastq -rve SRR29407826_2.fastq

Files with reads:.
SRR29407826.zip
VStrains log:
vstrains.log
Spades log:
spades.log

@RunpengLuo
Copy link
Member

RunpengLuo commented Nov 24, 2024

Hi, Thanks for trying VStrains and sorry for the late reply,

  1. the GFA version conflicts with the external python library for parsing GFA file (gfapy) and it is not up-to-update with GFA version 1.2, I'll try to fix it later on but currently changing the version to 1.0 manually from GFA file should work for parsing. Thanks a lot for pointing this out!

  2. VStrains didn't test with coronaSPAdes but mainly on SPAdes. I've ran your dataset with SPAdes (common version) + VStrains. It might not be helpful to run VStrains with coronaSPAdes since it already report a single collapsed strain and the graph structure doesn't have edges to further process either. I've attached the result&Bandage visualization if it might be helpful.

Feel free to let me know if there are further questions,
John

out_SRR29407826.zip

@Dv1t
Copy link
Author

Dv1t commented Nov 25, 2024

I've also tried --rnaviral option of spades and there are two types of outcome with VStrains in case kmer=0:

  1. Strain assembled (but same as scaffold of spades)
  2. VStrain runs forever

Attaching logs and files for both cases:
First:
vstrains_kmer_0_success.log
spades_kmer_0_success.log
kmer_0_success_reads.zip

Second:
vstrains_kmer_0_fail.log
spades_kmer_0_fail.log
kmer_0_fail_reads.zip

@RunpengLuo
Copy link
Member

RunpengLuo commented Nov 30, 2024

I think it is likely due to the input graph is already disconnected since inferred k-mer size=0 and only 3 rows in the GFA file, and VStrains does not handle disconnected graphs (no edges) since no further graph simplification can be done post SPAdes --rnaviral. I've pushed the changes to make sure VStrains exit when such case exists. The key reason for running forever is due to paired-link inference attempts to infer (k+1)-mer linkage from paired-end reads, when k=0, inference step infers all 1-mer linkage, such search space is infeasible and won't provide any useful information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants