Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Trans-ABySS in threaded mode, but ABYSS seems to be running single-threaded #33

Open
elissasoroj opened this issue Jan 12, 2025 · 4 comments
Assignees
Labels

Comments

@elissasoroj
Copy link

Hello,

I am trying to process a large number of assemblies under a bit of a time crunch. I am running Trans-ABySS with the following command:

transabyss --pe krakennp_SRR20074402_out_1.fq.gz krakennp_SRR20074402_out_2.fq.gz krakennp_SRR20074403_out_1.fq.gz krakennp_SRR20074403_out_2.fq.gz krakennp_SRR20074404_out_1.fq.gz krakennp_SRR20074404_out_2.fq.gz krakennp_SRR29324688_out_1.fq.gz krakennp_SRR29324688_out_2.fq.gz krakennp_SRR29324689_out_1.fq.gz krakennp_SRR29324689_out_2.fq.gz krakennp_SRR29324700_out_1.fq.gz krakennp_SRR29324700_out_2.fq.gz krakennp_SRR29324701_out_1.fq.gz krakennp_SRR29324701_out_2.fq.gz -k 32 --name crichardii_ncbiCrHAM_transabyss_k32_out.fa --threads 18

Trans-ABySS seems to initialize fine:

Found Trans-ABySS directory at: /home/elissa/miniconda3/envs/abyss
Found Trans-ABySS `bin` directory at: /home/elissa/miniconda3/envs/abyss/bin
Found script at: /home/elissa/miniconda3/envs/abyss/bin/skip_psl_self.awk
Found script at: /home/elissa/miniconda3/envs/abyss/bin/skip_psl_self_ss.awk
Found `abyss-pe' at /home/elissa/miniconda3/envs/abyss/bin/abyss-pe
Found `MergeContigs' at /home/elissa/miniconda3/envs/abyss/bin/MergeContigs
Found `abyss-filtergraph' at /home/elissa/miniconda3/envs/abyss/bin/abyss-filtergraph
Found `abyss-junction' at /home/elissa/miniconda3/envs/abyss/bin/abyss-junction
Found `blat' at /home/elissa/miniconda3/envs/abyss/bin/blat
Found `abyss-map' at /home/elissa/miniconda3/envs/abyss/bin/abyss-map
# CPU(s) available:     80
# thread(s) requested:  18
# thread(s) to use:     18

But then it takes about 6 hours to read in one fq file at a time and discard reads (seems to be using these settings: ABYSS -k32 -q3 -e2 -E0 -c2 --coverage-hist=coverage.hist ...).

This seems like a parallelizeable step to me, or is this just standard behavior?

I am getting this error at the very beginning of the run. I thought it was not that important since it did not seem to interfere with the process for others (e.g. #26). However, I see the parameter j=18 up above the error, so perhaps it is related?

CMD: bash -euo pipefail -c 'abyss-pe graph=adj --directory=/mnt/pinky/elissa/1n2n/transabyss/crichardii k=32 name=crichardii_ncbiCrHAM_transabyss_k32_out.fa E=0 e=2 c=2 j=18 crichardii_ncbiCrHAM_transabyss_k32_out.fa-1.fa crichardii_ncbiCrHAM_transabyss_k32_out.fa-1.adj q=3 se="/mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_2.fq.gz"'
make: Entering directory '/mnt/pinky/elissa/1n2n/transabyss/crichardii'
dirname: missing operand
Try 'dirname --help' for more information.
ABYSS -k32 -q3 -e2 -E0 -c2    --coverage-hist=coverage.hist -s crichardii_ncbiCrHAM_transabyss_k32_out.fa-bubbles.fa  -o crichardii_ncbiCrHAM_transabyss_k32_out.fa-1.fa  /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_2.fq.gz

Any help is greatly appreciates. Sorry if I'm missing something obvious.

~Elissa

@kmnip
Copy link
Collaborator

kmnip commented Jan 12, 2025

Hi @elissasoroj ,

If I remember correctly, ABySS (without Bloom filter deBruijn graph) can only read multiple read files at the same time if it was using MPI. Trans-ABySS doesn't run ABySS with MPI enabled.

The dirname: missing operand error is indeed the same issue as #26 . The solution for this issue is in my comment here:
#26 (comment)

j=18 tells abyss-pe to use 18 threads in its workflow. I don't think that is related to this issue.

Do you have to use Trans-ABySS in your work?
If not, you can try RNA-Bloom: https://github.com/bcgsc/RNA-Bloom
I developed it for reference-free transcriptome assembly. It should work well for your time crunch.

Ka Ming

@kmnip kmnip self-assigned this Jan 12, 2025
@kmnip kmnip added the question label Jan 12, 2025
@elissasoroj
Copy link
Author

Hi Ka Ming,

Thanks for the quick reply! I am currently testing different approaches, so I will give RNA-Bloom a try!

I'd still like too try out Trans-ABySS if possible - is there a setting for it that will allow me to run ABySS in parallel - for example, is there a way to run it with the Bloom filter deBruijn graph?

Thanks again,
~Elissa

@kmnip
Copy link
Collaborator

kmnip commented Jan 12, 2025

I tried the Bloom filter DBG approach in ABySS a long time ago, but it produced a worse transcriptome assembly at the time. I decided to stick with the original DBG approach. So, I wouldn't recommend switching to the Bloom filter DBG.
Sorry, I don't think there is a solution to the issue.

@elissasoroj
Copy link
Author

Alright, thank you so much! I appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants