Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NextFlow #112

Open
MadeleineOman opened this issue Jan 23, 2025 · 8 comments
Open

NextFlow #112

MadeleineOman opened this issue Jan 23, 2025 · 8 comments

Comments

@MadeleineOman
Copy link

I'm trying to run the Nextflow pipeline and it seems that the pipeline creates temporary folders during the analysis under the "work" folder, however during execution it for some reason cannot find these folders (which I verify do exist once the pipeline has errored out). I've included a screenshot of the error, as well as the whole .log file. I'm sure this is just some small issue (ie. previous step not functioning exactly as expected, ergo next step errors completely), but because the error itself isn't that informative I don't know where within the pipeline to look for troubleshooting.

Any help is appreciated!

2025_01_23_nextflow.log
Image

@fa8sanger
Copy link
Collaborator

The log says "bsub" command wasn't found. This is the program to handle the queue system LSF. Which queue system are you using?

@MadeleineOman
Copy link
Author

Actually we are using a local server, not a cluster. Can we still run the pipeline using Nextflow?

In case not, I've taken a look at the manual implementation but also see "bsub" commands there. Would the fix be simply to run the code directly? Ie instead of
bsub -q basement -G team78-grp -o out -e log -M20000 -R"span[hosts=1] select[mem>20000] rusage[mem=20000]" "/software/CGP/external-apps/bwa-0.7.5a/bwa mem -C /lustre/scratch117/casm/team78/ro4/hs37d5/hs37d5.fa 70#1R1.fastq.gz 70#1R2.fastq > 70#1.sam"
run
bwa mem -C /lustre/scratch117/casm/team78/ro4/hs37d5/hs37d5.fa 70#1R1.fastq.gz 70#1R2.fastq > 70#1.sam
we are using a smaller genome and so I do not think the same levels of computation/memory optimization will be required.

@fa8sanger
Copy link
Collaborator

It should be possible to run locally but it will take very long. I am not good at nextflow, but this is how google says:

To run Nextflow without LSF, simply set the "executor" option in your Nextflow configuration file to "local"; this will instruct Nextflow to run all pipeline tasks on the machine where you launched the command, effectively bypassing the need for a cluster job scheduler like LSF.
Key points:
Configuration file: Modify your nextflow.config file to include the following line:
Code

process.executor = 'local'

When to use "local": This is suitable for testing your pipeline on a single machine, running small workflows, or when you don't require large-scale cluster computing.
Alternative executors for cluster environments:
Slurm: If your cluster uses Slurm, set process.executor = 'slurm'.
SGE: For Sun Grid Engine, use process.executor = 'sge'.
PBS: For Portable Batch System, set process.executor = 'pbs'.

@MadeleineOman
Copy link
Author

Ah I understand. I've changed the config file to process.executor = 'local' and also changed the profile flag in the command to run nextflow from -profile lsf_singularity to -profile standard`

I'm getting a new error now, that

Image
full output log: 2025_01_30_nextflow.log

I'm not sure where this bwa_mem.pl command is coming from, since I have bwa-mem2 installed in the conda env I'm using to run nextflow. Using grep I see bwa_mem.pl is referenced in the NanoSeq/Nextflow/modules/bwa.nf file, but im still not sure how to solve this problem.
Once again, thanks for any help!

@fa8sanger
Copy link
Collaborator

Oh, that's used for remapping bams, which is something you usually don't do. Indeed I think that's only an internal option for the Sanger. From your log it seems you invoked nextflow with:
nextflow run NanoSeq_main.nf -qs 300 -profile standard --ref /research/projects/PBCV1/preliminary/raw_data/sequencing/reference/test/genome.fa --sample_sheet /research/projects/PBCV1/preliminary/data/samplesheet/test_samplesheet.csv

So you were not requesting remapping. Can you share your test_samplesheet.csv file? I'll then ask the person who wrote the Nextflow part

@MadeleineOman
Copy link
Author

Here is the test_samplesheet.csv: test_samplesheet.csv. I wanted to test the pipeline on the test files provided by you to just try and get the pipeline up and running first. I assumed the /test/duplex.bam was an example bam for input, the /test/normal.bam was the corresponding matched normal, and that /test/hs37d5.fa.gz was the reference (which i had to prep and index myself, using this script: prepGenome.txt.txt

Just incase I made some false assumptions, I retried running the pipeline with my data, and did get a different error. Not sure if this helps, but here is the log and the associated samplesheet:
2025_01_31_nextflow_mypipeline.log
samplesheet.csv

@fa8sanger
Copy link
Collaborator

Thanks. I see in your log that it says "remap:true". You don't want that. Could you try specifying in the call to nextflow "remap false"? I'd also recommend using the noise_bed and snp_bed masks.
If that doesn't work either I'll ask the person who wrote the nextflow.

@fa8sanger
Copy link
Collaborator

Also, the log says: Jan.-31 11:03:01.961 [main] INFO nextflow.script.BaseScript - running with fastqs as input that will be trimmed,tagged and mapped
but you are not running it with fastqs but with bam files.... Not sure what's wrong. Hopefully Raúl would b e able to give you a hand (just emailed him)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants