Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tombo preprocess annotate_raw_with_fastqs #422

Open
AzlanNI opened this issue Feb 14, 2023 · 16 comments
Open

tombo preprocess annotate_raw_with_fastqs #422

AzlanNI opened this issue Feb 14, 2023 · 16 comments

Comments

@AzlanNI
Copy link

AzlanNI commented Feb 14, 2023

Hello Everyone,

I am currently using Tombo version 1.5 on our uni HPC to analyze some bacterial modifications in DNA. Before we used fast5 data which included the basecalls so i could just start with the resquiggle Step and everything was working fine.

But our updated software separate the fastqs and fast5s. The fastqs are also gziped. So i just ungziped the fastqs and tried to annotate the fast5s with the fastqs from the same barcode (run).

I currently always get the Error:
Preparing reads and extracting read identifiers.
****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 348/348 [00:39<00:00, 8.85it/s]
[19:31:46] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [01:03, ?it/s]
[19:32:50] Added sequences to a total of 0 reads.

So i looked this problem up but could not find a solution which was working for me. We are now rebasecalling the Data. But i would like to know if someone knows how this problem could be solved.

The Line i am using is:
tombo preprocess annotate_raw_with_fastqs --overwrite --fast5-basedir /fast5s/ --fastq-filenames /fastqs/*.fastq

Is there a problem with having muti or singlefast5s ? Or should i look more into the sequencing settings to solve this problem.
And the resquiggle command just gives me the Error that i am missing basecalls in my fast5 data.

I also looked into the final_summary file and i enabled basecalling so i dont understand why Tombo is saying that i am missing basecalls in my fast5.

instrument=MN39041
position=
flow_cell_id=FAT59921
sample_id=Mho_4518_PG21
protocol_group_id=Mho_4518_PG21
protocol=sequencing/sequencing_MIN112_DNA_SQK-Q20EA:FLO-MIN112:SQK-NBD112-24
protocol_run_id=0aa23499-3d48-47f8-ac08-4ebd74be0aa5
acquisition_run_id=14dc527603366f0c24d86d62d46496a806336743
started=2023-02-10T16:09:52.073115+01:00
acquisition_stopped=2023-02-13T16:10:51.626632+01:00
processing_stopped=2023-02-13T16:11:32.102936+01:00
basecalling_enabled=1
sequencing_summary_file=sequencing_summary_FAT59921_0aa23499_14dc5276.txt
fast5_files_in_final_dest=732
fast5_files_in_fallback=0
fastq_files_in_final_dest=755
fastq_files_in_fallback=0

So if anyone would have an idea how i could solve this problem or if i should provide any further information about my problem let me know.

kind regards,

Azlan

@stegiopast
Copy link

Hey Azlan,

I am facing the same problem right now. I tried to annotate single and multi fast5s to fastqs and could not annotate successfully.
I checked the permissions of the file as suggested here: #112
Did not work out for me..
Others recommend to convert the multi fast5 to single fast5 format first: #286
Did not work out either in my case..
Some others say the problem could lie within the sequencing summary files, that do not annotate the single read format filenames.
Link: #245
Additionally, it seems to be very important to set the overwrite flag, but you mention it in your command. I would also be interested in some more information about what might could go wrong.

Kind regards,
Stefan

@stegiopast
Copy link

Hey Azlan,

I downloaded tombo via conda by using:

conda install -c "bioconda/label/cf201901" ont-tombo

It works out for me. Be aware that this might just be a temprorary solution.

Kind regards,
Stefan

@AzlanNI
Copy link
Author

AzlanNI commented Feb 16, 2023

Hey Stefan,

Thanks for ur reply! I will check out the conda install and see if it works out for me.

Kind regards,
Azlan

@FerchoHQ
Copy link

Hey Azlan,

I am facing the same problem right now. I tried to annotate single and multi fast5s to fastqs and could not annotate successfully. I checked the permissions of the file as suggested here: #112 Did not work out for me.. Others recommend to convert the multi fast5 to single fast5 format first: #286 Did not work out either in my case.. Some others say the problem could lie within the sequencing summary files, that do not annotate the single read format filenames. Link: #245 Additionally, it seems to be very important to set the overwrite flag, but you mention it in your command. I would also be interested in some more information about what might could go wrong.

Kind regards, Stefan

Hi Azlan and Stefan,

I'm in the same situation, even I have installed "conda install -c "bioconda/label/cf201901" ont-tombo", I'm still in the same situation, my code.

for get the single reads

multi_to_single_fast5 -i fast5s -s single_reads-fast5s --recursive -t 32

preprocess

tombo preprocess annotate_raw_with_fastqs --fast5-basedir ./single_reads-fast5s
--fastq-filenames guppy_bc.fastq --processes 32 --overwrite

I use all combinations as fastq, one by one, merge, and I used the program deepnano2_caller.py to obtain one fastq file from the fast5s files.

Thank you for your attention, kind regards.

2022-02-21-Tombo-v05

@stegiopast
Copy link

Hi FerchoHQ,

Did you download the vbz compression package as well ?

conda install ont_vbz_hdf_plugin

Sometimes lacking that package causes problems in reading the signal from fast5 files.
Let me know if it helps.

Best,
Stefan

@AzlanNI
Copy link
Author

AzlanNI commented Feb 22, 2023

Hello together,

I now had the time to try it out but sadly even after installing tombo with conda install -c "bioconda/label/cf201901" ont-tombo i still have the same issue. I tried to re do the command again after installing the plugin bit still the same error with

[17:58:46] Loading minimap2 reference.
[17:58:46] Getting file list.
******************** ERROR ********************
Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use tombo preprocess annotate_raw_with_fastqs to add basecalls from FASTQ files to raw FAST5 files.

I am not sure why Tombo is not annotating the fast5 signal with the fastqs but saw the problem a lot of time in forums. Sadly i am still on the search for a solution so if anyone has some advice just go ahead!

Thanks and kind regards,
Azlan

@FerchoHQ
Copy link

Hi FerchoHQ,

Did you download the vbz compression package as well ?

conda install ont_vbz_hdf_plugin

Sometimes lacking that package causes problems in reading the signal from fast5 files. Let me know if it helps.

Best, Stefan

Hello,

I installed the package but still get the same error from the image I uploaded. What is strange to me is that I have a data set with which it does work, only with my real data it doesn't; this data was generated on 09/2022.

Kind regards.

@Jude-Martin
Copy link

I had a similar issue and couldn't get tombo preprocess annotate_raw_with_fastqs to work. I resorted to using an older version of guppy that could still use the --fast5out flag to annotate the files.

@FerchoHQ
Copy link

I had a similar issue and couldn't get tombo preprocess annotate_raw_with_fastqs to work. I resorted to using an older version of guppy that could still use the --fast5out flag to annotate the files.

Good to know this information, which version of guppy did you use?

@Jude-Martin
Copy link

I think the option was removed in version 6.4, so any 6.3.x version should work. I have used 6.3.2 succesfully for this.

@FerchoHQ
Copy link

I think the option was removed in version 6.4, so any 6.3.x version should work. I have used 6.3.2 succesfully for this.

Thanks for your help, the option --fast5_out from guppy brings the fast5s useful to run directly tombo resquiggle, (guppy_basecaller v6.0.1).

I shared the lines of the pipeline that I used.

guppy_basecaller -i ./fast5s -s ./testing-modified-v02 -c dna_r9.4.1_450bps_fast.cfg
--as_cpu_threads_per_scaler 32 --compress_fastq --recursive
--min_qscore 10 --bam_out
--align_ref ../flye_assembly/assembly.fasta --fast5_out

Then I get the single fast5s

multi_to_single_fast5 -i testing-modified-v02/workspace/
-s fast5s_singles --recursive -t 32

With the single fast5s I could continue the rest of the tombo pipeline.

Kind regards.

@DNKonanov
Copy link

Hello,

I faced with the same problem. The problem is a missing parameter in the source code.
The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release

#394

@FerchoHQ
Copy link

FerchoHQ commented Mar 3, 2023

Hello,

I faced with the same problem. The problem is a missing parameter in the source code. The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release

#394

Hello,

The information was pretty useful, thanks.

@gaoxiangcao
Copy link

I think the option was removed in version 6.4, so any 6.3.x version should work. I have used 6.3.2 succesfully for this.

Thanks for your help, the option --fast5_out from guppy brings the fast5s useful to run directly tombo resquiggle, (guppy_basecaller v6.0.1).

I shared the lines of the pipeline that I used.

guppy_basecaller -i ./fast5s -s ./testing-modified-v02 -c dna_r9.4.1_450bps_fast.cfg --as_cpu_threads_per_scaler 32 --compress_fastq --recursive --min_qscore 10 --bam_out --align_ref ../flye_assembly/assembly.fasta --fast5_out

Then I get the single fast5s

multi_to_single_fast5 -i testing-modified-v02/workspace/ -s fast5s_singles --recursive -t 32

With the single fast5s I could continue the rest of the tombo pipeline.

Kind regards.

could you provide me this version of guppy?

@gaoxiangcao
Copy link

Hello,

I faced with the same problem. The problem is a missing parameter in the source code. The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release

#394

hi, how can i find the _preprocess.py file because i install tombo by conda

@stegiopast
Copy link

Hello,
I faced with the same problem. The problem is a missing parameter in the source code. The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release
#394

hi, how can i find the _preprocess.py file because i install tombo by conda

Hey,

usually you can find it here: ~/anaconda3/envs/<name_of_environment>/lib/python3.7/site-packages/tombo/_preprocess.py if your conda prefix is the default prefix.

Best,
Stefan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants