tombo preprocess annotate_raw_with_fastqs #422

AzlanNI · 2023-02-14T18:50:14Z

Hello Everyone,

I am currently using Tombo version 1.5 on our uni HPC to analyze some bacterial modifications in DNA. Before we used fast5 data which included the basecalls so i could just start with the resquiggle Step and everything was working fine.

But our updated software separate the fastqs and fast5s. The fastqs are also gziped. So i just ungziped the fastqs and tried to annotate the fast5s with the fastqs from the same barcode (run).

I currently always get the Error:
Preparing reads and extracting read identifiers.
****** WARNING ****** Basecalls exsit in specified slot for some reads. Set --overwrite option to overwrite these basecalls.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 348/348 [00:39<00:00, 8.85it/s]
[19:31:46] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [01:03, ?it/s]
[19:32:50] Added sequences to a total of 0 reads.

So i looked this problem up but could not find a solution which was working for me. We are now rebasecalling the Data. But i would like to know if someone knows how this problem could be solved.

The Line i am using is:
tombo preprocess annotate_raw_with_fastqs --overwrite --fast5-basedir /fast5s/ --fastq-filenames /fastqs/*.fastq

Is there a problem with having muti or singlefast5s ? Or should i look more into the sequencing settings to solve this problem.
And the resquiggle command just gives me the Error that i am missing basecalls in my fast5 data.

I also looked into the final_summary file and i enabled basecalling so i dont understand why Tombo is saying that i am missing basecalls in my fast5.

instrument=MN39041
position=
flow_cell_id=FAT59921
sample_id=Mho_4518_PG21
protocol_group_id=Mho_4518_PG21
protocol=sequencing/sequencing_MIN112_DNA_SQK-Q20EA:FLO-MIN112:SQK-NBD112-24
protocol_run_id=0aa23499-3d48-47f8-ac08-4ebd74be0aa5
acquisition_run_id=14dc527603366f0c24d86d62d46496a806336743
started=2023-02-10T16:09:52.073115+01:00
acquisition_stopped=2023-02-13T16:10:51.626632+01:00
processing_stopped=2023-02-13T16:11:32.102936+01:00
basecalling_enabled=1
sequencing_summary_file=sequencing_summary_FAT59921_0aa23499_14dc5276.txt
fast5_files_in_final_dest=732
fast5_files_in_fallback=0
fastq_files_in_final_dest=755
fastq_files_in_fallback=0

So if anyone would have an idea how i could solve this problem or if i should provide any further information about my problem let me know.

kind regards,

Azlan

stegiopast · 2023-02-15T15:45:03Z

Hey Azlan,

I am facing the same problem right now. I tried to annotate single and multi fast5s to fastqs and could not annotate successfully.
I checked the permissions of the file as suggested here: #112
Did not work out for me..
Others recommend to convert the multi fast5 to single fast5 format first: #286
Did not work out either in my case..
Some others say the problem could lie within the sequencing summary files, that do not annotate the single read format filenames.
Link: #245
Additionally, it seems to be very important to set the overwrite flag, but you mention it in your command. I would also be interested in some more information about what might could go wrong.

Kind regards,
Stefan

stegiopast · 2023-02-15T16:03:55Z

Hey Azlan,

I downloaded tombo via conda by using:

conda install -c "bioconda/label/cf201901" ont-tombo

It works out for me. Be aware that this might just be a temprorary solution.

Kind regards,
Stefan

AzlanNI · 2023-02-16T13:34:40Z

Hey Stefan,

Thanks for ur reply! I will check out the conda install and see if it works out for me.

Kind regards,
Azlan

FerchoHQ · 2023-02-21T22:10:53Z

Hey Azlan,

I am facing the same problem right now. I tried to annotate single and multi fast5s to fastqs and could not annotate successfully. I checked the permissions of the file as suggested here: #112 Did not work out for me.. Others recommend to convert the multi fast5 to single fast5 format first: #286 Did not work out either in my case.. Some others say the problem could lie within the sequencing summary files, that do not annotate the single read format filenames. Link: #245 Additionally, it seems to be very important to set the overwrite flag, but you mention it in your command. I would also be interested in some more information about what might could go wrong.

Kind regards, Stefan

Hi Azlan and Stefan,

I'm in the same situation, even I have installed "conda install -c "bioconda/label/cf201901" ont-tombo", I'm still in the same situation, my code.

for get the single reads

multi_to_single_fast5 -i fast5s -s single_reads-fast5s --recursive -t 32

preprocess

tombo preprocess annotate_raw_with_fastqs --fast5-basedir ./single_reads-fast5s
--fastq-filenames guppy_bc.fastq --processes 32 --overwrite

I use all combinations as fastq, one by one, merge, and I used the program deepnano2_caller.py to obtain one fastq file from the fast5s files.

Thank you for your attention, kind regards.

stegiopast · 2023-02-22T13:48:37Z

Hi FerchoHQ,

Did you download the vbz compression package as well ?

conda install ont_vbz_hdf_plugin

Sometimes lacking that package causes problems in reading the signal from fast5 files.
Let me know if it helps.

Best,
Stefan

AzlanNI · 2023-02-22T17:02:45Z

Hello together,

I now had the time to try it out but sadly even after installing tombo with conda install -c "bioconda/label/cf201901" ont-tombo i still have the same issue. I tried to re do the command again after installing the plugin bit still the same error with

[17:58:46] Loading minimap2 reference.
[17:58:46] Getting file list.
******************** ERROR ********************
Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use tombo preprocess annotate_raw_with_fastqs to add basecalls from FASTQ files to raw FAST5 files.

I am not sure why Tombo is not annotating the fast5 signal with the fastqs but saw the problem a lot of time in forums. Sadly i am still on the search for a solution so if anyone has some advice just go ahead!

Thanks and kind regards,
Azlan

FerchoHQ · 2023-02-22T17:24:08Z

Hi FerchoHQ,

Did you download the vbz compression package as well ?

conda install ont_vbz_hdf_plugin

Sometimes lacking that package causes problems in reading the signal from fast5 files. Let me know if it helps.

Best, Stefan

Hello,

I installed the package but still get the same error from the image I uploaded. What is strange to me is that I have a data set with which it does work, only with my real data it doesn't; this data was generated on 09/2022.

Kind regards.

Jude-Martin · 2023-02-24T12:31:06Z

I had a similar issue and couldn't get tombo preprocess annotate_raw_with_fastqs to work. I resorted to using an older version of guppy that could still use the --fast5out flag to annotate the files.

FerchoHQ · 2023-02-24T16:27:07Z

I had a similar issue and couldn't get tombo preprocess annotate_raw_with_fastqs to work. I resorted to using an older version of guppy that could still use the --fast5out flag to annotate the files.

Good to know this information, which version of guppy did you use?

Jude-Martin · 2023-02-27T10:55:11Z

I think the option was removed in version 6.4, so any 6.3.x version should work. I have used 6.3.2 succesfully for this.

FerchoHQ · 2023-02-28T17:20:51Z

I think the option was removed in version 6.4, so any 6.3.x version should work. I have used 6.3.2 succesfully for this.

Thanks for your help, the option --fast5_out from guppy brings the fast5s useful to run directly tombo resquiggle, (guppy_basecaller v6.0.1).

I shared the lines of the pipeline that I used.

guppy_basecaller -i ./fast5s -s ./testing-modified-v02 -c dna_r9.4.1_450bps_fast.cfg
--as_cpu_threads_per_scaler 32 --compress_fastq --recursive
--min_qscore 10 --bam_out
--align_ref ../flye_assembly/assembly.fasta --fast5_out

Then I get the single fast5s

multi_to_single_fast5 -i testing-modified-v02/workspace/
-s fast5s_singles --recursive -t 32

With the single fast5s I could continue the rest of the tombo pipeline.

Kind regards.

DNKonanov · 2023-03-02T13:53:49Z

Hello,

I faced with the same problem. The problem is a missing parameter in the source code.
The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release

#394

FerchoHQ · 2023-03-03T16:32:14Z

Hello,

I faced with the same problem. The problem is a missing parameter in the source code. The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release

#394

Hello,

The information was pretty useful, thanks.

gaoxiangcao · 2023-06-26T12:22:04Z

I think the option was removed in version 6.4, so any 6.3.x version should work. I have used 6.3.2 succesfully for this.

Thanks for your help, the option --fast5_out from guppy brings the fast5s useful to run directly tombo resquiggle, (guppy_basecaller v6.0.1).

I shared the lines of the pipeline that I used.

guppy_basecaller -i ./fast5s -s ./testing-modified-v02 -c dna_r9.4.1_450bps_fast.cfg --as_cpu_threads_per_scaler 32 --compress_fastq --recursive --min_qscore 10 --bam_out --align_ref ../flye_assembly/assembly.fasta --fast5_out

Then I get the single fast5s

multi_to_single_fast5 -i testing-modified-v02/workspace/ -s fast5s_singles --recursive -t 32

With the single fast5s I could continue the rest of the tombo pipeline.

Kind regards.

could you provide me this version of guppy?

gaoxiangcao · 2023-06-26T12:37:13Z

Hello,

I faced with the same problem. The problem is a missing parameter in the source code. The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release

#394

hi, how can i find the _preprocess.py file because i install tombo by conda

stegiopast · 2023-07-11T16:20:58Z

Hello,
I faced with the same problem. The problem is a missing parameter in the source code. The solution is either to just manually fix the source file or to install the last github release since the bug was fixed here but not in the conda release
#394

hi, how can i find the _preprocess.py file because i install tombo by conda

Hey,

usually you can find it here: ~/anaconda3/envs/<name_of_environment>/lib/python3.7/site-packages/tombo/_preprocess.py if your conda prefix is the default prefix.

Best,
Stefan

vidsvur mentioned this issue Oct 2, 2023

Will deep-signal plant accept remora? PengNi/deepsignal-plant#36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tombo preprocess annotate_raw_with_fastqs #422

tombo preprocess annotate_raw_with_fastqs #422

AzlanNI commented Feb 14, 2023

stegiopast commented Feb 15, 2023

stegiopast commented Feb 15, 2023

AzlanNI commented Feb 16, 2023

FerchoHQ commented Feb 21, 2023

stegiopast commented Feb 22, 2023

AzlanNI commented Feb 22, 2023

FerchoHQ commented Feb 22, 2023

Jude-Martin commented Feb 24, 2023

FerchoHQ commented Feb 24, 2023

Jude-Martin commented Feb 27, 2023

FerchoHQ commented Feb 28, 2023

DNKonanov commented Mar 2, 2023

FerchoHQ commented Mar 3, 2023

gaoxiangcao commented Jun 26, 2023

gaoxiangcao commented Jun 26, 2023

stegiopast commented Jul 11, 2023

tombo preprocess annotate_raw_with_fastqs #422

tombo preprocess annotate_raw_with_fastqs #422

Comments

AzlanNI commented Feb 14, 2023

stegiopast commented Feb 15, 2023

stegiopast commented Feb 15, 2023

AzlanNI commented Feb 16, 2023

FerchoHQ commented Feb 21, 2023

for get the single reads

preprocess

I use all combinations as fastq, one by one, merge, and I used the program deepnano2_caller.py to obtain one fastq file from the fast5s files.

stegiopast commented Feb 22, 2023

AzlanNI commented Feb 22, 2023

FerchoHQ commented Feb 22, 2023

Jude-Martin commented Feb 24, 2023

FerchoHQ commented Feb 24, 2023

Jude-Martin commented Feb 27, 2023

FerchoHQ commented Feb 28, 2023

DNKonanov commented Mar 2, 2023

FerchoHQ commented Mar 3, 2023

gaoxiangcao commented Jun 26, 2023

gaoxiangcao commented Jun 26, 2023

stegiopast commented Jul 11, 2023