Why are signal lengths per nucleotide always multiples of 12? #68

RaverJay · 2020-02-21T10:19:24Z

Hey there,

I noticed the signal lengths for each nucleotide are only multiples of 12 after re-squiggling with prepare_mapped_reads.py

So differences between adjacent values in Ref_to_signal are e.g.
[ 24., 12., 12., 132., 108., 24., 12., ...]

Why is that?
Seems like unnecessarily coarse resolution
Does this correspond to the Move table in any way?

Possibly unrelated:
I just realized that my reads have old Albacore basecalls (Events table etc) and Guppy basecalls (Move table etc) - will prepare_mapped_reads.py only use the newest Basecalls in the fast5?

Cheers

tmassingham-ont · 2020-03-12T11:48:09Z

Hello. It's to do with the resolution ("stride") of the model used to map the data, and a trade-off between speed and precision when remapping. The mapping can be coarse since training only uses it to select signal-sequence pairs; the training criterion itself does not use this information and considers all possible ways of aligning the signal to the sequence.

RaverJay · 2020-03-17T11:21:47Z

Thanks, that clears it up a lot!

Does that mean that Taiyaki's prepare_mapped_reads.py is not really best suited as a resquiggler, e.g. to extract signal means for individual bases for other downstream tasks?

Are you aware of anything more accurate?

EDIT: current guppy models also have stride 10, while the default for taiyaki trained models is 2 - a pretrained stride 1 or 2 model for RNA could be useful

tmassingham-ont · 2020-03-17T16:43:40Z

If you are looking at mapping to sample resolution, you could try scrappie mappy --model squiggle_r94_rna ref.fa read.fast5 (from https://github.com/nanoporetech/scrappie ). Another option would be the tools from the Nanopolish or Tombo packages.

RaverJay changed the title ~~Why are signal lengths of nucleotides multiples of 12?~~ Why are signal lengths per nucleotides always multiples of 12? Feb 24, 2020

RaverJay changed the title ~~Why are signal lengths per nucleotides always multiples of 12?~~ Why are signal lengths per nucleotide always multiples of 12? Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are signal lengths per nucleotide always multiples of 12? #68

Why are signal lengths per nucleotide always multiples of 12? #68

RaverJay commented Feb 21, 2020 •

edited

Loading

tmassingham-ont commented Mar 12, 2020

RaverJay commented Mar 17, 2020 •

edited

Loading

tmassingham-ont commented Mar 17, 2020

Why are signal lengths per nucleotide always multiples of 12? #68

Why are signal lengths per nucleotide always multiples of 12? #68

Comments

RaverJay commented Feb 21, 2020 • edited Loading

tmassingham-ont commented Mar 12, 2020

RaverJay commented Mar 17, 2020 • edited Loading

tmassingham-ont commented Mar 17, 2020

RaverJay commented Feb 21, 2020 •

edited

Loading

RaverJay commented Mar 17, 2020 •

edited

Loading