Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Why are signal lengths per nucleotide always multiples of 12? #68

Open
RaverJay opened this issue Feb 21, 2020 · 3 comments
Open

Why are signal lengths per nucleotide always multiples of 12? #68

RaverJay opened this issue Feb 21, 2020 · 3 comments

Comments

@RaverJay
Copy link

RaverJay commented Feb 21, 2020

Hey there,

I noticed the signal lengths for each nucleotide are only multiples of 12 after re-squiggling with prepare_mapped_reads.py

So differences between adjacent values in Ref_to_signal are e.g.
[ 24., 12., 12., 132., 108., 24., 12., ...]

Why is that?
Seems like unnecessarily coarse resolution
Does this correspond to the Move table in any way?

Possibly unrelated:
I just realized that my reads have old Albacore basecalls (Events table etc) and Guppy basecalls (Move table etc) - will prepare_mapped_reads.py only use the newest Basecalls in the fast5?

Cheers

@RaverJay RaverJay changed the title Why are signal lengths of nucleotides multiples of 12? Why are signal lengths per nucleotides always multiples of 12? Feb 24, 2020
@RaverJay RaverJay changed the title Why are signal lengths per nucleotides always multiples of 12? Why are signal lengths per nucleotide always multiples of 12? Feb 24, 2020
@tmassingham-ont
Copy link
Contributor

Hello. It's to do with the resolution ("stride") of the model used to map the data, and a trade-off between speed and precision when remapping. The mapping can be coarse since training only uses it to select signal-sequence pairs; the training criterion itself does not use this information and considers all possible ways of aligning the signal to the sequence.

@RaverJay
Copy link
Author

RaverJay commented Mar 17, 2020

Thanks, that clears it up a lot!

Does that mean that Taiyaki's prepare_mapped_reads.py is not really best suited as a resquiggler, e.g. to extract signal means for individual bases for other downstream tasks?

Are you aware of anything more accurate?

EDIT: current guppy models also have stride 10, while the default for taiyaki trained models is 2 - a pretrained stride 1 or 2 model for RNA could be useful

@tmassingham-ont
Copy link
Contributor

If you are looking at mapping to sample resolution, you could try scrappie mappy --model squiggle_r94_rna ref.fa read.fast5 (from https://github.com/nanoporetech/scrappie ). Another option would be the tools from the Nanopolish or Tombo packages.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants