Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dorado for m6A analysis #1170

Open
baibhav-bioinfo opened this issue Dec 10, 2024 · 4 comments
Open

dorado for m6A analysis #1170

baibhav-bioinfo opened this issue Dec 10, 2024 · 4 comments

Comments

@baibhav-bioinfo
Copy link

baibhav-bioinfo commented Dec 10, 2024

Hello,
Appreciation for developing the valuable and diverse software.

I am trying to use Dorado for m6A analysis (using DRS data), from site identification to Diff Methylated Rate analysis between conditions with replicates (using modkit).

(1) i wanted to know if
dorado basecaller [email protected] /path/to/pod5
--modified-bases-models [email protected]_m6A_DRACH@v1/ --device cude:all --reference ref.fasta > calls.bam

this is correct approach for alignment along with basecalling.

also as other tools (eg. m6Anet) uses transcriptome rather than genome. which one would be more suitable to align with?.

(2) The resultant bed file (after modkit pileup) contains position with Nmod=0. Why are those in the output when my argument is to detect the m6A mod sites? do i need to filter out those, keeping only Nmod>=1?
also the file have Nother_mod, what are those and why are we getting other mods?
is there any option to get only relevant Nmod rows, in my case A?

@malton-ont
Copy link
Collaborator

Hi @baibhav-bioinfo,

  1. Yes. See the docs here. Regarding which reference to use, bioinformatics questions are better posted on the Nanopore community forums. I'm not familiar with m6Anet.
  2. This question would be better posted on the modkit github page, but at a guess I'd say you are using a model that only calls m6A in a DRACH motif, so perhaps modkit is showing Nmod=0 for A bases not in that context?

@baibhav-bioinfo
Copy link
Author

Okay @malton-ont thanks for the response.
I will see the docs for more information and post in nanopore community for more answers.
Thanks for your insight Regarding the Nmod=0 rows, it makes sense.

@baibhav-bioinfo
Copy link
Author

hello,
I was able to run the Dorado basecall + methylation call with my DRS reads. I ran the command both in m6A_DRACH context and m6A_all context, using the following commands

dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A_DRACH@v1/ --device cuda:all > sample.bam

dorado basecaller [email protected] pod5/ --modified-bases-models [email protected]_m6A@v1/ --device cuda:all > sample.bam

for the DRACH motif context i got 60,000 sites per sample (10 million reads with ~1000nt per read)
but for all context run, i got >1.6 million m6A sites predicted.

Did something wrong happened in my ALL context run?
Is that number of sites possible?

Please suggest.

@malton-ont
Copy link
Collaborator

Hi @baibhav-bioinfo,

By default dorado outputs predictions in all-context for any site with a >5% chance of being a modification. You can adjust this using the --modified-bases-threshold parameter if you prefer to be more conservative, or I believe modkit has its own filtering functionality which you can ask about on their github page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants