Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training an R10.4.1 methylation model #1064

Closed
hasindu2008 opened this issue Feb 27, 2023 · 4 comments
Closed

Training an R10.4.1 methylation model #1064

hasindu2008 opened this issue Feb 27, 2023 · 4 comments

Comments

@hasindu2008
Copy link
Contributor

This is closely related to #1059. As that issue was for the nucleotide models, I am opening another issue to discuss about the cpg models.

to train a cpg methylation model, I have done what you suggested at #825 some time ago. I made a hybrid reference genome - one with all CGs changed into MG, and the other with the CGs left unchanged. Then reads were mapped to the corresponding reference and the BAMs were merged together.

Now the doubt is about what I should be using as an input cpg model. I created an input model like below:

#model_name r10_450bps.cpg.9mer.template.model
#kit    r10_450bps
#strand template
#k      9
#alphabet       cpg
kmer    level_mean      level_stdv      sd_mean sd_stdv weight
AAAAAAAAA       54.204906       3.0     1       1       1       1
AAAAAAAAC       58.590797       3.0     1       1       1       1
AAAAAAAAG       55.952068       3.0     1       1       1       1
AAAAAAAAM       58.590797       3.0     1       1       1       1
AAAAAAAAT       58.433520       3.0     1       1       1       1
AAAAAAACA       63.659959       3.0     1       1       1       1
AAAAAAACC       65.805750       3.0     1       1       1       1
AAAAAAACG       64.072506       3.0     1       1       1       1
AAAAAAACM       65.805750       3.0     1       1       1       1
...

Here what I did was for aby methylated k-mers, I put the same values as for the corresponding nucleotide k-mers from the ONT's base model. Then I did the following command:

nanopolish  train -r positive_and_negative_pass.fastq -g hg38noAlt_hybrid.fa -b merged.bam  -t 40 --train-kmers=all --input-model-filename r10.4.1_400bps.cpg.9mer.model -d ../meth/

Unfortunately, none of the k-mers was trained. Note that now the nucleotide training works after the modification you made in [#1059], but still not for cpg. Is there a problem with the approach I am following or wonder if there is something else in the code that prevents the training?

@jts
Copy link
Owner

jts commented Feb 27, 2023

There's a few different reasons this could fail. Were the nucleotide (ACGT only) k-mers trained, or nothing at all trained? Were any status messages printed to the terminal?

@hasindu2008
Copy link
Contributor Author

Oooohhh. I added fprintfs here andthere and finally found the potential problem - as the base model is using the inbuilt old r10 model. I manually replaced the .inl model and seems like itis now actually doing the training. Will keep you posted when the training is done.

@jts
Copy link
Owner

jts commented Mar 1, 2023

Ah yes, good find!

@hasindu2008
Copy link
Contributor Author

well, will take a while it seems and the server is crying :D
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants