Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

The prediction results showed a low recall #50

Open
weir12 opened this issue Aug 29, 2019 · 11 comments
Open

The prediction results showed a low recall #50

weir12 opened this issue Aug 29, 2019 · 11 comments

Comments

@weir12
Copy link

weir12 commented Aug 29, 2019

Hi:
As I said in the previous issues,I'm trying to use taiyaki to train my modified RNA model.But I found the prediction results showed a low recall.I'll go through the process of getting results.
1.Follow the instructions to train a modified base model(About 140k reads covering 2000 transcriptome modification sites,Approximately 1.5 modified bases per read).
2.Using model to basecalling the training set itself again.
basecall.py --device 0 --modified_base_output basecalls.hdf5 ${trainning_set_reads} training2/model_final.checkpoint > basecalls.fa
3.Basecall. fa was mapping to transcriptome by minimap2 to get bam file(minimap2 -t 8 -ax splice -uf -k14 ${transcriptpme} ${workspace}basecalls_reversed.fa > ${workspace}basecalls.sam).The threshold value was set to obtain the modified base coordinates on reads(Handling basecalls.hdf5 files).
4.Converting the coordinates of modified bases with read as the origin to the coordinates with transcriptome as the origin(Process cigar in the bam file using r.get_reference_positions of pysam).
5.Calculating how many of the 2000 transcriptome modification sites are covered by modified bases.I found that only about 6% positions were recalled.I don't know what the problem is, maybe my test method is not suitable, or the training problem.

training_loss
location_sig_ref
chunklog
Unfortunately, this downstream analysis software does not support RNA data.So, does downstream analysis of taiyaki currently require users to do it themselves?
https://github.com/nanoporetech/megalodon

@weir12
Copy link
Author

weir12 commented Sep 5, 2019

supplementary question:
Does the decline of the loss curve indicate that my training is effective and correct?

@weir12
Copy link
Author

weir12 commented Sep 6, 2019

Hi:
I found that there was a big difference between the original sequence(Generated during Gridion sequencing) and the sequence generated by the modified base model.
here is an example.

'read_id','original_seq','predict_seq','original_identity','identity','original_align_trans','original_align_start','original_align_end','predict_align_trans','predict_align_start','predict_align_end','predict_identity'

5ec71e76-bd51-457d-b1f1-ad551ddae980 ACUACGAGGGUACCCGGGUAAUGCCGCAGGAAUUCUUUGGCCGCAGAACUUCCUUUCAAAGGCUAUUGCUACGGCGGAAAUUAGCUAAUGGUUACAGAAAGAAGCCCAUCCCCUCGGCCCGAAUAGUUUAUCGAAAUCGAGAAGGGCAGUGGAAAUUUCGACGGUCCUGACCGCAUCUCCGACUCCAUCCUUCCCCUGCACUCCUAACAGCAGCAUGUCCAAGAGCCCAUACCUGUCACUUUAGCCACUGGAGCUUCCACCGUCCCUGGCCUCUAAACCAUUGUAUCAUGGCUUCGCGACAAUUAAUGUGCCAAUACAUUCUUACGAGAGUUGGCCUUCAUUCUACCGAGUUUGAUAUGUACUUAUGCUGUUAGUGGUAGUUUAAAGCAUACCACCAAGGUACUGGCUUUGAUUUGUAUUAACCCUAGCAAGCCAUCUUAGCCAGGCUAUUGUGGAAAAGUCAACACCAUGAAGUUUGGAUGAUUGUUGCUGGAUCUUUUUCGGCUAAAUACAAGCUCCACGAAAUAGUCCGACGUCAGCCAAAUUCUUUGGAGCUAUCAAGGCUGCAAUGGCCUACUUCGCCUUAAUAAAACACCGACCAAUAUUUGAAAUCCCCCGUUGAAGCUGAAGUCAGGAUCAUUCCACUAUGCAAAACGUUGUUCCAUGAUCGACGCAGCACAAUUGGUAAUAGACGAGUCAAGGAUAUGUAAGUGGUAAAACUACGGCUGUGAUAUUCGCAUCGUGGACUUUGAGAAUCCCAAAGAAAAUGAGAAGAACCUUCUGCUCAUGGCGGUAAAGCCUAUGCAUGGCUAAGGCUGAUGACGAAAACUGAUGGUCAGUUGUUAUUUUGCCUCAGAAUGGUUGGUCACUAAUGCAGAAUGGAGCAACCUCGGCGGCUGUUAAAUCGUUUGCGCAUUGAUUCAUAUCACCACACUAUGGGUGACAUGUGGUGGCGAACUGUUAUGGUUUUUGUCCCCAGAGUUCUAAACAAGUCUAAUCGAAGAUAUGUUACGAAUCGUCUUUGGCAAAAAGAUUUUCGAGUGUCGAUUAAGCAGAUUCUCGUGGUCCAUUACUCCUUACUUUUUCCCAAGGUAUUGCGUCGCACUCGCGAUAACAUAAACCCUGAGAACAUUUUGAUCUCUUGGGUAUGGUGUUUUUAGAAUUCAUGGGGAUUUUUGGCGCUAUGGAUGUUUGUGCAAUCCCAGCGAUGAUAACUGUUGUGGAAUCAAAGAGUUAUGGGUAUUUUGGUGACCCCCCUCGGCUCCGUGAAAUUCGACUUCCAGGCUUUACAAUCCUUUUAUUGAUAUGACACACCUUAAUUAUCGAAUUCUACAUUGUCAUAUUUUUCGCCACGUUUCCCUUCCGCUAAAGCUUUUUCACGAGAACACCAGUUGGUACGCAAAAUUGUAAAGCUUGGAGAGAGUUUACAAUCUCAUUUGUGCACGUCCAAAAUAUUCGUAUUAAGAUCGCCAUUUUAAACCUGAUCGCAAUGACCUUUUUCCGAAUGCAGCAGGUUGUCUUGUCAUGGCAAAUCCCCAUUGCCAAACAAAGAAUUACUGCCCCCAAGAGAAGCUUUAAUCUCCCUUUGUCCAAGAAAUUGGCGCUGCUUUGUCGAUGAUUGGUUUAUGAUAUCCAGAGUGCCUUGAAACGAGCCAAAUCCUCCCGCUUCAGAGGACUAUCUACGCCACAAUUCAUGUUGAGAUUUUCGUAAUGGCUUGCUUCUUUGUUUUGAAGACGAUUAGUGUUUCAGAUUUUUUUUUAAUUAGCCAUUUUCUUGGUUGUUUGAAUUCAUCAACUCCGCUGCGCAUACUCCUCUCUUUUACCUUAUACUAUUAAUUUUACGACCGAAUUUCGAAAUGAUGUGUGUGUUUUGGCUGCCCCUUCGAAUGGUUACACUUCGACCCGAAGCUAUUUUAUUUGGAACAUGCAAGGCCCGAAAAAAUAAAAACGUCCAAGUUGGACUGACUUGCAAAUUAUAUCUACAAAAAGAAGAUGCCGCAUUGUUUUUUGUAUUCGAAUUUAGUUGAUGUAAUUUUUUAAUUUUUUAAAAUUCCAUCUUCCAUAAUUGCUACUUAUGCUUAAAUUUGGGACAAGCUACGUAUUGUAACGGCAUAGUGAGGAAAAUGGCCUUGUUUAAUAAUUUUUUCAGAGCAUUUUUUAUCAUAUUUGUAAUAAAGUCAGUAACGGCUUCGACUAAUCUACUGUUUGCGCUUACCUUAAAUAU      ACCAAGTACCCGCGGTAATGCCAAGGATTTTTGGCCGCAGAACTCACCTTCAAGGCTGCTACTCGGCGGAAATTAATAATGGTTACAGAAAGAAGCCCACTCCCTTGCCCGAATAGTTTATTTGGATCGAGAAGGCAGTAATGGAAGCTATTCGACGGTCCTGACCGCATCCGACAACCATCCTCCCTACTGGCACTCCCTCATTGCTGCATGTCCAAGAGCCCATTTCGTCATTTAGCCACTGAGCTTCCTACCGTCCTGGCCTCTAAACTCATTAGTATCATCGTGTTCGCGACAATTATGCCAATACATTCTTACGAGAGTTGGCCTCATTCACTACCGAGTTTGATATGTTTACCTATGCTGTTAGTGGTAGTTTAAGCTCTCCTCCAAGGTACTGGCTTTGATTGTATTAACCCTGCAAGCCATTTAGCCAGCTATTGGAAAGTCATCCATGAAGTTTAGATGATAATGTTGGATTTTTCGGCTAATACAAGCTCCAAATAGTCCGACGTCAGCCAAATATTCTTGAGGCATCCAAGGCTGCAATGGCCCCTACTACTTCGCTCCTATAAAACACCGACCAATATTCTGAAATCCCCGCAAGCTGAAGTCAGGATTTATTCCACCATATGCAAAACGCGTTGTTCTCGATTGAGCGCTGAATACAAATTGTTGACGAGTCAAGGATATGACTGGTGGTGCTACGCTGTGATTCGCTCGTGGCTCTTCGTGAGAATCCCAAAGAAAATGAGAAGAACCTTCGCTTTTTGCAGTAAAGCCTATCGTGTTAAGGCTGACGACGAAAACTGATGGTCAGTGCTGCTATTATTTGCCTCAGAATGGTTGGTTCAATGCAGAATGGAGCATCCTAATGTTGTTAAATCGTATGATTTGTGCATTGATTCTCATATTCCTCTCTATAGTGACACGTGGTGCGCTGTTATGGTTCGTCCCAGAGGATCTGCTAAGTCTATCGAAGATATGACACGATCGTCTGGCAAAAGATTTTAGTGTCGATTAAGCAGATTCTTCGTGGTCCTCAACTCCATTCCCAAGGTATTGTGCGCATTCGCGATATTATGACCTGAGAACATTTTGATCTTGAGTATGGTGTTTTTAAGAATTATGGATTTTTGGCGCATGTGATGTTGTGCAATCCCGGTGACGATATTATGTCTGTGGAATCAAAGAGTATGGGTATTTGGTAGTGACCCCTCGAATACCTGAGCTTTGACACCAGGCTTTACAATGCCTTTTTGCTGATATGTGGTCTACCGCTTTATGGTTACATTGTCTATATTTCGCACATACCTTTCCGCAAAGCTTTCAGAACGACCAGCTTGTACGCTAATTGTAAAGCTTGGAGAGTCAATCTCATTTGCGATGTCCAAAATATTTCGTATATCAAAGATTTGCCATATTTTAAACCAGTCAATGACCTTTCCGATGCTATGGGTTGTTTTTCTGTTCTGGCAAATCCCCATACTGCCGAACAAAGAATTACTGCCCAAGAAGCTTTAATCTTCCCTTTGTCCAAGAAATTGAATGCTGCTTCAATTGATGATTGGTCATGCACTATGATGCTCCAGAGGTGCCTTGAAAACGAAGCAAATCCTCCCTTCGCTTCAGAAGTTATCTACGCCACAATACATGTTGAGATTTTCGTAATGGCTTGCTCTTGTTTGAAGACGATTAGTGTCACGAGATTTTTAATAGCCATTTCTTGGTTGTTTGATTATCATCAATGCACGTTGTGTGCATACTCCTTCTTTACTCTCCTTATACTAACTATTTCATGTACGATCTTCGAAATGATGTGTGTTTTGGCTGCCCTCTGCTGGGTTCTCACTTTTCGACCTAAGCTATTTATTTGGAACCGCAAGGCCAAAAATAAAACGTCCAAAGTTGACTGACTTCGCAAATTATACAAAAAGAAGATGCCGCATTGTTTTGTATTTGCTTAGTTGATGTAATTTTTAATTCTTGAATTCCATTTCCATGCTGCATTTATGCTTAGCTACTGATGCTACGTAATATATGGGCCAACAGTGAGGAAAATGGCCTTGTTTAATAATTTCAGAGCATTTCTTTATCATATTTTTGTAATAAAAGTCAGTAACGGCTTCGACTGCTTACTGTTAAGTATACCTTGATCCTCCCTGCCCCATCATTCGCTTCATTATCTATCCATCCCTCATCATCACCAAATTCCA      84.92208982584785       59.14414414414414       NM_001023527.2       2894    5076    ('NM_001023527.2',)     2864    5062    81.74807197943444

In my opinion,because the sequence composition changes, alignment information changes,.So it is hard to predict the known positive site of input successfully.

@callumparr
Copy link

loss in mapped squiggle for RNA reads is also compounded by polyA absent from guppy called fastq.

supplementary question:
Does the decline of the loss curve indicate that my training is effective and correct?

I guess the loss curve is from the second round of training? This looks very similar to mine. I see greater data loss during the first round and produces a curve very similar to the one produced with the walkthrough_modbase exercise.

@weir12
Copy link
Author

weir12 commented Sep 10, 2019

loss in mapped squiggle for RNA reads is also compounded by polyA absent from guppy called fastq.

So you have encountered a similar situation?My friend:)

I guess the loss curve is from the second round of training? This looks very similar to mine. I see greater data loss during the first round and produces a curve very similar to the one produced with the walkthrough_modbase exercise.

Yep,this is from second round.

@myrtlecat
Copy link
Contributor

Sorry for the slow response! I hope I can help you to diagnose the problem.

Yep, this is from second round.

Ok, good. If this was the first round of training I would say something looks wrong, but for refining a model this looks fine. The plots showing the mapped reads also look ok. Let's assume the training is ok for now and check the basecalling.

I found that there was a big difference between the original sequence...

The example you gave has an accuracy of about ~77% which is... not great. One possibility is that the default chunk size used by basecall.py is too short for RNA.

Could you try basecall.py --chunk_size 12000 ... and see if that makes a difference? Perhaps you could post the new basecall for that same read for comparison?

@weir12
Copy link
Author

weir12 commented Sep 11, 2019

Thanks a lot!!!I will try to alter the parameter and evaluate the results.Keep in touch.

@weir12
Copy link
Author

weir12 commented Sep 12, 2019

Update:
After I update the parameters for basecall.py, this is a new statistic result for the same UUID:

5ec71e76-bd51-457d-b1f1-ad551ddae980    ACUACGAGGGUACCCGGGUAAUGCCGCAGGAAUUCUUUGGCCGCAGAACUUCCUUUCAAAGGCUAUUGCUACGGCGGAAAUUAGCUAAUGGUUACAGAAAGAAGCCCAUCCCCUCGGCCCGAAUAGUUUAUCGAAAUCGAGAAGGGCAGUGGAAAUUUCGACGGUCCUGACCGCAUCUCCGACUCCAUCCUUCCCCUGCACUCCUAACAGCAGCAUGUCCAAGAGCCCAUACCUGUCACUUUAGCCACUGGAGCUUCCACCGUCCCUGGCCUCUAAACCAUUGUAUCAUGGCUUCGCGACAAUUAAUGUGCCAAUACAUUCUUACGAGAGUUGGCCUUCAUUCUACCGAGUUUGAUAUGUACUUAUGCUGUUAGUGGUAGUUUAAAGCAUACCACCAAGGUACUGGCUUUGAUUUGUAUUAACCCUAGCAAGCCAUCUUAGCCAGGCUAUUGUGGAAAAGUCAACACCAUGAAGUUUGGAUGAUUGUUGCUGGAUCUUUUUCGGCUAAAUACAAGCUCCACGAAAUAGUCCGACGUCAGCCAAAUUCUUUGGAGCUAUCAAGGCUGCAAUGGCCUACUUCGCCUUAAUAAAACACCGACCAAUAUUUGAAAUCCCCCGUUGAAGCUGAAGUCAGGAUCAUUCCACUAUGCAAAACGUUGUUCCAUGAUCGACGCAGCACAAUUGGUAAUAGACGAGUCAAGGAUAUGUAAGUGGUAAAACUACGGCUGUGAUAUUCGCAUCGUGGACUUUGAGAAUCCCAAAGAAAAUGAGAAGAACCUUCUGCUCAUGGCGGUAAAGCCUAUGCAUGGCUAAGGCUGAUGACGAAAACUGAUGGUCAGUUGUUAUUUUGCCUCAGAAUGGUUGGUCACUAAUGCAGAAUGGAGCAACCUCGGCGGCUGUUAAAUCGUUUGCGCAUUGAUUCAUAUCACCACACUAUGGGUGACAUGUGGUGGCGAACUGUUAUGGUUUUUGUCCCCAGAGUUCUAAACAAGUCUAAUCGAAGAUAUGUUACGAAUCGUCUUUGGCAAAAAGAUUUUCGAGUGUCGAUUAAGCAGAUUCUCGUGGUCCAUUACUCCUUACUUUUUCCCAAGGUAUUGCGUCGCACUCGCGAUAACAUAAACCCUGAGAACAUUUUGAUCUCUUGGGUAUGGUGUUUUUAGAAUUCAUGGGGAUUUUUGGCGCUAUGGAUGUUUGUGCAAUCCCAGCGAUGAUAACUGUUGUGGAAUCAAAGAGUUAUGGGUAUUUUGGUGACCCCCCUCGGCUCCGUGAAAUUCGACUUCCAGGCUUUACAAUCCUUUUAUUGAUAUGACACACCUUAAUUAUCGAAUUCUACAUUGUCAUAUUUUUCGCCACGUUUCCCUUCCGCUAAAGCUUUUUCACGAGAACACCAGUUGGUACGCAAAAUUGUAAAGCUUGGAGAGAGUUUACAAUCUCAUUUGUGCACGUCCAAAAUAUUCGUAUUAAGAUCGCCAUUUUAAACCUGAUCGCAAUGACCUUUUUCCGAAUGCAGCAGGUUGUCUUGUCAUGGCAAAUCCCCAUUGCCAAACAAAGAAUUACUGCCCCCAAGAGAAGCUUUAAUCUCCCUUUGUCCAAGAAAUUGGCGCUGCUUUGUCGAUGAUUGGUUUAUGAUAUCCAGAGUGCCUUGAAACGAGCCAAAUCCUCCCGCUUCAGAGGACUAUCUACGCCACAAUUCAUGUUGAGAUUUUCGUAAUGGCUUGCUUCUUUGUUUUGAAGACGAUUAGUGUUUCAGAUUUUUUUUUAAUUAGCCAUUUUCUUGGUUGUUUGAAUUCAUCAACUCCGCUGCGCAUACUCCUCUCUUUUACCUUAUACUAUUAAUUUUACGACCGAAUUUCGAAAUGAUGUGUGUGUUUUGGCUGCCCCUUCGAAUGGUUACACUUCGACCCGAAGCUAUUUUAUUUGGAACAUGCAAGGCCCGAAAAAAUAAAAACGUCCAAGUUGGACUGACUUGCAAAUUAUAUCUACAAAAAGAAGAUGCCGCAUUGUUUUUUGUAUUCGAAUUUAGUUGAUGUAAUUUUUUAAUUUUUUAAAAUUCCAUCUUCCAUAAUUGCUACUUAUGCUUAAAUUUGGGACAAGCUACGUAUUGUAACGGCAUAGUGAGGAAAAUGGCCUUGUUUAAUAAUUUUUUCAGAGCAUUUUUUAUCAUAUUUGUAAUAAAGUCAGUAACGGCUUCGACUAAUCUACUGUUUGCGCUUACCUUAAAUAU ACCAAGTCTCCATGGTAATGCCAAGAATTTTCTTGATCGCAGAACTTCCTTCAAGGCTAATTACATGGCGGAAATTAATCAATGGTTACAGAAAGAAGCCCTCTCCCTTGCCCGAATAGTTTATTGCTGAGAAGGCGGTAATGGAAGCTATTCGACGGTCCATTGACCGCATCTCCGACTCCTCCCTGCACTCCTACTGCATGTCCAAGAGCCCATTCTGTCATTTAGCCACTACTGGAGCTTCCTACCGTCCTGGCCTTCTAAACATTAGTATCGTGTTCGCGACAATTATGCCAATCATCACACGAGAGTTGGCCATTCACTACCGAGTTTGATATGTTTACCATGCTGTTAGTGGTAGTTTAAGCTCTCCTCCAAGGTACTGGCTTTTGATTGTATTAACCCTGCAAATCCATTTAGCCAGGCTATTGGAAAGTCATCCATGAAGTTTGGATGATAATGTTGGATTTCGGCTAATACAAGCTCCAAATAGTCCGACGTCAGCCAAATATTTTGAGGCATCCAAGGCTGCAATGGCCCTACTACTTATCCTATAAAACACCGACCAATATTTCTGAATCCCCGCAGCTGAAGTCAGGATTTATTCCATATGCAAAACGCGTTGTTCCTCGATTGAGCGTAGAATCAATTGGTTGATGAGACCAAGGATATGTAGTGGTGCTACGGCTGTGATACGTTTGGACTCTTGAGAATCCCAAAGAATGAGAAGAACCTTCGCTTTGCGGTAAAGCCTATCGTCGTTAAGGCTGATGACGAAACTGATGGTCAGTATTGCTAAACTTGCCTCCAGAATGGTTGGTTCAATGCAGAATGGAGCATCCTAATGTTGTTAAATCGTTGATTTGTGCATTGATTCTCATTATCTTCCTCTCTATAGTGACACGTGGTGCGCTGTTATGGTTCGTCCCAGAGGGATCTGCTAAGTCTAATCGAAGATATGACACGATCGTCTTGGCAAAAAGATTTCGAGTGTCGATTAAGCAGATTTCGTGGTCTCATTACTCCACATTCCCAAGGTATTGTGCGCATTCGCGATATTAAACCTGAGAACATTTTGATTTAAGTATGGTGTTTTAGAATTACTGATTCTTGGCGCTGTGATGTTTTGTGCAATCCCGGTGACGATATTACTGTTGTGGAATCAAAGAGTATGGGTATTTGTAGTGACCCCTTCGGCTCCTGAATTTGACAAGGCTCTTACAATGCCTTTTTGCTGATATGTGATCTACCGCTATCGATTTTACATTGTCTATATTTCGTCATACCTTTCCGCAAAGCTTTCAGAACGACCAGTTGTACGCAATTGTAAAGCTTGGAGAGTACAATCTCATTTGCGATGTCCAAAATATTCGTATATCAAAGACTTGCCATATTTTAAACCAGTTTATAATGACCTTTCCGATGCTATGCAGCAGTTGTTTCTGTTTGGCAAATCCCATTGCCGAACAAAGAATTACTGCCCAAGAAGCTTTAATCTTCCCTTTGTCCAAGAAATTGAATGCTGTTTTGTCGTCGATTGTACACTATGATGCTCCAGAGGTGCCTTGAATGGGCAAATCCTCCTCGCTTCAGAAGTTATCTACGCCACAATCACTCGTTGAGATTTCGTAATGGCTTGCTTTGTTTGAAGACGATTAGTGTACGAGATTTTTAATAGCCATTTCTTGGTTGTTTGATTTTTATCATCAATGCCACGTTGTGTGCATACTCCTCTCTTCTCTCCTTATACTAACTAATTTCATGTACGATTTTCGAAATGATGTGTGTTTTGTTGCCCTTTGCTGGTTACACTTTCGACCTAAGCTATTTATTTGGAACCAAGGCCAAAAAAATAAAAACGTCCAAAGTTGACTGACTTAATAAATTATATACAAAAAGAAGATGCCGCATTGTTTTGTATTTGCTTAGTTGATGTAATTTTTTAATTCTTTAAATTCCATTTCCATTGCTGCATTTATGCTTAGCTACTAGTCAAGCTACGCAATATATGAAAATGCATAGTGAGGAATGGCCTTGTTTAATAATTTCAGAGCATTTTTATCATATTGTAATAAAAGTTCAGTAACGGCTTCGACTGCTTACTGTTTAGCGA        84.92208982584785       59.30366612866036       NM_001023527.2  2894    5076  ('NM_001023527.2',)      2849    5063    78.09798270893371

Uh,it seems to be getting worse...
On the other hand,I noticed that an conceptual error in my custom script: the order of
cond_logprobs is 3' to 5'.But list returned of get_reference_positions is 5' to 3‘.I fixed it,The recall rate is now 47.7% when threshold value was set at 0.2.

@marcus1487
Copy link
Contributor

Would it be possible to produce a ROC or Precision-recall curve for this threshold? While in theory these values correspond to log probabilities, in practice they often require calibration. See some discussion on this topic in megalodon docs here.

@weir12
Copy link
Author

weir12 commented Sep 16, 2019

@marcus1487 Thank you for your advice!I'll read the megalodon docs later.

@weir12
Copy link
Author

weir12 commented Sep 23, 2019

Update:
I get a curve that looks like this,after the threshold(min_coverage) is strictly controlled(90%)
PRC
Obviously, my model still has a lot of room for improvement.@marcus1487 @myrtlecat ,Could you give me some suggestions for improvement?Thanks!

@marcus1487
Copy link
Contributor

A number of improvements to the training scripts have been implemented in the latest release. In this latest release I have found that a --mod-factor 1 from the beginning of training performs better than the previous recommendations. Additionally, the starting learning rate may need to be lowered a bit for modified base training. I haven't done enough testing on how this interacts with RNA training at this point, so that is more of an open research question at this point. Hopefully this helps in your research here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants