Should I train a new model ? #19

Guo-Weihua · 2020-06-19T05:43:19Z

Dear Collin,
I want to run 2020puls using my own pan-cancer data without silent mutations(total mutation num >130, 000) to predict oncogene and TSG of Pan-cancer and type specific cancer. Should I train a new model using my data with –config drop_silent=”yes” followed by running predict or just run pretrained_predict using your pre-trained 20/20+ classifiers with the same config above?
Thanks.

ctokheim · 2024-01-07T23:14:10Z

Ideally one would train an entire new model where silent mutations were not included to then apply it on additional data where they also weren't included. In general, scores will skew higher when no silent mutations are included in your data when scored used a model that was trained on data that contained silent mutations. However, as you noticed by the option, a reasonable workaround is to adjust what is considered a significant score by accounting for the fact that silent mutations are not included in the monte carlo simulations. This should help reduce potential biases, but ideally you should check the p-values and see if there are artificially large number of significant results for your data. If that is the case, then you may need to train a new model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should I train a new model ? #19

Should I train a new model ? #19

Guo-Weihua commented Jun 19, 2020

ctokheim commented Jan 7, 2024

Should I train a new model ? #19

Should I train a new model ? #19

Comments

Guo-Weihua commented Jun 19, 2020

ctokheim commented Jan 7, 2024