Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I train a new model ? #19

Open
Guo-Weihua opened this issue Jun 19, 2020 · 1 comment
Open

Should I train a new model ? #19

Guo-Weihua opened this issue Jun 19, 2020 · 1 comment

Comments

@Guo-Weihua
Copy link

Dear Collin,
I want to run 2020puls using my own pan-cancer data without silent mutations(total mutation num >130, 000) to predict oncogene and TSG of Pan-cancer and type specific cancer. Should I train a new model using my data with –config drop_silent=”yes” followed by running predict or just run pretrained_predict using your pre-trained 20/20+ classifiers with the same config above?
Thanks.

@ctokheim
Copy link
Collaborator

ctokheim commented Jan 7, 2024

Ideally one would train an entire new model where silent mutations were not included to then apply it on additional data where they also weren't included. In general, scores will skew higher when no silent mutations are included in your data when scored used a model that was trained on data that contained silent mutations. However, as you noticed by the option, a reasonable workaround is to adjust what is considered a significant score by accounting for the fact that silent mutations are not included in the monte carlo simulations. This should help reduce potential biases, but ideally you should check the p-values and see if there are artificially large number of significant results for your data. If that is the case, then you may need to train a new model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants