-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need unit tests for calc...() popgen functions #474
Comments
This could take the form of a VCF file and an expected value. My SLiM test could simply load the VCF and check for a match (within reasonable numerical tolerance) to the expected value. |
My recommendation is to not do expected values from theory (if that's what you meant); instead compare to the value calculated independently - either by other software or by a separate, first-principles implementation. I don't want to take this on right now though - maybe a good student project? |
OK. Why not expected values from theory? |
Because that is so much more complicated - you have to worry about statistical power; how close is "close enough"; etcetera. That sort of thing is good for validation, but not so good for unit tests (for one thing you end up having to run a lot of simulatiosn to make sure). What we do in tskit, for instance, is usually just pull up the definition of the thing, then code up some real simple implementation that doesn't worry about efficiency; and compare to that. msprime does have a whole |
Aha, I see. Yes, there are certainly problems with doing statistical tests for validation. SLiM already does tons of them, though. But if a precise comparison to the "right answer" is possible, that's certainly better! |
The need for unit tests for SLiM's popgen functions has been underlined by another discovery of a bug with them (https://groups.google.com/g/slim-discuss/c/Yacfk9EIYeU/m/bc72wVUzBAAJ). I'm not sure how to test them, though. I suppose a test could construct a population with known mutations, placed into the genomes at known positions/frequencies, and then test that the value calculated by the function matches the expected value calculated independently from first principles or by other software. If someone can supply me with a test scenario and an expected value, I can construct a corresponding SLiM test, but I don't have the knowledge necessary to come up with appropriate scenarios and expected values. These test scenarios wouldn't need to be large/complex; even a test with a genome of say, ten base positions long with, say, five mutations present and four diploid individuals (eight genomes) would be quite sufficient to test that the math and logic are correct, I would think. It would be good to have such tests for all of the calc...() functions. Perhaps @npb596 or @petrelharp or @philippmesser could help me with this?
The text was updated successfully, but these errors were encountered: