-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early stopping #2125
Comments
Stopping happens for too many reasons. |
Is a developer does not believe in a test, why then submit it to Fishtest? I can imagine that a developer sometimes changes their mind during the running of a test but this should be quite rare. Besides the stopping of tests is strongly correlated with the LLR value. So the latter is the main driver. |
Rather than stopping early the developers should be able to indicate the beta value they are happy with. For example if a developer routinely stops a test at LLR=-1.0 this means they are happy with beta=0.37. So then they should submit tests with alpha=0.05, beta=0.37. That would make the trade off clear... |
well, the way I look at it is that most users submit tests that are related at once, the idea of the patch is the same but the parameters get tweaked in different directions, there is a relation between the tests one submits when they touch the same code, |
Note that users are limited to 6 active tests per user, this also leads to many early stopping to not get less TP, do you think such a limitation is a bad thing? |
I think we should remain focused. The facts are that many tests get stopped at We had this discussion about beta before but then Vondele noted that instead of changing beta one may equivalently change the Elo bounds. This is mathematically true, but now I realize that there is a psychological difference. With an With an asymmetric SPRT (e.g. |
BTW I think the idea of picking a "promising" patch among related patches depending on how the LLR evolves is strongly flawed. The LLR is a much too noisy metric for that. |
If LOS is too noisy and LLR is too noisy, maybe we can fix one, or introduce a one that is not noisy? |
As I see it, noisy optimization is intrinsically difficult. |
I complained about this practice on Discord a couple days ago, and was greeted with mostly silence, and one comment that running tests to -2.94 LLR is too dogmatic (and implicitly corroborated by pere's first comment here). I'm of the opinion that stopping early too frequently is actually a more effective waste of computation than letting them run to the bounds. I agree that should devs really intend to have a higher false negative rate, then they should indeed use an alternate beta before the test starts (rather than spamming the tests page with inconclusive noise as present). Choosing the stop rule during the test is extra bias over choosing the stop rule before the test and sticking to it |
Lots of tests are stopped early these days, even at
LLR=-0.5
. One should be aware that this drastically alters the error probabilities of the SPRT compared to their design values of0.05
.The text was updated successfully, but these errors were encountered: