You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Last week, I tried benchmarking 3.7 Sonnet Thinking (before you evaluated it), I quickly ran out of credit, and while matharena recognized the error it wanted to continue running the benchmark. IMO, when such an error occurs, it should either exit the test while mentioning the existing results will be saved, and that the user can resume test with the skip-existing flag.
Or you could also pause it and ask user to type Yes to resume, so the user can add credits and then resume.
The text was updated successfully, but these errors were encountered:
HI! Thanks for the feedback. The default behavior of our querier is that it will retry every minute if some error occurs (APIs tend to give quite a few errors if you got disconnected for a moment or so). After 50 errors for a sample, it will give up for this sample and continue to the next, giving an empty output of the model for the sample.
If you run the same command afterwards with the flag --skip-existing it will automatically rerun all samples that were not stored, including those samples for which we find an empty output (so those that got to 50 errors).
If I'm not mistaken, this is the behavior you are asking for (or at least partially), but this is not documented in the README (as it should be). Is this correct? The only thing this does not capture is your question for MathArena to recognise the error, but this is quite error prone (every API has different messages to indicate this, might change them, ...) and I don't think it's a big issue under the condition that we document the --skip-existing flag properly.
Last week, I tried benchmarking 3.7 Sonnet Thinking (before you evaluated it), I quickly ran out of credit, and while matharena recognized the error it wanted to continue running the benchmark. IMO, when such an error occurs, it should either exit the test while mentioning the existing results will be saved, and that the user can resume test with the skip-existing flag.
Or you could also pause it and ask user to type Yes to resume, so the user can add credits and then resume.
The text was updated successfully, but these errors were encountered: