Update e2e/lm-eval test infrastructure #1323
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY:
Remove some no longer necessary flags from the
run_tests.sh
script and update e2e/lm-eval test infra to use pytest’s parametrization.Unused flags
A couple of the flags in the test script were added to support reporting efforts in our CI/CD, but they are no longer necessary as they are handled outside the test script.
e2e/lm-eval test parametrization
This change is primarily to improve the reporting for these sets of tests, particularly about the test name that is generated. There are some challenges that the current, non-pytest parametrization approach introduce:
pytest
’s built-in parametrization to help with our test tracking.pytest
puts all parametrization in the test name itself and doesn't do any of this normalization, preserving the original data.For point 2, an example of the before/after naming:
TEST PLAN:
There are two (internal) test runs, one e2e and one lm-eval, showing the new changes working without issue to be communicated internally.