-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ridiculous benchmark #15
Conversation
…e out for one sut, and to sketch an architectural direction.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
1885696
to
8325b73
Compare
recheck |
2 similar comments
recheck |
recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poetry should be able to make it so you don't have to edit PYTHONPATH
. It might just be that you have to have a directory named coffee
that includes a __init__.py
? It also may require prefixing modules with coffee.
when importing them.
Another symptom of something not set up right is when I run poetry install
I get The current project could not be installed: No file/folder found for package coffee
.
src/run.py
Outdated
self.prefix = prefix | ||
|
||
def runspecs(self) -> List[str]: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply there should never be an actual HelmTest
object, only derived classes of it. In that situation I recommend Python's Abstract Base Class (ABC) library. That way you can mark this method as @abstractmethod
and the interpreter will enforce all derived classes override it.
Not something you have to fix here, but a pattern I'd like to use in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, happy to do.
|
||
def _make_output_dir(self): | ||
o = pathlib.Path.cwd() | ||
if o.name in ['src', 'test']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels very magic and potentially error prone. Can we pass output_dir into run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic needs to happen somewhere, as some ways of invoking this (as from my IDE) are inclined to run it in the src
or test
directories. Which then puts a bunch of output where it shouldn't be.
We could move the magic out and pass the directory in, but the theory here is that the CliHelmRunner is just going to do some stuff and give you a run result from which you can get scores, so requiring a working directory be passed in violates encapsulation and increases burden on the object's user, something we should generally seek to reduce.
Is there some plausible near-term use case you have in mind where this would cause a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An advantage to making it passed in is any pytests we write can use a temp directory to ensure multiple runs don't conflict and data generated during the test gets cleaned up.
Is there some plausible near-term use case you have in mind where this would cause a problem?
If there is a subdirectory of src and you run this from there, you are back to getting output in unexpected places. If we move src
to be in a coffee
directory this logic also breaks.
requiring a working directory be passed in violates encapsulation
HelmResult has to know about output_directory
, so IMO it isn't encapsulated. I think there is also a reason for a user to want to be able to find all these results after-the-fact, so asking them where to put it feels reasonable.
src/run.py
Outdated
return command | ||
|
||
|
||
class Benchmark: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd been thinking about a Benchmark as doing two things:
- Define the list of Tests it needs to run
- Define how to aggregate the Results from those Tests to produce a single Score.
In that context, its strange to me to have a Benchmark specify a SUT. None of its functionality seems to use the SUT. Is this instead something like a BenchmarkScorer
, responsible for doing the aggregation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could split this into intent and result objects. The result object definitely needs a SUT, because it has scores on the SUT. You're correct that the intent doesn't. Eventually we'll probably have quite a rich result object, as well as a complicated intent. But for now they were simple enough that the code seemed fine with them together.
Great points, next is setting up CI so that we can make sure it works for everybody, not just on one person's machine. |
Good here. This is very preliminary work that we'll be ironing out as we learn more. |
Just enough to make a number come out for one SUT and one Test, and to sketch an architectural direction. Some comments on open concerns in the code.
In theory, you should be able to do
poetry install
to set up the environment andPYTHONPATH=src pytest
to run the tests. If that works, you can run the benchmark (including a full HELM run) for GPT2 and BBQ like this: