Ridiculous benchmark #15

wpietri · 2023-12-07T00:55:54Z

Just enough to make a number come out for one SUT and one Test, and to sketch an architectural direction. Some comments on open concerns in the code.

In theory, you should be able to do poetry install to set up the environment and PYTHONPATH=src pytest to run the tests. If that works, you can run the benchmark (including a full HELM run) for GPT2 and BBQ like this:

$ python src/run.py 
GPT2 scored 1.5 stars

…e out for one sut, and to sketch an architectural direction.

github-actions · 2023-12-07T00:56:09Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri · 2023-12-07T01:06:47Z

recheck

wpietri · 2023-12-07T01:10:46Z

recheck

wpietri · 2023-12-07T01:13:10Z

recheck

brianwgoldman

Poetry should be able to make it so you don't have to edit PYTHONPATH. It might just be that you have to have a directory named coffee that includes a __init__.py? It also may require prefixing modules with coffee. when importing them.

Another symptom of something not set up right is when I run poetry install I get The current project could not be installed: No file/folder found for package coffee.

src/run.py

brianwgoldman · 2023-12-07T15:48:01Z

src/run.py

+        self.prefix = prefix
+
+    def runspecs(self) -> List[str]:
+        raise NotImplementedError


This seems to imply there should never be an actual HelmTest object, only derived classes of it. In that situation I recommend Python's Abstract Base Class (ABC) library. That way you can mark this method as @abstractmethod and the interpreter will enforce all derived classes override it.

Not something you have to fix here, but a pattern I'd like to use in the future.

Sure, happy to do.

brianwgoldman · 2023-12-07T16:41:32Z

src/run.py

+
+    def _make_output_dir(self):
+        o = pathlib.Path.cwd()
+        if o.name in ['src', 'test']:


This feels very magic and potentially error prone. Can we pass output_dir into run?

The magic needs to happen somewhere, as some ways of invoking this (as from my IDE) are inclined to run it in the src or test directories. Which then puts a bunch of output where it shouldn't be.

We could move the magic out and pass the directory in, but the theory here is that the CliHelmRunner is just going to do some stuff and give you a run result from which you can get scores, so requiring a working directory be passed in violates encapsulation and increases burden on the object's user, something we should generally seek to reduce.

Is there some plausible near-term use case you have in mind where this would cause a problem?

An advantage to making it passed in is any pytests we write can use a temp directory to ensure multiple runs don't conflict and data generated during the test gets cleaned up.

Is there some plausible near-term use case you have in mind where this would cause a problem?

If there is a subdirectory of src and you run this from there, you are back to getting output in unexpected places. If we move src to be in a coffee directory this logic also breaks.

requiring a working directory be passed in violates encapsulation

HelmResult has to know about output_directory, so IMO it isn't encapsulated. I think there is also a reason for a user to want to be able to find all these results after-the-fact, so asking them where to put it feels reasonable.

brianwgoldman · 2023-12-07T16:50:06Z

src/run.py

+        return command
+
+
+class Benchmark:


I'd been thinking about a Benchmark as doing two things:

Define the list of Tests it needs to run

Define how to aggregate the Results from those Tests to produce a single Score.

In that context, its strange to me to have a Benchmark specify a SUT. None of its functionality seems to use the SUT. Is this instead something like a BenchmarkScorer, responsible for doing the aggregation?

We could split this into intent and result objects. The result object definitely needs a SUT, because it has scores on the SUT. You're correct that the intent doesn't. Eventually we'll probably have quite a rich result object, as well as a complicated intent. But for now they were simple enough that the code seemed fine with them together.

tests/test_helm_runner.py

src/run.py

wpietri · 2023-12-07T18:43:19Z

Great points, next is setting up CI so that we can make sure it works for everybody, not just on one person's machine.

update-pyproject

dhosterman · 2023-12-07T19:09:45Z

Good here. This is very preliminary work that we'll be ironing out as we learn more.

A rough initial pass at a benchmark. Just enough to make a number com…

d62c852

…e out for one sut, and to sketch an architectural direction.

wpietri requested a review from a team as a code owner December 7, 2023 00:55

Make tests run no from various directories.

8325b73

wpietri force-pushed the ridiculous-benchmark branch from 1885696 to 8325b73 Compare December 7, 2023 01:05

brianwgoldman approved these changes Dec 7, 2023

View reviewed changes

dhosterman and others added 3 commits December 7, 2023 13:50

update poetry configuration

42f25d6

Shifting to use ABC

c3961e3

Merge pull request #16 from mlcommons/update-pyproject

c878e27

update-pyproject

dhosterman merged commit c4da412 into main Dec 7, 2023
1 check passed

github-actions bot locked and limited conversation to collaborators Dec 7, 2023

wpietri deleted the ridiculous-benchmark branch December 7, 2023 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ridiculous benchmark #15

Ridiculous benchmark #15

wpietri commented Dec 7, 2023

github-actions bot commented Dec 7, 2023 •

edited

Loading

wpietri commented Dec 7, 2023

wpietri commented Dec 7, 2023

wpietri commented Dec 7, 2023

brianwgoldman left a comment

brianwgoldman Dec 7, 2023

wpietri Dec 7, 2023

brianwgoldman Dec 7, 2023

wpietri Dec 7, 2023

brianwgoldman Dec 7, 2023

brianwgoldman Dec 7, 2023

wpietri Dec 7, 2023

wpietri commented Dec 7, 2023

dhosterman commented Dec 7, 2023

Ridiculous benchmark #15

Ridiculous benchmark #15

Conversation

wpietri commented Dec 7, 2023

github-actions bot commented Dec 7, 2023 • edited Loading

wpietri commented Dec 7, 2023

wpietri commented Dec 7, 2023

wpietri commented Dec 7, 2023

brianwgoldman left a comment

Choose a reason for hiding this comment

brianwgoldman Dec 7, 2023

Choose a reason for hiding this comment

wpietri Dec 7, 2023

Choose a reason for hiding this comment

brianwgoldman Dec 7, 2023

Choose a reason for hiding this comment

wpietri Dec 7, 2023

Choose a reason for hiding this comment

brianwgoldman Dec 7, 2023

Choose a reason for hiding this comment

brianwgoldman Dec 7, 2023

Choose a reason for hiding this comment

wpietri Dec 7, 2023

Choose a reason for hiding this comment

wpietri commented Dec 7, 2023

dhosterman commented Dec 7, 2023

github-actions bot commented Dec 7, 2023 •

edited

Loading