RHEL AI / InstructLab CPT support #117

dbutenhof · 2024-10-01T20:58:25Z

Type of change

Description

Support CPT needs for the RHEL AI / InstructLab performance team.

This PR adds three components:

A "Crucible" service to help project code interpret Crucible CDM data
An "ilab" project providing APIs built on top of the Crucible service to discover InstructLab runs and provide configuration data and metrics.
An "ilab" UI tab building on top of the API to discover and display InstructLab performance runs.

The crucible_svc is intended to be "reasonably general purpose" to support use by additional projects over time.

Related Tickets & Documents

Various Jira stories under Epic PANDA-496.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.

Testing

InstructLab CPT is using a persistent Crucible controller system in RDU3, tied to a 4-way L40S test system. The data store (a private OpenSearch instance) contains a set of Crucible runs capturing both training and SDG runs.

GET localhost:8000/api/v1/ilab/runs?benchmark=ilab will query the ilab.crucible OpenSearch instance and return a list of ilab benchmark runs.

Add Crucible readme file. Cleanups and refactoring

Also added the option to override the default graph title generator using the new `Graph.title` field.

dbutenhof

Initial comments on my own re-re-re-review. 🤣

The backend comments are already fixed, but I'm posting them "for posterity" (whomever Mr Posterity may be); not quite ready to push yet. I haven't done anything about the UI comments ...

frontend/src/actions/ilabActions.js

backend/app/api/v1/endpoints/ilab/ilab.py

backend/app/services/crucible_svc.py

local-compose.sh

dbutenhof · 2024-10-02T19:58:18Z

frontend/src/utils/apiConstants.js

+export const ILABS_JOBS_API_V1 = "/api/v1/ilab/runs";
+export const ILAB_GRAPH_API_V1 = "/api/v1/ilab/runs/";


Why are these identical? Are both used? Could they be combined?

dbutenhof · 2024-10-02T20:01:11Z

frontend/src/components/templates/ILab/ILabGraph.jsx

After loading the page many times, I'm starting to think that we should open the graph accordion with just the primary metric(s) graphed. That is, we shouldn't wait until someone makes a selection from the secondary metric pulldown -- but when they do, we can add the second graph.

frontend/src/actions/ilabActions.js

frontend/src/components/templates/ILab/index.jsx

This cleans up my direct API call to get the run's periods for graphing, to use a separate action and a reducer. I also experimented with trying to improve error diagnosis by looking at some of the error responses to "toast" instead of just saying something went wrong.

Add a Crucible `close` method, and use a FastAPI yield dependency to ensure every API connection is closed cleanly.

mfleader

This is what I have right now. I'm working on ilab.py and crucible_svc.py.

backend/app/main.py

backend/scripts/start-reload.sh

mfleader · 2024-10-02T15:37:47Z

frontend/README.md

@@ -1,5 +1,5 @@

-# Openshift Performance Dashbaord
+# Openshift Performance Dashboard


this can be factored out into a tidying pr

For a one letter swap? That seems ... excessive? 😦

that's why you bundle the tidyings together, but yeah I'll let it slide

The real question is more from the other side -- at which point, finding small incidental fixes and helpful refactorings during a development task, does it become necessary (or even "advisable") to stop "making progress" in order to pull out all those changes and separate them into another PR?

Reformatting an entire codebase (which I'd love to do to cpt-dashboard, by the way, because it's wildly inconsistent and often ugly), for example, is clearly not something to even attempt in the middle of something else.

I backed out the fallback exception handler for you, which I kinda regret as I did some refactoring and had to guess at where a problem was because there's no traceback. (I thought I might try to investigate where FastAPI or some other part of the framework is dropping the ball, but it's awkward to even figure out where to start, and I've not had time.) This stuff could go in a separate PR, sure; but one needs it when writing significant new code, and we don't have a quick pipeline in cpt-dashboard so separating it means it might as well not be there at all. (And possibly never will be.) 🤷🏻

Minor stuff like fixing a one letter typo isn't worth tracking, and putting it off inevitably means it never gets done. It's isolated, it's trivial to review, and it gets done.

And I realized later that I hadn't backed out the fallback handler ... but it somehow stopped working for me. After some investigation I realize that it was because I added the yield "dependency" mechanism to manage the Crucible connection, and my except block there was absconding with my "unhandled" exception. I think I fixed it. Again, while this is technically separate from the "InstructLab/Crucible support", it was an important development tool I can't believe people were living without before (though I still don't understand why the built-in FastAPI unhandled exception handler isn't logging traceback, even with FastAPI(debug=True,...), and some effort spent trying to figure that out was ultimately a complete waste of time... 😦 )

frontend/src/components/atoms/PlotGraph/index.jsx

frontend/src/components/templates/ILab/index.jsx

mfleader · 2024-10-02T16:02:55Z

frontend/src/components/templates/ILab/MetricsDropdown.jsx

+  const onSelect = (_event, value) => {
+    console.log("selected", value);
+    const run = value.split("*");
+    //setSelected(run[1].trim());


is this dead code?

mfleader · 2024-10-02T16:10:37Z

frontend/src/actions/ilabActions.js

+
+export const setSelectedMetrics = (id, metrics) => (dispatch, getState) => {
+  const metrics_selected = cloneDeep(getState().ilab.metrics_selected);
+  // if (id in metrics_selected) {


is this dead code?

mfleader · 2024-10-03T17:33:36Z

frontend/src/actions/ilabActions.js

+    const { start_date, end_date, size, offset } = getState().ilab;
+    const response = await API.get(API_ROUTES.ILABS_JOBS_API_V1, {
+      params: {
+        ...(start_date && { start_date }),


what does this line mean?

Hmm, interesting. ... is the spread operator, which unrolls an array or object. I think start_date && { start_date } should result in ...{ start_date } (which will be spread to start_date: start_date in the surrounding object as an API query parameter) iff start_date is non-null.

But when I try a simplified version of this interactively in the Chrome console, it's unhappy about the paren, so I wonder if this'll actually work:

Uncaught SyntaxError: Unexpected token '('

@MVarshini ??

These lines are still pretty confusing. I think a comment in the code may be worth it.

backend/app/services/crucible_readme.md

frontend/src/actions/toastActions.js

backend/app/main.py

backend/app/services/crucible_svc.py

jaredoconnell

Some comments for the backend.

backend/app/api/v1/endpoints/ilab/ilab.py

backend/app/services/crucible_readme.md

backend/app/services/crucible_svc.py

+ other review feedback

jaredoconnell

I don't see any blockers. I'll leave the final approval up to the developers that maintain this repository.

jaredoconnell · 2024-10-09T21:42:29Z

frontend/src/actions/ilabActions.js

+    const { start_date, end_date, size, offset } = getState().ilab;
+    const response = await API.get(API_ROUTES.ILABS_JOBS_API_V1, {
+      params: {
+        ...(start_date && { start_date }),


These lines are still pretty confusing. I think a comment in the code may be worth it.

frontend/src/actions/ilabActions.js

jaredoconnell · 2024-10-10T18:11:10Z

frontend/src/components/organisms/Pagination/index.jsx

+  const checkAndFetch = (_evt, newPage) => {
+    if (props.type === "ilab") {
+      dispatch(fetchNextJobs(newPage));
+    }
+  };


Is this because it's only fetched per page for iLab, and the others do one fetch for all pages?

frontend/src/components/organisms/TableFilters/index.jsx

jaredoconnell · 2024-10-10T18:15:09Z

frontend/src/components/templates/ILab/ILabGraph.jsx

+          data={getGraphData(item.id)[0]?.data}
+          layout={getGraphData(item.id)[0]?.layout}


What's the reason it's a list? And why the first item?

jaredoconnell · 2024-10-10T18:27:39Z

frontend/src/components/templates/ILab/MetaRow.jsx

+import { Title } from "@patternfly/react-core";
+import { uid } from "@/utils/helper";
+
+const MetaRow = (props) => {


It may be helpful to add a brief comment stating what a meta row is for.

jaredoconnell · 2024-10-10T20:26:58Z

frontend/src/components/templates/ILab/MetricsDropdown.jsx

+
+    return hasData;
+  };
+  /* Metrics select */


Is there a reason a block comment is used for a single exclusive line?

frontend/src/components/templates/ILab/index.jsx

jaredoconnell · 2024-10-10T21:33:29Z

frontend/src/components/templates/ILab/index.jsx

+    status: "Status",
+  };
+
+  return (


Is it possible for this thing to be split up? If so, do you think it would be worth it?

I suppose it could be refactored into subcomponents; whether that'd be worthwhile is a different question. There isn't a lot of logic here above the external components it's already using.

backend/app/services/crucible_svc.py

+ add some method documentation + misc review feedback

dbutenhof · 2024-10-15T20:05:08Z

I'll split this up along functional lines. That'll be a bit neater after #118 is resolved, as it factors out the Python dependency changes.

dbutenhof and others added 5 commits September 27, 2024 16:30

Add ILAB / Crucible support to CPT backend

4094cc9

GET localhost:8000/api/v1/ilab/runs?benchmark=ilab will query the ilab.crucible OpenSearch instance and return a list of ilab benchmark runs.

UI code updates

e6c8ee0

Improve periodic graph names.

5e735a7

Add Crucible readme file. Cleanups and refactoring

Documentation and cleanup

505903b

Also added the option to override the default graph title generator using the new `Graph.title` field.

Allow overriding graph color

4ccd603

dbutenhof commented Oct 2, 2024

View reviewed changes

frontend/src/components/templates/ILab/index.jsx Show resolved Hide resolved

dbutenhof added 3 commits October 3, 2024 08:29

Some (self) review cleanup

de0accc

Cleanup OpenSearch connections

5671d77

Add a Crucible `close` method, and use a FastAPI yield dependency to ensure every API connection is closed cleanly.

dbutenhof marked this pull request as ready for review October 4, 2024 12:53

mfleader requested changes Oct 4, 2024

View reviewed changes

dry923 requested review from chentex and rsevilla87 October 7, 2024 10:39

Try to remove a couple of incidental changes

a49fc65

mfleader requested changes Oct 7, 2024

View reviewed changes

frontend/src/actions/toastActions.js Outdated Show resolved Hide resolved

backend/app/main.py Outdated Show resolved Hide resolved

Undoing a few more ancillary changes

1d5783d

dbutenhof self-assigned this Oct 8, 2024

dbutenhof added the enhancement New feature or request label Oct 8, 2024

mfleader reviewed Oct 9, 2024

View reviewed changes

backend/app/services/crucible_svc.py Outdated Show resolved Hide resolved

backend/app/services/crucible_svc.py Outdated Show resolved Hide resolved

backend/app/services/crucible_svc.py Outdated Show resolved Hide resolved

backend/app/services/crucible_svc.py Outdated Show resolved Hide resolved

Review feedback

f20e45d

jaredoconnell reviewed Oct 9, 2024

View reviewed changes

MVarshini and others added 2 commits October 10, 2024 12:34

Pagination and Date filter issue

a959fe6

Rewrite param consolidation

79151ea

+ other review feedback

jaredoconnell reviewed Oct 10, 2024

View reviewed changes

Debug unhandled exceptions

bb7ed60

+ add some method documentation + misc review feedback

dbutenhof mentioned this pull request Oct 14, 2024

Add support for multi-run graphing in ilab / Crucible backend #120

Closed

7 tasks

dbutenhof marked this pull request as draft October 15, 2024 20:03

dbutenhof mentioned this pull request Oct 16, 2024

Add new Crucible backend service #122

Draft

7 tasks

dbutenhof closed this Oct 25, 2024

dbutenhof deleted the ilab branch October 25, 2024 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHEL AI / InstructLab CPT support #117

RHEL AI / InstructLab CPT support #117

dbutenhof commented Oct 1, 2024

dbutenhof left a comment

dbutenhof Oct 2, 2024

dbutenhof Oct 2, 2024

mfleader left a comment

mfleader Oct 2, 2024

dbutenhof Oct 7, 2024

mfleader Oct 10, 2024

dbutenhof Oct 10, 2024

dbutenhof Oct 11, 2024

mfleader Oct 2, 2024

mfleader Oct 2, 2024

mfleader Oct 3, 2024

dbutenhof Oct 4, 2024 •

edited

Loading

jaredoconnell Oct 9, 2024

jaredoconnell left a comment

jaredoconnell left a comment

jaredoconnell Oct 9, 2024

jaredoconnell Oct 10, 2024

jaredoconnell Oct 10, 2024

jaredoconnell Oct 10, 2024

jaredoconnell Oct 10, 2024

jaredoconnell Oct 10, 2024

dbutenhof Oct 11, 2024

dbutenhof commented Oct 15, 2024

		export const ILABS_JOBS_API_V1 = "/api/v1/ilab/runs";
		export const ILAB_GRAPH_API_V1 = "/api/v1/ilab/runs/";

		@@ -1,5 +1,5 @@

		# Openshift Performance Dashbaord
		# Openshift Performance Dashboard

		data={getGraphData(item.id)[0]?.data}
		layout={getGraphData(item.id)[0]?.layout}

RHEL AI / InstructLab CPT support #117

RHEL AI / InstructLab CPT support #117

Conversation

dbutenhof commented Oct 1, 2024

Type of change

Description

Related Tickets & Documents

Checklist before requesting a review

Testing

dbutenhof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfleader left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbutenhof Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaredoconnell left a comment

Choose a reason for hiding this comment

jaredoconnell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbutenhof commented Oct 15, 2024

dbutenhof Oct 4, 2024 •

edited

Loading