Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selecting multiple metrics on compare page #133

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

dbutenhof
Copy link
Collaborator

@dbutenhof dbutenhof commented Nov 13, 2024

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization
  • Documentation Update

Description

Support selection of multiple metrics using the pulldown in the comparison page. The update occurs when the pulldown closes.

To simplify the management of "available metrics" across multiple selected runs, which might have entirely different metrics, the reducer no longer tries to store separate metric selection lists for each run. This also means that the "default" metrics selection remains when adding another comparison run, or expanding another row.

This is chained from #122 (Crucible service) -> #140 (unit test framework) -> #146 (crucible unit tests) -> #123 (ilab API) -> #155 (API unit tests) -> #158 (functional test framework) -> #124 (ilab UI) -> #153 (date picker) -> #125 (multi-run graphing API) -> #127 (multi-run graphing UI) -> #129 (statistics aggregation) -> #131 (metadata flyover) -> #132 (multiple metrics selection) -> #133 (compare multiple metrics)

Related Tickets & Documents

PANDA-645 support multiple metrics selection in compare view

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.

Testing

Manual testing on local deployment.
image

@dbutenhof dbutenhof force-pushed the compare branch 2 times, most recently from 99e2605 to ac58188 Compare November 14, 2024 14:42
Copy link
Member

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the compare commit. Looks fine overall.

try {
if (getState().ilab.metrics?.find((i) => i.uid == uid)) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the case for when it already has the data synced? If so I would just add a simple comment like this:

     return; // already fetched

This also applies to the other instances of this.

Comment on lines 138 to 157
periods?.periods?.forEach((p) => {
if (p.is_primary) {
summaries.push({
run: uid,
metric: p.primary_metric,
periods: [p.id],
});
}
if (metrics) {
metrics.forEach((metric) => {
if (
avail_metrics.find((m) => m.uid == uid)?.metrics?.includes(metric)
) {
summaries.push({
run: uid,
metric,
aggregate: true,
periods: [p.id],
})
);
}
});
const response = await API.post(
`/api/v1/ilab/runs/multisummary`,
summaries
);
if (response.status === 200) {
dispatch({
type: TYPES.SET_ILAB_SUMMARY_DATA,
payload: { uid, data: response.data },
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can some basic comments be included to differentiate these two? Looking closely I can see that the bottom one is aggregate.

@dbutenhof dbutenhof self-assigned this Nov 18, 2024
Copy link
Member

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the new changes look good.

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Dec 19, 2024
Copy link

This PR was closed because it has been stalled for 6 days with no activity.

dbutenhof and others added 13 commits February 18, 2025 11:22
This encapsulates substantial logic to encapsulate interpretation of the
Crucible Common Data Model OpenSearch schema for the use of CPT dashboard API
components. By itself, it does nothing.
This uses `black`, `isort` and `flake8` to check code quality, although
failure is ignored until we've cleaned it up (which has begin in
PR cloud-bulldozer#139 against the `revamp` branch).

Minimal unit testing is introduced, generating a code coverage report.
The text summary is added to the Action summary page, and the more
detailed HTML report is stored as an artifact for download.

NOTE: The GitHub Action environment is unhappy with `uvicorn` 0.15;
upgrading to the latest 0.32.x seems to work and hasn't obviously
broken anything else.
`crucible_svc.py` test coverage is now at 97%. While the remaining 3% is
worth some effort later, the law of diminishing returns will require A
significant additional effort; and since subsequent ILAB PRs will change
some of the service code anyway it's good enough for now.
Provide the `api/v1/ilab` API endpoint to allow a client to query
collected data on a Crucible CDM OpenSearch instance through the
`crucible_svc` service layer. It is backed by the Crucible layer added
in cloud-bulldozer#122, so only the final commit represents changes in this PR.
This covers 100% of the ilab.py API module using `FastAPI`'s `TestClient`.

This proved ... interesting ... as the FastAPI and Starlette versions we use
are incompatible with the underlying httpx version ... TestClient init fails
in a way that can't be worked around. (Starlette passes an unknown keyword
parameter.)

After some experimentation, I ended up "unlocking" all the API-related
packages in `project.toml` to `"*"` and letting `poetry update` resolve them,
then "re-locked" them to those versions. The resulting combination of modules
works for unit testing, and appears to work in a real `./local-compose.sh`
deployment as well.
This adds a mechanism to "can" and restore a small prototype ILAB (Crucible
CDM) Opensearch database in a pod along with the dashboard back end, front
end, and functional tests. The functional tests run entirely within the pod,
with no exposed ports and with unique container and pod names, allowing for
the possibility of simultaneous runs (e.g., a CI) on the same system.

This also has utilities for diagnosing a CDM (v7) datastore and cloning a
limited subset, along with creating an Opensearch snapshot from that data
to bootstrap the functional test pod.

Only a few functional test cases are implemented here, as demonstration. More
will be added separately.
This relies on the ilab API in cloud-bulldozer#123, which in turn builds on the crucible
service in cloud-bulldozer#122.
The `fetchILabJobs` action wasn't updating the date picker values from the API
response unless a non-empty list of jobs is returned. This means that on the
initial load, if the default API date range (1 month) doesn't find any jobs,
the displayed list is empty and the date range isn't updated to tell the user
what we've done.

I've seen no ill effects in local testing from simply removing the length
check, and now the date picker is updated correctly.
When graphing metrics from two runs, the timestamps rarely align; so we add a
`relative` option to convert the absolute metric timestamps into relative
delta seconds from each run's start.
This adds the basic UI to support comparison of the metrics of two InstructLab
runs. This compares only the primary metrics of the two runs, in a relative
timeline graph.

This is backed by cloud-bulldozer#125, which is backed by cloud-bulldozer#124, which is backed by cloud-bulldozer#123,
which is backed by cloud-bulldozer#122. These represent a series of steps towards a complete
InstructLab UI and API, and will be reviewed and merged from cloud-bulldozer#122 forward.
This PR is primarily CPT dashboard backend API (and Crucible service) changes
to support pulling and displaying multiple Crucible metric statistics. Only
minor UI changes are included to support API changes. The remaining UI changes
to pull and display statistics will be pushed separately.
Add statistics charts for selected metric in row expansion and comparison
views.
Extract the "Metadata" into a separate component, which allows it to be reused
as an info flyover on the comparison page to help in identifying target runs
to be compared.
Modify the metrics pulldown to allow multiple selection. The statistical
summary chart and graph will show all selected metrics in addition to the
inherent benchmark primary benchmark (for the primary period).
Support selection of multiple metrics using the pulldown in the comparison
page. The update occurs when the pulldown closes.

To simplify the management of "available metrics" across multiple selected
runs, which might have entirely different metrics, the reducer no longer
tries to store separate metric selection lists for each run. This also means
that the "default" metrics selection remains when adding another comparison
run, or expanding another row.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants