Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix confusion matrix in model evaluation panel #5186

Merged
merged 6 commits into from
Nov 25, 2024

Conversation

imanjra
Copy link
Contributor

@imanjra imanjra commented Nov 25, 2024

What changes are proposed in this pull request?

fix confusion matrix in model evaluation panel

How is this patch tested? If it is not, please explain why.

Using model evaluation panel

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release
    notes for FiftyOne users.

(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features

    • Enhanced handling of confusion matrices and performance metrics in the evaluation component.
    • Added dynamic hover tooltips in confusion matrix plots to reflect ground truth and predicted fields.
    • Improved clarity of data displayed in confusion matrices with updated hover information.
  • Bug Fixes

    • Fixed potential issues with undefined limits in matrix slicing.
  • Chores

    • Minor adjustments to configuration options for displaying class performance and confusion matrices.

Copy link
Contributor

coderabbitai bot commented Nov 25, 2024

Walkthrough

The pull request introduces changes to the Evaluation.tsx, EvaluationPlot.tsx, and plotly.py files. In Evaluation.tsx, modifications are made to the getMatrix and formatPerClassPerformance functions to improve the handling of confusion matrices and performance metrics, ensuring consistent data slicing and clearer limit handling. In EvaluationPlot.tsx, the scaleanchor: "x" property is removed from the y-axis configuration, affecting axis synchronization in the plot rendering. The plotly.py file sees the addition of gt_field and pred_field parameters to confusion matrix plotting functions, enhancing hover information. The overall structure and props of the components remain unchanged.

Changes

File Change Summary
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx Updated getMatrix and formatPerClassPerformance functions for improved data handling and clarity.
app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/EvaluationPlot.tsx Removed scaleanchor: "x" from y-axis configuration in layoutDefaults, affecting axis synchronization.
fiftyone/core/plots/plotly.py Added gt_field and pred_field parameters to confusion matrix plotting functions for enhanced hover tooltips.
fiftyone/utils/eval/base.py Modified _confusion_matrix method to conditionally handle ids based on tabulate_ids parameter.

Possibly related PRs

  • various model evaluation fixes and enhancements #5123: This PR modifies the Evaluation.tsx component, including updates to the getMatrix function, which is directly related to the changes made in the main PR regarding confusion matrices and performance metrics.
  • model evaluation bug fixes #5166: This PR also introduces modifications to the Evaluation.tsx component, enhancing dialog functionalities for configuring class performance and confusion matrix settings, which aligns with the main PR's focus on improving data processing and user interaction in the evaluation metrics display.
  • fix model evaluation panel name #5090: This PR updates the EvaluationPanel class, which is part of the model evaluation functionality, indicating a connection to the changes made in the main PR related to evaluation metrics.

Suggested reviewers

  • manivoxel51

🐇 In the code we play,
With metrics bright and gay,
Confusion matrices refined,
Performance now well-defined.
Plotting graphs with style and grace,
In our rabbit hole, we embrace! 🐇


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 29bcdb3 and ae7b873.

📒 Files selected for processing (1)
  • app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/packages/core/src/plugins/SchemaIO/components/NativeModelEvaluationView/Evaluation.tsx

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@imanjra imanjra force-pushed the bugfix/model-eval-conf-matrix branch from 6d568a0 to 2a72908 Compare November 25, 2024 14:20
@imanjra imanjra requested review from a team and removed request for a team November 25, 2024 14:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
fiftyone/utils/eval/base.py (1)

362-365: Consider optimizing the numpy delete operations

While the current implementation is correct, you could optimize the nested np.delete operations by using a single operation with both axes.

Consider this optimization:

-                ids = np.delete(
-                    np.delete(ids, rm_inds, axis=0), rm_inds, axis=1
-                )
+                ids = np.delete(np.delete(ids, rm_inds, 0), rm_inds, 1)

This change:

  1. Makes the code more concise
  2. Maintains the same functionality
  3. Slightly improves readability
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2a72908 and 8beca16.

📒 Files selected for processing (2)
  • fiftyone/core/plots/plotly.py (7 hunks)
  • fiftyone/utils/eval/base.py (2 hunks)
🔇 Additional comments (6)
fiftyone/utils/eval/base.py (3)

336-337: LGTM: Proper handling of ids for (other, other) case

The addition correctly maintains consistency between the confusion matrix and ids array when omitting the (other, other) case.


344-346: LGTM: Proper handling of ids for missing label cases

The addition correctly maintains consistency between the confusion matrix and ids array when handling (other, missing) and (missing, other) cases.


Line range hint 336-365: Verify the impact on model evaluation panel

The changes look good and maintain proper synchronization between the confusion matrix and ids array. Let's verify the impact on the UI.

✅ Verification successful

Changes are safe and properly integrated with the UI

The changes to the confusion matrix and ids array handling are well-integrated with the model evaluation panel. The panel uses the confusion matrix functionality through a clean interface that's not affected by the internal changes to handle tabulate_ids. The UI implementation in fiftyone/operators/builtins/panels/model_evaluation/__init__.py shows proper abstraction where it only interacts with the high-level confusion matrix API.

  • The model evaluation panel uses confusion_matrix() method without direct dependency on the ids array implementation
  • The changes maintain backward compatibility with existing confusion matrix usage
  • The UI's matrix display and colorscale generation remain unaffected
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any UI components that might be affected by these changes

# Look for UI components using confusion matrix
rg -l "confusion.*matrix" --type tsx --type jsx

# Look for components importing this module
rg -l "fiftyone.*eval.*base" --type tsx --type jsx --type py

Length of output: 168


Script:

#!/bin/bash
# Search for confusion matrix usage in JavaScript/TypeScript files
rg -l "confusion.*matrix" --type js

# Search for imports of the eval base module in JavaScript/TypeScript files
rg -l "fiftyone.*eval.*base" --type js

# Search for Python files that import or use this module
rg -l "from fiftyone.utils.eval.base|import fiftyone.utils.eval.base" --type py

# Look for any UI components or tests related to confusion matrix
rg -l "ConfusionMatrix" --type js --type py

Length of output: 204


Script:

#!/bin/bash
# Search for any Python files that use or test confusion matrix functionality
rg -l "confusion_matrix" --type py

# Search for any files containing model evaluation panel references
rg -l "ModelEvaluationPanel|model_evaluation" --type py

# Look for test files related to evaluation
fd "test.*eval.*\.py$"

# Check for usage of the evaluation module in the app
rg -l "fiftyone\.utils\.eval" --type py

Length of output: 774


Script:

#!/bin/bash
# Check the model evaluation panel implementation for confusion matrix usage
rg -C 5 "confusion_matrix" fiftyone/operators/builtins/panels/model_evaluation/__init__.py

# Check evaluation tests for confusion matrix with ids
rg -C 5 "tabulate_ids.*confusion_matrix|confusion_matrix.*tabulate_ids" tests/unittests/evaluation_tests.py tests/intensive/evaluation_tests.py

Length of output: 1815

fiftyone/core/plots/plotly.py (3)

106-107: LGTM: Parameter additions for confusion matrix fields

The addition of gt_field and pred_field parameters enhances the confusion matrix visualization by allowing customization of field names in the hover information.

Also applies to: 130-131


137-138: LGTM: Enhanced hover template with field names

The hover template now properly displays customized field names from gt_field and pred_field parameters, with sensible defaults of "truth" and "predicted" when not provided.

Also applies to: 142-143


1972-1975: LGTM: Consistent InteractiveHeatmap implementation

The InteractiveHeatmap class has been properly updated to support custom field names:

  • Well-documented parameters in the docstring
  • Proper initialization and storage of the parameters
  • Consistent hover template implementation

Also applies to: 1991-1992, 2010-2011, 2206-2207, 2211-2212

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better! 🥇

Can you add two more small tweaks to the confusion matrix display?

  1. Use the hovertemplate as implemented here so that the names of the ground truth and predicted fields are shown in the tooltip:
Screenshot 2024-11-25 at 10 12 32 AM
  1. Flip the y-axis so that the confusion matrix has the standard descending diagonal as pictured below. You should be able to achieve this via autorange="reversed"
Screenshot 2024-11-25 at 10 15 28 AM

@lanzhenw
Copy link
Contributor

lanzhenw commented Nov 25, 2024

Screenshot 2024-11-25 at 2 33 41 PM

to confirm, using y axis as ground_truth and x axis as predictions. Is this correct? Or should it be the opposite?

@lanzhenw lanzhenw requested a review from brimoor November 25, 2024 20:35
@brimoor
Copy link
Contributor

brimoor commented Nov 25, 2024

to confirm, using y axis as ground_truth and x axis as predictions. Is this correct? Or should it be the opposite?

yes correct, thanks ✅

@lanzhenw lanzhenw enabled auto-merge (squash) November 25, 2024 21:39
lanzhenw
lanzhenw previously approved these changes Nov 25, 2024
hovertemplate:
[
"<b>count: %{z:d}</b>",
`ground_truth: %{y}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lanzhenw ideally we would show the actual field names here, rather than the hard-coded ground_truth and predictions strings, like is implemented here.

In order to achieve that, you'd need access to config.gt_field and config.pred_field from the evaluation's config. Do you have access to those in the frontend? If you do, then let's use the actual field names.

If you don't easily have access to the field names, then please just use ground truth rather than ground_truth and we can add field names in a follow-up PR.

@lanzhenw lanzhenw requested a review from brimoor November 25, 2024 22:05
Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants