Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log_image: log bounding boxes #766

Open
4 tasks
dberenbaum opened this issue Jan 23, 2024 · 5 comments
Open
4 tasks

log_image: log bounding boxes #766

dberenbaum opened this issue Jan 23, 2024 · 5 comments
Assignees
Labels
A: log_image Area: `live.log_image` A: studio Area: Studio integration A: vscode Area: DVC VSCode Extension integration p2-medium

Comments

@dberenbaum
Copy link
Collaborator

Related: iterative/dvc#10198, iterative/vscode-dvc#4917

We need a way to log bounding boxes (and maybe later other annotations like segmentation masks) for images saved with dvclive.

p1

The API can look like this:

boxes = [
  {"label": "cat", "box": {"x_min": 100, "x_max": 110, "y_min": 5, "y_max": 20}},
  {"label": "cat", "box": {"x_min": 30, "x_max": 55, "y_min": 75, "y_max": 90}},
  {"label": "dog", "box": {"x_min": 80, "x_max": 100, "y_min": 25, "y_max": 50}}
]
live.log_image("myimg.png", myimg, boxes=boxes)

In addition to saving the image to dvclive/plots/images/myimg.png, this will also save annotations to dvclive/plots/images/myimg.json in the following format:

{"boxes":
  [
    {"label": "cat", "box": {"x_min": 100, "x_max": 110, "y_min": 5, "y_max": 20}},
    {"label": "cat", "box": {"x_min": 30, "x_max": 55, "y_min": 75, "y_max": 90}},
    {"label": "dog", "box": {"x_min": 80, "x_max": 100, "y_min": 25, "y_max": 50}}
  ]
}

p2:

  • Other box formats (using width, height, and x/y for the center/corner) ({"x_center": 100, "y_center": 50, "width": 10, "height": 20})
  • Normalized coordinates (between 0 and 1) instead of pixel coordinates (we could probably auto-detect this)
  • Scores ("scores": {"acc": 0.9, "loss": 0.05}) so that users can filter boxes based on thresholds (only show boxes where acc > 0.8)
  • Segmentations masks (tbd, requires a class per pixel)
@dberenbaum dberenbaum added p1-important Include in the next sprint A: log_image Area: `live.log_image` A: studio Area: Studio integration A: vscode Area: DVC VSCode Extension integration labels Jan 23, 2024
@dberenbaum dberenbaum added this to DVC Jan 23, 2024
@github-project-automation github-project-automation bot moved this to Backlog in DVC Jan 23, 2024
@dberenbaum dberenbaum moved this from Backlog to Todo in DVC Jan 23, 2024
@dberenbaum
Copy link
Collaborator Author

May need to consider whether it's necessary to list the universe of labels somewhere or if it's fine to parse them as the set of all individual labels.

@AlexandreKempf
Copy link
Contributor

I'm not 100% sure this is the right place for my first discussion on a feature.
But I'll jump on that one and remove my comment if it is not the correct way of dealing with feature discussion internally.

  • I would advise using "left," "top," "right," and "bottom" instead of "x" and "y" notations. The first one leaves no ambiguity, while the second is quite ambiguous. First, because it depends on what you consider x and y to be (images can be seen as a matrix (x is vertical, and y is horizontal) or as a plot (x is horizontal, and y is vertical). Then, "x_min" and "x_max" depend on your reference point. For instance, torchvision and Shapely don't have the same. The first considers the top left of the image to be the reference, and the second considers the bottom left (It is the same debate as matrix vs plot). Honestly, after many years working on object detection, the only format that never confused us was "left," "top," "right," and "bottom". While I agree that the user interface should have several options, internally, I can't recommend enough that we use a nonambiguous notation.

  • You mentioned a "score" feature, which is a great idea. In my opinion, it should probably be in P1, actually. There are so many detections out of a detection model that they only make sense if you have a score attached. What could be very interesting to have a threshold set by class in the visual interface. Usually, some classes are more represented than others, so the threshold you want to set for each class can be very different (for the same model, it could be 0.3 for rare classes and 0.95 for common classes).

  • I realized you wanted to give an example, but you don't usually have an accuracy score for each bounding box. The best most libraries out there give you is the confidence for the winning class, and only during the validation process (not during training). Indeed, during training, the model (or framework) will only return the loss for the all image. I would suggest we had this "score" at the same level of "label" and "box" and make it a float.

  • We should take advantage of other tools dealing with classification + detection + polygons + segmentation + multiclass like Supervisely (a labeling platform) or lightning-flash. Honestly, having a nice and intuitive data format for all these use cases is not trivial. We might benefit from looking at their data schema and eventually asking them what they would do differently if they could start over.

Feel free to tell me if I should have done this discussion differently or elsewhere. I'll act accordingly.

@dberenbaum
Copy link
Collaborator Author

Great feedback @AlexandreKempf! Let's go with your suggestions here.

@mattseddon and @julieg18 have been working on this functionality, and you could work with them on getting this implemented. @AlexandreKempf is our newest ML product engineer who just joined the team.

@julieg18
Copy link

julieg18 commented Feb 2, 2024

@AlexandreKempf, great suggestions on this feature!

Feel free to take a look at iterative/vscode-dvc#5227 if you'd like to give any feedback on the plots' current design and reach out if you have any questions about VSCode's or Studio's side of things.

@AlexandreKempf AlexandreKempf moved this from Todo to In Progress in DVC Feb 13, 2024
@AlexandreKempf
Copy link
Contributor

TODO list for this project:

  • DVClive should save the annotations in a .json file close to the image file
  • DVC should display the annotations when running the query dvc plots diff --json --split so that VScode can read them
  • VSCode should display the annotations
  • DVC should sent the annotations to Studio
  • DVC should sent the annotations to Studio for live experiments
  • Studio should display the annotations

@dberenbaum dberenbaum added p2-medium and removed p1-important Include in the next sprint labels Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: log_image Area: `live.log_image` A: studio Area: Studio integration A: vscode Area: DVC VSCode Extension integration p2-medium
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

3 participants