Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detectnet/clustering.py incorrectly calculates bounding box height #557

Closed
samsparks opened this issue Feb 7, 2019 · 8 comments
Closed

Comments

@samsparks
Copy link

vote_boxes() in clustering.py calculates each detection height by subtracting each bounding box's index 1 from index 3.

However, since a bounding box is a cv::Rect, index 3 is height and index 1 is y. The clustering algorithm should test bounding box height as follows:

            if rect[3] >= self.min_height:
@samsparks
Copy link
Author

Actually, as I look closer, the issue is move invasive. clustering.py assumes [[x1, y1, x2, y2]] boxes as the vote_boxes()'s input and output. Therefore, the algorithm needs to convert the input prior to the call to groupRectangles(), test the height using rect[3], and convert back to [x1, y1, x2, y2] when populating detections_per_image

@samsparks
Copy link
Author

Is there a better venue for this discussion? I haven't received a response on nvidia's forums.
TIA

@drnikolaev
Copy link

hi @samsparks sorry for the delay. I'm trying this but I'd appreciate some sample/unit test to check that it works correctly

@samsparks
Copy link
Author

Sure, @drnikolaev, I would be happy to provide example code. However, as this is an interface issue between clustering.py and opencv, I'm not sure what to provide beyond an inspection of the code.

Lines 167-172 show the extraction of the top left and bottom right coordinates of each bounding box into candidate boxes

    y1 = (np.asarray([net_boxes[1][y[i]][x[i]] for i in list(range(x.size))]) + my)
    x2 = (np.asarray([net_boxes[2][y[i]][x[i]] for i in list(range(x.size))]) + mx)
    y2 = (np.asarray([net_boxes[3][y[i]][x[i]] for i in list(range(x.size))]) + my)

    boxes = np.transpose(np.vstack((x1, y1, x2, y2)))

These coordinates are returned from gridbox_to_boxes() and passed to vote_boxes() on lines 224-226

            propose_boxes, propose_cvgs, mask = gridbox_to_boxes(cur_cvg, cur_boxes, self)
            # Vote across the proposals to get bboxes
            boxes_cur_image = vote_boxes(propose_boxes, propose_cvgs, mask, self)

Finally (unless I am missing something), these values are passed without being converted properly to (x, y, width, height) in vote_boxes() on line 189

    nboxes, weights = cv.groupRectangles(
        np.array(propose_boxes).tolist(),
        self.gridbox_rect_thresh,
        self.gridbox_rect_eps)

This looks wrong based on the opencv documentation.

Additionally, I rebuilt opencv to test the interface after posting this question on their forum. By adding debug statements to the implementation of groupRectangles(), I was able to prove the python code is expected (x, y, width, height).

Do you have an idea for what I can provide as example code? I am happy to do whatever I can to help.

@drnikolaev
Copy link

@samsparks please just give me example how exactly you execute clustering.py, against what dataset and/or model and what you expect as the correct outcome.

@samsparks
Copy link
Author

samsparks commented Feb 11, 2019

Hi @drnikolaev - I have not forgotten about this. Unfortunately, I do not have a trained model I can provide, and DIGITS does not allow testing of pretrained models :-(. So I am going to have to train something from scratch.

In the meantime, I have and example where I modified clustering.py to print out the input to group rectangles right before it is called, as follows:
print("proposed: {}".format(np.array(propose_boxes).tolist()))

This output the following set of bounding boxes in clustering.py:
[[547,432,701,639],[557,435,700,640],[560,438,695,641],[560,438,694,640],[88,443,336,663],[83,444,357,671],[83,444,373,676],[87,449,377,676],[87,454,380,677],[76,453,388,680],[72,447,394,683],[80,437,393,683],[101,430,392,678],[547,433,702,641],[555,433,701,645],[558,437,696,647],[556,440,696,644],[84,443,357,664],[73,448,369,665],[74,449,375,664],[81,451,373,664],[85,454,375,666],[81,454,385,672],[74,452,392,676],[77,445,396,679],[91,433,392,680],[547,430,705,644],[553,429,704,649],[555,434,697,649],[552,438,695,649],[85,445,365,661],[69,451,376,662],[69,452,379,663],[76,453,374,663],[80,452,377,666],[79,451,382,671],[74,449,388,673],[77,445,393,673],[90,434,389,674],[546,429,706,643],[553,428,703,647],[554,432,695,649],[553,435,693,654],[81,445,370,663],[68,454,383,664],[67,455,388,664],[72,454,384,667],[77,452,382,669],[71,448,386,671],[66,443,388,672],[73,438,389,671],[92,429,388,673],[545,429,706,642],[553,429,703,643],[553,432,695,647],[553,432,696,658],[79,450,367,664],[72,459,379,663],[71,459,387,665],[75,458,388,667],[75,455,390,666],[65,448,389,668],[63,441,387,669],[73,433,384,672],[100,425,388,675],[549,429,707,648],[550,429,701,652],[554,434,703,662],[79,462,356,665],[73,462,374,665],[74,461,383,666],[73,460,387,667],[69,457,391,664],[60,447,390,668],[63,435,385,673],[81,430,384,676],[116,433,390,677]]

And it returns:
[[553, 433, 700, 647], [ 75, 449, 382, 669], [ 95, 430, 390, 676]]

Passing the values in c++ return the following
[[546,431,704,642],[70,447,389,672],[555,435,696,647],[74,455,381,666]]

I expect these two to match, but they do not. I think the problem is in how clustering.py is calling groupRectangles()

The full source of the example can be found here

@samsparks
Copy link
Author

Hi @drnikolaev -

I used the default DIGITS DetectNet (KITTI) model and KITTI images contained in data_object_image_2.zip.

The two images 003716.png and 003719.png provide good examples for the problem.

  • DIGIT's clustering.py finds 4 bounding boxes for 003716.png
  • DIGIT's clustering.py finds 9 bounding boxes for 003719.png.

I can reproduce this reliably in jetson-inference only by malforming the construction of opencv::Rect objects.

I believe the current implementation of clustering.py works most of the time because groupRectangles() is grouping like objects. It is reasonably forgiving if you pass in [x1, y1, x2, y2] instead of [x1, y1, width, height] because it is just matching a pair of points instead of a point and width-height. However, it does not work as well when detections are in the bottom right (too inclusive) or top left (too exclusive) of the image.

See my fork of jetson-inference for the "broken" C++ code that replicates clustering.py. There is a define of REPLICATE_CLUSTERING_PY in detectNet.cpp that switches between the correct and incorrect construction of the cv::Rect objects.

Please note this will change the required values for epsilon. I plan on retraining my network after applying the following patch:

index 380df4a..d5c0589 100644
--- a/python/caffe/layers/detectnet/clustering.py
+++ b/python/caffe/layers/detectnet/clustering.py
@@ -188,14 +188,14 @@ def vote_boxes(propose_boxes, propose_cvgs, mask, self):
     # GROUP RECTANGLES Clustering
     ######################################################################
     nboxes, weights = cv.groupRectangles(
-        np.array(propose_boxes).tolist(),
+        [[e[0],e[1],e[2]-e[0],e[3]-e[1]] for e in np.array(propose_boxes).tolist()],
         self.gridbox_rect_thresh,
         self.gridbox_rect_eps)
     if len(nboxes):
         for rect, weight in zip(nboxes, weights):
-            if (rect[3] - rect[1]) >= self.min_height:
+            if rect[3] >= self.min_height:
                 confidence = math.log(weight[0])
-                detection = [rect[0], rect[1], rect[2], rect[3], confidence]
+                detection = [rect[0], rect[1], rect[0]+rect[2], rect[1]+rect[3], confidence]
                 detections_per_image.append(detection)
 
     return detections_per_image

drnikolaev added a commit to drnikolaev/caffe that referenced this issue Feb 28, 2019
@drnikolaev
Copy link

Fixed in v0.17.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants