Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16-bit segmentation mask #8

Open
Rajrup opened this issue Apr 6, 2023 · 5 comments
Open

16-bit segmentation mask #8

Rajrup opened this issue Apr 6, 2023 · 5 comments

Comments

@Rajrup
Copy link

Rajrup commented Apr 6, 2023

Amazing work!
The results from the datasets that you shared are excellent. Thanks for sharing!

One question: I am trying to run RT-SBS on my collected videos containing a person. How can I generate the 16-bit PSPNet segmentation mask for these videos? I'm particularly interested in the person class.

@Swayzzu
Copy link

Swayzzu commented Aug 18, 2023

Same issue here

@cioppaanthony
Copy link
Owner

Hi @Swayzzu and @Rajrup,

Thank you for your nice comment and interest in our work!

These 16-bits images represent the "semantic foreground" output probability per pixel with 16-bit precision.
You can easily extract them with any segmentation network by taking the output of the network right before the argmax that selects the highest class probability as class value.
Once you get the output probabilities per class and per pixel of the network, the probabilities of the foreground classes (the one that are interesting for you) are aggregated (by addition) so that you get a single channel.

To be more precise, the semantic segmentation network PSPNet trained on the ADE20K dataset outputs a vector containing 150 real numbers for each pixel, where each number is associated to a particular object class within a set of 150 mutually exclusive classes. The semantic probability estimate is computed by applying a softmax function to this vector and summing the values obtained for classes that belong to a subset of classes that are relevant for motion detection. We use the subset: person, car, cushion, box, boot, boat, bus, truck, bottle, van, bag and bicycle, whose elements correspond to moving objects of the CDNet 2014 dataset.

Now it is not mandatory to work at this exact precision. 64-32 or even 8 bit precision would work similarly and you could choose a different set of classes of interest.

Let me know if this is clearer now and don't hesitate if you have any extra question.

@Swayzzu
Copy link

Swayzzu commented Aug 21, 2023

Hi @cioppaanthony ,

Thank you so much for your reply!

I do have some extra questions:

  1. After interested classes are chosen and the probabilities are aggregated, what I get is a float number (for example, 0.5), apparently it's not a 16-bit number. I tried the following way: int(0.5 * 65535) = 32768. Is this the right way to do it? Since it is way too simple, I'm worried that what I did is wrong.
  2. From you code, I can see four thresholds: thresh of FG and BG for segment_semantics, and thresh of FG and BG for segment_no_semantics. Why did you choose these thresh numbers?
  3. If I use my custom dataset and the number of classes is changed, do I need to modify these thresh numbers? If so, how do I choose thresh numbers to fit my dataset?

@cioppaanthony
Copy link
Owner

Hi @Swayzzu,

  1. Yes, that is the correct way. As you can see in the argument parser, the threshold values are integers. This is one way to represent "16 bits" information (not optimal I agree, but it was the one used in the original SBS code so I kept it for consistency).
  2. The threshold values were optimized using a Bayesian optimization strategy on the CDNet 2014 dataset with the overall F1 score as optimization criterion.
  3. Since CDNet contains various video categories, the default thresholds should already provide a good baseline performance. Afterwards, you could optimize them for your own dataset through a grid search or Bayesian optimization strategy.

@Swayzzu
Copy link

Swayzzu commented Aug 21, 2023

Thank you! That's very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants