Is the operation in SGE-Block equivalent to GroupNorm ? #33

mrT23 · 2020-01-30T14:20:00Z

Hi.
I have two questions:

Question 1:

        t = t - t.mean(dim=1, keepdim=True)
        std = t.std(dim=1, keepdim=True) + 1e-5
        t = t / std
        t = t.view(b, self.groups, h, w)
        t = t * self.weight + self.bias

it this code equivalent to batchNorm (or GroupNorm) ?
if so, shouldn't we use running_mean and running_var to stabilize the statistics and improve convergence ?

Question 2:
xn = xn.sum(dim=1, keepdim=True)
what it is logic behind this line ? why are summing along the groups ?

thanks a lot
Tal

The text was updated successfully, but these errors were encountered:

Haus226 · 2024-08-31T13:46:31Z

For question 2, I think it is used to reduce the weighted channels in each group to obtain the attention map $a$

Haus226 mentioned this issue Aug 31, 2024

Replace with Group Normalization #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the operation in SGE-Block equivalent to GroupNorm ? #33

Is the operation in SGE-Block equivalent to GroupNorm ? #33

mrT23 commented Jan 30, 2020

Haus226 commented Aug 31, 2024

Is the operation in SGE-Block equivalent to GroupNorm ? #33

Is the operation in SGE-Block equivalent to GroupNorm ? #33

Comments

mrT23 commented Jan 30, 2020

Haus226 commented Aug 31, 2024