Wrong Batch Normalization #40

bryant03 · 2018-07-11T10:09:45Z

In function normalize()
`

    with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]
        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
        print ('mean.get_shape()',mean.get_shape())
        beta= tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
        outputs = gamma * normalized + beta

`
but i think the second parameter of tf.nn.moments() should not be [-1], since we need to consider the batch information.
After modification the code shown as below:

`

 with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]
        axis = list(range(len(inputs_shape) - 1))
        mean, variance = tf.nn.moments(inputs, axis, keep_dims=True)
        print ('mean.get_shape()',mean.get_shape())
        beta= tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
        outputs = gamma * normalized + beta

`

The text was updated successfully, but these errors were encountered:

RayXu14 · 2018-09-19T03:39:42Z

Transformer use Layer Normalization rather than batch normalization. Layer Normalization need not consider the batch information. see Layer Normalization at the end of page 2

RoyJoyRo · 2019-03-25T13:26:42Z

Transformer use Layer Normalization rather than batch normalization. Layer Normalization need not consider the batch information. see Layer Normalization at the end of page 2

However, I doubt that the implementation of layer_norm is still wrong. The paper is not describing the model clearly. I suggest referencing to the implementation in layers.py, and change the code as axis = list(range(1, len(inputs_shape)))

RayXu14 · 2019-03-28T12:30:41Z

Transformer use Layer Normalization rather than batch normalization. Layer Normalization need not consider the batch information. see Layer Normalization at the end of page 2

However, I doubt that the implementation of layer_norm is still wrong. The paper is not describing the model clearly. I suggest referencing to the implementation in layers.py, and change the code as axis = list(range(1, len(inputs_shape)))

Agree.
In this reposity, the author just use a way that is just related to last dim, not related to sequence length direction. But in layer normalization cases, generally speaking, it should be just batch irrelevant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong Batch Normalization #40

Wrong Batch Normalization #40

bryant03 commented Jul 11, 2018

RayXu14 commented Sep 19, 2018 •

edited

Loading

RoyJoyRo commented Mar 25, 2019 •

edited

Loading

RayXu14 commented Mar 28, 2019

Wrong Batch Normalization #40

Wrong Batch Normalization #40

Comments

bryant03 commented Jul 11, 2018

RayXu14 commented Sep 19, 2018 • edited Loading

RoyJoyRo commented Mar 25, 2019 • edited Loading

RayXu14 commented Mar 28, 2019

RayXu14 commented Sep 19, 2018 •

edited

Loading

RoyJoyRo commented Mar 25, 2019 •

edited

Loading