Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Batch Normalization #40

Open
bryant03 opened this issue Jul 11, 2018 · 3 comments
Open

Wrong Batch Normalization #40

bryant03 opened this issue Jul 11, 2018 · 3 comments

Comments

@bryant03
Copy link

In function normalize()
`

    with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]
        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
        print ('mean.get_shape()',mean.get_shape())
        beta= tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
        outputs = gamma * normalized + beta

`
but i think the second parameter of tf.nn.moments() should not be [-1], since we need to consider the batch information.
After modification the code shown as below:

`

 with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]
        axis = list(range(len(inputs_shape) - 1))
        mean, variance = tf.nn.moments(inputs, axis, keep_dims=True)
        print ('mean.get_shape()',mean.get_shape())
        beta= tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
        outputs = gamma * normalized + beta

`

@RayXu14
Copy link

RayXu14 commented Sep 19, 2018

Transformer use Layer Normalization rather than batch normalization. Layer Normalization need not consider the batch information. see Layer Normalization at the end of page 2

@RoyJoyRo
Copy link

RoyJoyRo commented Mar 25, 2019

Transformer use Layer Normalization rather than batch normalization. Layer Normalization need not consider the batch information. see Layer Normalization at the end of page 2

However, I doubt that the implementation of layer_norm is still wrong. The paper is not describing the model clearly. I suggest referencing to the implementation in layers.py, and change the code as axis = list(range(1, len(inputs_shape)))

@RayXu14
Copy link

RayXu14 commented Mar 28, 2019

Transformer use Layer Normalization rather than batch normalization. Layer Normalization need not consider the batch information. see Layer Normalization at the end of page 2

However, I doubt that the implementation of layer_norm is still wrong. The paper is not describing the model clearly. I suggest referencing to the implementation in layers.py, and change the code as axis = list(range(1, len(inputs_shape)))

Agree.
In this reposity, the author just use a way that is just related to last dim, not related to sequence length direction. But in layer normalization cases, generally speaking, it should be just batch irrelevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants