You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Losses all take a reduction argument which can either be sum or sum_over_batch_size. sum is fairly straight forward, and there are no suprises with the implemention - a summation, or a weighted summation if sample_weight or mask is present. sum_over_batch_size is ambiguously named and inconsistent with the implementation.
Ambiguity: I originally thought it meant "sum over the batch dimension". Looking at previous implementations, it looks like it's meant to mean "sum divided by the batch size". The keras 3.0 implementation looks like it just computes a weighted mean.
If it's meant to be the mean, why not call it "mean"? If it's not meant to be the mean, then consider this a bug report, because the current implementation just does that.
Note I'm not just being pedantic - I want to submit a PR that mixes masking (multiplication by zero is not masking when infs and nans are around), but I need to know exactly what the implementation is supposed to be to fix this.
The text was updated successfully, but these errors were encountered:
sum_over_batch_size and mean are the same thing. It should more naturally be called mean but we kept the Keras 2 terminology for backwards compatibility.
For anyone coming back here in the future: I actually prefer sum_over_batch_size to mean now, because the weighted interpretations are different. While a sum reduction with sample_weights is interpreted as a weighted sum, a sum_over_batch_size is interpreted as a weighted sum divided by the number of unmasked entries (the "batch size"), not the weighted mean.
Losses all take a
reduction
argument which can either besum
orsum_over_batch_size
.sum
is fairly straight forward, and there are no suprises with the implemention - a summation, or a weighted summation ifsample_weight
ormask
is present.sum_over_batch_size
is ambiguously named and inconsistent with the implementation.Ambiguity: I originally thought it meant "sum over the batch dimension". Looking at previous implementations, it looks like it's meant to mean "sum divided by the batch size". The keras 3.0 implementation looks like it just computes a weighted mean.
If it's meant to be the mean, why not call it "mean"? If it's not meant to be the mean, then consider this a bug report, because the current implementation just does that.
Note I'm not just being pedantic - I want to submit a PR that mixes masking (multiplication by zero is not masking when infs and nans are around), but I need to know exactly what the implementation is supposed to be to fix this.
The text was updated successfully, but these errors were encountered: