Mixed-precision stateful LSTM/GRU training not working #20605

larschristensen · 2024-12-06T12:31:08Z

Enabling mixed-precision mode when training a stateful LSTM or GRU using Keras v3.7.0 fails with error messages like this:

Traceback (most recent call last):
  File "/home/lars.christensen/git/keras-io/examples/timeseries/timeseries_weather_forecasting.py", line 275, in <module>
    lstm_out = keras.layers.LSTM(32, stateful=True)(inputs)
  File "/home/lars.christensen/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/lars.christensen/.local/lib/python3.10/site-packages/optree/ops.py", line 747, in tree_map
    return treespec.unflatten(map(func, *flat_args))
ValueError: initial_value: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor: shape=(256, 32), dtype=float16, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float16)>

The issue can for example be recreated by modifying the weather forecasting example located at "https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_weather_forecasting.py" to use a stateful LSTM as in the attached source code.
timeseries_weather_forecasting.zip

When mixed-precision mode is disabled, it works as expected. Hence, this is a problem only in mixed-precision mode.

The text was updated successfully, but these errors were encountered:

mehtamansi29 · 2024-12-10T05:03:56Z

Hi @larschristensen -

Thanks for reporting the issue. Here as per your code getting error because you are using batch_size=inputs.shape[0] argument in Input layer and stateful=True argument in LSTM layer.

inputs = keras.layers.Input(shape=(inputs.shape[1], inputs.shape[2]), batch_size=inputs.shape[0])
lstm_out = keras.layers.LSTM(32, stateful=True)(inputs)
outputs = keras.layers.Dense(1)(lstm_out)

Here attached gist is working fine with Timeseries forecasting for weather prediction.
For more details you can follow this code example as well.

larschristensen · 2024-12-10T07:52:30Z

@mehtamansi29 I have modified the weather forecasting prediction example, so that it works with stateful=True by using a fixed batch size in order to use it as an example of the underlying problem. The gist you made doesn't work when setting stateful=True.
Modifying the example code to work with stateful=True as I did in my example code works fine except for mixed-precision training, which is the point I'm raising here. If you just run my example code as-is, it should directly show the problem.

Surya2k1 · 2024-12-10T09:44:26Z

@larschristensen , I faced same issue but not with keras_nightly. Please test with keras_nightly (3.7.0.dev2024121003) . It seems resolved.

larschristensen · 2024-12-10T14:10:20Z

@Surya2k1 Thanks, it indeed works as expected on keras_nightly. Closing.

google-ml-butler · 2024-12-10T14:10:22Z

Are you satisfied with the resolution of your issue?
Yes
No

github-actions bot assigned sachinprasadhs Dec 6, 2024

mehtamansi29 assigned mehtamansi29 and unassigned sachinprasadhs Dec 10, 2024

mehtamansi29 added type:Bug stat:awaiting response from contributor labels Dec 10, 2024

google-ml-butler bot removed the stat:awaiting response from contributor label Dec 10, 2024

larschristensen closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-precision stateful LSTM/GRU training not working #20605

Mixed-precision stateful LSTM/GRU training not working #20605

larschristensen commented Dec 6, 2024 •

edited

Loading

mehtamansi29 commented Dec 10, 2024

larschristensen commented Dec 10, 2024 •

edited

Loading

Surya2k1 commented Dec 10, 2024

larschristensen commented Dec 10, 2024

google-ml-butler bot commented Dec 10, 2024

Mixed-precision stateful LSTM/GRU training not working #20605

Mixed-precision stateful LSTM/GRU training not working #20605

Comments

larschristensen commented Dec 6, 2024 • edited Loading

mehtamansi29 commented Dec 10, 2024

larschristensen commented Dec 10, 2024 • edited Loading

Surya2k1 commented Dec 10, 2024

larschristensen commented Dec 10, 2024

google-ml-butler bot commented Dec 10, 2024

larschristensen commented Dec 6, 2024 •

edited

Loading

larschristensen commented Dec 10, 2024 •

edited

Loading