Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed-precision stateful LSTM/GRU training not working #20605

Closed
larschristensen opened this issue Dec 6, 2024 · 5 comments
Closed

Mixed-precision stateful LSTM/GRU training not working #20605

larschristensen opened this issue Dec 6, 2024 · 5 comments
Assignees
Labels

Comments

@larschristensen
Copy link

larschristensen commented Dec 6, 2024

Enabling mixed-precision mode when training a stateful LSTM or GRU using Keras v3.7.0 fails with error messages like this:

Traceback (most recent call last):
  File "/home/lars.christensen/git/keras-io/examples/timeseries/timeseries_weather_forecasting.py", line 275, in <module>
    lstm_out = keras.layers.LSTM(32, stateful=True)(inputs)
  File "/home/lars.christensen/.local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/lars.christensen/.local/lib/python3.10/site-packages/optree/ops.py", line 747, in tree_map
    return treespec.unflatten(map(func, *flat_args))
ValueError: initial_value: Tensor conversion requested dtype float32 for Tensor with dtype float16: <tf.Tensor: shape=(256, 32), dtype=float16, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float16)>

The issue can for example be recreated by modifying the weather forecasting example located at "https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_weather_forecasting.py" to use a stateful LSTM as in the attached source code.
timeseries_weather_forecasting.zip

When mixed-precision mode is disabled, it works as expected. Hence, this is a problem only in mixed-precision mode.

@mehtamansi29
Copy link
Collaborator

Hi @larschristensen -

Thanks for reporting the issue. Here as per your code getting error because you are using batch_size=inputs.shape[0] argument in Input layer and stateful=True argument in LSTM layer.

inputs = keras.layers.Input(shape=(inputs.shape[1], inputs.shape[2]), batch_size=inputs.shape[0])
lstm_out = keras.layers.LSTM(32, stateful=True)(inputs)
outputs = keras.layers.Dense(1)(lstm_out)

Here attached gist is working fine with Timeseries forecasting for weather prediction.
For more details you can follow this code example as well.

@larschristensen
Copy link
Author

larschristensen commented Dec 10, 2024

@mehtamansi29 I have modified the weather forecasting prediction example, so that it works with stateful=True by using a fixed batch size in order to use it as an example of the underlying problem. The gist you made doesn't work when setting stateful=True.
Modifying the example code to work with stateful=True as I did in my example code works fine except for mixed-precision training, which is the point I'm raising here. If you just run my example code as-is, it should directly show the problem.

@Surya2k1
Copy link
Contributor

@larschristensen , I faced same issue but not with keras_nightly. Please test with keras_nightly (3.7.0.dev2024121003) . It seems resolved.

@larschristensen
Copy link
Author

@Surya2k1 Thanks, it indeed works as expected on keras_nightly. Closing.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants