-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNNCell, LSTMCell and GRUCell are implemented as mutable structs, but never do mutation #1089
Comments
You might also want to take a look at the h member of said structs. It seems to only be used to initialize the hidden and init state of Recur when creating it. If this is the case it seems a bit wasteful and confusing to keep them as members of the cells. |
The hidden state dimensions change depending on the size of the batch |
@bhvieira yes, but that change is handled as part of |
Hmm, I see. The cells themselves are never updated, only their values just like any other layer. I'll make some tests. I remember asking something similar in the Slack channel though. |
Alright, it works and I have no clue why it's mutable, really. It could me immutable. See the spoilers below:
|
They were turned mutable only in #161 |
You can optimize the initial state though. But the way they are set right now doesn't allow that by default. And then you get a repetition: the initial state appears in the cell and in Recur.init. |
Further refining to the commit: 9a6fcf0#diff-d486393fe3ae37696de565e0fbd70386 Looks like it had to do with connecting the CUDA and Flux APIs for RNNs. I'll try and test later if making them non-mutable breaks this interface. Nothing is jumping out at me here, but I've only glanced at it. |
All you need to do to train the initial state right now is to call
Yes, having both |
check that CUDNN drop solves for too many wrappers - FluxML#1259
this is fixed on master |
Fixed in #1367 |
It seems like
RNNCell
,LSTMCell
andGRUCell
that are defined asmutable struct
s insrc/layers/recurrent.jl
never actually use any mutation in the forward pass. Instead, it looks like state mutation is handled in theRecur
struct, which copies over the hidden state of itscell
during construction and then never reads or writes the cell's hidden state onwards.Given this, what is the reason these recurrent cells are defined as mutable structs?
Should we change them to be immutable to reap the performance benefits that come with this?
The text was updated successfully, but these errors were encountered: