-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4 questions regarding the structure of LSTM_autoencoder #1
Comments
Hello Ranjan, |
An ideal threshold is one where the precision and recall are highest together. That means, the point of their intersection. If it is hard to identify that from this plot, I will look at the array of precision and recall. |
Can you answer my other questions as well please...? |
@hellojinwoo Yes, I am at the moment drafting a post that will answer your questions (at least some of them). Your questions are really good and require a detailed explanation. I also identified a few issues in my lstm network, that I will correct and mention. |
@hellojinwoo Please look at this post, https://towardsdatascience.com/step-by-step-understanding-lstm-autoencoder-layers-ffab055b6352 |
Hi @cran2367 this is such a good post.I just wanted to know if you would update the LSTM structure soon? |
Thank you, @sudhanvasbhat1993. I will be making the next post explaining how to optimize a Dense Autoencoder. Thereafter, I will be making a post on LSTM autoencoder tuning. But the next LSTM post may take a few weeks. |
Hi, LSTM Autoencoder for Extreme Rare Event Classification in Keras was a great article. I applied the same on a vehicle predictive maintenance data set. Can you please guide me to where i am going wrong and what can be done to resolve this issue. Thanks a lot in advance. |
Hello, Mr. Ranjan. Thanks for your great article LSTM Autoencoder for Extreme Rare Event Classification in Keras and code on the github. While reading your code, however, I came up with 3 questions.
I decided to ask you questions here rather than on medium because I can upload pictures and quote codes more accurately here. Hope you are okay with this.
Q1. Why ‘return_sequences=True’ for all the LSTM layers?
Back up explanations
< Figure 1. seq2seq model : Encoding - Decoding model >
In the encoding stage, what a model needs to do is making a fixed-length vector(a latent vector) which contains all the information and time-wise relationships of the input sequence. In decoding step, a model’s goal is to create an output that is as close as possible to the original input.
So my guess is that in the encoding stage, we do not need outputs as in the figure 1, as the autoencoder model's only goal is to make a hidden latent vector well. The little MSE the output created from the latent vector in the decoding stage has with the input data, the better the latent vector is.
Doesn’t it mean that we can make ‘return_sequences = False’, which does not print out the outputs in the encoding stage?
Q2. What would be the first hidden state (h0, c0) for the decoding stage?
Back up explanations
timesteps
as inlstm_autoencoder.add(RepeatVector(timesteps))
This means that the latent vector would be fed to the decoder as an input in the decoding stage. Below is the code snippet.If latent vectors are used as inputs in the decoding stage, what would be used for inital hidden state (h0, c0) ? In the seq2seq model (figure 1) mentioned above, the latent vector is used as initial hidden state (h0, c0) in the decoding stage. The input in the decoding stage would be a sentence that needs to be translated, for example from English to French.
So I am curious to know what would be used as an initial hidden state cell (h0, c0) in your code!
Q3. Why output unit size increases from 5 to 16, in the encoding stage?
Back up explanations
lstm_autoencoder.summary()
we can see that the output unit increases from 5 (in the layer 'lstm_16') to 16 (in the layer 'lstm_17' )< Figure 2. summary of LSTM - Autoencoder model >
Since the output of previous LSTM layer is an input for the next LSTM layer, I think the output size is equivalent to hidden state size.
If the hidden layer's size is greater than the number of inputs, the model can learn just an 'identity function' which is not desirable. (Source : [What is the intuition behind the sparsity parameter in sparse autoencoders?])(https://stats.stackexchange.com/questions/149478/what-is-the-intuition-behind-the-sparsity-parameter-in-sparse-autoencoders)
Layer 'lstm_16' is only 5-size long while the next layer 'lstm_17' is 16-size long. So I think the lstm_17 would just copy (acting like an 'identity matrix') the last_16, which makes the layer lstm_17 undesirable.
I am curious to know why the output size (hidden_layer size) increases rather than decreases!
Q4. How smaller does the input data size get reduced in the latent vector?
Back up explanations
Thanks for this nice post again.
The text was updated successfully, but these errors were encountered: