Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining #5

Open
erlebach opened this issue Sep 29, 2017 · 13 comments
Open

Pretraining #5

erlebach opened this issue Sep 29, 2017 · 13 comments

Comments

@erlebach
Copy link

Hi,
My team and I are trying to duplicate the results of your paper, but cannot. Would it be possible to gain access to the code that pretrains the data? That would help us a lot. Thank you.

@michelleowen
Copy link

Hi I am also interested in your pre-training code. I did pre-training based on your description in your paper. However, with pre-training, gamma output will always assign the same class to all data points.

@michelleowen
Copy link

Also, why you assign weights from one previous layer in pretrained AE to the layers in VaDE as below:
vade.layers[1].set_weights(ae.layers[0].get_weights())
vade.layers[2].set_weights(ae.layers[1].get_weights())
vade.layers[3].set_weights(ae.layers[2].get_weights())
vade.layers[4].set_weights(ae.layers[3].get_weights())
why not
vade.layers[1].set_weights(ae.layers[1].get_weights())
vade.layers[2].set_weights(ae.layers[2].get_weights())
vade.layers[3].set_weights(ae.layers[3].get_weights())
vade.layers[4].set_weights(ae.layers[4].get_weights())
if pre-trained ae has the same network architecture with VaDE?

@eelxpeng
Copy link

Also having trouble replicating the results. Using the pretrained weights provided works fine, except for HAR dataset. But using pretraining code from DEC-keras, which achieves good results for AE+kmeans and DEC, does not make the VaDE model work. Also, in the code for HAR dataset, it specifies the random state for GMM, which shouldn't be done. Removing the random state specification and repeat many times, the performance is significantly lower than the result reported. Is the author using the pretrain code from original DEC code? If not, could you provide it?

@ttgump
Copy link

ttgump commented Feb 1, 2018

@michelleowen
I think they are using a Sequential model based on their json file. So the architecture of ae is like:
ae = Sequential()
ae.add(Dense(intermediate_dim[0], input_dim=original_dim, activation='relu'))
ae.add(Dense(intermediate_dim[1], activation='relu'))
ae.add(Dense(intermediate_dim[2], activation='relu'))
ae.add(Dense(latent_dim))
ae.add(Dense(intermediate_dim[2], activation='relu'))
ae.add(Dense(intermediate_dim[1], activation='relu'))
ae.add(Dense(intermediate_dim[0], activation='relu'))
ae.add(Dense(original_dim))
But even I tried to pretrain this autoencoder first, I get same problem that gamma output will always assign the same class to all data points. So I guess authors used other technic to pretrain "ae".

@wangmn93
Copy link

wangmn93 commented Apr 6, 2018

They use VAE or AAE to pretrain the model. You need to constraint the latent space with KL divergence in the loss (Or use discriminator in AAE).
I have tried VAE for pretraining. The accuracy after 200 epoch is 86% on MNIST. The range of latent space is -5 ~ 5. While the range of latent space of the provided pretrained weights is -3 ~ 3. If you can further shrink the range of latent space, i think the result will be the same as theirs.

@eelxpeng
Copy link

eelxpeng commented May 21, 2018

@wangmn93 Could you elaborate more on the VAE pretraining? How to control the range of latent space? By setting coefficient on the KL divergence term? Also, it seems that their provided pretrain weights only have autoencoder weight, but not enc_sigma weight. It would even better if you could share your code for the pretraining. Thanks.

@wangmn93
Copy link

wangmn93 commented May 22, 2018 via email

@eelxpeng
Copy link

eelxpeng commented May 22, 2018

@wangmn93 Thank you for your reply. I actually tried many possible initializations, including ae, sdae, vae, with all kinds of random initialization. However, I haven't got one work. Could you share one code that at least sometimes works? I am trying to find out the reason of the instability, and good initialization method to make things work robustly. Your help would be much appreciated.

@wangmn93
Copy link

wangmn93 commented May 22, 2018 via email

@devyhia
Copy link

devyhia commented Nov 27, 2018

@eelxpeng Did you make any progress on this problem? Did you get the DEC-Keras pre-training method to work?

I could get the AE pre-training on DEC-Keras to reach ~86% ... However, once I plug that in to VaDE, accuracy drops dramatically to ~57%. Not really sure what is going on wrong there.

@Zizi6947
Copy link

@wangmn93 Did you train some other datasets ?I could get 85%+ on MNIST using VaDE. But when i train the new dataset, the acc is only about 20% .

@wangmn93
Copy link

wangmn93 commented Dec 20, 2018 via email

@djsavic
Copy link

djsavic commented Jan 5, 2022

@Zizi6947 @devyhia @michelleowen
I managed to adapt the code for some tabular data with ~60 features and ~1e5 total batch. The only way of achieving a good result was to pretrain the model for 1 epoch as ae = vade, ae.fit(X, X, loss='mse', optimizer='adam') and then proceed with vade.fit(....). It did the trick for me. Also, the parameter alpha that is in the loss function needs to be carefully tuned in order to prevent negative loss. Alpha is sensitive to latent_dim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants