Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about experiments hyperparameters #84

Open
adverbial03 opened this issue Nov 27, 2023 · 1 comment
Open

about experiments hyperparameters #84

adverbial03 opened this issue Nov 27, 2023 · 1 comment

Comments

@adverbial03
Copy link

Hello, thanks for sharing your excellent work!

I have some specific questions about the selection of hyperparameters in experiments and hope you can answer them:

  1. In the GDN.py class OutLayer , there is code for multiple layers (i.e., layer_num > 1), but when calling OutLayer, layer_num =1. Why is this? Are there any experimental results and analyses supporting this parameter choice?
  2. In class GraphLayer, there is a design for a multi-head attention mechanism (heads > 1), but when selecting parameters, heads=1. I think that multiple heads can help us mine richer temporal information. Why wasn't this done, and have you conducted experiments related to this decision?

I think this is an excellent paper, and I hope to know more experiment details and analysis. Is there a version of the paper with an appendix?

@d-ailin
Copy link
Owner

d-ailin commented Nov 28, 2023

Thanks for your kind words and interest in our work.

  1. layer_num is not necessarily to be set as 1. For example, it could be set with other values in run.sh. The choice of values could be varied based on the datasets. For example, when we test on SWaT, the performance of layer_num=2 is close to the result of layer_num=1, so we choose layer_num=1 for simplicity in this case. In short, the hyperparameter could be chose based on the model performance, such as reconstruction error on validation set during training.
  2. Yes, I agree that multi-head attention should be better compared to using single head. We have also tested using multi-head, but it seems in our cases, it would not improve too much or just close to the result of using single-head. But still, this could be potentially varied given different datasets.

We don't have an additional appendix for the paper, but please feel free to ask if there are any other questions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants