Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of VISTA #14

Open
wensuinan opened this issue Jul 7, 2022 · 1 comment
Open

The problem of VISTA #14

wensuinan opened this issue Jul 7, 2022 · 1 comment

Comments

@wensuinan
Copy link

Hi, thanks for your great work!

In the paper,VISTA projects the input feature sequences X1 ∈ Rn×df and X2 ∈ Rm×df into
queries Q ∈ Rn×dq and keys K ∈ Rm×dq (values V ∈Rm×dv) via convolutional operators of 3 × 3 kernels, where
dq and dv are the feature dimensions of queries (keys) and values. To decouple the classification and regression tasks,
Q and K are further projected into Qi, Ki, i ∈ {sem, geo} via individual MLP (implemented as 1D convolution).

However,This is not the case in the code!

@AndlollipopDE
Copy link
Member

AndlollipopDE commented Jul 8, 2022

If you ask the code here at

self.q_sem_conv = nn.Conv2d(input_channels,

Yes, one thing to know is that the parameter "reduction_ratio" is set to 2 by default, at the early stage of experiments, we set reduction_ratio to 1 and use one convolution and two additional MLPs to get sem and geo version for both Q and K. In the final version, we tune the reduction_ratio to 2 and do not use additional MLP but 2 individual convolutions, because we find that individual convolution for sem and geo is enough in the final version. By the way, by doing this we save a little bit of parameters and make the whole module look eaiser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants