The problem of VISTA #14

wensuinan · 2022-07-07T10:40:55Z

Hi, thanks for your great work!

In the paper，VISTA projects the input feature sequences X1 ∈ Rn×df and X2 ∈ Rm×df into
queries Q ∈ Rn×dq and keys K ∈ Rm×dq (values V ∈Rm×dv) via convolutional operators of 3 × 3 kernels, where
dq and dv are the feature dimensions of queries (keys) and values. To decouple the classification and regression tasks,
Q and K are further projected into Qi, Ki, i ∈ {sem, geo} via individual MLP (implemented as 1D convolution).

However，This is not the case in the code！

AndlollipopDE · 2022-07-08T15:44:30Z

If you ask the code here at

VISTA/det3d/models/necks/attention.py

Line 23 in 6b46e9b

self.q_sem_conv = nn.Conv2d(input_channels,

Yes, one thing to know is that the parameter "reduction_ratio" is set to 2 by default, at the early stage of experiments, we set reduction_ratio to 1 and use one convolution and two additional MLPs to get sem and geo version for both Q and K. In the final version, we tune the reduction_ratio to 2 and do not use additional MLP but 2 individual convolutions, because we find that individual convolution for sem and geo is enough in the final version. By the way, by doing this we save a little bit of parameters and make the whole module look eaiser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem of VISTA #14

The problem of VISTA #14

wensuinan commented Jul 7, 2022

AndlollipopDE commented Jul 8, 2022 •

edited

Loading

The problem of VISTA #14

The problem of VISTA #14

Comments

wensuinan commented Jul 7, 2022

AndlollipopDE commented Jul 8, 2022 • edited Loading

AndlollipopDE commented Jul 8, 2022 •

edited

Loading