You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
, you design three loss function voken classification, voken regression and voken constrastive. But you only report "voken classification" in paper, maybe you find "voken regression and voken constrastive" both don't work or even harm model performance after trials? Is my guess correct ? (Because image features are far different from language embeddings. )
(2) What's the intuition that voken classification loss can improve model performance ? I suspect that different words with similar semantic will have same voken labels and voken classification loss will optimize their similarity. What is your opinion?Could you give me some intuition from your views?
The text was updated successfully, but these errors were encountered:
Yes, these token losses perform similarly. We thus choose the simplest one. To me, it's classification.
Token label is also a strong supervision. For me, they are mostly used for distillation. Contrastive and L2-reg are more like distillation, but tokens can do the same (e.g., in the language mode distillations). Some other works to look at are: wave2vec 2.0, DINO.
I have two questions.
(1) I notice that in your code
vokenization/vlm/model.py
Line 238 in 5601b79
(2) What's the intuition that voken classification loss can improve model performance ? I suspect that different words with similar semantic will have same voken labels and voken classification loss will optimize their similarity. What is your opinion?Could you give me some intuition from your views?
The text was updated successfully, but these errors were encountered: