Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table 4 and Table 5 using COCO pretraining or not? #39

Open
superaha opened this issue Oct 26, 2022 · 4 comments
Open

Table 4 and Table 5 using COCO pretraining or not? #39

superaha opened this issue Oct 26, 2022 · 4 comments

Comments

@superaha
Copy link

Hi there,

Thank you for sharing the repo. In Table 3, the results of YOUTUBE-VIS 2019 are reported using both models with and without the COCO pretraining.

How about Table 4 and Table 5 for IDOL? I did not find the detailed settings and explanations for these two results.

Thanks

@timmeinhardt
Copy link

timmeinhardt commented Oct 26, 2022

I have asked the same question in a different issue. This line

WEIGHTS: "cocopretrain_R50.pth"

seems to suggest that they used a model pretrained on COCO sequences. But I would appreciate a clarification of the others as well!

@superaha
Copy link
Author

Thanks for pointing this out. Let us see if the authors can clarify.
@wjf5203

@wjf5203
Copy link
Owner

wjf5203 commented Oct 27, 2022

Hi,

Thanks for your attention and pointing this out.

Let me clarify this. We have at most three training steps for IDOL:

Step 1: pre-training the instance segmentation pipeline on COCO, following all other VIS methods.
Step 2: pre-training IDOL on pseudo key-reference pair from COCO. (This step forces the model to learn a position-insensitive contrastive embedding that relies on appearance of the object rather than the spatial position.)
Step 3: finetune our VIS method IDOL on VIS dataset (YTVIS19/YTVIS21/OVIS), following all other VIS methods.

So, the main difference is Step 2.
In Table 3,4,5, all the IDOL results marked with $\dagger$ are obtained by Step 1+2+3, others without $\dagger$ are obtained by Step 1+3.

We will add more detailed experimental settings in the next arXiv version ~

@HanGuangXin
Copy link

@wjf5203 Hi, so there are 2 steps of pre-train. The first step is on single frame with static COCO images. The second step is on pseudo key-reference pairs.

And I have 3 questions about this:

  1. Why do we have to do pre-train on static COCO images first? Why just using the second is not enough?
  2. The provided pre-trained weights is from the second step, not the first step? If so, could you provide the trained weights of the first step?
  3. How can I do the first step pre-training by myself? It seems the code only support the second and the third steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants