-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where is the Visual Wake Word test set? #135
Comments
Good question! MS-COCO does not publish the labels (aka annotations) for the test set and holds competitions oriented around the test set. This means that Visual Wake Words does not contain an explicit test set. It's traditionally best practice to use the Val set as the test set and use a small percentage of the training set for validation if needed. MLPerf Tiny should potentially move to adopt this practice, including an update to the paper. @cskiraly and @jeremy-syn, who currently owns the VWW benchmark? I'm happy to help make the change if needed. |
Hi @colbybanbury @LucaUrbinati44 Any news on this issue ? Thanks Lucas |
Hi @LucasFischer123, Short answer Long answer Then, as second experiment, we trained the model again from scratch, but this time on the whole dataset, i.e. without removing the 1000 images. This time the testing result on the 1000 images was 86%, as the paper. Since the second experiment gave the same results of the paper, we decided to go for this second “solution” (see “Short answer”). However, we know that this procedure is not 100% correct since the model saw the 1000 images twice (during training and during testing). Thus, we hope the organizers' could solve this issue soon, both in the repo instructions and in the paper. Thank you all, |
Hi @LucaUrbinati44 @colbybanbury @LucasFischer123 @cskiraly and @jeremy-syn I had a similar question on how to evaluate accuracy. I created this Jupyter notebook, which you can run in your browser (or use this script if you prefer running locally). This script downloads the dataset from Silabs and runs both TFLite reference models (int8-model and float-model) with the 1000 images listed in y_labels.csv to measure their accuracy. I get below results:
Does this look correct? BTW, I get 86.0% for int8 accuracy (instead of 85.9%) when I run on M1 MacBook instead of colab. |
One more note: For the int8-accuracy, a few of the testcases in y_labels.csv produce a probability of exactly 0.5 (i.e. signed int8 value of 0, or unsigned int8 value of 128). In my script I assume that probability-of-person = 0.5 indicates a person. Changing this to non-person reduces the int8-accuracy by 0.3%. |
I would like to evaluate the pretrained MobileNet model on the preprocessed COCO2014 test set, but I am not able to find this preprcessed test set anywhere in the repo. Where can I find it? For the other three datasets (AD, IC, KS) it has been already provided in the repo.
I suspect I have to generate it by myself using this script setting
dataType='test2014'
, because this should be the same script that has been used to create the training+validation dataset that is used for the training and that can be downloaded here.Moreover, the paper entitled "MLPerf Tiny Benchmark" mentions the presence of this test set for the VWW problem at paragraph 4.1.
Finally, why is there no
test.py
(orevaluated.py
) script to run the model on the test set, while for all the other three datasets (AD, IC, KS) there are such scripts?Thank you,
Regards,
Luca Urbinati
The text was updated successfully, but these errors were encountered: