[Paper] [Twitter] [Poster] [DataComp Leaderboard]
[HuggingFace (Samples ID on DataComp-medium)]
This is an official code of negCLIPLoss & NormSim (NeurIPS2024 Spotlight), which is a simple but efficient data selection method for CLIP model. This paper has a previous version VAS paper.
If you found this repository, our paper useful, please consider citing:
@article{wang2024cliploss,
title={CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning},
author={Wang, Yiping and Chen, Yifang and Yan, Wendan and Fang, Alex and Zhou, Wenjing and Jamieson, Kevin and Du, Simon Shaolei},
journal={arXiv preprint arXiv:2405.19547},
year={2024}
}
The main function is baselines/vas2: load_uids_with_cs_new
. Will clean the code and add details soon.
We thank the authors of DataComp for open sourcing their codes.