Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 306 Bytes

220504 CoCa.md

File metadata and controls

7 lines (4 loc) · 306 Bytes

https://arxiv.org/abs/2205.01917

CoCa: Contrastive Captioners are Image-Text Foundation Models (Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu)

generative pretraining과 contrastive pretraining을 하나로 묶은 vision-language 모델이네요.

#vision-language