https://arxiv.org/abs/2207.00208
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce (Wonyoung Shin, Jonghun Park, Taekang Woo, Yongwoo Cho, Kwangjin Oh, Hwanjun Song)
네이버 쇼핑 쪽에서 나온 clip네요. 정확한 학습 환경이 나와있지는 않은데 gradient accumulation으로 결과를 만들었다는 것이 눈에 띕니다.
#vision-language #contrastive_learning #retrieval