-
- Detailed illustration of our object presence prior-guided text embedding adjustment module.
- The CLIP text encoder generates text embeddings for each object class, and the object presence prior is derived from both visual and text embeddings.
- Within hierarchically defined class groups, text embeddings are selected based on object presence prior, then refined in an object-specific direction to align with components likely present in the image.
-
+
+
+ Detailed illustration of our object presence prior-guided text embedding adjustment module.
+ The CLIP text encoder generates text embeddings for each object class, and the object presence prior is derived from both visual and text embeddings.
+ Within hierarchically defined class groups, text embeddings are selected based on object presence prior, then refined in an object-specific direction to align with components likely present in the image.
+
+