Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于mapping层的设计 #30

Open
ZKCCZ opened this issue Dec 16, 2024 · 2 comments
Open

关于mapping层的设计 #30

ZKCCZ opened this issue Dec 16, 2024 · 2 comments

Comments

@ZKCCZ
Copy link

ZKCCZ commented Dec 16, 2024

正如您所说”更简单的Projection的跨模态特征对齐方式,相较于Cross-Attention会带来更大的性能损失“。我认为Q-Former甚至也可以理解为一个更复杂的mapping方式,然而LLaVA系列仅使用了MLP实现图文映射,没有探讨更多的实现方式。请问您为什么会认为Cross-Attention会带来增益呢?您或许可以提供一些论文参考。
刚入门LLM和VLLM,问题会比较基础,提前感谢您的解答!

@jingyaogong
Copy link
Owner

问题和 #8 基本类似,可以参考。

推荐知乎另一个不错的视角解读:https://www.zhihu.com/search?type=content&q=Cross-Attention%20%E6%AF%94%20mlp

这是一个开放的、并非能给出100%结论的议题,故不在此处做更多讨论

(issue漏看了,回复稍晚,见谅)

@ZKCCZ
Copy link
Author

ZKCCZ commented Jan 14, 2025

感谢回答~
另外,请问作者对目前大多数LLM和LMM的数据只训练1个epoch的看法是什么?
对于LLM的预训练,可以理解为防止过拟合和被记忆一个讨论
对于多模态对齐,即使数据量远不如LLM的预训练阶段。譬如LLaVA的对齐和微调都只有1个ep。此外,利用了QWEN2-0.5B的llava-onevision在各个阶段也只使用一个epoch训练。
请问作者的看法是什么,为什么minimind-V的默认epoch是20呢?谢谢~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants