Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于语言模型权重的初始化的一些问题... #35

Open
ThreeWaterGG opened this issue Feb 8, 2025 · 1 comment
Open

关于语言模型权重的初始化的一些问题... #35

ThreeWaterGG opened this issue Feb 8, 2025 · 1 comment

Comments

@ThreeWaterGG
Copy link

ThreeWaterGG commented Feb 8, 2025

3.3 下载minimind语言模型的预训练权重(百度网盘 or HuggingFace),放到./out/ 目录下,命名为*_llm.pth

1、请问这个权重是否直接是minimind项目中训练出的权重(https://github.com/jingyaogong/minimind?tab=readme-ov-file#%E8%AE%AD%E7%BB%83%E5%AE%8C%E6%88%90%E7%9A%84%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D)
视觉模型的Transformer比语言模型多出一个VisionProj层,不知可否直接用state_dict = torch.load(ckp, map_location=args.device)进行加载
2、加载预训练语言模型参数后,视觉模型的微调可能会导致语言模型性能降低甚至崩塌,是否因此需要刻意调低学习率?

@jingyaogong
Copy link
Owner

1、是的,是隔壁训练出来的纯语言模型权重;可以用,只是 strict=False 即可
2、是的,需要降低学习率

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants