We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1-pretrain-vlm.py使用GPU训练如下: 模型可学习参数: 109.34016 百万 = 0.10934016 B (Billion) Epoch:0/19 loss:8.766 lr:0.0004000 epoch_Time:3503.0min: 0/24808 Epoch:0/19 loss:6.576 lr:0.0004000 epoch_Time:513.0min: 100/24808 Epoch:0/19 loss:6.067 lr:0.0004000 epoch_Time:522.0min: 200/24808 Epoch:0/19 loss:5.930 lr:0.0004000 epoch_Time:522.0min: 300/24808 使用CPU训练如下: Epoch:[0/19]0|24808 loss:5.749 lr:0.0004000 epoch_Time:10788.0min: 0/24808 Epoch:0/19 loss:2.958 lr:0.0004000 epoch_Time:6120.0min: 100/24808 用CPU训练到100个批次损失就到2.95了, 这是怎么回事? 配置如下: dim: int = 768, n_layers: int = 16, n_heads: int = 16, n_kv_heads: int = 8,
The text was updated successfully, but these errors were encountered:
我看了一下loss日志,正常训练也确实会从5.几 -> 2.几
目测以上提供的信息,当GPU训练到第300个iter的时候,此时已经基于它保存了3次 *.pth 权重了。
*.pth
随后用CPU训练的时候,是基于前一次GPU跑了300个iter后的结果 *.pth 继续训练的。
所以出现loss "5.几 -> 2.几" 是正常的,用GPU同样也是这样的现象,和是否用CPU/GPU没有直接关系
另外iteration中的Loss降低不一定代表模型能力骤升,也许只是 loss=2.几 这一段的iteration,对应的图像和文本对比较简单, 天生就比 loss=5.几 的那一段更容易预测,loss天然就该比别的iteration更低。
loss=2.几
loss=5.几
有其它任何问题欢迎继续交流~
Sorry, something went wrong.
No branches or pull requests
1-pretrain-vlm.py使用GPU训练如下:
模型可学习参数: 109.34016 百万 = 0.10934016 B (Billion)
Epoch:0/19 loss:8.766 lr:0.0004000 epoch_Time:3503.0min: 0/24808
Epoch:0/19 loss:6.576 lr:0.0004000 epoch_Time:513.0min: 100/24808
Epoch:0/19 loss:6.067 lr:0.0004000 epoch_Time:522.0min: 200/24808
Epoch:0/19 loss:5.930 lr:0.0004000 epoch_Time:522.0min: 300/24808
使用CPU训练如下:
Epoch:[0/19]0|24808 loss:5.749 lr:0.0004000 epoch_Time:10788.0min: 0/24808
Epoch:0/19 loss:2.958 lr:0.0004000 epoch_Time:6120.0min: 100/24808
用CPU训练到100个批次损失就到2.95了,
这是怎么回事?
配置如下:
dim: int = 768,
n_layers: int = 16,
n_heads: int = 16,
n_kv_heads: int = 8,
The text was updated successfully, but these errors were encountered: