Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用GPU和用CPU,训练损失差别好大 #2

Open
cqcracked opened this issue Oct 5, 2024 · 1 comment
Open

用GPU和用CPU,训练损失差别好大 #2

cqcracked opened this issue Oct 5, 2024 · 1 comment

Comments

@cqcracked
Copy link

cqcracked commented Oct 5, 2024

1-pretrain-vlm.py使用GPU训练如下:
模型可学习参数: 109.34016 百万 = 0.10934016 B (Billion)
Epoch:0/19 loss:8.766 lr:0.0004000 epoch_Time:3503.0min: 0/24808
Epoch:0/19 loss:6.576 lr:0.0004000 epoch_Time:513.0min: 100/24808
Epoch:0/19 loss:6.067 lr:0.0004000 epoch_Time:522.0min: 200/24808
Epoch:0/19 loss:5.930 lr:0.0004000 epoch_Time:522.0min: 300/24808
使用CPU训练如下:
Epoch:[0/19]0|24808 loss:5.749 lr:0.0004000 epoch_Time:10788.0min: 0/24808
Epoch:0/19 loss:2.958 lr:0.0004000 epoch_Time:6120.0min: 100/24808
用CPU训练到100个批次损失就到2.95了,
这是怎么回事?
配置如下:
dim: int = 768,
n_layers: int = 16,
n_heads: int = 16,
n_kv_heads: int = 8,

@jingyaogong
Copy link
Owner

jingyaogong commented Oct 5, 2024

1-pretrain-vlm.py使用GPU训练如下: 模型可学习参数: 109.34016 百万 = 0.10934016 B (Billion) Epoch:0/19 loss:8.766 lr:0.0004000 epoch_Time:3503.0min: 0/24808 Epoch:0/19 loss:6.576 lr:0.0004000 epoch_Time:513.0min: 100/24808 Epoch:0/19 loss:6.067 lr:0.0004000 epoch_Time:522.0min: 200/24808 Epoch:0/19 loss:5.930 lr:0.0004000 epoch_Time:522.0min: 300/24808 使用CPU训练如下: Epoch:[0/19]0|24808 loss:5.749 lr:0.0004000 epoch_Time:10788.0min: 0/24808 Epoch:0/19 loss:2.958 lr:0.0004000 epoch_Time:6120.0min: 100/24808 用CPU训练到100个批次损失就到2.95了, 这是怎么回事? 配置如下: dim: int = 768, n_layers: int = 16, n_heads: int = 16, n_kv_heads: int = 8,

image

我看了一下loss日志,正常训练也确实会从5.几 -> 2.几

目测以上提供的信息,当GPU训练到第300个iter的时候,此时已经基于它保存了3次 *.pth 权重了。

随后用CPU训练的时候,是基于前一次GPU跑了300个iter后的结果 *.pth 继续训练的。

所以出现loss "5.几 -> 2.几" 是正常的,用GPU同样也是这样的现象,和是否用CPU/GPU没有直接关系

另外iteration中的Loss降低不一定代表模型能力骤升,也许只是 loss=2.几 这一段的iteration,对应的图像和文本对比较简单, 天生就比 loss=5.几 的那一段更容易预测,loss天然就该比别的iteration更低。

有其它任何问题欢迎继续交流~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants