We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1、python train.py 单卡 训练速度9.0step/s 2、fleetrun train.py 单卡 use_amp = False 训练速度 9.0step/s use_amp=True 训练速度3.9step/s 3、fleetrun train.py 多卡(6卡) use_amp = False 训练速度3.0step/s use_amp=True 训练速度1.8step/s 问题1、使用use_amp后性能下降严重 2、使用fleetrun 分布式训练较单卡性能下降严重,使用3张卡才相当于之前一张卡,没有体现分布式加速训练的效果
The text was updated successfully, but these errors were encountered:
需要你提供下完整的训练情况哈,包括机器环境,以及训练配置 机器环境:GPU类型、GPU driver版本,CUDA版本、CuDNN版本、NCCL版本、Paddle版本 训练配置:训练模型的规模、batch size设置、训练用数据(最好直接用knover自带的data/example来测试)、其他可能影响训练性能的配置 目前我测试使用 fleetrun train.py (即scripts/distributed/train.sh)在自带的example数据上,跑 projects/PLATO-2/pretrain/24L_train_stage-1.conf 在V100单卡上,CUDA10.2,Paddle2.2.2 使用use_amp=true(2.5steps/s)是会显著快于use_amp=false(0.75steps/s)
Sorry, something went wrong.
No branches or pull requests
1、python train.py 单卡 训练速度9.0step/s
2、fleetrun train.py 单卡 use_amp = False 训练速度 9.0step/s use_amp=True 训练速度3.9step/s
3、fleetrun train.py 多卡(6卡) use_amp = False 训练速度3.0step/s use_amp=True 训练速度1.8step/s
问题1、使用use_amp后性能下降严重
2、使用fleetrun 分布式训练较单卡性能下降严重,使用3张卡才相当于之前一张卡,没有体现分布式加速训练的效果
The text was updated successfully, but these errors were encountered: