-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which is used for BERT training benchmark #84
Comments
We use |
@luotao1 For ParallelExecutor, how to calculate benchmark result by your QA team? is it "speed * CPU_NUM" or just speed? |
We don't use |
then how to measure if speed is comparable with V100 ? Are they identical? |
It is not identical. |
Yes, speed is not linear with CPU_NUM, but I checked code, and find this speed reflects iteration execution time, not really processed samples. It means: So my question is for cpu vs. GPU, we may not compare data directly on speed output from log, given CPU_NUM is a virtual concept to use CPU multi-cores , and used to utilize data parallelism, while GPU need discrete card to extend multi-node. This speed is more like latency, We can give different speed with different CPU_NUM, but how to compare them with GPU fairly, that is what I want to ask. |
how about compute samples/s to compare between CPU and GPU? |
I see this calculation logic in benchmark run.sh by use samples/s, it counts both CPU_NUM, BS. |
Which script is used for BERT training benchmark, I see there are 2 kind of script, one is for pre-train, e.g train.py, the other is for fine tuning, e.g. run_classify.py.
Which one is used for benchmark?
The text was updated successfully, but these errors were encountered: