How is micro-batch-size influencing the throughput per GPU ? #1254

bugm · 2024-10-12T08:09:03Z

bugm
Oct 12, 2024

Hi,
I am testing how is micro-batch-size influencing the throughput per GPU with a constant global-batch-size.
The result shows that as the micro-batch-size increases, the throughput per GPU(TFLOP/s/GPU) also increases.
I have done some test with a 400M transformer based model on 2 A40 GPUS, and only use data parallelism. Here are some training Arguments

With different test I only change the micro-batch-size , trained on 100 iterations with seq_len =1024 and global-batch-size =24 . Here are some result with different micro-batch-size

I print the log every 5 iterations and compute the averaged throughput per GPU.
For each Iteration , the total computational complexity is the same , but throughput per GPU increases as the micro-batch-size increases. I know that may related to the GPU cache load or arithmetic intensity but not quite clear. Can anyone provide some in-depth explanations?

wplf · 2024-11-06T08:11:00Z

wplf
Nov 6, 2024

micro-batch-size is the sample numbers for every iteration.
global-batch-size is the samlple numbers for every param update and optim step, which is the accumulation_num * micro-batch-size .

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is micro-batch-size influencing the throughput per GPU ? #1254

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How is micro-batch-size influencing the throughput per GPU ? #1254

bugm Oct 12, 2024

Replies: 1 comment

wplf Nov 6, 2024

bugm
Oct 12, 2024

wplf
Nov 6, 2024