Getting negative throughput value for large batch sizes #128

mgiessing · 2024-02-19T13:52:23Z

Hi, I'm using the optimum-benchmark for onnxruntime backend (cpu), but for larger batch sizes I get negative values which doesn't seem correct - is this a known bug?

This is the param setup:

[...]
  sweeper:
    params:
      benchmark.input_shapes.batch_size: 1,2,4,8,16,32,64
      model: /data/LLMs/onnx/llama-2-7b_onnx,/data/LLMs/onnx/llama-2-7b_onnx_quant

E.g. for llama-2 ONNX models with onnxruntime (there are always 2 runs for each batch size, one for fp32 the other for int8)

for i in {1..13}; do
    cat "experiments/llama-2-7b_ort/$i/benchmark_report.json" | jq '.batch_size, .decode.throughput."mean[tokens/s]"'
done

1
10.64650340356247
2
12.258327597687689
2
22.51221913908261
4
26.243146016015586
4
46.47543946736317
8
56.99298854476616
8
102.96338019358438
16
156.63038758907555
16
284.71610900884076
32
668.620966745494
32
2320.323920074327
64
-744.356127863481
64
-779.9233473188091

As you can see for bs=64 the numbers are negative - is this something you can confirm/reproduce with onnxruntime?

Thanks!

IlyasMoutawwakil · 2024-02-19T16:06:13Z

Hi! I can't reproduce it, can you please share more of the experiment config, benchmark report and cli log ?
is this using main branch ? can you try running with 0.0.1 ? to see if it's caused by one of my new PRs

mgiessing · 2024-02-19T16:58:13Z

I was using the main branch, my problem with version 0.0.1 is it doesn't seem to know any keys in the backend section (such as export: false which I use because I'm running "pre-converted" models, since I don't have a lot of space in my root partition.

I'll try to update the benchmark config, so it runs with 0.0.1.

If you want to try to replicate my setup, I did the following: (Let's use Tinyllama as it is much smaller but has the same "negative throughput" issue)

Using huggingface optimum to export to onnx

optimum-cli export onnx --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 /data/LLMs/onnx/tinyllama_onnx/

using this benchmark config (changed bs to only high numbers and model to tinyllama):

defaults:
  - backend: onnxruntime # default backend
  - launcher: process # default launcher
  - benchmark: inference # default benchmark
  - experiment # inheriting experiment schema
  - _self_ # for hydra 1.1 compatibility
  - override hydra/job_logging: colorlog # colorful logging
  - override hydra/hydra_logging: colorlog # colorful logging

experiment_name: tinyllama_ort
model: tinyllama-model

backend:
  device: cpu
  export: false
  task: text-generation
  library: transformers

benchmark:
  memory: true
  warmup_runs: 10
  input_shapes:
    batch_size: 1
    sequence_length: 256
  new_tokens: 512

# hydra/cli specific settings
hydra:
  run:
    # where to store run results
    dir: experiments/${experiment_name}
  sweep:
    # where to store sweep results
    dir: experiments/${experiment_name}
  job:
    # change working directory to the run directory
    chdir: true
    env_set:
      # set environment variable OVERRIDE_BENCHMARKS to 1
      # to not skip benchmarks that have been run before
      OVERRIDE_BENCHMARKS: 1
  sweeper:
    params:
      benchmark.input_shapes.batch_size: 64,128
      model: /data/LLMs/onnx/tinyllama_onnx

Run with

optimum-benchmark --config-dir benchmark_configs/ --config-name tinyllama_ort --multirun

IlyasMoutawwakil · 2024-02-19T17:35:30Z

@mgiessing thanks I was able to reproduce it with smaller values seq len and new tokens.
it seems that for some reason generate takes less time than forward which doesn't make sense

[PROC-0][2024-02-19 17:25:34,014][latency][INFO] -              + forward latency: 6.503192 ± 2 x 0.068859 (s)
[PROC-0][2024-02-19 17:25:44,944][latency][INFO] -              + generate latency: 1.214331 ± 2 x 0.003735 (s)

the way decoding latency is computed is based on these two measurements (prefill=forward, decode=generate-forward), I'll investigate what's happening here.

IlyasMoutawwakil · 2024-02-19T17:51:21Z

okay found it, i see why it's only happening with onnxruntime, I was filtering some inputs to avoid passing postion_ids to models that don't require them (causal LMs are a mess), but other inputs got filtered accidently, and since generate can be ran with empty input it didn't raise any errors (it just decodes starting from a bos token).
Thanks a lot for spotting this ! I'll fix it asap and add some sanity checks to avoid this ever occurring.

mgiessing · 2024-02-19T18:11:48Z

Ah amazing - thanks for your quick investigation and finding! That is really appreciated :)

IlyasMoutawwakil · 2024-02-20T08:35:43Z

@mgiessing fixed in in #129 🤗
also added some logging to notify users when an input is generated but removed for a specific backend

[08:35:31][test-cli][INFO] - [PROC-0][2024-02-20 08:35:31,920][onnxruntime][WARNING] - Input position_ids is not in expected inputs names. Removing it.

mgiessing · 2024-02-20T15:54:16Z

Thanks a lot @IlyasMoutawwakil - much appreciated & everything works fine now :)

mgiessing closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting negative throughput value for large batch sizes #128

Getting negative throughput value for large batch sizes #128

mgiessing commented Feb 19, 2024

IlyasMoutawwakil commented Feb 19, 2024

mgiessing commented Feb 19, 2024

IlyasMoutawwakil commented Feb 19, 2024

IlyasMoutawwakil commented Feb 19, 2024 •

edited

Loading

mgiessing commented Feb 19, 2024

IlyasMoutawwakil commented Feb 20, 2024 •

edited

Loading

mgiessing commented Feb 20, 2024

Getting negative throughput value for large batch sizes #128

Getting negative throughput value for large batch sizes #128

Comments

mgiessing commented Feb 19, 2024

IlyasMoutawwakil commented Feb 19, 2024

mgiessing commented Feb 19, 2024

Using huggingface optimum to export to onnx

IlyasMoutawwakil commented Feb 19, 2024

IlyasMoutawwakil commented Feb 19, 2024 • edited Loading

mgiessing commented Feb 19, 2024

IlyasMoutawwakil commented Feb 20, 2024 • edited Loading

mgiessing commented Feb 20, 2024

IlyasMoutawwakil commented Feb 19, 2024 •

edited

Loading

IlyasMoutawwakil commented Feb 20, 2024 •

edited

Loading