-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting negative throughput value for large batch sizes #128
Comments
Hi! I can't reproduce it, can you please share more of the experiment config, benchmark report and cli log ? |
I was using the main branch, my problem with version 0.0.1 is it doesn't seem to know any keys in the I'll try to update the benchmark config, so it runs with 0.0.1. If you want to try to replicate my setup, I did the following: (Let's use Tinyllama as it is much smaller but has the same "negative throughput" issue) Using huggingface optimum to export to onnx
using this benchmark config (changed bs to only high numbers and model to tinyllama): defaults:
- backend: onnxruntime # default backend
- launcher: process # default launcher
- benchmark: inference # default benchmark
- experiment # inheriting experiment schema
- _self_ # for hydra 1.1 compatibility
- override hydra/job_logging: colorlog # colorful logging
- override hydra/hydra_logging: colorlog # colorful logging
experiment_name: tinyllama_ort
model: tinyllama-model
backend:
device: cpu
export: false
task: text-generation
library: transformers
benchmark:
memory: true
warmup_runs: 10
input_shapes:
batch_size: 1
sequence_length: 256
new_tokens: 512
# hydra/cli specific settings
hydra:
run:
# where to store run results
dir: experiments/${experiment_name}
sweep:
# where to store sweep results
dir: experiments/${experiment_name}
job:
# change working directory to the run directory
chdir: true
env_set:
# set environment variable OVERRIDE_BENCHMARKS to 1
# to not skip benchmarks that have been run before
OVERRIDE_BENCHMARKS: 1
sweeper:
params:
benchmark.input_shapes.batch_size: 64,128
model: /data/LLMs/onnx/tinyllama_onnx Run with
|
@mgiessing thanks I was able to reproduce it with smaller values seq len and new tokens.
the way decoding latency is computed is based on these two measurements (prefill=forward, decode=generate-forward), I'll investigate what's happening here. |
okay found it, i see why it's only happening with |
Ah amazing - thanks for your quick investigation and finding! That is really appreciated :) |
@mgiessing fixed in in #129 🤗
|
Thanks a lot @IlyasMoutawwakil - much appreciated & everything works fine now :) |
Hi, I'm using the optimum-benchmark for onnxruntime backend (cpu), but for larger batch sizes I get negative values which doesn't seem correct - is this a known bug?
This is the param setup:
E.g. for llama-2 ONNX models with onnxruntime (there are always 2 runs for each batch size, one for fp32 the other for int8)
As you can see for bs=64 the numbers are negative - is this something you can confirm/reproduce with onnxruntime?
Thanks!
The text was updated successfully, but these errors were encountered: