Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make benchmark_throughput static support single image input #718

Open
wants to merge 3 commits into
base: habana_main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 35 additions & 9 deletions benchmarks/benchmark_throughput.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from functools import cache
from typing import Dict, List, Optional, Tuple

import requests as reqs
import torch
import uvloop
from PIL import Image
Expand Down Expand Up @@ -355,6 +356,16 @@
if args.dataset is None:
vocab_size = tokenizer.vocab_size
requests = []
mm_data = None
if args.mm_data:
mm_data_url = "https://llava-vl.github.io/static/images/view.jpg"
try:
raw_image = Image.open(reqs.get(
mm_data_url, stream=True).raw).convert("RGB")
except Exception as e:
print(f"Failed to download image from {mm_data_url}: {e}")
raw_image = None
mm_data = {"image": raw_image}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,multimodal data can be run by using json file and not setting the 'dataset' parameter, with sample_requests method. Isn't that sufficient in the required scenarios?

Copy link
Author

@yma11 yma11 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a detailed command for this test method as you describe? As I looked through the code, you may talk about the sharegpt4v_instruct_gpt4-vision_cap100k.json which should work as a real dataset test. In the pr, we would like to append a image to test multi-modal path with FIXED input length and output length, the full command likes python benchmarks/benchmark_throughput.py --model=meta-llama/Llama-3.2-11B-Vision-Instruct --max_model_len=4096 --input-len=1024 --output-len=2048 --num-prompts=1000 --max-num-seqs=64 --mm-data.

for _ in range(args.num_prompts):

request_tokenizer = tokenizer
Expand All @@ -363,35 +374,46 @@
lora_request, lora_tokenizer = get_random_lora_request(args)
if lora_tokenizer:
request_tokenizer = lora_tokenizer

if args.mm_data:
kdamaszk marked this conversation as resolved.
Show resolved Hide resolved
input_len = args.input_len - 2
else:
input_len = args.input_len

Check failure on line 380 in benchmarks/benchmark_throughput.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (SIM108)

benchmarks/benchmark_throughput.py:377:13: SIM108 Use ternary operator `input_len = args.input_len - 2 if args.mm_data else args.input_len` instead of `if`-`else`-block
# Synthesize a prompt with the given input length.
candidate_ids = [
random.randint(0, vocab_size - 1)
for _ in range(args.input_len)
for _ in range(input_len)
]
# As tokenizer may add additional tokens like BOS, we need to try
# different lengths to get the desired input length.
for _ in range(5): # Max attempts to correct
candidate_prompt = request_tokenizer.decode(candidate_ids)
tokenized_len = len(request_tokenizer.encode(candidate_prompt))

if tokenized_len == args.input_len:
if tokenized_len == input_len:
break

# Adjust length based on difference
diff = args.input_len - tokenized_len
diff = input_len - tokenized_len
if diff > 0:
candidate_ids.extend([
random.randint(100, vocab_size - 100)
for _ in range(diff)
])
else:
candidate_ids = candidate_ids[:diff]
requests.append(
SampleRequest(prompt=candidate_prompt,
prompt_len=args.input_len,
expected_output_len=args.output_len,
lora_request=lora_request))
if mm_data is not None:
requests.append(
SampleRequest(prompt="<|image|><|begin_of_text|>" + candidate_prompt,

Check failure on line 406 in benchmarks/benchmark_throughput.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

benchmarks/benchmark_throughput.py:406:81: E501 Line too long (93 > 80)
kdamaszk marked this conversation as resolved.
Show resolved Hide resolved
prompt_len=args.input_len,
expected_output_len=args.output_len,
lora_request=lora_request,
multi_modal_data=mm_data))
else:
requests.append(
SampleRequest(prompt=candidate_prompt,
prompt_len=args.input_len,
expected_output_len=args.output_len,
lora_request=lora_request))
else:
requests = sample_requests(tokenizer, args)

Expand Down Expand Up @@ -497,6 +519,10 @@
default=None,
help="Path to the lora adapters to use. This can be an absolute path, "
"a relative path, or a Hugging Face model identifier.")
parser.add_argument("--mm-data",
action='store_true',
default=False,
help="Add multi-modal data to the requests.")

parser = AsyncEngineArgs.add_cli_args(parser)
args = parser.parse_args()
Expand Down
Loading