Skip to content

Commit

Permalink
trt force export
Browse files Browse the repository at this point in the history
  • Loading branch information
IlyasMoutawwakil committed Dec 10, 2024
1 parent ef84a5a commit 07218dc
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 3 deletions.
7 changes: 4 additions & 3 deletions examples/cuda_trt_llama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@ launcher:
backend:
device: cuda
device_ids: 0
max_batch_size: 4
max_new_tokens: 32
max_prompt_length: 64
force_export: true
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
max_prompt_length: 64
max_new_tokens: 32
max_batch_size: 4

scenario:
input_shapes:
Expand Down
1 change: 1 addition & 0 deletions optimum_benchmark/backends/tensorrt_llm/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ def load_trtmodel_from_pretrained(self) -> None:
max_batch_size=self.config.max_batch_size,
max_new_tokens=self.config.max_new_tokens,
max_beam_width=self.config.max_beam_width,
force_export=self.config.force_export,
**self.config.model_kwargs,
)

Expand Down
1 change: 1 addition & 0 deletions optimum_benchmark/backends/tensorrt_llm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ class TRTLLMConfig(BackendConfig):
pp: int = 1
use_fp8: bool = False
dtype: str = "float16"
force_export: bool = False
optimization_level: int = 2
use_cuda_graph: bool = False

Expand Down

0 comments on commit 07218dc

Please sign in to comment.