-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Triton benchmarks for blog #509
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Rishi Chandra <[email protected]>
7c26cc0
to
2614369
Compare
1. [`spark_resnet.py`](spark_resnet.py): Uses predict_batch_udf to perform in-process prediction on the GPU. | ||
2. [`spark_resnet_triton.py`](spark_resnet_triton.py): Uses predict_batch_udf to send inference requests to Triton, which performs inference on the GPU. | ||
|
||
Spark cannot change the task parallelism within a stage based on the resources required (i.e., multiple CPUs for preprocessing vs. single GPU for inference). Therefore, implementation (1) will limit to 1 task per GPU to enable one instance of the model on the GPU. In contrast, implementation (2) allows as many tasks to run in parallel as cores on the executor, since Triton handles inference on the GPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For resnet-50 could multiple model instances fit in the GPU? If so, might be good to benchmark that case, where multiple spark tasks run per GPU with each having its own model instance. Due to multiple processes, GPU compute will be time sliced so perf could be hit, but still interesting to compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to consolidate this script with spark_resnet.py and select library or triton via cli argument?
Scripts, configs, and instructions to reproduce the blog benchmarks, displaying the benefit of using Triton for CPU parallelism..