Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: GPU benchmarks for ML workloads. #160

Open
ywilke opened this issue Jan 14, 2025 · 2 comments
Open

Feature request: GPU benchmarks for ML workloads. #160

ywilke opened this issue Jan 14, 2025 · 2 comments

Comments

@ywilke
Copy link

ywilke commented Jan 14, 2025

I love what you are doing at sparecores!

I was trying to find good benchmarks that compare the GPU instances that different cloud providers offer. Unfortunately I was unable to find any good comparison of the performance for ML workloads between the different GPU instances. Even when I tried the find a comparison between the GPUs that the cloud instances contain I could not find a good benchmark that compares them.
It would be great if you could add some GPU benchmarks to sparecores that would test common ML workloads like LLM/resnet training/inference. I am not sure if there is already a good benchmark suite that you could run.

Maybe this repository is not the correct place for feature requests. Let me know if you want to move it somewhere else.

@daroczig
Copy link
Member

daroczig commented Jan 14, 2025

Thanks for this request! We are actually currently working on LLM inference speed benchmarks, which I was hoping to ship in a week or so, but we hit a problem with llama-bench from llama.cpp scaling to multiple GPUs [ggerganov/llama.cpp/discussions/11236]. We will see if we can resolve it, or we might need to write up custom benchmarking scripts supporting both CPU and (multi)GPU use cases from tiny to larger models. I will let you know about this here.

We also have plans to support other benchmarks, e.g. we started GBM model training benchmarks on CPU and GPU as well following @szilard's related benchmarks, but that was put back to the backlog due to other priorities. I think we can pick it up after the above-mentioned LLM-speed updates.

@ywilke
Copy link
Author

ywilke commented Jan 14, 2025

Great to hear that! Looking forward to see the data once it is ready.

P.S. I will add two more feature requests. Maybe you are also already working on those but just wanted to mention that there is interest for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants