-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use a GPU other than a100, h100? #58
Comments
Hi,
* I want to measure storage performance, but I want to use and measure the GPU model(NVIDIA GeForce RTX 2060) I currently have. Is it possible?
The MLPerf Storage benchmark puts the same load on the storage system that using a GPU would, but it does not actually use any GPUs, it simulates what a GPU would do. The benchmark.sh script:
1. Calculates the amount of (simulated) data that must be used during the (simulated) training task to ensure that no significant level of data caching is taking place in the host that is running the benchmark based on the amount of DRAM in the machine(s) running the benchmark,
2. It generates that number of data files (full of random bytes of data),
3. It reads those data files back from the storage system in the same patterns and at the same intervals that would be done if you were actually training a neural network on the Unet3D workload.
The only supported (simulated) GPUs in the v0.5 release were the V100 and the only supported (simulated) GPUs in the upcoming v1.0 release will be the A100 and the H100.
You can experiment with other “accelerator types” (other NVIDIA GPUs or silicon from other vendors) by changing the “sleep time” in the configuration file(s) and then running the benchmark. The “sleep time” used in the v0.5 release for the Unet3D workload was the time it took a V100 GPU to compute one batch of the Unet3D workload, and similarly for the other workloads and other accelerator types. To run the benchmark with a simulated RTX 2060 accelerator, you would need to run a real Unet3D training task on the RTX 2060 and track how long it took the RTX 2060 to calculate a single batch (the average length across all batches), then you could run the benchmark with that “sleep time”.
Thanks,
Curtis
From: fadu-defresne ***@***.***>
Sent: Thursday, March 28, 2024 11:39 PM
To: mlcommons/storage ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [mlcommons/storage] Can I use a GPU other than a100, h100? (Issue #58)
I want to measure storage performance, but I want to use and measure the GPU model(NVIDIA GeForce RTX 2060) I currently have. Is it possible?
First of all, I measured it with the command below, but only the CPU utilization is 100%, and it is confirmed that the installed GPU is not utilized.
./benchmark.sh run --hosts 127.0.0.1 --workload unet3d --accelerator-type a100 --num-accelerators 8 --results-dir /storage/log/nvme0n1_unet3d_xfs_files500_proc8_20240329_152122_340072 --param dataset.num_files_train=500 --param dataset.data_folder=/mnt/nvme0n1/unet3d
—
Reply to this email directly, view it on GitHub <#58> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AXZDB7KBXUIDLTGO72ISDD3Y2UEBNAVCNFSM6AAAAABFOAITD6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYTINZWGI2DKMA> .
You are receiving this because you are subscribed to this thread. <https://github.com/notifications/beacon/AXZDB7JDI44LZPBUBLBAKELY2UEBPA5CNFSM6AAAAABFOAITD6WGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHIIAUX2I.gif> Message ID: ***@***.*** ***@***.***> >
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I want to measure storage performance, but I want to use GPU model(NVIDIA GeForce RTX 2060) I currently have. Is it possible?
First of all, I measured it with the command below, but only the CPU utilization is 100%, and it is confirmed that the installed GPU is not utilized. (The GPU was installed normally with the latest Nvidia driver.)
./benchmark.sh run --hosts 127.0.0.1 --workload unet3d --accelerator-type a100 --num-accelerators 8 --results-dir /storage/log/nvme0n1_unet3d_xfs_files500_proc8_20240329_152122_340072 --param dataset.num_files_train=500 --param dataset.data_folder=/mnt/nvme0n1/unet3d
The text was updated successfully, but these errors were encountered: