You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am seeing constant throttling during even idling. Right now, just idling, I am seeing:
nvidia-smi -q -d PERFORMANCE
==============NVSMI LOG==============
Timestamp : Sat May 8 13:19:52 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Active
Display Clock Setting : Not Active
Where SW Thermal Slowdown is indicating that the GPU is throttled, despite being at 59 degrees Celsius. Running glxgears and checking clocks, I get:
nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp : Sat May 8 13:23:43 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 3504 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
SM Clock Samples
Duration : 18446744073709.55 sec
Number of Samples : 100
Max : 1531 MHz
Min : 139 MHz
Avg : 0 MHz
Memory Clock Samples
Duration : 18446744073709.55 sec
Number of Samples : 100
Max : 3504 MHz
Min : 405 MHz
Avg : 0 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
So the GPU is clearly being heavily throttled.
My guess is that this is related to the following settings:
nvidia-smi -q -d TEMPERATURE
==============NVSMI LOG==============
Timestamp : Sat May 8 13:25:04 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Temperature
GPU Current Temp : 56 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 57 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Interestingly, if I enable thermald with the --adaptive flag, I get this:
==============NVSMI LOG==============
Timestamp : Sat May 8 13:29:56 2021
Driver Version : 465.27
CUDA Version : 11.3
Attached GPUs : 1
GPU 00000000:2D:00.0
Temperature
GPU Current Temp : 56 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 75 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
And the throttling goes away and performance is suddenly much improved.
So apparently thermald can change this setting, but I cannot seem to be able to do so manually since "GPUMaxOperatingTempThreshold" is a read-only variable:
nvidia-settings -a GPUMaxOperatingTempThreshold=80
ERROR: The attribute 'GPUMaxOperatingTempThreshold' specified in assignment 'GPUMaxOperatingTempThreshold=80' cannot be assigned (it is a read-only
attribute).
I am now on Fedora 34 but I saw the exact same problem on Ubuntu 20.10.
I don't really know what's going on here, but it seems strange that I should have to run thermald just to escape this throttling problem (and then I still think that 75C is too low to be throttling on. To be honest, I don't really understand the interplay between GPU Slowdown Temp and GPU Max Operating Temp. It seems to me that they are synonymous.
Here's the full output from nvidia-smi:
Sat May 8 15:23:05 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:2D:00.0 Off | N/A |
| N/A 67C P0 N/A / N/A | 578MiB / 2002MiB | 7% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2762 G /usr/libexec/Xorg 293MiB |
| 0 N/A N/A 2953 G /usr/bin/gnome-shell 88MiB |
| 0 N/A N/A 4524 G ...AAAAAAAAA= --shared-files 134MiB |
| 0 N/A N/A 5395 G ...e/Steam/ubuntu12_32/steam 18MiB |
| 0 N/A N/A 5604 G ./steamwebhelper 1MiB |
| 0 N/A N/A 6303 G ...AAAAAAAAA= --shared-files 6MiB |
| 0 N/A N/A 7422 G anki 27MiB |
| 0 N/A N/A 21305 G /usr/bin/gjs 2MiB |
+-----------------------------------------------------------------------------+
I am experiencing severe throttling on my NVIDIA GPU. I have a Thinkpad T14 Gen1 with Geforce MX330. I have followed the guides to install the drivers (https://rpmfusion.org/Howto/NVIDIA) and to make my nvidia GPU primary (https://docs.fedoraproject.org/en-US/quick-docs/how-to-set-nvidia-as-primary-gpu-on-optimus-based-laptops/). I am on version 465.27 of the driver and have a Fedora 34 workstation setup.
I am seeing constant throttling during even idling. Right now, just idling, I am seeing:
Where SW Thermal Slowdown is indicating that the GPU is throttled, despite being at 59 degrees Celsius. Running glxgears and checking clocks, I get:
So the GPU is clearly being heavily throttled.
My guess is that this is related to the following settings:
Interestingly, if I enable thermald with the
--adaptive flag
, I get this:And the throttling goes away and performance is suddenly much improved.
So apparently thermald can change this setting, but I cannot seem to be able to do so manually since "GPUMaxOperatingTempThreshold" is a read-only variable:
I am now on Fedora 34 but I saw the exact same problem on Ubuntu 20.10.
I don't really know what's going on here, but it seems strange that I should have to run thermald just to escape this throttling problem (and then I still think that 75C is too low to be throttling on. To be honest, I don't really understand the interplay between GPU Slowdown Temp and GPU Max Operating Temp. It seems to me that they are synonymous.
Here's the full output from
nvidia-smi
:I wasn't really sure whether to post this bug here or on the NVIDIA forums, so I've cross-posted it (https://forums.developer.nvidia.com/t/severe-throttling-on-thinkpad-t14-gen-1-with-geforce-mx330/177366).
The text was updated successfully, but these errors were encountered: