Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe throttling on Thinkpad T14 Gen 1 with GeForce MX330 #67

Open
jolars opened this issue May 8, 2021 · 0 comments
Open

Severe throttling on Thinkpad T14 Gen 1 with GeForce MX330 #67

jolars opened this issue May 8, 2021 · 0 comments

Comments

@jolars
Copy link

jolars commented May 8, 2021

I am experiencing severe throttling on my NVIDIA GPU. I have a Thinkpad T14 Gen1 with Geforce MX330. I have followed the guides to install the drivers (https://rpmfusion.org/Howto/NVIDIA) and to make my nvidia GPU primary (https://docs.fedoraproject.org/en-US/quick-docs/how-to-set-nvidia-as-primary-gpu-on-optimus-based-laptops/). I am on version 465.27 of the driver and have a Fedora 34 workstation setup.

I am seeing constant throttling during even idling. Right now, just idling, I am seeing:

nvidia-smi -q -d PERFORMANCE

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:19:52 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Active
        Display Clock Setting             : Not Active

Where SW Thermal Slowdown is indicating that the GPU is throttled, despite being at 59 degrees Celsius. Running glxgears and checking clocks, I get:

nvidia-smi -q -d CLOCK

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:23:43 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Clocks
        Graphics                          : 139 MHz
        SM                                : 139 MHz
        Memory                            : 405 MHz
        Video                             : 544 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1911 MHz
        SM                                : 1911 MHz
        Memory                            : 3504 MHz
        Video                             : 1708 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    SM Clock Samples
        Duration                          : 18446744073709.55 sec
        Number of Samples                 : 100
        Max                               : 1531 MHz
        Min                               : 139 MHz
        Avg                               : 0 MHz
    Memory Clock Samples
        Duration                          : 18446744073709.55 sec
        Number of Samples                 : 100
        Max                               : 3504 MHz
        Min                               : 405 MHz
        Avg                               : 0 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A

So the GPU is clearly being heavily throttled.

My guess is that this is related to the following settings:

nvidia-smi -q -d TEMPERATURE

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:25:04 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Temperature
        GPU Current Temp                  : 56 C
        GPU Shutdown Temp                 : 102 C
        GPU Slowdown Temp                 : 97 C
        GPU Max Operating Temp            : 57 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A

Interestingly, if I enable thermald with the --adaptive flag, I get this:

==============NVSMI LOG==============

Timestamp                                 : Sat May  8 13:29:56 2021
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:2D:00.0
    Temperature
        GPU Current Temp                  : 56 C
        GPU Shutdown Temp                 : 102 C
        GPU Slowdown Temp                 : 97 C
        GPU Max Operating Temp            : 75 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A

And the throttling goes away and performance is suddenly much improved.

So apparently thermald can change this setting, but I cannot seem to be able to do so manually since "GPUMaxOperatingTempThreshold" is a read-only variable:

nvidia-settings -a GPUMaxOperatingTempThreshold=80

ERROR: The attribute 'GPUMaxOperatingTempThreshold' specified in assignment 'GPUMaxOperatingTempThreshold=80' cannot be assigned (it is a read-only
       attribute).

I am now on Fedora 34 but I saw the exact same problem on Ubuntu 20.10.

I don't really know what's going on here, but it seems strange that I should have to run thermald just to escape this throttling problem (and then I still think that 75C is too low to be throttling on. To be honest, I don't really understand the interplay between GPU Slowdown Temp and GPU Max Operating Temp. It seems to me that they are synonymous.

Here's the full output from nvidia-smi:

Sat May  8 15:23:05 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 465.27       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:2D:00.0 Off |                  N/A |
| N/A   67C    P0    N/A /  N/A |    578MiB /  2002MiB |      7%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2762      G   /usr/libexec/Xorg                 293MiB |
|    0   N/A  N/A      2953      G   /usr/bin/gnome-shell               88MiB |
|    0   N/A  N/A      4524      G   ...AAAAAAAAA= --shared-files      134MiB |
|    0   N/A  N/A      5395      G   ...e/Steam/ubuntu12_32/steam       18MiB |
|    0   N/A  N/A      5604      G   ./steamwebhelper                    1MiB |
|    0   N/A  N/A      6303      G   ...AAAAAAAAA= --shared-files        6MiB |
|    0   N/A  N/A      7422      G   anki                               27MiB |
|    0   N/A  N/A     21305      G   /usr/bin/gjs                        2MiB |
+-----------------------------------------------------------------------------+

I wasn't really sure whether to post this bug here or on the NVIDIA forums, so I've cross-posted it (https://forums.developer.nvidia.com/t/severe-throttling-on-thinkpad-t14-gen-1-with-geforce-mx330/177366).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant