Unix & Linux Asked by Aleksandr Dubinsky on December 16, 2020
How is it possible to control the fan speed of multiple consumer NVIDIA GPUs such as Titan and 1080 Ti on a headless node running Linux?
The following is a simple method that does not require scripting, connecting fake monitors, or fiddling and can be executed over SSH to control multiple NVIDIA GPUs' fans. It has been tested on Arch Linux.
sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7
This will create an /etc/X11/xorg.conf
with an entry for each GPU, similar to the manual method.
Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/
or /usr/share/X11/xorg.conf.d/
), which override xorg.conf
and set AllowNVIDIAGPUScreens
. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.
nvidia-xconfig --query-gpu-info
Find the PCI BusID
fields. Note that these are not the same as the bus IDs reported in the kernel.
Alternatively, do sudo startx
, open /var/log/Xorg.0.log
(or whatever location startX lists in its output under the line "Log file:"), and look for the line NVIDIA(0): Valid display device(s) on GPU-<GPU number> at PCI:<PCI ID>
.
/etc/X11/xorg.conf
Here is an example of xorg.conf
for a three-GPU machine:
Section "ServerLayout"
Identifier "dual"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
Screen 1 "Screen2" RightOf "Screen1"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:5:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:6:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Device"
Identifier "Device2"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:9:0:0"
Option "Coolbits" "7"
Option "AllowEmptyInitialConfiguration"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
EndSection
Section "Screen"
Identifier "Screen1"
Device "Device1"
EndSection
Section "Screen"
Identifier "Screen2"
Device "Device2"
EndSection
The BusID
must match the bus IDs we identified in the previous step. The option AllowEmptyInitialConfiguration
allows X to start even if no monitor is connected. The option Coolbits
allows fans to be controlled. It can also allow overclocking.
Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/
or /usr/share/X11/xorg.conf.d/
), which override xorg.conf
and set AllowNVIDIAGPUScreens
. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.
/root/.xinitrc
nvidia-settings -q fans
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:2]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=75
I use .xinitrc to execute nvidia-settings for convenience, although there's probably other ways. The first line will print out every GPU fan in the system. Here, I set the fans to 75%.
sudo startx -- :0
You can execute this command from SSH. The output will be:
Current version of pixman: 0.34.0
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat May 27 02:22:08 2017
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
Attribute 'GPUFanControlState' (pushistik:0[gpu:0]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:0]) assigned value 75.
Attribute 'GPUFanControlState' (pushistik:0[gpu:1]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:1]) assigned value 75.
Attribute 'GPUFanControlState' (pushistik:0[gpu:2]) assigned value 1.
Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:2]) assigned value 75.
nvidia-smi
and nvtop
can be used to observe temperatures and power draw. Lower temperatures will allow the card to clock higher and increase its power draw. You can use sudo nvidia-smi -pl 150
to limit power draw and keep the cards cool, or use sudo nvidia-smi -pl 300
to let them overclock. My 1080 Ti runs at 1480 MHz if given 150W, and over 1800 MHz if given 300W, but this depends on the workload. You can monitor their clock speed with nvidia-smi -q
or more specifically, watch 'nvidia-smi -q | grep -E "Utilization| Graphics|Power Draw"'
Reboot. I haven't found another way to make the fans automatic.
Correct answer by Aleksandr Dubinsky on December 16, 2020
When you run fans.py, it sets up a temporary X server for each GPU with a fake display attached. Then, it loops over the GPUs every few seconds and sets the fan speed according to their temperature. When the script dies, it returns control of the fans to the drivers and cleans up the X servers.
Answered by Andy Jones on December 16, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP