Super User Asked by Pandian Le on November 18, 2020
On normal occasions when xorg
and compiz
is running in my gpu, I
can Suspend
peacefully. However if I run some intense (90% GPU
in use) training (via jupyter) related to pytorch
, and subsequently
suspend after the processes are over, it refuses to sleep/wakeup.
I am positive GPU being full or not empty is causing the issue. I
don’t know why "some process" possibly related to the GPU is not
Suspending. When I run jupyter
and run 1+1
(or a simple process)
and Suspend
, then also no issues.
Kernlog shows me nothing "fishy". I have tried a bunch of online
remedies. Now at a dead end.
How do I identify what is happening? any ideas?
Other symptoms
It sort-of sleeps but I still hear some sound from the laptop when I
hit a key (it sounds as if it is booting up). And then blank screen
after that. Sometimes I get to go to the TTY but can’t type anything.
My system
Spent a good 5 full days understanding and searching and re-installing
etc… Now at a dead end.
Checked the kern logs (pastebin link) but didn’t see anything
"fishy". (at 02:08
I start sleeping and at 10:21
I hit hard
reset).
Here is a tiny exerpt:
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6443] manager: sleep requested (sleeping: no enabled: yes)
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6443] manager: sleeping...
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6447] manager: NetworkManager state is now ASLEEP
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.6453] device (wlp2s0): state change: activated -> deactivating (reason 'sleeping') [100 110 37]
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8169] device (wlp2s0): state change: deactivating -> disconnected (reason 'sleeping') [110 30 37]
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8356] dhcp4 (wlp2s0): canceled DHCP transaction, DHCP client pid 8328
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8356] dhcp4 (wlp2s0): state changed bound -> done
Oct 2 02:08:06 eghx-nitro NetworkManager[8152]: <info> [1601597286.8363] dns-mgr: Writing DNS information to /sbin/resolvconf
Oct 2 02:08:06 eghx-nitro kernel: [24100.153393] wlp2s0: deauthenticating from e8:cc:18:41:3c:15 by local choice (Reason: 3=DEAUTH_LEAVING)
Oct 2 02:08:07 eghx-nitro NetworkManager[8152]: <warn> [1601597287.0509] sup-iface[0xb4a6f0,wlp2s0]: connection disconnected (reason -3)
Oct 2 02:08:07 eghx-nitro NetworkManager[8152]: <info> [1601597287.0511] device (wlp2s0): supplicant interface state: completed -> disconnected
Oct 2 02:08:07 eghx-nitro NetworkManager[8152]: <info> [1601597287.0525] device (wlp2s0): state change: disconnected -> unmanaged (reason 'sleeping') [30 10 37]
Oct 2 02:08:08 eghx-nitro kernel: [24101.983885] PM: suspend entry (deep)
Oct 2 02:08:09 eghx-nitro kernel: [24101.983888] PM: Syncing filesystems ... done.
Oct 2 10:21:32 eghx-nitro kernel: [24103.953554] Freezing user space
processes ... (elapsed 0.002 seconds) done.
Based on Nvidia forum added the following to grub and updated.
GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_rev_override=1
acpi_osi=Linux scsi_mod.use_blk_mq=1 nouveau.modeset=0
nouveau.runpm=0 mem_sleep_default=deep"
Added the following to iniramfs-tools/modules and updated.
nvidia
nvidia_modeset
nvidia_uvm
nvidia_drm
Didn’t change kernel as there was no evidence towards it. People
changed to 4.17. Mine is currently 4.15.
Blind try: Trying different (Suspend)s
systemctl suspend
pm-suspend
Tried downgrading the drivers to 384 from 430 with changing it at
additional drivers
. This was not useful as this was not capable
of co-existing with pytorch=1.6.0
Complete remove and re-install of nvdia-430
as per here:
purge
, add-apt-repository ppa:graphics-drivers/ppa
, update
and autoinstall
.
This ended in the black screen of death. Recovered it with
noveau.modeset=0
. Somehow GPU was not working anymore.
At this point did a complete re-install of xserver
,unity
,
lightdm
and nvidia-430
over tty terminal before login screen.
This recovered the system to it’s previous state i.e., suspend
when GPU full hangs the system.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP