wake up from hibernate failing since tumbleweed update

Hi,
some update back in November made my wake-up after hibernate fail. Sorry for being late to report.

When waking up the system, there’s an initial boot message that has always been there and did not seem to cause a fail (messages taken from journalctl --no-pager --no-hostname -b-1 -p3 after another power-off and boot):
Dec 20 10:28:49 kernel: x86/cpu: SGX disabled by BIOS.
Dec 20 10:28:49 kernel: ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [CAP1] at bit offset/length 64/32 exceeds size of target Buffer (64 bits) (20210730/dsopcode-198)
Dec 20 10:28:49 kernel: ACPI Error: Aborting method _SB._OSC due to previous error (AE_AML_BUFFER_LIMIT) (20210730/psparse-529)

After that, the screen is cleared and another message comes up:
Dec 20 10:28:51 kernel: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 122124 PRIVRING ]

I had not noticed this message before, and since I see that message, the wake-up fails as follows:

The boot continues, and it restores all screen elements as I left them - looks like a successful wake-up. Initially, the mouse pointer can be moved, but none of the windows can be activated nor do any window updates (e.g. clock is frozen already). Shortly after clicking anywhere, the mouse pointer freezes and the only way out that I found is a hard power off.

The machine is a HP Z1 Entry Tower G5/8591, BIOS R01 Ver. 02.04.02 12/27/2019
GPU: NVIDIA GP107 (137000a1), [FONT=monospace]bios: version 86.07.71.00.0f
[/FONT]
Can anyone suggest a troubleshooting path or even a fix to that?

PS: just checked the whole journal, and the nouveau FAULT can be found as far back as March when I enabled hibernate. Hibernation and wake-up worked even with that FAULT - my problem might not be related to nouveau.
Hmm, and when trying to add the full journal I noticed that the wake-up attempt did not add ANY journal entry. The journal has the end of the hibernation directly followed by the last boot (the currently running one), skipping over the failed wake-up.

Dec 20 20:30:48 systemd-sleep[20835]: running kernel is grub menu entry openSUSE Tumbleweed (vmlinuz-5.15.7-1-default)
Dec 20 20:30:48 systemd-sleep[20835]: preparing boot-loader: selecting entry openSUSE Tumbleweed, kernel /boot/5.15.7-1-default
Dec 20 20:30:48 systemd-sleep[20835]: running /usr/sbin/grub2-once “openSUSE Tumbleweed”
Dec 20 20:30:49 systemd-sleep[20835]: time needed for sync: 0.0 seconds, time needed for grub: 0.2 seconds.
Dec 20 20:30:49 systemd-sleep[20835]: INFO: Done.
Dec 20 20:30:49 systemd-sleep[20831]: Entering sleep state ‘hibernate’…
Dec 20 20:30:49 kernel: PM: hibernation: hibernation entry
– Boot 8ef2aee395034bc690b3af7b14050a6d
Dec 21 10:36:04 kernel: microcode: microcode updated early to revision 0xea, date = 2021-01-05
Dec 21 10:36:04 kernel: Linux version 5.15.7-1-default (geeko@buildhost) (gcc (SUSE Linux) 11.2.1 20211124 [revision 7510c23c1ec53aa4a62705f0384079661342ff7b], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.37.20211112-3) #1 SMP Wed Dec 8 08:54:39 UTC 2021 (b92986a)
Dec 21 10:36:04 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.7-1-default root=UUID=d4147341-4747-4564-9815-eb29f7724fc8 splash=silent resume=/dev/disk/by-uuid/6d87067f-67c4-419f-8284-8c0955ed5f20 quiet mitigations=off

Maybe journals are missing because I have to hard-power off the system when the wake-up fails? I see
Journal file /var/log/journal/e400992336a1414ea58a370960394257/system@0005d38faddbc80f-8b700933e0925f7b.journal~ is truncated, ignoring file.
and the time stamp of that file matches the failed wake-up. It’s 8M in size, contains some binary data, but

journalctl --file /var/log/journal/e400992336a1414ea58a370960394257/system@0005d38faddbc80f-8b700933e0925f7b.journal~
Failed to open files: No data available

Thanks,
Bdot

What happens if you hibernate in text mode (booting in multi-user.target)? Does it lock up too?

I now tested the following:
to get to console mode:

init 3

to hibernate:

systemctl hibernate

then wake up,
and to get to graphical mode:

init 5

and that worked well - the shell (before the init 5) was restored with history and previous screen output. Of course it does not help to keep my apps and windows the way they were when I had to leave, but now I can try to isolate if it’s the base graphical system or any of my applications. Thanks for the hint, I’ll report progress here …

Just as a bit off topic.

I assume it would be more consistent if you use systemctl for all cases and do not mix between old-time init and systemctl in one test.

Consider reducing the size of the systemd Journal and, verifying the Journal …

  • Vacuuming the time being journaled down to about 6 weeks may help – assuming that, you are allowed to discard Journal entries – you may have to export the Journal entries before you begin to vacuum …

Stopping all apps did not help to prevent the hang upon wake-up after hibernate.
Truncating the journal to just 2 files did not help to get logs.
However, leaving the system in the hang state for over 10 minutes (by accident), resulted in logs:

[FONT=monospace]Jan 18 10:00:33 kernel: **x86/cpu: SGX disabled by BIOS.**
Jan 18 10:00:33 kernel: **ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [CAP1] at bit offset/length 64/32 exceeds size of target Buffer (64 bits) (20210930/dsopcode-198)**
Jan 18 10:00:33 kernel: **ACPI Error: Aborting method \_SB._OSC due to previous error (AE_AML_BUFFER_LIMIT) (20210930/psparse-529)**
Jan 18 10:00:35 kernel: **nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 122124  PRIVRING ]**
Jan 18 09:00:39 smartd[877]: **Device: /dev/nvme0, number of Error Log entries increased from 754 to 756**
Jan 18 09:00:39 su[1141]: **PAM _pam_load_conf_file: unable to open config for system-auth**
Jan 18 09:00:43 systemd[1]: **vertica_agent.service: New main PID 6766 does not belong to service, and PID file is not owned by root. Refusing.**
Jan 18 09:00:43 systemd[1]: **vertica_agent.service: New main PID 6766 does not belong to service, and PID file is not owned by root. Refusing.**
Jan 18 09:00:43 systemd[1]: **Failed to start Vertica management agent.**
Jan 18 09:00:45 kernel: **nouveau 0000:01:00.0: gr: intr 00000040**
Jan 18 09:01:16 pulseaudio[7088]: **GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, **or the network connection was broken.[/FONT]

There’s a lot of warning messages (-p4) in the journal, mainly from plasmashell and kwin_x11, but I cannot tell if any of them are responsible for the hang … should I include them here?

Next thing I’d try is to see if I can login to the hanging machine from remote, but I’m still accepting suggestions :wink:

Login from remote is not possible (no response). Ping works.

I lost patience with that issue and switched from nouveau to the NVidia drivers. That has resolved the issue for me, adding some indication that the nouveau driver was at fault.

Thank you all for your support.

BTW, before switching to NVidia, I found this journal entry, maybe it is of help if someone wants to tackle this bug:


Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000502000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [007febf000 unknown] 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 000000000050a000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [007febf000 unknown] 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 122124  PRIVRING ] 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000507000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [007febf000 unknown] 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80040000 [PBENTRY SIGNATURE] ch 7 [007f43b000 plasmashell[1831]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 7 [007f43b000 plasmashell[1831]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80040000 [PBENTRY SIGNATURE] ch 6 [007f59c000 kwin_x11[1772]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 6 [007f59c000 kwin_x11[1772]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80040000 [PBENTRY SIGNATURE] ch 2 [007f9b8000 Xorg.bin[1540]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 2 [007f9b8000 Xorg.bin[1540]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: e1000e 0000:00:1f.6 eno1: DPG_EXIT_DONE took 2510 msec. This is a firmware bug 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 7 [007f43b000 plasmashell[1831]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 80000000 [SIGNATURE] ch 7 [007f43b000 plasmashell[1831]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00004000 [GPPTR] ch 7 [007f43b000 plasmashell[1831]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 00004000 [GPPTR] ch 7 [007f43b000 plasmashell[1831]] subc 0 mthd 0000 data 00000000 
Jan 17 09:15:10 kernel: nouveau 0000:01:00.0: fifo: fault 00 [READ] at 0000000000000000 engine 06 [HOST0] client 06 [HUB/HOST] reason 00 [PDE] on channel 7 [007f43b000 plasmashell[1831]] 
Jan 17 09:15:10 kernel: ------------ cut here ]------------ 
Jan 17 09:15:10 kernel: WARNING: CPU: 0 PID: 18805 at drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c:284 gk104_fifo_engine_id+0x33/0x50 [nouveau] 
Jan 17 09:15:10 kernel: Modules linked in: af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter intel_rapl_msr intel_rapl_common intel_pmc_core_pltdrv intel_pmc_core iTCO_wdt intel_pmc_bxt ee1004 iTCO_vendor_support mei_hdcp mei_wdt snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence intel_tcc_cooling x86_pkg_temp_thermal snd_sof_intel_hda intel_powerclamp snd_sof_pci coretemp snd_sof_xtensa_dsp snd_sof pktcdvd soundwire_bus kvm_intel dmi_sysfs snd_hda_codec_conexant snd_soc_skl kvm snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc snd_hda_codec_generic 
Jan 17 09:15:10 kernel:  hp_wmi sparse_keymap snd_soc_sst_dsp platform_profile snd_soc_acpi_intel_match irqbypass snd_soc_acpi pcspkr efi_pstore rfkill ledtrig_audio wmi_bmof snd_soc_core uvcvideo e1000e snd_hda_codec_hdmi i2c_i801 snd_compress videobuf2_vmalloc snd_pcm_dmaengine videobuf2_memops i2c_smbus videobuf2_v4l2 snd_hda_intel snd_usb_audio videobuf2_common snd_intel_dspcfg snd_intel_sdw_acpi videodev snd_hda_codec snd_usbmidi_lib snd_rawmidi snd_hda_core snd_seq_device mc joydev snd_hwdep snd_pcm mei_me mei snd_timer ucsi_acpi typec_ucsi snd typec soundcore intel_pch_thermal roles thermal acpi_pad tiny_power_button nls_iso8859_1 nls_cp437 vfat fat fuse configfs ip_tables x_tables hid_plantronics hid_generic usbhid i915 nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_ttm_helper mxm_wmi i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt xhci_pci_renesas fb_sys_fops aesni_intel cec xhci_hcd rc_core crypto_simd cryptd nvme drm nvme_core 
Jan 17 09:15:10 kernel:  usbcore serio_raw sr_mod cdrom wmi video button btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs 
Jan 17 09:15:10 kernel: CPU: 0 PID: 18805 Comm: systemd-udevd Not tainted 5.15.12-1-default #1 openSUSE Tumbleweed 1c4cd75566fe1e0b59eea555aec08277d3586276 
Jan 17 09:15:10 kernel: Hardware name: HP HP Z1 Entry Tower G5/8591, BIOS R01 Ver. 02.04.02 12/27/2019 
Jan 17 09:15:10 kernel: RIP: 0010:gk104_fifo_engine_id+0x33/0x50 [nouveau] 
Jan 17 09:15:10 kernel: Code: 74 30 8b 97 98 04 00 00 48 85 f6 74 1d 85 d2 7e 19 48 81 c7 98 03 00 00 31 c0 48 39 37 74 18 83 c0 01 48 83 c7 10 39 d0 7c f0 <0f> 0b b8 ff ff ff ff c3 b8 0f 00 00 00 c3 66 66 2e 0f 1f 84 00 00 
Jan 17 09:15:10 kernel: RSP: 0000:ffffb352d2687cc0 EFLAGS: 00010046 
Jan 17 09:15:10 kernel: RAX: 0000000000000009 RBX: ffff949fd83ce030 RCX: 0000000000000009 
Jan 17 09:15:10 kernel: RDX: 0000000000000009 RSI: ffff949fd83ce010 RDI: ffff949fd83ce430 
Jan 17 09:15:10 kernel: RBP: 0000000000000006 R08: 0000000000000000 R09: 0000000000000000 
Jan 17 09:15:10 kernel: R10: 0000000000000009 R11: 0000000000000000 R12: 0000000000000046 
Jan 17 09:15:10 kernel: R13: ffff949fd83ce010 R14: ffff949fd83ce008 R15: ffff949fd83ce2b8 
Jan 17 09:15:10 kernel: FS:  00007f0146c88f40(0000) GS:ffff94aecf400000(0000) knlGS:0000000000000000 
Jan 17 09:15:10 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
Jan 17 09:15:10 kernel: CR2: 00005622c68245a8 CR3: 000000014a1ac005 CR4: 00000000003706f0 
Jan 17 09:15:10 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
Jan 17 09:15:10 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
Jan 17 09:15:10 kernel: Call Trace: 
Jan 17 09:15:10 kernel:  <TASK> 
Jan 17 09:15:10 kernel:  gk104_fifo_fault+0x107/0x230 [nouveau 1295e3e7288d5a89b5745927852cfd5a1b1a1ab5] 
Jan 17 09:15:10 kernel:  gp100_fifo_intr_fault+0xe8/0x110 [nouveau 1295e3e7288d5a89b5745927852cfd5a1b1a1ab5] 
Jan 17 09:15:10 kernel:  gk104_fifo_intr+0x2a3/0x500 [nouveau 1295e3e7288d5a89b5745927852cfd5a1b1a1ab5] 
Jan 17 09:15:10 kernel:  ? user_path_at_empty+0x45/0x50 
Jan 17 09:15:10 kernel:  nvkm_mc_intr+0x129/0x170 [nouveau 1295e3e7288d5a89b5745927852cfd5a1b1a1ab5] 
Jan 17 09:15:10 kernel:  nvkm_pci_intr+0x4d/0x90 [nouveau 1295e3e7288d5a89b5745927852cfd5a1b1a1ab5] 
Jan 17 09:15:10 kernel:  __handle_irq_event_percpu+0x37/0x160 
Jan 17 09:15:10 kernel:  handle_irq_event+0x57/0xb0 
Jan 17 09:15:10 kernel:  handle_edge_irq+0x87/0x220 
Jan 17 09:15:10 kernel:  __common_interrupt+0x3b/0xa0 
Jan 17 09:15:10 kernel:  common_interrupt+0x3e/0xa0 
Jan 17 09:15:10 kernel:  ? asm_common_interrupt+0x8/0x40 
Jan 17 09:15:10 kernel:  asm_common_interrupt+0x1e/0x40 
Jan 17 09:15:10 kernel: RIP: 0033:0x560bc7f7d331 
Jan 17 09:15:10 kernel: Code: e5 fe 83 c5 2a 41 89 c0 89 e8 83 c8 01 45 85 c0 0f 49 e8 48 8b 44 24 08 23 68 0c 0f 84 f9 00 00 00 41 c6 47 68 00 48 8b 68 38 <45> 31 e4 48 85 ed 0f 84 9a 00 00 00 0f 1f 00 48 8b 54 24 08 48 89 
Jan 17 09:15:10 kernel: RSP: 002b:00007ffd26166a30 EFLAGS: 00000202 
Jan 17 09:15:10 kernel: RAX: 0000560bc8424400 RBX: 0000560bc8325240 RCX: 0000560bc841f700 
Jan 17 09:15:10 kernel: RDX: 0000000000000000 RSI: 00007ffd26166a70 RDI: 0000560bc8452860 
Jan 17 09:15:10 kernel: RBP: 0000560bc8424490 R08: 0000000000000000 R09: 0000000000000000 
Jan 17 09:15:10 kernel: R10: 0000560bc7f8b1b7 R11: 0000000000000000 R12: 0000000000000000 
Jan 17 09:15:10 kernel: R13: 0000560bc84234f0 R14: 0000000000000009 R15: 0000560bc8501290 
Jan 17 09:15:10 kernel:  </TASK>