Weird hibernation crashes

Hi,

After long years of being using hibernation reliably, I have it crash after
a few days of use, for the last month or two. Sometimes it stops at the
“s2disk: snapshotting system”, and fewer times it stops at the same message
but while restoring.

(I’m using oS 11.4 currently. Gnome, but this is not related)

As this was working before, I assume it has been some update. My first
suspect has been pm-utils, which got a few patches recently. I’m now using
1.4.1-5.9.1 instead of the newest updated 1.4.1-5.27.1. No difference.

My next suspect was the kernel. I’m using 2.6.37.6-0.11-desktop, and I have
gone back to 2.6.37.6-0.9-desktop (dated last October). No difference, so
I’m back at the newer kernel.

Having no success there, I don’t know what to suspect.

I thought it might be related to vmplayer: I started it, then hibernated -
success, so vmplayer is not the culprit. I have also tried with
libreooffice (!), gimp, scanning a document… I’m catching at straws here.

I don’t know what to try next :frowning:

One strange thing I noticed today is that swap is not used (ah, yes, I also
checked swap space for bad blocks, too). My system has 8 GiB of ram, but
nevertheless it uses swap (up to 2…3 GiB) when I open several
applications that use lots of ram. But today only 56K are in use, which is
strange.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Since no one else has responded, I’ll summarize my current challenges with S2ram (sleep), probably unrelated but who knows these days.

My system is different, I am running 12.1/KDE and my problem is on a laptop running Network Manager, DHCP with NFS shares mounted.
I have a desktop running virtually same setup except it is ifhup (traditional) networking, static addressing and it sleeps just fine.

My problem appears to be that NFS shares don’t get properly unmounted, so plasma won’t sleep and the system unsleeps after 20 second wait.

If I manually umount all the nfs shares with

sudo umount -alt nfs,nfs4

then initiate a sleep, all works fine and it sleeps.
But if I execute the same command from a user script placed in /etc/pm/sleep.d with a 00 name prefix, the sleep usually fails.
The script does run( I get a report at top of /var/log/pm-utils.log) and reports successful completion, but sleep usually(but not always) fails to complete and it appears from the backtrace in /var/log/messages that plasma is having issues with nfs and won’t sleep.
The laptop worked fine until I updated to KDE 4.8.3 (it was fine on 4.8.2).
Sometimes it will work, it sort of feels like a race condition, with the “lazy” umount (-l option) not being completed before a follow on step.
So far, I have added a 3 second sleep to my /etc/pm/sleep.d/00aa_nfsworkaround script, but no help.

I have a bug open BUG and have been dialoging with pm-utils folks (I think).

So probably unrelated to your hibernate issues, but…

So far, I have added a 3 second sleep to my /etc/pm/sleep.d/00aa_nfsworkaround script, but no help.

Well how about that, in replying to your problem, I remembered that I wanted to try a longer sleep in my umount script.
I changed to

sudo umount -alt nfs,nfs4
sleep 5s

and it worked, once anyway.

So I maybe fixed my issue!

On 2012-05-22 17:16, cmcgrath5035 wrote:
>
> Since no one else has responded, I’ll summarize my current challenges
> with S2ram (sleep), probably unrelated but who knows these days.

I just discovered that “suspend…rpm” also was updated. After next crash
I’ll try an older version of this one. Although there has been an update of
it today, I’ll wait.

> My system is different, I am running 12.1/KDE and my problem is on a
> laptop running Network Manager, DHCP with NFS shares mounted.
> I have a desktop running virtually same setup except it is ifhup
> (traditional) networking, static addressing and it sleeps just fine.

Uh, NFS. There are problems with that one.

Once I reported a bug [Bug 568132] with that, because hibernating with NFS
mounts active would crash the server. What I proposed (and I still have
implemented) was aborting hibernation if a mount was found.

Instead they investigated why this would fail and apparently solved the
root issue.

> So probably unrelated to your hibernate issues, but…

Yep, it is unrelated, but thanks.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Carlos,

I’ve also had my system begin to hang at the snapshotting system stage with OpenSUSE 12.1. Did your regression to an older suspend help?

On 2012-06-30 22:36, duncreg wrote:
>
> Carlos,
>
> I’ve also had my system begin to hang at the snapshotting system stage
> with OpenSUSE 12.1. Did your regression to an older suspend help?

No.

Right now I’m experimenting with a modified s2disk and a modified kernel.
Yes, I’m modifying the kernel itself to investigate this…

Yes, if somebody produces and distributes a modified kernel and a modified
s2disk program, it would allow others to test and add information to this
bug - but that is beyond my capabilities, sorry. We need help from the devs
for that, and so far, there is no answer to the bugzilla.

All these threads are related:

This one shows a similar problem, that started me investigating this:
Cannot hibernate
(s2disk)
. Some of my advances are tracked there.

This was my starting point:
How to install a source package

And here I investigate how to do the hacking I need:
how to edit and
build rpm sources ?
. I got help to modify the s2disk program, and
learned that hibernation was failing at a kernel call.

This is the Bugzilla where I request help from the devs:
Bugzilla #

Please, add at least a ME TOO message to that bugzilla! It is crucial
that people suffering from this problem let the devs know that several
people are affected. As many as possible!

This mail list thread is where I ask for help on that bugzilla, to attrack
attention to it. David Haller told me how to hack the kernel, which I had
no idea how to do.
Mailinglist Archive: Need help: I found out where s2disk crashes.

This is where I ask for help in the kernel mail list. Nobody answered.
Mailinglist Archive: call to ioctl() does not return in s2disk

So…


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2012-07-01 00:13, Carlos E. R. wrote:
> On 2012-06-30 22:36, duncreg wrote:

> So…

Status report.

In “/etc/suspend.conf” change “suspend loglevel” to 7 at least, and make
sure that “splash” is “n”. When hibernation crashes, notice the last
message. Write it or better, take a photo with a camera. When hibernation
succeeds, locate that same message in that session, and then notice the
next messages: it is probably crashing on those messages.

In my case it fails here:


> <0.6> 2012-07-06 19:54:07 Telcontar kernel - - -  792.002714] r8169 0000:06:00.0: eth0: link up

and what should go next is this:


> <0.6> 2012-07-06 19:54:07 Telcontar kernel - - -  792.291119] usb 2-5.4: reset high speed USB device using ehci_hcd and address 4
> <0.4> 2012-07-06 19:54:07 Telcontar kernel - - -  792.953016] snd-usb-audio 2-5.4:1.2: no reset_resume for driver snd-usb-audio?
> <0.4> 2012-07-06 19:54:07 Telcontar kernel - - -  792.953214] snd-usb-audio 2-5.4:1.3: no reset_resume for driver snd-usb-audio?

which matches what Jeff Mahoney told me in the opensuse-kernel mail list,
to start unloading the usb audio driver (See
here).

Thus now I’m hibernating this way:


# echo ; date ; rcalsasound stop;  pm-hibernate ; rcalsasound start ; date

I don’t know what will happen, I simply hope it does not crash.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)