machine keeps shutting off in the middle of the night

It has been running OK for a few months…recently it has been shutting itself off when nobody is using it.
It’s used as a file server.

OpenSuse 10.3 32-bit
Linux Kernel 2.6.22.17-0.1-default
Intel Q6600 quad processor
Intel D975XBX2 M/B
Using linux software RAID with SATA drives

I get this in the messages log:

Jun 26 01:51:02 server kernel: Pid: 186, comm: pdflush Tainted: G N 2.6.22.17-0.1-default #1
Jun 26 01:51:02 server kernel: RIP: 0010:<ffffffff802643fc>] <ffffffff802643fc>] page_waitqueue+0x58/0x6d
Jun 26 01:51:02 server kernel: RSP: 0018:ffff8101299abc08 EFLAGS: 00010206
Jun 26 01:51:02 server kernel: RAX: 08face00e7401160 RBX: ffff81012cc01160 RCX: 0000000000000040
Jun 26 01:51:02 server kernel: RDX: 0000000000001500 RSI: c000000000000000 RDI: 0000000000000000
Jun 26 01:51:02 server kernel: RBP: ffff8101299abe60 R08: 0000000000000000 R09: 000000000000003f
Jun 26 01:51:02 server kernel: R10: ffff81012cc018e0 R11: 0000000000000006 R12: 0000000000000000
Jun 26 01:51:02 server kernel: R13: 0000000000000000 R14: ffff8101299abc50 R15: ffff81012b7579a8
Jun 26 01:51:02 server kernel: FS: 0000000000000000(0000) GS:ffffffff80500000(0000) knlGS:0000000000000000
Jun 26 01:51:02 server kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jun 26 01:51:02 server kernel: CR2: 0000000000001b90 CR3: 000000011abb3000 CR4: 00000000000006e0
Jun 26 01:51:02 server kernel: Process pdflush (pid: 186, threadinfo ffff8101299aa000, task ffff81012a850100)
Jun 26 01:51:02 server kernel: Stack: ffffffff80264a89 ffff81012cc01160 ffffffff80269d82 ffff8101299abd20
Jun 26 01:51:02 server kernel: ffffffff802ae702 ffff810042fb6e68 000000000000000e ffffffffffffffff
Jun 26 01:51:02 server kernel: 0000000044dc5268 000000000000000e 0000000000000000 ffff81012cc01160
Jun 26 01:51:02 server kernel: Call Trace:
Jun 26 01:51:02 server kernel: <ffffffff80264a89>] unlock_page+0x18/0x26
Jun 26 01:51:02 server kernel: <ffffffff80269d82>] write_cache_pages+0x171/0x2cd
Jun 26 01:51:02 server kernel: <ffffffff802ae702>] __mpage_writepage+0x0/0x4fd
Jun 26 01:51:02 server kernel: <ffffffff802aec8d>] mpage_writepages+0x40/0x5d
Jun 26 01:51:02 server kernel: <ffffffff883fecec>] :fat:fat_get_block+0x0/0x1e8
Jun 26 01:51:02 server kernel: <ffffffff80269f1a>] do_writepages+0x20/0x2d
Jun 26 01:51:02 server kernel: <ffffffff802a4c8a>] __writeback_single_inode+0x1d7/0x3bb
Jun 26 01:51:02 server kernel: <ffffffff802a51ce>] sync_sb_inodes+0x1c4/0x2b0
Jun 26 01:51:02 server kernel: <ffffffff802a566f>] writeback_inodes+0x82/0xdb
Jun 26 01:51:02 server kernel: <ffffffff8026a462>] wb_kupdate+0x9e/0x111
Jun 26 01:51:02 server kernel: <ffffffff8026a77a>] pdflush+0x0/0x203
Jun 26 01:51:02 server kernel: <ffffffff8026a8d3>] pdflush+0x159/0x203
Jun 26 01:51:02 server kernel: <ffffffff8026a3c4>] wb_kupdate+0x0/0x111
Jun 26 01:51:02 server kernel: <ffffffff8024420c>] kthread+0x47/0x73
Jun 26 01:51:02 server kernel: <ffffffff8020aa48>] child_rip+0xa/0x12
Jun 26 01:51:02 server kernel: <ffffffff802441c5>] kthread+0x0/0x73
Jun 26 01:51:02 server kernel: <ffffffff8020aa3e>] child_rip+0x0/0x12
Jun 26 01:51:02 server kernel:
Jun 26 01:51:02 server kernel:
Jun 26 01:51:02 server kernel: Code: 2b 8a 90 06 00 00 48 d3 e8 48 6b c0 18 48 03 82 80 06 00 00
Jun 26 01:51:02 server kernel: RIP <ffffffff802643fc>] page_waitqueue+0x58/0x6d
Jun 26 01:51:02 server kernel: RSP <ffff8101299abc08>
Jun 26 01:51:02 server kernel: CR2: 0000000000001b90

After this, it seems to run for a while…then shuts itself off. We turn it on in the morning and it seems to work fine all day.

Any help appreciated…

If you are running KDE I would start by checking KPowersave.

It may be that it is just set to shut down after a specified period of time.

If it is a server and you want it to run I would check Autosuspend.

Maybe that is selected to show suspend after a period of time, and it is suspending to disk on you .

Nope…running Gnome.

As I mentioned…it has run fine for months. Now mysteriously shuts off. Doesn’t reboot. bios is set to turn on if power if disrupted.

I come in and turn on the power button and it seems to be ok for a while.

Anybody know what the kernel message “pdflush tainted” means?

On Thu, 26 Jun 2008 20:26:03 GMT
randallcwilkinson <randallcwilkinson@no-mx.forums.opensuse.org> wrote:

>
> Nope…running Gnome.
>
> As I mentioned…it has run fine for months. Now mysteriously shuts
> off. Doesn’t reboot. bios is set to turn on if power if disrupted.
>
> I come in and turn on the power button and it seems to be ok for a
> while.
>
> Anybody know what the kernel message “pdflush tainted” means?
>
>
Hi
Power supply failing, memory, all the core temps on cpu ok? Have you
checked the RAID is ok?


Cheers Malcolm °¿° (Linux Counter #276890)
SLED 10.0 SP2 x86_64 Kernel 2.6.16.60-0.23-smp
up 15 days 17:15, 0 users, load average: 0.69, 0.27, 0.17
GPU GeForce 8600 GTS Silent - Driver Version: 173.14.09

> Power supply failing, memory, all the core temps on cpu ok? Have you
> checked the RAID is ok?

yep…i agree…i’d look for weakening hardware…maybe there is a
cron job “in the middle of the night” that is prompting a hardware
problem to kick it dead…

maybe lie to it and set the system clock 12 hours off and see if it
starts dying in the middle of the day…(and you can, of course, run
top/atop to a log to keep up with what the system/network load is as it
dies)…

DenverD

randallcwilkinson schrieb:
> As I mentioned…it has run fine for months. Now mysteriously shuts
> off. Doesn’t reboot. bios is set to turn on if power if disrupted.

I would suspect RAM gone bad. Happens. Try running memtest86 for a
couple of hours - has been a revelation to me more than once. :slight_smile:

> Anybody know what the kernel message “pdflush tainted” means?

“pdflush” is the name of the process that was running at the time
the kernel error has been detected. “tainted” means you have loaded
some non-opensource kernel module, and the integrity of the kernel
can consequently not be relied on anymore. That would be relevant
if a kernel bug was to be suspected as the reason for the crash,
because then the first step would be to try reproducing the problem
with an untainted kernel, ie. without any non-opensource modules
loaded. But your case looks very much like a hardware problem to
me, so I wouldn’t worry too much about the “tainted” message.

HTH
T.

Now that you mention it a backup cron job was scheduled only moments before these kernel messages occurred.

I have removed those for now. We’ll see tomorrow

The RAID array seems OK after a reboot and fschk.

I’ll have to wait for some downtime to do memtest.

The only non-Suse provided driver/module is nvidia. I’ll try ditching it and going to nv.

On Thu, 26 Jun 2008 22:26:04 GMT
randallcwilkinson <randallcwilkinson@no-mx.forums.opensuse.org> wrote:

>
> Now that you mention it a backup cron job was scheduled only moments
> before these kernel messages occurred.
>
> I have removed those for now. We’ll see tomorrow
>
> The RAID array seems OK after a reboot and fschk.
>
> I’ll have to wait for some downtime to do memtest.
>
> The only non-Suse provided driver/module is nvidia. I’ll try ditching
> it and going to nv.
>
>
Hi
Have you checked the RAID status via mdadm --detail /dev/md<number> ?


Cheers Malcolm °¿° (Linux Counter #276890)
SLED 10.0 SP2 x86_64 Kernel 2.6.16.60-0.23-smp
up 15 days 19:55, 0 users, load average: 3.69, 2.95, 1.52
GPU GeForce 8600 GTS Silent - Driver Version: 173.14.09

It’s either the nvidia driver memory leaking or you have a bad memory stick. Not enough details in that log posting to see. If it stays up after going nv, then you know.

I eliminated a backup cron job that copied files to an external FAT32 HD each night. That seemed to take care of it for now.

Soon I will be switching to a 3ware hardware based RAID card and upgrading to openSuSE 11.0 in two weeks.

> Soon I will be switching to a 3ware hardware based RAID card and
> upgrading to openSuSE 11.0 in two weeks.

be sure and do some reading about the pitfalls/hmmmmm, growing pains of
11.0…and especially decide if you wanna be a test pilot without a
parachute on KDE4.x

DenverD

DenverD wrote:
>> Soon I will be switching to a 3ware hardware based RAID card and
>> upgrading to openSuSE 11.0 in two weeks.
>
> be sure and do some reading about the pitfalls/hmmmmm, growing pains of
> 11.0…and especially decide if you wanna be a test pilot without a
> parachute on KDE4.x
>
> DenverD
Or just fall back to 3.5.9 if you are KDE fan, otherwise gnome and xfce
are other viable options.