BUG: Bad Page State in Process <process>

Hello,

I have recently upgraded to OpenSuse 11.2 64bit from 11.1.

I am running the Xen kernel, and get daily kernel hangs - flashing caps-lock.

In dmesg, I get the following errors, which I am suspecting are related:

 7176.752200] BUG: Bad page state in process nfsd  pfn:1e6d80                                                                 
 7176.752205] page:ffff88000a039480 flags:8000000000000800 count:0 mapcount:0 mapping:(null) index:0                          
 7176.752209] Pid: 2369, comm: nfsd Tainted: P    B      2.6.31.5-0.1-xen #1                                                  
 7176.752211] Call Trace:                                                                                                     
 7176.752222]  <ffffffff800119b9>] try_stack_unwind+0x189/0x1b0                                                              
 7176.752227]  <ffffffff8000f466>] dump_trace+0xa6/0x1e0                                                                     
 7176.752230]  <ffffffff800114c4>] show_trace_log_lvl+0x64/0x90                                                              
 7176.752234]  <ffffffff80011513>] show_trace+0x23/0x40                                                                      
 7176.752239]  <ffffffff8046af06>] dump_stack+0x81/0x9e                                                                      
 7176.752243]  <ffffffff800dacb5>] bad_page+0xf5/0x160                                                                       
 7176.752248]  <ffffffff800dcf94>] free_hot_cold_page+0xa4/0x2b0                                                             
 7176.752277]  <ffffffff800dd26e>] free_hot_page+0x1e/0x40                                                                   
 7176.752281]  <ffffffff800e10d7>] put_page+0x57/0x150                                                                       
 7176.752286]  <ffffffff802ff7cb>] gnttab_page_free+0x3b/0x60                                                                
 7176.752290]  <ffffffff800dcf47>] free_hot_cold_page+0x57/0x2b0                                                             
 7176.752294]  <ffffffff800dd26e>] free_hot_page+0x1e/0x40                                                                   
 7176.752298]  <ffffffff800e10d7>] put_page+0x57/0x150                                                                       
 7176.752302]  <ffffffff803b5f1c>] skb_release_data+0x8c/0x100                                                               
 7176.752306]  <ffffffff803b5848>] __kfree_skb+0x28/0xd0                                                                     
 7176.752310]  <ffffffff80400ea8>] sk_eat_skb+0x78/0x90                                                                      
 7176.752314]  <ffffffff804040a6>] tcp_recvmsg+0x8e6/0xda0                                                                   
 7176.752319]  <ffffffff803b0183>] sock_common_recvmsg+0x43/0x70                                                             
 7176.752322]  <ffffffff803ae308>] sock_recvmsg+0x128/0x170                                                                  
 7176.752326]  <ffffffff803ae396>] kernel_recvmsg+0x46/0x80
 7176.752348]  <ffffffffa054c62d>] svc_recvfrom+0x6d/0xc0 [sunrpc]
 7176.752378]  <ffffffffa054d563>] svc_tcp_recvfrom+0x1e3/0x450 [sunrpc]
 7176.752402]  <ffffffffa055a89a>] svc_recv+0x70a/0x7e0 [sunrpc]
 7176.752438]  <ffffffffa0635c1d>] nfsd+0xbd/0x180 [nfsd]
 7176.752443]  <ffffffff8006fb96>] kthread+0xb6/0xc0
 7176.752447]  <ffffffff8000d38a>] child_rip+0xa/0x20

The nfsd is sometimes lockd, or sometimes other modules.

I have the following modules loaded:


nvidia              10326408  20     
nf_conntrack_ipv4      14232  2      
nf_defrag_ipv4          2856  1 nf_conntrack_ipv4
xt_state                2920  2                  
nf_conntrack          100768  2 nf_conntrack_ipv4,xt_state
xt_physdev              2968  4                           
iptable_filter          4520  1                           
ip_tables              24600  1 iptable_filter            
x_tables               29936  3 xt_state,xt_physdev,ip_tables
drbd                  324888  2                              
netbk                 207248  0 [permanent]                  
blkbk                  32824  0 [permanent]                  
blkback_pagemap         4032  1 blkbk                        
blktap                131684  2 [permanent]                  
xenbus_be               4904  3 netbk,blkbk,blktap           
snd_pcm_oss            59744  0                              
snd_mixer_oss          21640  1 snd_pcm_oss                  
coretemp                8680  0                              
snd_seq                78144  0                              
f71882fg               35224  0                              
snd_seq_device         10300  1 snd_seq                      
nfsd                  355392  5                              
exportfs                6376  1 nfsd                         
nfs                   402880  1                              
lockd                  95476  2 nfsd,nfs                     
fscache                53616  1 nfs                          
nfs_acl                 4072  2 nfsd,nfs                     
auth_rpcgss            56416  2 nfsd,nfs                     
sunrpc                265928  21 nfsd,nfs,lockd,nfs_acl,auth_rpcgss
ipv6                  373632  40                                   
bridge                 74504  1                                    
stp                     3340  1 bridge                             
llc                     8560  2 bridge,stp                         
microcode               5204  0                                    
fuse                   88624  1                                    
ext4                  378128  1                                    
jbd2                  106208  1 ext4                               
crc16                   2504  1 ext4                               
loop                   22228  2                                    
dm_mod                100712  30                                   
snd_hda_codec_realtek   317292  1                                  
nouveau               751508  0                                    
ttm                    76368  1 nouveau                            
drm_kms_helper         36744  1 nouveau                            
snd_hda_intel          38016  4                                    
drm                   236704  3 nouveau,ttm,drm_kms_helper         
iTCO_wdt               15312  0
snd_hda_codec         110664  2 snd_hda_codec_realtek,snd_hda_intel
pcspkr                  3720  0
i2c_algo_bit            8396  1 nouveau
i2c_i801               15624  0
r8169                  46796  0
iTCO_vendor_support     4908  1 iTCO_wdt
snd_hwdep              11056  1 snd_hda_codec
i2c_core               38176  6 nvidia,nouveau,drm_kms_helper,drm,i2c_algo_bit,i2c_i801
intel_agp              37872  0
snd_pcm               120048  4 snd_pcm_oss,snd_hda_intel,snd_hda_codec
snd_timer              33400  3 snd_seq,snd_pcm
agpgart                51212  4 nvidia,ttm,drm,intel_agp
button                  8232  0
8250_pnp               18728  0
8250                   36744  1 8250_pnp
serial_core            31528  1 8250
sr_mod                 20420  0
sg                     39808  0
lirc_mceusb2           16556  1
lirc_dev               15240  3 lirc_mceusb2
snd                    96168  16 snd_pcm_oss,snd_mixer_oss,snd_seq,snd_seq_device,snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer
soundcore              11200  1 snd
snd_page_alloc         12376  2 snd_hda_intel,snd_pcm
uhci_hcd               34056  0
ehci_hcd               64024  0
xenblk                 31596  0
cdrom                  47688  2 sr_mod,xenblk
xennet                 46304  0
edd                    13168  0
reiserfs              286888  6
fan                     6352  0
ide_pci_generic         5484  0
ide_core              147360  1 ide_pci_generic
ata_piix               30320  8
ata_generic             6508  0
pata_jmicron            4552  0
thermal                25032  0
processor              52692  0
thermal_sys            21632  3 fan,thermal,processor
hwmon                   4648  3 coretemp,f71882fg,thermal_sys

nvidia.ko taints the kernel, but the same problem happens without it.

I was expecting this to be a memory error, but I ran memtest86+ overnight without any errors.

I am wondering if it is a disk driver error causing nfsd to go awry. I don’t see any smart errors on any of the disks.

Any ideas?

Paul