ECC memory error question

Hi All: I have a big Opteron server with 256GB Kingston ECC DDR3 DRAM - I was wondering if any hardware Wizard out there can decode this for me?

Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572779] [Hardware Error]: CPU:36       MC4_STATUS-|CE|MiscV|-|AddrV|CECC]: 0x9c08400008080a13

Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572791] [Hardware Error]:      MC4_ADDR: 0x0000003186466d90

Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572795] [Hardware Error]: Northbridge Error (node 6): DRAM ECC error detected on the NB.
                                                                                                                            
Message from syslogd@OS121-TY3 at Jun 10 06:40:08 ...
 kernel:[1961523.572815] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

I think this means it caught and fixed an error, but I’m not sure. (The only reason I think that is that it is supposedly “error-correcting” memory…) :expressionless: It throws errors like this about once every few weeks or so when it is running near 100% capacity (no swapping, ~90% of available CPUs with 100% on each CPU used, 40% memory utilization).

THANK YOU!!:slight_smile:

EDIT: Trusty internet… http://superuser.com/questions/502269/hardware-error-messages-from-syslogd - I guess my main question is, shouldn’t it actually say that the error was corrected?