I am getting a lot of single bit ECC failures at the same address. Is there a way to set a threshold for the number of failures, such that when the hits pass the threshold Linux will automatically move the page to a different page frame, or page it out, and then mark the bad page frame as permanently reserved?
Hi
If you know the exact address you could look at the memmap kernel option to reserve/block it from use.
https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt
Replace the RAM module, if you move to a different slot does the error move…?
Is memmap=1$ss legitimate?
Editing /etc/sysctl.conf takes care of the one occurrence, but I was really hoping for a way to make it automatic for future occurrences.
Hi
Alas my last and very limited experience with ECC memory was in the early 2000’s on Solaris… check via dmesg | grep e820.
What is the full parameter name in /etc/sysctl.conf and do I use memmap=1$0x000000002a7cb510, memmap=16$0x000000002a7cb510 or memmap=4K$x000000002a7cb000?
Thanks.
Hi
The last iteration so that would exclude 4K of memory starting at address 0x000000002a7cb000 (it is 0x too).
Thanks. What name do I use for /proc or sysctl -w if I want to do it before a reboot?
Hi
It should be down in /proc/sys/vm/.
Have a read here: https://www.kernel.org/doc/Documentation/sysctl/vm.txt
That doesn’t list memmap.
I don’t see any subdirectory there relevant to reserving physical memory. I see /proc/sys/vm/mmap/foo, but those are something different.