kernel for multi-socket systems?

Is there a preferred kernel for multi-socket systems, such as Opteron systems, or is the default “desktop” kernel the best.
Basically, I’m running large climate models - so heavy numerics and very long runtimes. Is the support for Opterons the
same in OS12.3 and OS13.1? (I would also consider installing a different distro of Linux if there were an advantage, but
I think all the distros have similar kernels…)
SuperMicro 2U server
Opteron 6386SE
512GB memory
OS 12.3
Thank you!!! :slight_smile:

I am using kernel-default and it has “CONFIG_NR_CPUS=512” enabled.

Thanks. Wow :open_mouth: I guess they’re either optimistic, or all kernels are pretty much the same. Mine is the “desktop” kernel. I think that’s the same as the “default” kernel.

Default has different timings to deal with server latency different then Desktop does. Otherwise they are the same just tuned different.

I guess the issues with Opterons aren’t just number of CPU’s. I’m having trouble with system stability (OS12.3x64) and I think it might be related to “features” of the Opteron 6386SE - maybe the default kernel (and upgrading to 13.1) would help? I know compatibilities have changed between 12.x and 13.1 (I have a new Pavilion AMD upon which 12.x won’t install, and an old one upon which 13.1 won’t install). Also, to get even a couple of hours stability, I have to turn off all processor speed control (PowerNow, DownCPU, etc.). This seems like it might be a kernel issue…

OTOH: It could be hardware. One Phenom I had once upon a time showed a bad in-CPU memory controller and I had to return to factory. I think that was diagnosed with memtest86, but the memory was much smaller than this machine. I don’t think memtest86 will support 512GB memory.

5.0.1 was able to run through my 192GB.

Edit:
I use -default on the 16/32(HT) core system.

You seem to indicate you are having problems but have not stated what you are seeing? ie Do you see spontaneous reboot?I have an AMD FX 6300 (6 core) black edition and I have had several no good reason reboots. Also have gotten a message that some cache found a memory error but it was corrected. it has only happened 3 times in the 2.5 months I have had the MB+CPU. So what are you seeing??

If you have more memory then memtest can handle break it out ie remove 1/2 test that replace and remove other 1/2 test that.

Thank you for the reply. Sorry about that - yes, a spontaneous reboot (the power doesn’t cycle, but the screen goes black and it reboots). It happens about once a day but I’ve been playing with the BIOS settings and for now have all core throttling turned off (PowerNow, CState Mode, PowerCap, HPC, DownCore, C1E) and that seems to help. I am trying Sisoft Sandra right now under Win2k3server and will try Memtest 5.1 soon. Spontaneous reboot happened both with opensuse 12.3 and Win2k3serverR2SP2. CPU monitoring shows cool CPUs.

I have an FX 8350 which has been stable. I previously had trouble with a Phenom II black (not overclocked) and with Memtest I was able to show that it was one of the two memory controllers within the CPU. AMD replaced it.

Opensuse comes with a hardware checker and an old version of memtest. I should try running those too…

I just wanted everyone to know that memtest86 5.1 sees and processes all 64 CPUs and 512GB of RAM. Good to know! :slight_smile:

I see the same symptoms here black screen reboot. but only 3 times so far But then I’m not stressing this machine with running huge climate models LOL The one notification I got that corrupt memory in some cache was detected and corrected. seems to say there is a hardware problem some wher. But it is so infrequent at the moment to be basically undiagonosable here

I just got off the telephone with Tech Support - they say SLES 11 SP2 is a minimum - is that the same thing as Opensuse 13.1?

No not the same. 13.1 is well ahead of SLES.

openSUSE is the set bed for SUSE but it has been some time since SUSE version advanced. As I understand there is a new SLES coming out soon

Update: Hmmm… Sandra/Memtest show no problems. I guess at this point I’m down to trying to satisfy the vendor’s OS requirements (just to get support).

I tried running the Firmware Test from the 13.1x64 DVD and while it was loading the kernel (green bars on the bottom of the screen, slowly extending across the screen) it fails, and gives a character mode display saying the installation failed. I rebooted and ran the install media checker, and thought it was going to do the same thing, but the text mode display came up with the media tester, and it says the DVD is OK. So something is not right. Maybe I should see if kubuntu has a similar issue? I don’t really know how much different distros of linux tweak the kernel hardware support, specifically AMD piledriver.