Boot fails after ram upgrade

Hi all,

I am using Linux & OpenSuSE for years… I updated my OS to Tumbleweed 64 Bits 3-4 months before and it worked perfectly until today.

My Lenovo 3000 G530 laptop had 3 ( 2 + 1 ) GBs of memory and I upgraded ram to 4 ( 2 + 2 ) gigs and it does not boot after then.

System boots to grub successfully.
When i select one of the installed kernels ( 3.5.x desktop/default/vanilla, 3.1.x desktop/default/vanilla ) computer resets in seconds.
failsafe kernels also reset the computer.
There is not any weird messages in screen before rebooting.
System boots normally with single module or any setup with 3 GBs in total.

  • I updated the BIOS to latest version.
  • I checked memory modules via memtest and no error reported.
  • My system has dual boot with Win7 64 Bit ( that i wrote this ) and it works perfectly.

So I am absolutely sure that hardware is good.

  • I have a large collection of dists, I successfully boot with;
    – gentoo 64 Bits CD
    – gentoo 32 Bits CD
    – OpenSuSE 10.3 32 Bits DVD
    – OpenSuSE 11.4 32 Bits DVD
    – System Rescue CD 64 Bits
    – System Rescue CD 32 Bits
    – Ubuntu 11.4 64 Bits DVD

  • But failed with;
    OpenSuSE 12.1 64 Bits Net Install CD
    OpenSuSE 10.2 64 Bits DVD

  • I tried to boot OpenSuSE 64 Bits CD/DVD’s and my installed kernels with these options, but no success…
    – acpi=off
    – acpi=force
    – pci=noacpi
    – acpi=oldboot
    – apm=off
    – ide=nodma
    – edd=on
    – edd=off
    – noapic
    – nolapic

This smells like a bug in openSuSE 64 Bits kernels and i will file a bug report if i would get a solution in forums.

Anybody ever saw something like this before ? any ideas ???

It is far more likely that you are having memory error problems, than that you have found a kernel bug.

I was thinking the same before booting Windows and other distros. But everything works fine so this cant be true.

I compiled a 64 Bits kernel; from 2.6.38.8 vanilla kernel sources using Ubuntu’s kernel config. It booted and i am using this custom kernel right now.

Now its clear that there is a problem in OpenSuSE’s 64 Bits kernels :frowning:

I will try to compile 3.x kernels when i have some spare time…

On 08/07/2012 09:46 AM, xsdnd wrote:
>
> I was thinking the same before booting Windows and other distros. But
> everything works fine so this cant be true.
>
>
> I compiled 2.6.38.8 vanilla kernel from sources using Ubuntu’s kernel
> config. It booted and i am using this custom kernel right now.
> Now its clear that there is a problem in OpenSuSE’s kernels :frowning:
>
> I will try to compile 3.x kernels when i have some spare time…

You make it sound as if openSUSE uses a different kernel than does Ubuntu. That
is not true - both use the mainline kernel maintained by Linus Torvalds. There
are usually some distro patches applied, but none of them affect anything as
basic as memory management.

There is a possibility that a bug was introduced between 2.6.38 and 3.1, but I
think this is unlikely for an undetected bug to last this long.

You also need to compare the configuration parameters used with the Ubuntu
kernel with those of the openSUSE kernel you tried. There could be a problem there.

Ah sorry for misunderstanding i did not want to say something like that. Of course they share the very same code base…

I think cause of the problem is in module configrations but i dont a have a 2.6.38 OpenSuSE kernel config so i could not get a diff.

If everything works when i compile the kernel 3.1.10 with current running config than we could diff the problematic module configuration and possibly file a bug report with a fix.

On 08/07/2012 11:06 AM, xsdnd wrote:
>
> lwfinger;2478820 Wrote:
>> On 08/07/2012 09:46 AM, xsdnd wrote:
>> You make it sound as if openSUSE uses a different kernel than does
>> Ubuntu. That
>> is not true - both use the mainline kernel maintained by Linus
>> Torvalds. There
>> are usually some distro patches applied, but none of them affect
>> anything as
>> basic as memory management.
>>
>> There is a possibility that a bug was introduced between 2.6.38 and
>> 3.1, but I
>> think this is unlikely for an undetected bug to last this long.
>>
>> You also need to compare the configuration parameters used with the
>> Ubuntu
>> kernel with those of the openSUSE kernel you tried. There could be a
>> problem there.
>
>
> Ah sorry for misunderstanding i did not want to say something like
> that. Of course they share the very same code base…
>
> I think cause of the problem is in module configrations but i dont a
> have a 2.6.38 OpenSuSE kernel config so i could not get a diff.
>
>
> If everything works when i compile the kernel 3.1.10 with current
> running config than we could diff the problematic module configuration
> and possibly file a bug report with a fix.

That plan sounds good.

As i suspected custom build kernel booted…

Now need to find the problematic difference in configs.

Here is the diff for 3.1.10;

[Diff] Linux Kernel 3.1.10 config diff - Pastebin.com](http://pastebin.com/dtZhSy9k)

I am not a kernel expert and as you guess compiling a kernel takes too much time, so i cant try to compile it again and again with every possible combination.

Anybody with more experience check the diff and make recommendations please ?

On 08/07/2012 04:36 PM, xsdnd wrote:
>
> As i suspected custom build kernel booted…
>
> Now need to find the problematic difference in configs.
>
>
> Here is the diff for 3.1.10;
>
> ‘[Diff] Linux Kernel 3.1.10 config diff - Pastebin.com
> (http://pastebin.com/dtZhSy9k)
>
>
> I am not a kernel expert and as you guess compiling a kernel takes too
> much time, so i cant try to compile it again and again with every
> possible combination.
>
> Anybody with more experience check the diff and make recommendations
> please ?

The only thing I see in the differences that might make a difference is
“CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y” is set in the xss kernel.

The other potential change is that the xss kernel uses the SLUB method, and the
openSUSE configuration uses SLAB; however, that should not make any difference.

just an update;

in the past 20 days; i tried to compile and recompile kernel again and again.

it seems like 2-3 kernel config options is responsible about this situation.

one of them is (i am %99 sure about this) config_acpi_processor, this one must be “Y” in config (its M in kernel desktop config), if its a module then boot fails. but this does not enough for successful booting.

when that one is set as Y boot continues until later stages but just after “boot” daemon starts, laptop reboots itself. i am pretty sure about, this is because of BUS options in kernel config. in a few days this will be resolved.

PS : it takes about 2,5 - 3 hours to compile a kernel rpm in a dual core centrino :frowning:

On 08/23/2012 09:46 AM, xsdnd wrote:
>
> lwfinger;2478898 Wrote:
>> On 08/07/2012 04:36 PM, xsdnd wrote:
>>>
>>> As i suspected custom build kernel booted…
>>>
>>> Now need to find the problematic difference in configs.
>>>
>>>
>>> Here is the diff for 3.1.10;
>>>
>>> ‘[Diff] Linux Kernel 3.1.10 config diff - Pastebin.com
>>> (’[Diff] Linux Kernel 3.1.10 config diff - Pastebin.com
>> (http://pastebin.com/dtZhSy9k))
>>>
>>>
>>> I am not a kernel expert and as you guess compiling a kernel takes
>> too
>>> much time, so i cant try to compile it again and again with every
>>> possible combination.
>>>
>>> Anybody with more experience check the diff and make recommendations
>>> please ?
>>
>> The only thing I see in the differences that might make a difference is
>> “CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y” is set in the xss kernel.
>>
>> The other potential change is that the xss kernel uses the SLUB method,
>> and the
>> openSUSE configuration uses SLAB; however, that should not make any
>> difference.
>
> just an update;
>
> in the past 20 days; i tried to compile and recompile kernel again and
> again.
>
> it seems like 2-3 kernel config options is responsible about this
> situation.
>
>
> one of them is (i am %99 sure about this) config_acpi_processor, this
> one must be “Y” in config (its M in kernel desktop config), if its a
> module then boot fails. but this does not enough for successful booting.

It is also possible to add any module to the initrd using YaST => /etc/sysconfig
Editor under the kernel/INITRD_MODULES setting.

> when that one is set as Y boot continues until later stages but just
> after “boot” daemon starts, laptop reboots itself. i am pretty sure
> about, this is because of BUS options in kernel config. in a few days
> this will be resolved.
>
>
> PS : it takes about 2,5 - 3 hours to compile a kernel rpm in a dual
> core centrino :frowning:

For testing on your own system, you do not need to make the rpm. By doing the
following steps, you save considerable time:


Set new configuration with 'make xconfig' or 'make menuconfig'
make -j3
sudo make modules_install install

On openSUSE, the above sequence will add the new kernel to the boot menu if it
has a name that is different than others. One of the things the rpm build does
is a ‘make clean’ when it starts. That step is generally not needed,
particularly when you are using the same source and only changing the configuration.

BTW, that number following the -j in the make statement should be the number of
CPUs + 1. That parameter alone could cut the build time by a factor of 2.

I hope you are not building the kernel as root!! The kernel build system is
quite complex and running it without restrictions can lead to disaster. In the
past, a bug in the build code caused /dev/null to be destroyed when the build
was run as root. Determining the cause for the multitude of errors that resulted
took quite a while. A number of people had to reinstall.

i did not know that that, thanks for it :slight_smile:

yep thats right but i could not find a way to easyly uninstall after test, i just built rpms in the nights while sleeping.
by the way i could not make rpms for 3.5.x, it throws an error while building rpm. and compiled kernels with this way :wink:

oops also dont know that, i built everything as root since gentoo days, bad habbit :frowning:

So this is an ACPI problem and here is the solution for me;

In the OpenSuSE kernel sources, there were 2 kernel options i changed;
Power Management and ACPI Options -> ACPI Support -> Processor to YES (CONFIG_ACPI_PROCESSOR) automatically sets the CONFIG_ACPI_CONTAINER to Y and CONFIG_THERMAL to Y)
Power Management and ACPI Options -> CPU Frequency Scaling -> X86 CPU scaling Drivers -> ACPI Processor P-States Driver to YES (CONFIG_X86_ACPI_CPUFREQ)

— Details —

I wanted to test if this would be resolved with dynamic loading of modules with initrd and I built 2 kernels from OpenSuSE kernel sources (you can not set CONFIG_X86_ACPI_CPUFREQ to YES without setting CONFIG_ACPI_PROCESSOR to YES so not 3);

  • In the first one (semi-patched) only changed the option CONFIG_ACPI_PROCESSOR from M to Y (and also CONFIG_ACPI_CONTAINER to Y and CONFIG_THERMAL to Y).

  • And other (patched) changed both options.

  • Default => Sysconfig -> initrd_modules = thermal processor fan

  • opensuse standart kernel - FAILS
    sysconfig -> initrd_modules = processor container acpi-cpufreq thermal_sys

  • opensuse standart kernel - FAILS
    sysconfig -> initrd_modules = thermal processor fan container acpi-cpufreq thermal_sys

  • opensuse standart kernel - FAILS
    sysconfig -> initrd_modules = processor fan container acpi-cpufreq thermal_sys

  • opensuse desktop patched kernel - BOOTS & RUNS OK
    sysconfig -> initrd_modules = thermal processor fan

  • opensuse desktop semi-patched kernel - BOOTS then FAILS
    sysconfig -> initrd_modules = thermal processor fan

  • opensuse desktop semi-patched kernel - FAILS
    sysconfig -> initrd_modules = thermal processor fan acpi-cpufreq

So for me the only possible solution was changing both options and using that kernel.

These works for both 3.1.10-1.16 and 3.5.2-39 kernels…
These settings taken from Ubuntu generic kernel and i am sure they are safe for everyone…

The bug report;
https://bugzilla.novell.com/show_bug.cgi?id=777376

On 08/24/2012 06:16 PM, xsdnd wrote:
> lwfinger;2481556 Wrote:
>>
>> For testing on your own system, you do not need to make the rpm. By
>> doing the
>> following steps, you save considerable time:
>>
>>>
> Code:
> --------------------
> > >
> > Set new configuration with ‘make xconfig’ or ‘make menuconfig’
> > make -j3
> > sudo make modules_install install
> >
> --------------------
>>>
>>
> yep thats right but i could not find a way to easyly uninstall after
> test, i just built rpms in the nights while sleeping.
> by the way i could not make rpms for 3.5.x, it throws an error while
> building rpm. and compiled kernels with this way :wink:

Yes, one does need to change /boot/grub/menu.lst (I generally use YaST =>
Bootloader), and delete files from /boot and /lib/modules to get rid of a test
kernel. I frequently do kernel bisections to find the source of regressions.
Having all the intermediate kernels available for retesting is valuable in case
the bisection results in a nonsense result, and I need to recheck one.

>> I hope you are not building the kernel as root!! The kernel build
>> system is
>> quite complex and running it without restrictions can lead to disaster.
>> In the
>> past, a bug in the build code caused /dev/null to be destroyed when the
>> build
>> was run as root. Determining the cause for the multitude of errors that
>> resulted
>> took quite a while. A number of people had to reinstall.
>
> oops also dont know that, i built everything as root since gentoo days,
> bad habbit :frowning:

A noted developer, whose name I will not reveal, feels so strongly about this
issue that he would like to insert a “rm -rf <slash>” in the kernel build
process to discourage building kernels as root.

lol nice idea! :smiley:

On 2012-08-25 01:37, Larry Finger wrote:

> A noted developer, whose name I will not reveal, feels so strongly about this issue that he
> would like to insert a “rm -rf <slash>” in the kernel build process to discourage building
> kernels as root.

Then the instructions should explain that clearly, because they don’t.

Have a look at “/usr/src/linux/README.SUSE”, for example. The only paragraph mentioning “root”
is this:

(2) Create a build directory for use in configuring and building
the kernel. Using /usr/src/linux directly requires root priviledges
and will cause problems if you need to build kernel modules for
other installed kernels.

and that’s all. On “/usr/src/linux/README” they say:

To do the actual install you have to be root, but none of the normal
build should require that. Don’t take the name of root in vain.

but they don’t explain how to do it as plain user.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

On 08/25/2012 05:38 PM, Carlos E. R. wrote:
> On 2012-08-25 01:37, Larry Finger wrote:
>
>> A noted developer, whose name I will not reveal, feels so strongly about this issue that he
>> would like to insert a “rm -rf <slash>” in the kernel build process to discourage building
>> kernels as root.
>
> Then the instructions should explain that clearly, because they don’t.
>
> Have a look at “/usr/src/linux/README.SUSE”, for example. The only paragraph mentioning “root”
> is this:
>
> (2) Create a build directory for use in configuring and building
> the kernel. Using /usr/src/linux directly requires root priviledges
> and will cause problems if you need to build kernel modules for
> other installed kernels.
>
> and that’s all. On “/usr/src/linux/README” they say:
>
> To do the actual install you have to be root, but none of the normal
> build should require that. Don’t take the name of root in vain.
>
> but they don’t explain how to do it as plain user.

There are 2 ways. (1) use ‘cp -rf /usr/src/linux ~/linux’, which doubles the
storage requirements, or (2) sudo chown -R <your_user> <your_group>
/usr/src/linux’. After that, you can ‘cd /usr/srv/linux’ and build as your
regular user. Option (2) is the one I use with distribution kernels. For
mainline kernels, I use git and keep a current source in /home.

On 2012-08-26 05:05, Larry Finger wrote:
> On 08/25/2012 05:38 PM, Carlos E. R. wrote:

> There are 2 ways. (1) use ‘cp -rf /usr/src/linux ~/linux’, which doubles the storage
> requirements, or (2) sudo chown -R <your_user> <your_group> /usr/src/linux’. After that, you
> can ‘cd /usr/srv/linux’ and build as your regular user. Option (2) is the one I use with
> distribution kernels.

Wow.

Well, instead of that I allowed write permission to the group. But you also need
/usr/src/kernel-modules/ and some others, I think.

If compiling by user is that important, the sources should already come prepared for that.

> For mainline kernels, I use git and keep a current source in /home.

There is another method that I did once and I have forgotten how: tell the kernel compilation
to use a different output path. I have notes somewhere …]

BUILD directory for the kernel:

When compiling the kernel all output files will per default be
stored together with the kernel source code.
Using the option “make O=output/dir” allow you to specify an alternate
place for the output files (including .config).
Example:
kernel source code: /usr/src/linux-2.6.N
build directory: /home/name/build/kernel

To configure and build the kernel use:
cd /usr/src/linux-2.6.N
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install

Please note: If the ‘O=output/dir’ option is used then it must be
used for all invocations of make.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)