VirtualBox guest OS boot failure (after update)

Hi everyone

I am running openSUSE 13.2_x64 with all the updates (KDE desktop).

Inside my VirtualBox, I run openSUSE 13.2_x86 (minimul install) with all the updates (ICE window manager).

I actually have the following repositories on my HDD and sync (keep them updated) regularly.

OSS
NON-OSS
UPDATE-OSS
UPDATE-NON-OSS
KDE-EXTRA
PACKMAN-ESSENTIALS
PACKMAN-MULTIMEDIA

Using YaST I have configured both the host and guest machine to use local HDD repos.

I usually run ‘zypper update’ on both machines after I sync (update) the repos.

Since yesterday (after updating) my guest machine has stopped booting. I am attaching a screen-shot of the error in the hope of making things clearer.

https://drive.google.com/file/d/0B-Fe1hFb-YbIRE1RXzN0eUU3NmM/view?usp=sharing

The error message points to the Machine log, which I could not make any head or tail of

:stuck_out_tongue: :stuck_out_tongue:

I will try to attach the log as well. (link)](https://drive.google.com/file/d/0B-Fe1hFb-YbITmxwLUdfLU9EOWc/view?usp=sharing)

Hope you guys can at help me make some sense of it.

Thanks in advance guys
Emon

Well, it seems to crash at “Loading initial ramdisk…”.
The most likely problem in that case is loading/applying the CPU microcode update IMO.

Try pressing ‘e’ at the boot menu and append “dis_ucode_loader” at the end of the line starting with “linux” or “linuxefi”. Does it boot then?
I suppose an earlier kernel works? You should find one in “Advanced Options” in the boot menu.

I will try to attach the log as well. (link)](https://drive.google.com/file/d/0B-Fe1hFb-YbITmxwLUdfLU9EOWc/view?usp=sharing)

I don’t think this helps much, unfortunately.
At least I don’t really see a hint in there…

Try pressing ‘e’ at the boot menu and append “dis_ucode_loader” at the end of the line starting with “linux” or “linuxefi”. Does it boot then?

Sorry, no joy… :frowning:

I suppose an earlier kernel works? You should find one in “Advanced Options” in the boot menu.

Totally forgot about that!!! :stuck_out_tongue: Thanks man…

Yes, kernel version 3.16…7-21 works!!lol!

It’s version 3.16.7.-24 that seems to be the problem :\

Any idea why???

One of the nice things about virtualization is that it’s incredibly simple to clone the Guest (make sure it’s a full clone and not a linked clone) so that you have exact copies and then you can try updating again.

I don’t know how you updated your machine before but I recommend using zypper so that you can see a list of exactly what packages will be upgraded. Run the following in either your original or clone copy to try upgrading the kernel again (and everything else in your system).

zypper update

TSU

Hi again everyone

First some clarification

Yes, kernel version 3.16…7-21 works!!lol!
It’s version 3.16.7.-24 that seems to be the problem :\

I wasn’t totally correct…:cry:

BTW I am running AMD hardware.

Processor - Athlon XII 250
Motherboard - MSI 880GMA-E335(FX)
RAM - 8 GB

My host and guest both have EXT4 File System.

I did a fresh install of the guest and this time I did a modular/incremental update instead of the wholesale ‘zypper update’. I upgraded the rpm groups one by one, took a lot of snapshots of the guest machine and finally after some trial and error upgrade everything except for the following two pkgs

patterns-openSUSE-enhanced_base
patterns-openSUSE-enhanced_base_opt

If you run ‘zypper update’ now you are gonna get the following output.

The following 2 NEW packages are going to be installed:
ucode-amd ucode-intel

The following 2 packages are going to be upgraded:
patterns-openSUSE-enhanced_base patterns-openSUSE-enhanced_base_opt

The following 2 patterns are going to be upgraded:
enhanced_base enhanced_base_opt

2 packages to upgrade, 2 new.

Now if you install these new pkgs

ucode-amd
ucode-intel

That’s it! Guest refuses to boot!

I must thank wolfi323](https://forums.opensuse.org/member.php/40214-wolfi323) for the great suggestion/hint in this cause

append “dis_ucode_loader” at the end

It saved me a lot of time.

During the installation of these two pkgs, it shows that the log is stored at /var/log/YaST2/mkinitrd.log

I am no expert but I found lines like this in the log

W: Some kernel modules could not be included:

W: ext4

F: Failed to install module ext4

Is that bad??

I am attaching the log file, I am sure you guys will understand it better than me. (link)](https://drive.google.com/open?id=0B-Fe1hFb-YbIdkhJNElTelhPMDA)

Thanks again guys, especially to wolfi323](https://forums.opensuse.org/member.php/40214-wolfi323)
Eagerly waiting to hear from you guys again :slight_smile:
Emon

Sorry, I made a mistake.
The option is called “dis_ucode_ldr”…

Any idea why???

Yes.
Loading the microcode update into the CPU fails/hangs for some reason.

As far as I know only certain AMD CPUs are affected by this.

I’m just surprised that this happens to you now.
Seems to be some regression in the new kernel then.

You should probably file a bug report at http://bugzilla.opensuse.org/ or at least add a comment to https://bugzilla.opensuse.org/show_bug.cgi?id=913996 (use the same username/password as here)

Now if you install these new pkgs

ucode-amd
ucode-intel

That’s it! Guest refuses to boot!

Yes. Those packages contain the microcode update (ucode-amd for AMD CPUs, ucode-intel for intel CPUs).
So uninstalling ucode-amd should fix your boot problem too.

I am no expert but I found lines like this in the log

[QUOTE]W: Some kernel modules could not be included:

W: ext4

F: Failed to install module ext4

Is that bad??[/QUOTE]
No.
The mentioned modules cannot be added to the initrd because they are integrated into the kernel.
dracut became more verbose (or “noisy” :wink: ) with an update and warns in those cases.
It probably shouldn’t try to add them at all, but only with a warning message such things can be noticed and probably fixed…

On 2015-08-20 21:16, emon wrote:

> Now if you install these new pkgs
>
> ucode-amd
> ucode-intel
>
> That’s it! Guest refuses to boot!

Remove and taboo them.
Then write a bugzilla report.

IMHO, it does not make much sense to try to update the microcode inside
a virtual machine. Leave that to the host.

>> F: Failed to install module ext4
>>
>>
>
> Is that bad??

No. Just noise.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Hi wolfi323](https://forums.opensuse.org/member.php/40214-wolfi323)

The option is called “dis_ucode_ldr”…

Worked like a charm :slight_smile:
Thanks

You should probably file a bug report at http://bugzilla.opensuse.org/ or at least add a comment to https://bugzilla.opensuse.org/show_bug.cgi?id=913996 (use the same username/password as here)

Just did; first time ever reporting any bug :stuck_out_tongue:
https://bugzilla.opensuse.org/show_bug.cgi?id=913996#c114

The mentioned modules cannot be added to the initrd because they are integrated into the kernel.

Had a hunch that might be the case, but thanks for explaining.

Emon

Hi robin_listas](https://forums.opensuse.org/member.php/21725-robin_listas)

Remove and taboo them.

Thanks for the suggestion, that was the plan actually, until I found out that “dis_ucode_ldr” method works :expressionless:

>> F: Failed to install module ext4
>>
>>
>
> Is that bad??

No. Just noise.

Thanks for the explanation

All the best
Emon

On 2015-08-20 23:26, emon wrote:

>> Remove and taboo them.
>>
> Thanks for the suggestion, that was the plan actually, until I found out
> that “dis_ucode_ldr” method works :expressionless:

Being a minimal install you have there, it would make sense to remove
them :wink:


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

  1. This is something new to me, until now I was only aware of the traditional way of updating the BIOS to implement processor microcode updates. See the following for more on this Linux-specific support for applying microcode updates on boot
    https://wiki.archlinux.org/index.php/Microcode

I don’t know if running microcode off your disk drive has worse performance than running in silicon(unless it’s read into RAM?!), but I suspect the performance should be worse (I’m speculating). For something that has to run with least latency, this suggests you should check with your manufacturer for any available BIOS updates. I have no idea if an updated BIOS affects whether the Linux microcode module is updated or not (I suspect no change).

  1. The above link contains instructions for inspecting your syslog for entries indicating whether your microcode was installed successfully or not. IMO that would be an important part of your bug report… whether the microcode was updated properly and is buggy or if the update failed and had some other consequence. Since it looks like there are actually five events you should find in the syslog, it might be useful to know which ones were successful or failed.

HTH,
TSU

That is a good point; but I want to have all pkgs updated :stuck_out_tongue:

Emon

So?
The BIOS is also just software that’s run when you turn on the computer.
If the BIOS “program” can update the microcode, why should the kernel not be able to?
And why should there be a difference between both cases?

I don’t know if running microcode off your disk drive has worse performance than running in silicon(unless it’s read into RAM?!), but I suspect the performance should be worse (I’m speculating).

The microcode is not run “off your disk drive”. It’s uploaded to the chip in question (the CPU in this particular case), and just “runs” the same way the built-in one would.

Your last statement is the case in point. You’re not actually “uploading to the chip in question” because to do so requires flashing the silicon. I’m pretty sure whether the microcode is running in RAM or off the disk it’s got to be a lot slower than in silicon. I’ve always thought of this microcode as critical to how the BIOS performs, literally as the layer that implements software interacting with hardware. Microcode burned into the BIOS chip is very low level and by necessity very efficient.

If the microcode isn’t burned into the BIOS chip, it’s run as a separate layer above the existing BIOS microcode. Maybe this microcode is doing something other than BIOS functionality?-- But then, why is this new method considered to be an alternative to flashing the BIOS instead of as completely new and different microcode?

TSU

On 2015-08-21 21:16, tsu2 wrote:
> something other than BIOS functionality?-- But then, why is this new
> method considered to be an alternative to flashing the BIOS instead of
> as completely new and different microcode?

No, this is not flashing the BIOS. It is “flashing” the CPU, so to
speak. But not flash, but special internal register in the CPU. The
modifications disappear on power down, and possibly on reset.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

On 2015-08-21 16:36, tsu2 wrote:

> I don’t know if running microcode off your disk drive has worse
> performance than running in silicon(unless it’s read into RAM?!), but I
> suspect the performance should be worse (I’m speculating). For something
> that has to run with least latency, this suggests you should check with
> your manufacturer for any available BIOS updates. I have no idea if an
> updated BIOS affects whether the Linux microcode module is updated or
> not (I suspect no change).

No, no.

These CPUS have complex instruction sets. And these instructions
themselves can be “programmed”. That is, when you tell the CPU to add
two cpu registers it is possible to modify how exactly that “ADD” works.

This was made so for optimization and for correcting CPU hardware bugs.
The changes are not stored in ram, they do not “run” in the cpu. They
are modifications to the instruction set, stored internally in the cpu.

Where the modifications are read from doesn’t matter.

That a guest operating system, running virtualized, tries to modify the
CPU instruction set doesn’t make sense to me.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

If that is what is happening, then I can see to a certain extent turning features on and off as a conceivable use.

So, for example VMware Workstation 11 is supposed to be for the first time taking advantage of special features in the Haswell chipset that is supposed to measurably improve performance (I don’t know if it’s easy to verify, but hurrah if it’s true).
But, I’m not aware that these CPU and Bridge chipsets need to turn features on and off… AFAIK they’re all on by default and only need to be referenced. AFAIK no features are off by default.

And, if the microcode goes any further than simply enabling/disabling it’s not clear to me that the code can run anywhere faster than in RAM if the silicon isn’t being flashed… As I think about this, the only place I can think of that might be pretty fast is if deposited in the L1 (maybe L2 or L3) cache assuming they are on the die. Otherwise, you’d encounter massive latencies related to traversing a bus.

I expect there has to be something significant I’m overlooking, but on its face I’m surprised and curious what these microcode upgrades might be doing.

TSU

On 2015-08-21 22:56, tsu2 wrote:

> But, I’m not aware that these CPU and Bridge chipsets need to turn
> features on and off… AFAIK they’re all on by default and only need to
> be referenced. AFAIK no features are off by default.

No, it is not features. it is “code”. You change the code that runs the
assembler code, so to speak.

> And, if the microcode goes any further than simply enabling/disabling
> it’s not clear to me that the code can run anywhere faster than in RAM
> if the silicon isn’t being flashed… As I think about this, the only
> place I can think of that might be pretty fast is if deposited in the L1
> (maybe L2 or L3) cache assuming they are on the die. Otherwise, you’d
> encounter massive latencies related to traversing a bus.

It is not there. I don’t know how to explain this. I have done this in
class, long ago, for older CPUS, but I don’t know how to explain it in
simple terms. You need knowledge of CPU internal design to understand
what it is about.

Say that in order to load a register with the contents of the ram at a
certain location, you have to first write the address in the external
address bus, then rise the read signal, then wait for the ram to
respond, then latch the data bus to the internal bus, then open the
latch to the AX register, etc. All those are microoperations. You can
alter the sequence, the timing, add phases. Maybe create new
instructions (in theory). You could change the ADD instruction to
instead subtract (in theory).

It gets way more complex when you think about changing the microcode for
the multiply op.

Surely there must be some text that explains it… here, read this:

https://en.wikipedia.org/wiki/Microcode


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

On 2015-08-21 17:16, emon wrote:
>
> robin_listas;2725001 Wrote:
>>
>> Being a minimal install you have there, it would make sense to remove
>> them :wink:
>>
>
> That is a good point; but I want to have all pkgs updated :stuck_out_tongue:

There is no point to update a package that can not run. Not because it
is broken, but because it can not run under virtualization, unless I’m
mistaken.


Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 “Bottle” (Minas Tirith))

Although this microcode sub-topic a bit off topic for this thread, it’s still interesting to understand this important topic to understand how hardware supports software/applications,
Although I was not aware of this specific method to modify CPU (and potentially other hardware) through microcode, I was aware of some isolated instances when microcode patching has been used… I just thought these modifications were done a different way.

Intel and AMD do have the facility for microcode to be uploaded and written to fast memory to modify small pieces of the CPU architecture.
The better info I found on Wikipedia describes the Control Store where the microcode is kept
https://en.wikipedia.org/wiki/Control_store

So, what actually does this microcode do?
I found a few references… In the past, patches for some bugs in CPUs dating back to Core 2. Currently, the Haswell TSX (Tranactional Synchronization Extensions) has been found to contain an extremely serious bug, so Intel has been disabling it every way it can through the use of microcode updates. There is supposedly some kind of workaround which can be implemented but typically it’s just disabled. This probably means the special VMware Workstation 11 performance advantage isn’t safely implemented and may not be for another generation or two of Intel Broadwell versions. I haven’t heard of any other apps which have been updated to use TSX, but if they exist they would be affected, too.

So, although I haven’t found anything that describes how large this Control Store is, there are clearly serious limitations to what you can do. As I observed, I’m pretty sure no CPU is shipped with extensions disabled unless for good reason and it is through a microcode update whether are set one way or another. The Control Store is obviously not large enough or positioned correctly to do extensive repair to the TSX. This TSX problem is current and extends to the first generation of Broadwell which is the successor to Haswell.

As for AMD, I don’t see any recent problems that are microcode-related. I only see a problem that affected quad core Barcelona Opteron chips in 2007 called TLB(transaction lookaside buffer). If you don’t have an AMD processor on or about that year, I don’t know of any serious microcode-related problems.

For anyone who might be interested in a slightly deeper description what is, and is in a microcode patch, I couldn’t find anything published officially but the following link is an interesting approach through reverse engineering. And, anyone who has an interest in cracking RSA certs might find the last few paragraphs interesting
http://inertiawar.com/microcode/