I installed opensuse 11 on a Dell 2950 64bits server, the boot on the default kernel is ok.
When I reboot on Xen kernel (xen packages were installed during OS installation), the system hangs with last prompt :
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.4 (February 18, 2008)
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 16
The LCD screen on the server shows “E1410 CPU 1 IERR CPU 2 IERR”.
Anyone has the same problem? A solution?
The update of NetXtreme driver changed nothing.
The opensuse 10.3 with xen worked well on this same server.
To debug what is happening you will need to boot the Xen kernel with a special switch that redirects log output to another systems console.
As quoted from the first bug report:
Generally you need to connect your machine to a second one via serial cable,
then specify “console=vga,com1 com1=115200” as (perhaps additional) parameters
on your Xen command line. To collect kernel messages, too, you may need to also
specify “xencons=xvc” on the kernel command line.
(Your best bet in searching would probably have been looking at other bugs
here.)
There have been reports that reverting to the 2.6.22 kernel fixes things… but using Xen 3.2 with an older kernel version could cause issues too AFAIK.
as i do have the very same problem on my Dell 1950-III, i just want to know if there is any progress meanwhile or if there is a solution for the dell 2950.
The bug reports mentioned in the second post seem to be related, but they also seem to not to do any progress…
is there anything i can do to help in finding a solution? i do have a serial-cable, and i could post more infos, but i am not a linux expert so i do need some guidance on how to gather the required (and related) infos. setting up the cable and boot params on the dell is explained in the second post, but i do not know how to gather the informations (which program, which settings etc) on the other side of the cable.
It would be very good if you could join one of the bugs and help supply input where you can.
The devs there will do the best they can to help out in getting the needed data.
A small but good change your setup will boot by adding option ’ mem=1G ’ to the boot option (use the 2d line in the boot screen)… but this is not optimal as you are restricting the Xen’s management to 1GB.
Any data you can supply to the bug will be of great value to get things rolling to a fix!
I then started experimenting. The first thing I saw was that I could disable the built-in ethernet in the Bios and it would boot.
I then checked this thread, and went and looked at the two bugs mentioned. The first one doesn’t seem to apply at all. The other has a solution involving a new kernel. I therefore installed that kernel, and it made no difference. (the 2.6.25.15-0.1.x86_64 kernel.)
The problem seems to be either with the Broadcom card, or with something the Dells do, as the other two posters were using dell x9xx hardware as well. Dell tells me the 2950 and the 1950 have the same motherboard makeup.
I’m not sure how to post or start a new bug, but I sure would like help. I don’t want to revert to 10.2 if I don’t have to.
I then tried the “opposite” and disabled the bnx2 module. It didn’t crash, though several services took a very long time to load due to a lack of ethernet.
So, it seems like that module coupled with xen (possibly throwing Dell in there) are the culprit.
I don’t really know where to go from here, though.
I can confirm the same problem exists on my Dell PowerEdge 2950 as well. I get the same crash as mentioned by others while loading the Broadcom driver. Lots of searching has yielded no solution.
Just adding my report. I’ll keep an eye on this thread.
The newer kernels (2.6.26 and up) will possibly fix these issues.
If you are looking for a stable Xen 3.2 platform it may be better to use SLES 10 SP2. I can definitely say that is running stable on the same systems (with Broadcom nics) where openSUSE is having these issues.
SLES won’t expire and you are allowed to use it till end of days… you will lose the ability to update the system after 60 days from installing.
By that time openSUSE 11.1 will be looking around the corner and I suspect this will also be fixed if enough are running into these issues…
Never the less, if you can help by giving the devs the needed input (adding your input in the mentioned bug reports) that will help speed up things.
I have started suspecting that the problem lies with updated firmware on the Broadcom cards. I have 2 older Dell 1950s with the same cards that work just fine with Suse11. Those 2 1950s were pretty-much first-run of the 1950 family. We got them shortly after they were released over a year ago.
I wonder if we can somehow downgrade the firmware in these cards to get us by.
I did try the downgrading of the firmware. Unfortunately, it won’t allow me to downgrade as far as our old servers. And the one level it allowed me to downgrade didn’t fix the problem.
Also, in response to Magic31, Thanks for your help in here. However, you said to report our findings on the “mentioned bug reports.” The problem with that is that those are not the problems we’re having. This appears to be a different bug altogether. That’s why we’re still posting here. I don’t know how to start a new bug, or bring it to the right people’s attention.
No problem & wish I could do more to help.
As to posting your own bug reports there is a good guide on the openSUSE WIKI : Submitting Bug Reports - openSUSE
Have a read through that to give some idea of what is expected.
To open a new bug report head to this link : https://bugzilla.novell.com/enter_bug.cgi?product=openSUSE+11.0 (note this is specifically for openSUSE 11.0, adjust the product to suite the case).
You should be able to log in with the same credentials you use to log in to the openSUSE webforums.
Try to give as much detail as possible and also put some love into the subject title (as it will help others finding it when the search bugzilla).
just to inform that I have a same problem with IBM x3650.
If I add btx2 module to blacklist, the system boots. When I load the kernel module system hang. I tried to update latest Broadcom firmware and compile latest kernel module with no success.
I have multiple HP c7000s Blade Chassis with both BL 460c and BL 480c blades installed in them. These servers run OpenSuse 11.0 with Xen 3.2 (obtained from suse’s repo (zypper install))
I have this exact same problem on ALL of my blades when booting with the Xen kernel. Booting the non-xen kernel works fine.
Today is 11/15/2008
Using Suse update (zypper) to get to todays patch level, I am given kernel-xen-2.6.25.18-0.2
This kernel version DOES still have the problem. It has not been fixed.
My workaround is to install kernel-xen-2.6.22.18-219.1.x86_64.rpm
With this older version installed, the problem IS RESOLVED. I have been using this old version for many months as it is the only workaround I am aware of.
I really hate running such an old kernel though. Hopefully this is fixed soon.