Opensuse 11 x86_64, xen and Dell 2950

I installed opensuse 11 on a Dell 2950 64bits server, the boot on the default kernel is ok.

When I reboot on Xen kernel (xen packages were installed during OS installation), the system hangs with last prompt :
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.4 (February 18, 2008)
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 16

The LCD screen on the server shows “E1410 CPU 1 IERR CPU 2 IERR”.

Anyone has the same problem? A solution?

The update of NetXtreme driver changed nothing.

The opensuse 10.3 with xen worked well on this same server.

Please help me.

Hi Franques,

I’m starting to suspect there are issues with the latest xen kernel and the used Broadcom drivers…

The thing to do would be to report a bug (or join an existing one) and supply needed details to the developers.

These bugs seem related:
https://bugzilla.novell.com/show_bug.cgi?id=389944
https://bugzilla.novell.com/show_bug.cgi?id=396236

To debug what is happening you will need to boot the Xen kernel with a special switch that redirects log output to another systems console.

As quoted from the first bug report:

Generally you need to connect your machine to a second one via serial cable,
then specify “console=vga,com1 com1=115200” as (perhaps additional) parameters
on your Xen command line. To collect kernel messages, too, you may need to also
specify “xencons=xvc” on the kernel command line.

(Your best bet in searching would probably have been looking at other bugs
here.)

There have been reports that reverting to the 2.6.22 kernel fixes things… but using Xen 3.2 with an older kernel version could cause issues too AFAIK.

Good luck & hope you get it fixed!

Wj

Hi,

as i do have the very same problem on my Dell 1950-III, i just want to know if there is any progress meanwhile or if there is a solution for the dell 2950.

The bug reports mentioned in the second post seem to be related, but they also seem to not to do any progress…

is there anything i can do to help in finding a solution? i do have a serial-cable, and i could post more infos, but i am not a linux expert :wink: so i do need some guidance on how to gather the required (and related) infos. setting up the cable and boot params on the dell is explained in the second post, but i do not know how to gather the informations (which program, which settings etc) on the other side of the cable.

any help would be appreciated,

regards

Stefan Schueffler

Hi Stefan,

It would be very good if you could join one of the bugs and help supply input where you can.
The devs there will do the best they can to help out in getting the needed data.

A small but good change your setup will boot by adding option ’ mem=1G ’ to the boot option (use the 2d line in the boot screen)… but this is not optimal as you are restricting the Xen’s management to 1GB.

Any data you can supply to the bug will be of great value to get things rolling to a fix!

Thanks! & Cheers,
Wj

I have a 1950 and am having the same problem. I checked the two bugs linked to above, and they don’t seem to be the problem we’re having.

Again, I can boot just fine using the normal kernel, but when I boot into the Xen kernel, it hangs at the same line:

“Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.4 (February 18, 2008)
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 16”

I then started experimenting. The first thing I saw was that I could disable the built-in ethernet in the Bios and it would boot.

I then checked this thread, and went and looked at the two bugs mentioned. The first one doesn’t seem to apply at all. The other has a solution involving a new kernel. I therefore installed that kernel, and it made no difference. (the 2.6.25.15-0.1.x86_64 kernel.)

The problem seems to be either with the Broadcom card, or with something the Dells do, as the other two posters were using dell x9xx hardware as well. Dell tells me the 2950 and the 1950 have the same motherboard makeup.

I’m not sure how to post or start a new bug, but I sure would like help. I don’t want to revert to 10.2 if I don’t have to.

Hi
Have you made a note of the module the device in question uses, then
add the boot option insmod=<name_of_the_module> at the grub menu.


Cheers Malcolm °¿° (Linux Counter #276890)
openSUSE 11.0 x86 Kernel 2.6.25.11-0.1-default
up 3 days 3:12, 1 user, load average: 0.13, 0.20, 0.18
GPU GeForce 6600 TE/6200 TE - Driver Version: 173.14.12

Thanks for the reply.

Yeah, I just tried that. Same result.

I then tried the “opposite” and disabled the bnx2 module. It didn’t crash, though several services took a very long time to load due to a lack of ethernet.

So, it seems like that module coupled with xen (possibly throwing Dell in there) are the culprit.

I don’t really know where to go from here, though.

BTW, I also tried the mem=1G trick. Same problem.

I am having the exact same problem on a poweredge 2900.

As soon as the broadcom drivers load in the xen kernel, the machine crashes hard with E1410.

Opensuse 11, I’ve tried the default kernel, the updated kernel, and kernel-xen-2.6.25.16-2.1.x86_64.rpm. No luck with any of them.

Exact same problem on HP BladeSystem c7000 Enclosure with ProLiant BL460c G1

“Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.4 (February 18, 2008)
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 16”

so not Dell dependent. Help?

I can confirm the same problem exists on my Dell PowerEdge 2950 as well. I get the same crash as mentioned by others while loading the Broadcom driver. Lots of searching has yielded no solution.

Just adding my report. I’ll keep an eye on this thread.

The newer kernels (2.6.26 and up) will possibly fix these issues.

If you are looking for a stable Xen 3.2 platform it may be better to use SLES 10 SP2. I can definitely say that is running stable on the same systems (with Broadcom nics) where openSUSE is having these issues.
SLES won’t expire and you are allowed to use it till end of days… you will lose the ability to update the system after 60 days from installing.
By that time openSUSE 11.1 will be looking around the corner and I suspect this will also be fixed if enough are running into these issues…

Never the less, if you can help by giving the devs the needed input (adding your input in the mentioned bug reports) that will help speed up things.

Cheers & thanks,
Wj

Wow, still no solution.

I have started suspecting that the problem lies with updated firmware on the Broadcom cards. I have 2 older Dell 1950s with the same cards that work just fine with Suse11. Those 2 1950s were pretty-much first-run of the 1950 family. We got them shortly after they were released over a year ago.

I wonder if we can somehow downgrade the firmware in these cards to get us by.

Just throwing in my latest thoughts.

I did try the downgrading of the firmware. Unfortunately, it won’t allow me to downgrade as far as our old servers. And the one level it allowed me to downgrade didn’t fix the problem.

Also, in response to Magic31, Thanks for your help in here. However, you said to report our findings on the “mentioned bug reports.” The problem with that is that those are not the problems we’re having. This appears to be a different bug altogether. That’s why we’re still posting here. I don’t know how to start a new bug, or bring it to the right people’s attention.

Thanks for your help.
JJ

Hi JJ,

No problem & wish I could do more to help.
As to posting your own bug reports there is a good guide on the openSUSE WIKI : Submitting Bug Reports - openSUSE
Have a read through that to give some idea of what is expected.

To open a new bug report head to this link : https://bugzilla.novell.com/enter_bug.cgi?product=openSUSE+11.0 (note this is specifically for openSUSE 11.0, adjust the product to suite the case).
You should be able to log in with the same credentials you use to log in to the openSUSE webforums.

Try to give as much detail as possible and also put some love into the subject title (as it will help others finding it when the search bugzilla). :slight_smile:

Wishing you luck!

Cheers,
Wj

Hello,

just to inform that I have a same problem with IBM x3650.

If I add btx2 module to blacklist, the system boots. When I load the kernel module system hang. I tried to update latest Broadcom firmware and compile latest kernel module with no success.

Br,
Teemu

Hi all,

same issue on the IBM x3550.

eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 18, node addr 00:1a:64:78:df:fc

The default kernel boots perfectly.

As far as I have been able to test the issue has been fixed with the newer kernels. openSUSE 11.1 (still Beta) is handling this well.

Kernel in openSUSE 11.1 boot with bnx2 and XEN, but driver isn’t stable. Network hangup when I try download any file. Deafult kernel work perfectly.

There is a report about this specific to the bnx2 nic.

https://bugzilla.novell.com/show_bug.cgi?id=438610

Would be good if you could add you two cents to it… You input should help the devs squash the issue!

Thanks,
Wj

Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.4 (February 18, 2008)
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 16

I have multiple HP c7000s Blade Chassis with both BL 460c and BL 480c blades installed in them. These servers run OpenSuse 11.0 with Xen 3.2 (obtained from suse’s repo (zypper install))

I have this exact same problem on ALL of my blades when booting with the Xen kernel. Booting the non-xen kernel works fine.
Today is 11/15/2008
Using Suse update (zypper) to get to todays patch level, I am given kernel-xen-2.6.25.18-0.2
This kernel version DOES still have the problem. It has not been fixed.

My workaround is to install kernel-xen-2.6.22.18-219.1.x86_64.rpm
With this older version installed, the problem IS RESOLVED. I have been using this old version for many months as it is the only workaround I am aware of.

I really hate running such an old kernel though. Hopefully this is fixed soon.