I am trying to get a modern OS working on an older Dell Poweredge 830 server with a CERC SATA 1.5/6Channel Hardware RAID controller. I am able to install OpenSuse 13.2 and I was able to install OpenSuse 42.1 from a USB stick. but when I’ve tried to install OpenSuse 42.2 as a new OS or when I’ve tried to do a package upgrade for OpenSuse 42.1, the RAID controller has some sort of a conflict at boot time. The full message I get is AAC: Host adapter dead -1 (which repeats several times) and then when the system reaches a timeout for a response from the aacraid driver, it prints :
aacraid: aac_fib_send: first asynchronous command timed out.
Usually a result of a PCI interrupt routing problem;
update mother board BIOS or consider utilizing one of
the SAFE mode kernel options (acpi, apic, etc)
Starting Dracut Emergency Shell…
Warning: /dev/disk-by-uuid/[insert long value of unique disk uuid] does not exist
Warning: Boot has failed. To debug this issue add “rd.shell rd.debug” to the kernel command line.
Problem is, I can’t figure out where to add these to the kernel command line or where the kernel command line even is. I’ve tried changing settings in the BIOS to disable ACPI and tried recovery commands of noacpi, acpi=off apic=off … none of them seem to make any difference. If I re-install 13.2 or 42.1, it works fine, so I don’t believe I have a hardware problem. As soon as I try to run any package updates, it goes back to the same problem on bootup.
Where should I start troubleshooting this? I’ve exhausted Google and much of this forum for results, but I can’t pin down this problem. I can boot into recovery mode and mount the drive’s partitions, which is how I discovered the timeout issue in the aacraid driver. I’m assuming that there is a kernel driver issue or a problem with the newer kernel. I’ve never recompiled a Linux kernel and don’t know how or where to begin.
Would anyone be willing to give me some advice on how to verify my kernel assumptions?
Has anyone experienced similar issues with an older Dell CERC RAID controller that can offer me some pointers to solving this issue?
Thanks in advance,
-Matthew
For a try at boot screen press e find line starting linux or linuxefi go to full end of the line (it wraps) add a space and the parameters to be tried. each parameter separated by a space press F10 to start boot. Once you find a working combo run yast select boot manager and mode the kernel line there that will make it permanent
Thanks gogalthorp! I was able to test out a few acpi and apic parameters, but nothing seems to make a difference. I still get the aacraid: aac_fib_send: first asynchronous command timed out. I was able to add the recommended rd.shell and rd.debug parameters and the dracut emergency shell finally gave me a prompt. I checked journalctl and was able to find some relevant information in the log. I see the following:
kernel: Adaptec aacraid driver 1.2-1[40709]-ms
kernel: aacraid 0000:03:01.0: PCI IRQ 32 -> rerouted to legacy IRQ 16
...
many logs later
...
kernel: aacraid 0000:03:01.0: Changed firmware to INTX mode
...
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: Host adapter dead -1
kernel: AAC: aacraid: aac_fib_send: first asynchronous command timed out.
Usually the result of a PCI interrupt routing problem;
update mother board BIOS or consider utilizing one of
the SAFE mode kernel options (acpi, apic, etc)
dracut-initqueue[206]: udevadm settle - timeout of 120 seconds reached, the event queue contains:
dracut-initqueue[206]: /sys/devices/pci0000:00:1c.0/0000:02:00.0/0000:03:01.0 (664)
...
kernel: aac_fib_free, XferState != 0, fibptr = 0xffff8800e8c00000, XferState = 0810ad
kernel: aacraid 0000:03:01.0: PCI IRQ 32 -> rerouted to legacy IRQ 16
aacraid: probe of 0000:03:01.0 failed with error -110
If I reboot into the 42.1 install media’s recovery mode, I can mount the drive and access all the data. Something in the updated packages is causing the RAID controller to not be loaded properly so the system can no longer find the boot drive.
Any suggestions on what I should look for or try next?
-Matthew
Thanks for the advice. Your tips have helped me get this server back up to a running state without a re-install of 42.1! I used what I think is the GRUB menu’s Advanced options to boot to “openSUSE Leap 42.1, with Linux 42.1 4.1.12-1-default” ( I hadn’t even noticed that the advanced option was there because I had been giving up too quickly and trying to boot back into an installation disc image. The two different kernel options were there after choosing advanced at the GRUB menu) and set Yast Boot Loader to use this as the default boot. The initial version must have been preserved during the update. I can still choose “openSUSE Leap 42.1, with Linux 42.1.36-44-default” from this menu and recreate the problem.
I followed the link you provided to bugzilla and found a similar issue in 42.2: https://bugzilla.opensuse.org/show_bug.cgi?id=1019627
this bug links to a kernel.org bug that shows the same INTX message that I am experiencing: https://bugzilla.kernel.org/show_bug.cgi?id=151661
After a bit of sleep so I have a clear head, I’m going to create a bug report for this issue and link it to the existing bug and hopefully narrow the investigation if the two are actually related.
-Matthew