openSuSE 11.4 x86_64
/dev/sda1 = ext3 = /boot (with Grub 0.97)
/dev/sda2 = xfs = /
/dev/sda3 = xfs = data disk
Dear all,
with 11.4/x86_64, we’re quite puzzled by a real mystery somewhere between initrd, run-init and chroot.
We have a Dell T5500 with just one SATA disk (AHCI mode), which used to boot fine until the last kernel
update to 2.6.37-6.0.9. Next reboot, the initrd bails out with
/bin/run-init: /sbin/init: No such file or directory.
Shortly thereafter, the kernel panics with “Attempting to kill init” (no surprise, though).
Booting the original openSuSE-11.4 DVD’s “Rescue” system, we see all partitions and filesystems on sda, can xfs_check
the / and the data disk and mount them all without problems:
Rescue:~ # mount -o ro,exec /dev/sda2 /mnt
Rescue:~ # mount -o bind /dev /mnt/dev
Rescue:~ # mount -o bind /proc /mnt/proc
Rescue:~ # mount -o bind /sys /mnt/sys
Rescue:~ #
As the on-disk installation, the Rescue’s architecture is also x86_64:
Rescue:/ # uname -a
Linux Rescue 2.6.37.1-1.2-default #1 SMP 2011-02-21 10:34:10 +0100 x86_64 x86_64 x86_64 GNU/Linux
As init is normally being run by first chroot’ing into the freshly mounted /root, we tried to resemble that, but any attempt to run
chroot
or
chroot anyCommand
fails with chroot not finding either “/bin/bash” or “anyCommand”, though the latter’s absolute path is correctly
given as relative to the chroot’ed directory:
Rescue:~ # chroot /mnt
chroot: failed to run command `/bin/bash': No such file or directory
Rescue:~ #
Rescue:~ # chroot /mnt '/bin/ls'
chroot: failed to run command `/bin/ls': No such file or directory
Rescue:~ # chroot /mnt '/bin/sh'
chroot: failed to run command `/bin/sh': No such file or directory
Rescue:~ #
You can even ls -ld and “file” these commands:
Rescue:~ # cd /mnt
Rescue:/mnt # ls -ld bin/ls bin/sh bin/bash
-rwxr-xr-x 1 root root 110216 Sep 21 15:30 bin/ls
-rwxr-xr-x 2 root root 627376 Feb 27 2011 bin/sh
-rwxr-xr-x 2 root root 627376 Feb 27 2011 bin/bash
Rescue:/mnt # file bin/ls bin/sh /bin/bash
bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
bin/sh: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
bin/bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
Rescue:/mnt #
they’re all there and of the right architecture!
You can even run them without chroot (note the leading slash omitted):
Rescue:/mnt # bin/ls -l
total 128
drwxr-xr-x 2 root root 4096 Jan 26 09:44 bin
drwxr-xr-x 2 root root 4096 Nov 2 14:34 boot
...
they’re working fine and most of those tested don’t even miss a shared library.
You just cannot chroot /mnt <whatever existing and otherwise working binary there is>…
Suspecting the kernel upgrade to have things screwed up, we moved the content of /dev/sda1 (the ext3 /boot partition)
into a subdirectory /OLD and copied over the (working) copies of another 11.4 machine (also 2.6.37-6.0.9)
and tried this initrd+kernel (proved to be working on the other machine).
Same error - that new initrd’s /bin/run-init also can’t run /sbin/init.
We then suspected some hidden or strange flag in the / filesystem’s XFS as being a possible cause, so we did the
obvious: re-creating it from scratch.
Using /dev/sda3 as backup store, we xfsdump’ed /dev/sda2 into a file, re-created the XFS on /dev/sda2 and xfsrestore’d
its content.
No change.
(By the way - many thanks to the “Rescue” developers to have included those filesystem-specific utilities!)
We then rsync’ed the content of /dev/sda2 into a subdirectory on /dev/sda3 and re-created /dev/sda2 as ext3; restoring
all with rsync back into this ext3 filesystem.
No change.
We strace’d different chroot commands (again big kudos to the “Rescue” developers to have included strace!!), and the failure is not at the chroot() call itself, but at the subsequent execve():
Rescue:/ # strace chroot /mnt
execve("/usr/bin/chroot", "chroot", "/mnt"], /* 45 vars */]) = 0
brk(0)
...
chroot("/mnt") = 0
chdir("/") = 0
execve("/bin/bash", "/bin/bash", "-i"], /* 45 vars */]) = -1 ENOENT (No such file or directory)
We compared the initrd’s chroot command with the one on disk and the one of the original 11.4 DVD - all binary identical:
Rescue:/ # cmp /usr/bin/chroot /mnt/usr/bin/chroot
Rescue:/ #
Rescue:/ # cmp /usr/bin/chroot /mounts/mp_0000/usr/bin/chroot
Rescue:/ #
We were desperate enough to even look for some stray selinux attributes, but neither the “ls -lZ” (ran from the initrd), nor
the ls from the /dev/sda2 root filesystem nor the ls from the original 11.4 DVD showed any xattr settings which “could” affect a kernel thinking it had been started with “selinux=1”…
To have openSuSE completely out of the game at one point, we booted Knoppix 6.7.1/x86_64 and here, the difference is that
no program from the mounted partition is executable (in contrast to the “Rescue” system):
root@Microknoppix:~# mount -o exec /dev/sda2 /media/sda2
root@Microknoppix:~# grep sda2 /proc/mounts
/dev/sda2 on /media/sda2 type ext3 (rw,relatime,errors=continue,barrier=0,data=writeback)
root@Microknoppix:/media/sda2/bin# ls -ld sh bash ls
-rwxr-xr-x 2 root root 627376 27. Feb 2011 bash
-rwxr-xr-x 1 root root 110216 21. Sep 15:30 ls
-rwxr-xr-x 2 root root 627376 27. Feb 2011 sh
root@Microknoppix:/media/sda2/bin# file sh bash ls
sh: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
root@Microknoppix:/media/sda2/bin#
root@Microknoppix:/media/sda2/bin# ./ls -l
-bash: ./ls: no such file or directory
root@Microknoppix:/media/sda2/bin#
root@Microknoppix:/media/sda2/bin# strace ./ls -l
execve("./ls", "./ls", "-l"], /* 19 vars */]) = -1 ENOENT (No such file or directory)
dup(2) = 3
fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7779000
_llseek(3, 0, 0xffee8768, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
close(3) = 0
munmap(0xf7779000, 4096) = 0
exit_group(1) = ?
root@Microknoppix:/media/sda2/bin#
(and yes, we did mount the partition explicitely with the mount option “exec”… )
And yes, we did set back the BIOS to its defaults (suspecting some weird “ExecuteDisable” bit to be in the way )…
So now, we’re out of ideas and (most importantly) out of gut feeling what it could be :’(
and thought we might ask you here…
Greatly appreciating any new ideas!