Mystery: 11.4 initrd refuses to chroot (/bin/run-init: /sbin/init: No such file or directory)

openSuSE 11.4 x86_64
/dev/sda1 = ext3 = /boot (with Grub 0.97)
/dev/sda2 = xfs = /
/dev/sda3 = xfs = data disk

Dear all,

with 11.4/x86_64, we’re quite puzzled by a real mystery somewhere between initrd, run-init and chroot.

We have a Dell T5500 with just one SATA disk (AHCI mode), which used to boot fine until the last kernel
update to 2.6.37-6.0.9. Next reboot, the initrd bails out with

/bin/run-init: /sbin/init: No such file or directory.

Shortly thereafter, the kernel panics with “Attempting to kill init” (no surprise, though).

Booting the original openSuSE-11.4 DVD’s “Rescue” system, we see all partitions and filesystems on sda, can xfs_check
the / and the data disk and mount them all without problems:

  Rescue:~ # mount -o ro,exec /dev/sda2 /mnt
  Rescue:~ # mount -o bind /dev /mnt/dev
  Rescue:~ # mount -o bind /proc /mnt/proc
  Rescue:~ # mount -o bind /sys /mnt/sys
  Rescue:~ #

As the on-disk installation, the Rescue’s architecture is also x86_64:

  Rescue:/ # uname -a
  Linux Rescue 2.6.37.1-1.2-default #1 SMP 2011-02-21 10:34:10 +0100 x86_64 x86_64 x86_64 GNU/Linux

As init is normally being run by first chroot’ing into the freshly mounted /root, we tried to resemble that, but any attempt to run
chroot
or
chroot anyCommand
fails with chroot not finding either “/bin/bash” or “anyCommand”, though the latter’s absolute path is correctly
given as relative to the chroot’ed directory:

  Rescue:~ # chroot /mnt
  chroot: failed to run command `/bin/bash': No such file or directory
  Rescue:~ #
  Rescue:~ # chroot /mnt '/bin/ls'
  chroot: failed to run command `/bin/ls': No such file or directory
  Rescue:~ # chroot /mnt '/bin/sh'
  chroot: failed to run command `/bin/sh': No such file or directory
  Rescue:~ #

You can even ls -ld and “file” these commands:

  Rescue:~ # cd /mnt
  Rescue:/mnt # ls -ld bin/ls bin/sh bin/bash
  -rwxr-xr-x 1 root root 110216 Sep 21 15:30 bin/ls
  -rwxr-xr-x 2 root root 627376 Feb 27  2011 bin/sh
  -rwxr-xr-x 2 root root 627376 Feb 27  2011 bin/bash
  Rescue:/mnt # file bin/ls bin/sh /bin/bash
  bin/ls:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
  bin/sh:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
  bin/bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
  Rescue:/mnt #

they’re all there and of the right architecture!

You can even run them without chroot (note the leading slash omitted):

  Rescue:/mnt # bin/ls -l
  total 128
  drwxr-xr-x   2 root root  4096 Jan 26 09:44 bin
  drwxr-xr-x   2 root root  4096 Nov  2 14:34 boot
  ...

they’re working fine and most of those tested don’t even miss a shared library.

You just cannot chroot /mnt <whatever existing and otherwise working binary there is>

Suspecting the kernel upgrade to have things screwed up, we moved the content of /dev/sda1 (the ext3 /boot partition)
into a subdirectory /OLD and copied over the (working) copies of another 11.4 machine (also 2.6.37-6.0.9)
and tried this initrd+kernel (proved to be working on the other machine).
Same error - that new initrd’s /bin/run-init also can’t run /sbin/init.

We then suspected some hidden or strange flag in the / filesystem’s XFS as being a possible cause, so we did the
obvious: re-creating it from scratch.
Using /dev/sda3 as backup store, we xfsdump’ed /dev/sda2 into a file, re-created the XFS on /dev/sda2 and xfsrestore’d
its content.
No change.

(By the way - many thanks to the “Rescue” developers to have included those filesystem-specific utilities!)

We then rsync’ed the content of /dev/sda2 into a subdirectory on /dev/sda3 and re-created /dev/sda2 as ext3; restoring
all with rsync back into this ext3 filesystem.
No change.

We strace’d different chroot commands (again big kudos to the “Rescue” developers to have included strace!!), and the failure is not at the chroot() call itself, but at the subsequent execve():

  Rescue:/ # strace chroot /mnt
  execve("/usr/bin/chroot", "chroot", "/mnt"], /* 45 vars */]) = 0
  brk(0)
  ...
  chroot("/mnt")                          = 0
  chdir("/")                              = 0
  execve("/bin/bash", "/bin/bash", "-i"], /* 45 vars */]) = -1 ENOENT (No such file or directory)

We compared the initrd’s chroot command with the one on disk and the one of the original 11.4 DVD - all binary identical:


  Rescue:/ # cmp /usr/bin/chroot /mnt/usr/bin/chroot
  Rescue:/ #
  Rescue:/ # cmp /usr/bin/chroot /mounts/mp_0000/usr/bin/chroot
  Rescue:/ #

We were desperate enough to even look for some stray selinux attributes, but neither the “ls -lZ” (ran from the initrd), nor
the ls from the /dev/sda2 root filesystem nor the ls from the original 11.4 DVD showed any xattr settings which “could” affect a kernel thinking it had been started with “selinux=1”…

To have openSuSE completely out of the game at one point, we booted Knoppix 6.7.1/x86_64 and here, the difference is that
no program from the mounted partition is executable (in contrast to the “Rescue” system):


  root@Microknoppix:~# mount -o exec /dev/sda2 /media/sda2
  root@Microknoppix:~# grep sda2 /proc/mounts
  /dev/sda2 on /media/sda2 type ext3 (rw,relatime,errors=continue,barrier=0,data=writeback)
  root@Microknoppix:/media/sda2/bin# ls -ld sh bash ls
  -rwxr-xr-x 2 root root 627376 27. Feb 2011  bash
  -rwxr-xr-x 1 root root 110216 21. Sep 15:30 ls
  -rwxr-xr-x 2 root root 627376 27. Feb 2011  sh
  root@Microknoppix:/media/sda2/bin# file sh bash ls
  sh:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
  bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
  ls:   ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped
  root@Microknoppix:/media/sda2/bin#
  root@Microknoppix:/media/sda2/bin# ./ls -l
  -bash: ./ls: no such file or directory
  root@Microknoppix:/media/sda2/bin#
  root@Microknoppix:/media/sda2/bin# strace ./ls -l
  execve("./ls", "./ls", "-l"], /* 19 vars */]) = -1 ENOENT (No such file or directory)
  dup(2)                                  = 3
  fcntl64(3, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
  fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7779000
  _llseek(3, 0, 0xffee8768, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
  write(3, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
  ) = 40
  close(3)                                = 0
  munmap(0xf7779000, 4096)                = 0
  exit_group(1)                           = ?
  root@Microknoppix:/media/sda2/bin#

(and yes, we did mount the partition explicitely with the mount option “exec”… )

And yes, we did set back the BIOS to its defaults (suspecting some weird “ExecuteDisable” bit to be in the way )…

So now, we’re out of ideas and (most importantly) out of gut feeling what it could be :’(
and thought we might ask you here…

Greatly appreciating any new ideas!

Not at all sure what the problem is you describe but it seems to me that you should ask on the developer mailing list. Here we are for the most part just users helping users. No developers. Maybe if you said why you are trying to do a chroot it may clarify things. Is this an attempt to fix a bad install???

Good luck

Maybe this post from the dutch subforums helps, I see quite some differences in the mount commands: http://forums.opensuse.org/nederlands-dutch/community/nl-how-tos/453092-systeem-overnemen-met-een-livecd.html
I know it’s in dutch, but the commands to chroot from a liveCD to the installed (to be repaired) system are different from what you describe in the above.

To add: on my laptop “run-init” is not in /bin, but in /lib/mkinitrd/bin

On 2012-01-26 17:16, bci wrote:

> /bin/run-init: /sbin/init: No such file or
> directory
.

Well, what you have to do is mount that partition somewhere with the rescue
system, and see if “/sbin/init” is there. I don’t understand why you try to
chroot there - which will fail if it can not find the binaries needed
(bash, for starters).


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Firstly thanks to all who’ve answered yet.

As to your question why we’re doing chroot here - simply because it is what fails in the “normal” boot process.
The initrd’s /bin/run-init program does the following:


 * run-init.c
 *
 * Usage: exec run-init -c /dev/console] /real-root /sbin/init "$@"
 *
 * This program should be called as the last thing in a shell script
 * acting as /init in an initramfs; it does the following:
 *
 * - Delete all files in the initramfs;
 * - Remounts /real-root onto the root filesystem;
 * - Chroots;
 * - Opens /dev/console;
 * - Spawns the specified init program (with arguments.)


and starting line 242:


  /* chroot, chdir */
  if ( chroot(".") || chdir("/") )
    die("chroot");

  /* Open /dev/console */
  if ( (confd = open(console, O_RDWR)) < 0 )
    die("opening console");
  dup2(confd, 0);
  dup2(confd, 1);
  dup2(confd, 2);
  close(confd);

  /* Spawn init */
  execv(init, initargs);
  die(init);			/* Failed to spawn init */

Maybe I’ve not made it clear enough in my initial posting: we’re not yet trying to repair the initrd, we’re still trying to understand whats going wrong here, and the “chroot” followed by “execve” are simply the last (and failing) actions of run-init.

In openSuSE, it seems not to use /real-root but /root, but be that as it may - this program’s last action to replace itself with a chroot’ed /sbin/init fails.

To have the issue reproduced even better and without possibly interfering “Rescue” settings, I’ve made the initrd stop just before the failing action (by adding “shell=1” to grub’s kernel … line, the initrd script boot/91-shell.sh comes into play and runs a bash just before executing initrd’s last boot/93-boot.sh actions).

At this point, the real root is already mounted as /root and of course, /root/sbin/init is there. Unfortunately, there’s neither the strace nor the chroot command in the initrd and I can’t debug much further now…

Any other ideas? I don’t want to badger the developers before all knowledge here’s thoroughly exhausted :wink:

On 2012-01-27 13:16, bci wrote:
>
> Firstly thanks to all who’ve answered yet.
>
> As to your question why we’re doing chroot here - simply because it is
> what fails in the “normal” boot process.
> The initrd’s /bin/run-init program does the following:

Your problem is not that chroot fails. Your problem is that there is no
bash and no init in that environment.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Dear Carlos,

insisting doesn’t make it true… After halting the initrd with “shell=1”, my /dev/sda2 is already cleanly mounted as /root and therein are (/)sbin/init or (/)bin/bash:

bash-4.1 #
bash-4.1 # ls -ld /root/sbin/init /root/bin/bash
-rwxr-xr-x  1  root  root  627376  Feb 27  2011  /root/bin/bash
-rwxr-xr-x  1  root  root   40832  Jul  4  2011  /root/sbin/init

So the files in fact are there. And I even can execute them from there:


bash-4.1 # /root/bin/bash
bash: no job control in this shell
bash-4.1 # echo $0
/root/bin/bash
bash-4.1 # echo $SHLVL
3

Hmm…

On 2012-01-27 14:46, bci wrote:
>
> Dear Carlos,
>
> insisting doesn’t make it true… After halting the initrd with
> “shell=1”, my /dev/sda2 is already cleanly mounted as /root and therein
> are (/)sbin/init or (/)bin/bash:

In the first message you wrote:

> Code:
> --------------------
> Rescue:~ # mount -o ro,exec /dev/sda2 /mnt
> Rescue:~ # mount -o bind /dev /mnt/dev
> Rescue:~ # mount -o bind /proc /mnt/proc
> Rescue:~ # mount -o bind /sys /mnt/sys
> Rescue:~ #
> --------------------

> Code:
> --------------------
> Rescue:~ # chroot /mnt
> chroot: failed to run command `/bin/bash’: No such file or directory

Rescue:~ #
Rescue:~ # chroot /mnt ‘/bin/ls’
chroot: failed to run command /bin/ls': No such file or directory &gt; Rescue:~ # chroot /mnt '/bin/sh' &gt; chroot: failed to run command /bin/sh’: No such file or directory
> Rescue:~ #
> --------------------

The error in chroot is because there is no bash in /mnt. Now you show there
is a bash in “/root/bin/”, but that is not the same directory as your first
message.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 2012-01-27 21:53, Carlos E. R. wrote:
> On 2012-01-27 14:46, bci wrote:

Please, no private messages, or others will not be able to help.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

The error in chroot is because there is no bash in /mnt. Now you show there
is a bash in “/root/bin/”, but that is not the same directory as your first
message.

Of course not, as we tested in two different environments (as I wrote 27-Jan-2012, 12:09).
But again in short:

  1. /mnt
    = Rescue system from DVD (as also the prompt says :wink:
    /dev/sda2 was mounted manually by me to /mnt

  2. /root
    = initrd (halted with shell=1)
    /dev/sda2 was mounted by initrd already to the place where it should continue with chroot, next attempting to execute /sbin/init

In both cases, the tested files (/)sbin/init and (/)bin/bash in fact are there, they are even executable without chroot, but not with chroot - that’s the mystery…

Puzzled…