Xen & nvidia crashes X11

I have a strange problem that the X-Server freezes when I log out, switch to another virtual console or the like.

It works fine when I boot the kernel without xen.

The problem reduced to a very simple scenario:

  • Boot Xen-Dom0-Kernel
  • wait until the X11 display manager wants a login
  • Press Alt+Ctrl+Fx
    ==> X11 freezes

I tried with kde4 and with xdm as display manager, both show the same problem. It does not happen when I boot with the XEN-less kernel.

I am using openSUSE 11.1 with all patches on AMD64 and the nvidia driver. 32bit is not a solution, this machine has 16GB Memory (which would be useless …)

Any ideas how I can solve this problem?

What happens if you boot the system into runlevel 3 (add a “3” to the boot options line), then install the NVIDIA driver “the hard way”, reboot ? My guess is that the Xen kernel does not see the nvidia kernel module.

Hmm,

I see the nvidia logo when X11 starts up. When I run lsmod, I also see nvidia as a loaded module and its usecount is not zero (I do not remember the exact number).

So it is used somehow.

OK, so that’s not the solution. Like said, it was merely a guess. Still, boot into runlevel 3, login as the user, and do “startx”. That should either start X, or throw some useful output at you.
Also take a look at .xsession-errors in the user’s homedir, at /var/log/messages, and at ‘dmesg | tail -20’ immediately after the X server crashes/freezes.

Okay. Since somehow it ignored runlevel 3, I turned off xdm for runlevel 5.

Just after start I see:

# lsmod| grep nvi
nvidia              10800936  0
agpgart                41604  1 nvidia
i2c_core               35360  2 nvidia,i2c_nforce2

So, the nvidia module is loaded.

Now let us do startx, I did this from a remote connection.

# startx
xauth:  creating new authority file /root/.serverauth.5856


X.Org X Server 1.5.2
Release Date: 10 October 2008
X Protocol Version 11, Revision 0
Build Operating System: openSUSE SUSE LINUX
Current Operating System: Linux somename 2.6.27.45-0.1-xen #1 SMP 2010-02-22 16:49:47 +0100 x86_64
Build Date: 15 April 2010  04:22:31PM

        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Mon Jun  7 12:30:21 2010
(==) Using config file: "/etc/X11/xorg.conf"
NVIDIA: failed to set MTRR @ 0xc0000000, 256M (WC)
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Type "ONE_LEVEL" has 1 levels, but <RALT> has 2 symbols
>                   Ignoring extra symbols
Errors from xkbcomp are not fatal to the X server
Could not init font path element /usr/share/fonts/TTF/, removing from list!
Could not init font path element /usr/share/fonts/OTF, removing from list!
/etc/X11/xim: Checking whether an input method should be started.
/etc/X11/xim: user environment variable LANG=POSIX
/etc/X11/xim: user environment variable LC_CTYPE=de_DE.UTF-8
sourcing /etc/sysconfig/language to get the value of INPUT_METHOD
INPUT_METHOD is not set or empty (no user selected input method).
Trying to start a default input method for the locale de_DE.UTF-8 ...
There is no default input method for the current locale.
Dummy input method "none" (do not use any fancy input method by default)
startkde: Starting up...
... I cut the lines here ...

Then I could press Ctrl-Alt-F1 once, the screen was blank and at least numlock did still work. But I could not see anything like a login or a shell prompt. I could switch back to X11, but suddenly the keyboard is locked and the screen is still blank.

But when I press Ctrl-Alt-F10 I see this here in dmesg, I checked before and there was no such crash

last sysfs file: /sys/module/snd_hda_intel/parameters/power_save
CPU 0
Modules linked in: netbk blkbk blktap xenbus_be nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ipv6 bridge stp fuse loop dm_mod ide_pci_generic amd74xx ide_core ata_generic snd_hda_intel nvidia(PX) jedec_probe(N) snd_pcm cfi_probe(N) snd_timer gen_probe(N) snd_page_alloc agpgart ppdev 8250_pnp mtd snd_hwdep rtc_cmos ohci1394 rtc_core shpchp i2c_nforce2 chipreg(N) snd pata_amd pcspkr parport_pc ieee1394 rtc_lib sr_mod pci_hotplug 8250 i2c_core map_funcs(N) parport soundcore button forcedeth serial_core floppy sg usbhid hid ff_memless ohci_hcd sd_mod crc_t10dif ehci_hcd usbcore xenblk cdrom xennet edd ext3 mbcache jbd fan sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon
Supported: No, Proprietary and Unsupported modules are loaded
Pid: 5874, comm: X Tainted: P          2.6.27.45-0.1-xen #1
RIP: e030:<ffffffffa07837b0>]  <ffffffffa07837b0>] nv_kern_close+0x101/0x383 [nvidia]
RSP: e02b:ffff8803e8493ec8  EFLAGS: 00010202
RAX: 207c2d7265746e49 RBX: ffff8803e800a108 RCX: 0000000000000000
RDX: ffffffffff5fc000 RSI: 0000000000000000 RDI: ffff8803e8d79298
RBP: 207c2d7265746e49 R08: ffff8803e8d79298 R09: ffff8803e992ac80
R10: 00009898a068e90c R11: ffffffff803e7431 R12: ffff8803e8d79000
R13: ffff8803e8d79298 R14: ffff8803e364c4b8 R15: ffff8803e992ac80
FS:  00007fedbfb246f0(0000) GS:ffffffff80763080(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process X (pid: 5874, threadinfo ffff8803e8492000, task ffff8803e7858800)
Stack:  0000000000000000 0000000800000000 ffff8803e800a2e8 0000000000000008
 ffff8803e992ac80 ffff8803e800a108 ffff8803e3075f20 ffff8803e8da4ec0
 00007fffa5e47460 ffffffff8029fa4d ffff8803e992ac80 0000000000000000
Call Trace:
 <ffffffff8029fa4d>] __fput+0xa1/0x165
 <ffffffff8029d09c>] filp_close+0x5b/0x62
 <ffffffff8029d151>] sys_close+0xae/0x107
 <ffffffff8020b408>] system_call_fastpath+0x16/0x1b
 <00007fedbdb50e80>] 0x7fedbdb50e80


Code: b5 26 00 00 4c 89 fa 4c 89 e6 4c 89 f7 e8 ac 36 f1 ff 4c 89 ef e8 1c 68 ac df 49 8b ac 24 b0 01 00 00 48 85 ed 0f 84 97 00 00 00 <83> 7d 08 00 48 8b 45 00 0f 85 81 00 00 00 4c 39 7d 30 75 7b f0
RIP  <ffffffffa07837b0>] nv_kern_close+0x101/0x383 [nvidia]
 RSP <ffff8803e8493ec8>
--- end trace ddb96801afa731de ]---
note: X[5874] exited with preempt_count 1
BUG: scheduling while atomic: X/5874/0x10000001
Modules linked in: netbk blkbk blktap xenbus_be nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ipv6 bridge stp fuse loop dm_mod ide_pci_generic amd74xx ide_core ata_generic snd_hda_intel nvidia(PX) jedec_probe(N) snd_pcm cfi_probe(N) snd_timer gen_probe(N) snd_page_alloc agpgart ppdev 8250_pnp mtd snd_hwdep rtc_cmos ohci1394 rtc_core shpchp i2c_nforce2 chipreg(N) snd pata_amd pcspkr parport_pc ieee1394 rtc_lib sr_mod pci_hotplug 8250 i2c_core map_funcs(N) parport soundcore button forcedeth serial_core floppy sg usbhid hid ff_memless ohci_hcd sd_mod crc_t10dif ehci_hcd usbcore xenblk cdrom xennet edd ext3 mbcache jbd fan sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon
Supported: No, Proprietary and Unsupported modules are loaded
Pid: 5874, comm: X Tainted: P      D   2.6.27.45-0.1-xen #1

Call Trace:
 <ffffffff8020c5e7>] show_trace_log_lvl+0x41/0x58
 <ffffffff80464dfc>] dump_stack+0x69/0x6f
 <ffffffff804654db>] schedule+0x121/0x854
 <ffffffff80230064>] __cond_resched+0x1c/0x43
 <ffffffff80465e1d>] _cond_resched+0x2d/0x38
 <ffffffff80285b08>] unmap_vmas+0x150/0x1c3
 <ffffffff8028afce>] exit_mmap+0x7e/0x109
 <ffffffff8023106c>] mmput+0x20/0xbc
 <ffffffff80234d26>] exit_mm+0x101/0x10c
 <ffffffff80236b41>] do_exit+0x208/0x304
 <ffffffff80467c28>] oops_begin+0x0/0xe2
 <ffffffff8020cd17>] do_stack_segment+0x96/0xcc
 <ffffffff804677a7>] error_exit+0x0/0x69
 <ffffffffa07837b0>] nv_kern_close+0x101/0x383 [nvidia]
 <ffffffff8029fa4d>] __fput+0xa1/0x165
 <ffffffff8029d09c>] filp_close+0x5b/0x62
 <ffffffff8029d151>] sys_close+0xae/0x107
 <ffffffff8020b408>] system_call_fastpath+0x16/0x1b
 <00007fedbdb50e80>] 0x7fedbdb50e80

BUG: scheduling while atomic: X/5874/0x00000001
Modules linked in: netbk blkbk blktap xenbus_be nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ipv6 bridge stp fuse loop dm_mod ide_pci_generic amd74xx ide_core ata_generic snd_hda_intel nvidia(PX) jedec_probe(N) snd_pcm cfi_probe(N) snd_timer gen_probe(N) snd_page_alloc agpgart ppdev 8250_pnp mtd snd_hwdep rtc_cmos ohci1394 rtc_core shpchp i2c_nforce2 chipreg(N) snd pata_amd pcspkr parport_pc ieee1394 rtc_lib sr_mod pci_hotplug 8250 i2c_core map_funcs(N) parport soundcore button forcedeth serial_core floppy sg usbhid hid ff_memless ohci_hcd sd_mod crc_t10dif ehci_hcd usbcore xenblk cdrom xennet edd ext3 mbcache jbd fan sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon
Supported: No, Proprietary and Unsupported modules are loaded
Pid: 5874, comm: X Tainted: P      D   2.6.27.45-0.1-xen #1

Call Trace:
 <ffffffff8020c5e7>] show_trace_log_lvl+0x41/0x58
 <ffffffff80464dfc>] dump_stack+0x69/0x6f
 <ffffffff804654db>] schedule+0x121/0x854
 <ffffffff80465e6d>] schedule_timeout+0x1e/0xad
 <ffffffff80466946>] __down+0x97/0xc4
 <ffffffff80249fe2>] down+0x27/0x39
 <ffffffffa0781c8e>] nv_free_pages+0x62/0x1a0 [nvidia]
 <ffffffffa068e9e2>] _nv006611rm+0x2e/0x33 [nvidia]
DWARF2 unwinder stuck at _nv006611rm+0x2e/0x33 [nvidia]

Leftover inexact backtrace:

 <ffffffffa05e9eb6>] _nv009785rm+0xb4/0x120 [nvidia]
 <ffffffffa05e9fb5>] _nv009786rm+0x93/0xa0 [nvidia]
 <ffffffffa0302b89>] _nv023260rm+0x3e0/0x458 [nvidia]
 <ffffffffa060c7c9>] _nv004551rm+0xe8/0x273 [nvidia]
 <ffffffffa060a7b6>] _nv004577rm+0x39/0x46 [nvidia]
 <ffffffffa060adbf>] _nv004568rm+0x368/0x4c7 [nvidia]
 <ffffffffa060a7b6>] _nv004577rm+0x39/0x46 [nvidia]
 <ffffffffa060a8dd>] _nv004572rm+0x11a/0x294 [nvidia]
 <ffffffffa060a7b6>] _nv004577rm+0x39/0x46 [nvidia]
 <ffffffffa0695233>] _nv004536rm+0xcb/0x10c [nvidia]
 <ffffffffa0696eac>] rm_free_unused_clients+0x69/0xb7 [nvidia]
 <ffffffffa0782a9b>] nv_kern_ctl_close+0x92/0xc7 [nvidia]
 <ffffffff8029fa4d>] __fput+0xa1/0x165
 <ffffffff8029d09c>] filp_close+0x5b/0x62
 <ffffffff80234e1d>] put_files_struct+0x64/0xc2
 <ffffffff80236b5a>] do_exit+0x221/0x304
 <ffffffff80467c28>] oops_begin+0x0/0xe2
 <ffffffff8020cd17>] do_stack_segment+0x96/0xcc
 <ffffffff804677a7>] error_exit+0x0/0x69
 <ffffffff803e7431>] pci_conf1_read+0x0/0xd1
 <ffffffffa07837b0>] nv_kern_close+0x101/0x383 [nvidia]
 <ffffffffa078379f>] nv_kern_close+0xf0/0x383 [nvidia]
 <ffffffff8029fa4d>] __fput+0xa1/0x165
 <ffffffff8029d09c>] filp_close+0x5b/0x62
 <ffffffff8029d151>] sys_close+0xae/0x107
 <ffffffff8020b408>] system_call_fastpath+0x16/0x1b
 <ffffffff8020b3a0>] system_call+0x0/0x52

Do you need anything more?

Sorry to say so, this is beyond me. You’ll have to wait and see if others drop in, more knowledgable about Xen.

I tried to flash a new bios into the machine and now I have a little bit different behaviour. No more crash in dmesg, but X loops endless comsuming 100% of CPU-Time. Again, pressing Alt-Ctrl-F10 in X shows up this not so expected behaviour.

In top this looks like this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5818 root      19  -1  105m  31m  15m R  100  0.2   2:11.68 X

And since neither kill -15, nor kill -11 and kill -9 works, it seems (I guess!) to loop inside the kernel :frowning:

I think I will setup the display manager to support only remote display, I am glad to have a second fast machine and both are connected with gigabit ethernet. With this configuration CUDA is still available to crunch data for Einstein@Home, SETI and all these …

My understanding is that the NVIDIA driver does not like the XEN kernel and they should not be used together.

On Mon, 07 Jun 2010 08:46:01 +0000, Wurgl wrote for a reply:

> I have a strange problem that the X-Server freezes when I log out,
> switch to another virtual console or the like.
>
> It works fine when I boot the kernel without xen.
>
> The problem reduced to a very simple scenario: * Boot Xen-Dom0-Kernel
> * wait until the X11 display manager wants a login * Press Alt+Ctrl+Fx
> ==> X11 freezes
>
> I tried with kde4 and with xdm as display manager, both show the same
> problem. It does not happen when I boot with the XEN-less kernel.
>
> I am using openSUSE 11.1 with all patches on AMD64 and the nvidia
> driver. 32bit is not a solution, this machine has 16GB Memory (which
> would be useless …)
>
> Any ideas how I can solve this problem?

I have a similar problem with Xen, OpenSuse 11.1. It locks the taskbar
panel where nothing on the taskbar works everything has to be run from
terminals.

I haven’t tracked down the problem yet. Xen was working properly, though
as someone wrote nVidia drivers don’t work with Xen so, I use the VESA
driver’s Cirrus VGA card.


Chillingout@opensuse.forum