2015-08-24 23:41:42

by Ken Moffat

[permalink] [raw]
Subject: Lockups with 4.2-rc kernels and qemu

(Cc'ing qemu-devel, please keep me in the Cc).

TL;DR - qemu locks up my machine when I use 4.2-rc kernels.

I only use qemu from time to time, mostly to test changes to my own
scripts, or new package versions, in beyond.linuxfromscratch.org. A
few days ago I came back to that, building an LFS system from the
beginning of this month and then using that to try out my changes.
In the beginning I built a 4.2-rc5 kernel and booted that system in
qemu-2.4.0.

I had several lockups along the way from a minimal system to the end
of my attempts to build a full desktop, and in fact I dropped back
to a 4.1 kernel and qemu-2.3.0 before I completed the build. Mostly,
the qemu system appeared to have been idle when the box locked up, or
else idle apart from xscreensaver, but on one occasion it ran through
several steps to build Xorg protocol header packages - I write a
'stamp' file so that I can resume after a failure, and several of
these were created, but empty (instead of a small amount of version,
time, space data) and it later turned out that nothing had been
installed for those packages.

In all cases I have nothing relevant in either the guest or the host
logs. On one or two occasions I got the flashing keyboard LEDs.

After the initial problems I ran memtest86+ for 10 hours without any
errors.

In the end, I dropped back to a 4.1 kernel and also qemu-2.3.0
because my concern was to finish the build. I was thinking -
erroneously, of course - that this was a qemu problem. Gradually, I
moved to 4.2-rc kernels and things appeared to be ok. But today I
installed 4.2-rc8 and thought about trying to create a test-case to
see if the kernel+qemu combination is probably ok. I decided to
start qemu, login, run startx [ xscreensaver will be invoked,
otherwise userspace is idle ], and leave it for 3 hours - lockups
varied in how long they took to appear, but all were in 2 hours or
less. So, I left -rc8 and qemu-2.3.0, and when I returned to it
after about 3 and a quarter hours the box had locked up.

The machine is an AMD phenom (not always the most reliable, I had to
drop caches today to get it to reliably build -rc8 with make -j 4
but otherwise it seems to have not needed that in the past month).
I'll attach the kernel config.

I compiled qemu-2.3.0 with:

sed -i '/resource/ i#include <sys/xattr.h>' \
fsdev/virtfs-proxy-helper.c

./configure --prefix=/usr \
--sysconfdir=/etc \
--docdir=/usr/share/doc/qemu-2.3.0 \
--target-list=x86_64-softmmu \
--audio-drv-list=alsa

and I use a wrapper script for qemu which tells me that what I am
actually running is:

qemu-system-x86_64 -enable-kvm -hda svn20150803-32-desk2.img -hdb
svn-logging-tools.img -m 2G -usb -usbdevice tablet -show-cursor
-display sdl -cpu kvm32 -monitor stdio -smp 4 -vga std

That was for building a new system, later runs only use a -hda
image. And all of the uses have been for i686 guests.

The host (and the initial guest used to build the new system) is
gcc-5.1.0, the newly built guests 5.2.0; binutils 2.25 / 2.25.1;
glibc 2.21 throughout.

Because I need to leave the system running for up to 3 hours to get
an idea if a particular kernel is ok, I don't think this problem
lends itself to bisection.

Any suggestions, please ?

ĸen
--
This one goes up to eleven: but only on a clear day, with the wind in
the right direction.


Attachments:
(No filename) (3.29 kB)
config-4.2-rc8.phenom (80.67 kB)
Download all attachments

2015-08-27 12:00:41

by Ken Moffat

[permalink] [raw]
Subject: Re: Lockups with 4.2-rc kernels and qemu

On Tue, Aug 25, 2015 at 12:40:15AM +0100, Ken Moffat wrote:
> (Cc'ing qemu-devel, please keep me in the Cc).
>
> TL;DR - qemu locks up my machine when I use 4.2-rc kernels.
>
Previous mail, or at least the copy to qemu, archived at
https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02784.html

I started bisecting, but it is going to be a slow business - kernels
which seem to be ok after e.g. 3 hours can still lock the box. I
left it running overnight, and this morning it was still up - but
firefox could no longer connect externally. I eventually tracked
that to my logs taking the space. But one reason the logs had
become so big was the following (found from this morning's restart):

Aug 27 09:50:28 ac4tv kernel: [ 124.279813] BUG: scheduling while
atomic: qemu-system-x86/1789/0x00000002
Aug 27 09:50:28 ac4tv kernel: [ 124.279819] Modules linked in:
psmouse i2c_piix4 asus_atk0110 microcode k10temp
Aug 27 09:50:28 ac4tv kernel: [ 124.279828] Preemption disabled
at:[<ffffffff8100787c>] kvm_vcpu_ioctl+0x7c/0x580
Aug 27 09:50:28 ac4tv kernel: [ 124.279839]
Aug 27 09:50:28 ac4tv kernel: [ 124.279842] CPU: 2 PID: 1789 Comm:
qemu-system-x86 Not tainted 4.1.0-10762-g8c7febe #4
Aug 27 09:50:28 ac4tv kernel: [ 124.279843] Hardware name: System
manufacturer System Product Name/M5A78L-M LX V2, BIOS 0601
11/30/2011
Aug 27 09:50:28 ac4tv kernel: [ 124.279845] ffff88021fd14880
ffff88020718fc38 ffffffff8178316e 0000000080000002
Aug 27 09:50:28 ac4tv kernel: [ 124.279847] 0000000000014880
ffff88020718fc48 ffffffff810b93d7 ffff88020718fca8
Aug 27 09:50:28 ac4tv kernel: [ 124.279849] ffffffff8178607c
ffff88021fc14880 ffff880215ebc100 ffff88020718fc78
Aug 27 09:50:28 ac4tv kernel: [ 124.279851] Call Trace:
Aug 27 09:50:28 ac4tv kernel: [ 124.279855] [<ffffffff8178316e>]
dump_stack+0x4f/0x7b
Aug 27 09:50:28 ac4tv kernel: [ 124.279859] [<ffffffff810b93d7>]
__schedule_bug+0x57/0xa0
Aug 27 09:50:28 ac4tv kernel: [ 124.279861] [<ffffffff8178607c>]
__schedule+0x67c/0x9d0
Aug 27 09:50:28 ac4tv kernel: [ 124.279864] [<ffffffff8178b008>] ?
_raw_write_unlock_irqrestore+0x18/0x40
Aug 27 09:50:28 ac4tv kernel: [ 124.279865] [<ffffffff817864b1>]
schedule+0x41/0x90
Aug 27 09:50:28 ac4tv kernel: [ 124.279867] [<ffffffff81786888>]
schedule_preempt_disabled+0x18/0x30
Aug 27 09:50:28 ac4tv kernel: [ 124.279869] [<ffffffff81788f9d>]
__mutex_lock_slowpath+0x10d/0x340
Aug 27 09:50:28 ac4tv kernel: [ 124.279871] [<ffffffff817891e6>]
mutex_lock+0x16/0x30
Aug 27 09:50:28 ac4tv kernel: [ 124.279874] [<ffffffff8115c4c2>]
static_key_slow_inc+0x42/0xc0
Aug 27 09:50:28 ac4tv kernel: [ 124.279875] [<ffffffff810b92dd>]
preempt_notifier_register+0x1d/0x60
Aug 27 09:50:28 ac4tv kernel: [ 124.279881] [<ffffffff8100774d>]
vcpu_load+0x4d/0x90
Aug 27 09:50:28 ac4tv kernel: [ 124.279887] [<ffffffff8100787c>]
kvm_vcpu_ioctl+0x7c/0x580
Aug 27 09:50:28 ac4tv kernel: [ 124.279892] [<ffffffff8178af77>] ?
_raw_write_unlock_irq+0x17/0x40
Aug 27 09:50:28 ac4tv kernel: [ 124.279898] [<ffffffff811bb445>]
do_vfs_ioctl+0x295/0x480
Aug 27 09:50:28 ac4tv kernel: [ 124.279905] [<ffffffff811c5189>] ?
__fget+0x79/0xb0
Aug 27 09:50:28 ac4tv kernel: [ 124.279910] [<ffffffff811bb67c>]
SyS_ioctl+0x4c/0x90
Aug 27 09:50:28 ac4tv kernel: [ 124.279914] [<ffffffff8115cf23>] ?
context_tracking_user_enter+0x13/0x20
Aug 27 09:50:28 ac4tv kernel: [ 124.279917] [<ffffffff8105c055>] ?
syscall_trace_leave+0xa5/0x140
Aug 27 09:50:28 ac4tv kernel: [ 124.279919] [<ffffffff8178b657>]
entry_SYSCALL_64_fastpath+0x12/0x6a
Aug 27 09:50:28 ac4tv kernel: [ 124.477786] kvm: zapping shadow
pages for mmio generation wraparound
Aug 27 09:50:31 ac4tv kernel: [ 127.177303] BUG: scheduling while
atomic: qemu-system-x86/1788/0x00000002
Aug 27 09:50:31 ac4tv kernel: [ 127.177309] Modules linked in:
Aug 27 09:50:31 ac4tv kernel: [ 127.177317] BUG: scheduling while
atomic: qemu-system-x86/1790/0x00000002
Aug 27 09:50:31 ac4tv kernel: [ 127.177318] psmouse i2c_piix4
asus_atk0110 microcode k10temp
Aug 27 09:50:31 ac4tv kernel: [ 127.177332] Preemption disabled at:
Aug 27 09:50:31 ac4tv kernel: [ 127.177334] Modules linked in:
psmouse i2c_piix4 asus_atk0110 microcode k10temp
Aug 27 09:50:31 ac4tv kernel: [ 127.177341] Preemption disabled
at:[<ffffffff8100787c>] kvm_vcpu_ioctl+0x7c/0x580
etc, etc.

I have not _noticed_ these messages on -rc8, but for the moment I am
inclined to treat them as another sign of the same problem. Which
means a fresh bisection. <sigh/>

ĸen
--
This one goes up to eleven: but only on a clear day, with the wind in
the right direction.

2015-08-27 16:54:24

by Ken Moffat

[permalink] [raw]
Subject: Re: Lockups with 4.2-rc kernels and qemu

On Thu, Aug 27, 2015 at 12:59:14PM +0100, Ken Moffat wrote:
> On Tue, Aug 25, 2015 at 12:40:15AM +0100, Ken Moffat wrote:
> > (Cc'ing qemu-devel, please keep me in the Cc).
> >
> > TL;DR - qemu locks up my machine when I use 4.2-rc kernels.
> >
> Previous mail, or at least the copy to qemu, archived at
> https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02784.html
>
> I started bisecting, but it is going to be a slow business - kernels
> which seem to be ok after e.g. 3 hours can still lock the box. I
> left it running overnight, and this morning it was still up - but
> firefox could no longer connect externally. I eventually tracked
> that to my logs taking the space. But one reason the logs had
> become so big was the following (found from this morning's restart):
>
> Aug 27 09:50:28 ac4tv kernel: [ 124.279813] BUG: scheduling while
> atomic: qemu-system-x86/1789/0x00000002
> Aug 27 09:50:28 ac4tv kernel: [ 124.279819] Modules linked in:
> psmouse i2c_piix4 asus_atk0110 microcode k10temp
> Aug 27 09:50:28 ac4tv kernel: [ 124.279828] Preemption disabled
> at:[<ffffffff8100787c>] kvm_vcpu_ioctl+0x7c/0x580

This is definitely a *different* bug - I just tried the next kernel,
and started to get these messages (almost 100,000 lines of them in
the next 4 minutes) : if that had been happening a couple of weeks
ago, I would have noticed ;-)

But, the end result is that I cannot attempt to bisect the lockups,
this scheduling while atomic business is preventing me looking for a
good kernel.

Has anybody had success qith qemu on an x86_64 host running a 4.2-rc
kernel ? Looks as if I'll have to give up, and hope that the main
problem doe not spread to 4.1.

ĸen
--
This one goes up to eleven: but only on a clear day, with the wind in
the right direction.

2015-08-28 03:05:17

by Ken Moffat

[permalink] [raw]
Subject: Re: Lockups with 4.2-rc kernels and qemu

On Thu, Aug 27, 2015 at 05:52:58PM +0100, Ken Moffat wrote:
> On Thu, Aug 27, 2015 at 12:59:14PM +0100, Ken Moffat wrote:
> > On Tue, Aug 25, 2015 at 12:40:15AM +0100, Ken Moffat wrote:
> > > (Cc'ing qemu-devel, please keep me in the Cc).
> > >
> > > TL;DR - qemu locks up my machine when I use 4.2-rc kernels.
> > >
> > Previous mail, or at least the copy to qemu, archived at
> > https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg02784.html
> >
> > I started bisecting, but it is going to be a slow business - kernels
> > which seem to be ok after e.g. 3 hours can still lock the box. I
> > left it running overnight, and this morning it was still up - but
> > firefox could no longer connect externally. I eventually tracked
> > that to my logs taking the space. But one reason the logs had
> > become so big was the following (found from this morning's restart):
> >
> > Aug 27 09:50:28 ac4tv kernel: [ 124.279813] BUG: scheduling while
> > atomic: qemu-system-x86/1789/0x00000002
> > Aug 27 09:50:28 ac4tv kernel: [ 124.279819] Modules linked in:
> > psmouse i2c_piix4 asus_atk0110 microcode k10temp
> > Aug 27 09:50:28 ac4tv kernel: [ 124.279828] Preemption disabled
> > at:[<ffffffff8100787c>] kvm_vcpu_ioctl+0x7c/0x580
>
> This is definitely a *different* bug - I just tried the next kernel,
> and started to get these messages (almost 100,000 lines of them in
> the next 4 minutes) : if that had been happening a couple of weeks
> ago, I would have noticed ;-)
>
> But, the end result is that I cannot attempt to bisect the lockups,
> this scheduling while atomic business is preventing me looking for a
> good kernel.
>
> Has anybody had success qith qemu on an x86_64 host running a 4.2-rc
> kernel ? Looks as if I'll have to give up, and hope that the main
> problem doe not spread to 4.1.
>
What will (hopefully) be my last comment on this - I went back to
4.1.3 on the host, and after a couple of hours it locked up. Maybe
in the past I have jsut been lucky. I think in future I'll probably
stick to native builds., it seems to be less painful.

ĸen
--
Il Porcupino Nil Sodomy Est! (if you will excuse my latatian)
aka "The hedgehog song"