2007-10-19 11:23:17

by Carsten Otte

[permalink] [raw]
Subject: severe bug in 2.6.23+ kvm.git

Hi list,

we've experienced a severe bug in current kvm.git, that may have been
introduced to the git tree quite recently around last weekend. 2.6.23
is broken, 2.6.23-rc8 works for us. The symptom is, that our operon
kvm test machine shredders its hard disk content to a state that is
not correctably by the file system checker. We use raid1 md mirrored
ext3 file systems on 4 sata hard disks on it, and we've verified
correct operation of the hardware via badblocks and memtest86.
The problem occurs even without kvm modules loaded, so the cause seems
to be something that Avi pulled elsewhere. Did anyone else experience
similar problems with the 2.6.23 based kvm tree? Does anyone have an
idea about a possible cause, which would help us debugging it?

thanks,
Carsten


2007-10-19 11:31:35

by Aurelien Jarno

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Carsten Otte a ?crit :
> Hi list,
>
> we've experienced a severe bug in current kvm.git, that may have been
> introduced to the git tree quite recently around last weekend. 2.6.23
> is broken, 2.6.23-rc8 works for us. The symptom is, that our operon
> kvm test machine shredders its hard disk content to a state that is
> not correctably by the file system checker. We use raid1 md mirrored
> ext3 file systems on 4 sata hard disks on it, and we've verified
> correct operation of the hardware via badblocks and memtest86.
> The problem occurs even without kvm modules loaded, so the cause seems
> to be something that Avi pulled elsewhere. Did anyone else experience
> similar problems with the 2.6.23 based kvm tree? Does anyone have an
> idea about a possible cause, which would help us debugging it?
>

Could you please precise what is corrupted? The guest disk image?

If that's the case, I experienced the same problem since
kvm-userspace.git has been updated to the latest qemu CVS, and I can
reproduce it with plain QEMU. I am able to reproduce it easily by
booting FreeBSD.

The problem is actually in QEMU, it has been broken by this commit:
http://cvs.savannah.nongnu.org/viewvc/qemu/hw/ide.c?root=qemu&r1=1.64&r2=1.65

You can try to apply the attached patch, it reverts this commit and can
be applied to the latest QEMU CVS and to the latest KVM versions.

--
.''`. Aurelien Jarno | GPG: 1024D/F1BCDB73
: :' : Debian developer | Electrical Engineer
`. `' [email protected] | [email protected]
`- people.debian.org/~aurel32 | http://www.aurel32.net


Attachments:
qemu-ide-corruption.diff (2.53 kB)

2007-10-19 11:37:57

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Aurelien Jarno wrote:
> Could you please precise what is corrupted? The guest disk image?
As stated, we actually did not run any guests and did not load the kvm
kernel modules.
The host root file system gets corrupted to an extend not correctable
by the file system checker (we gave it 24h to repair, then interrupted
it), and it's very easy to reproduce: a simple kernel make on the
hosts lets us reinstall the entire host operating system.


2007-10-19 11:43:22

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Carsten Otte a ?crit :
> Aurelien Jarno wrote:
>> Could you please precise what is corrupted? The guest disk image?
> As stated, we actually did not run any guests and did not load the kvm
> kernel modules.
> The host root file system gets corrupted to an extend not correctable
> by the file system checker (we gave it 24h to repair, then interrupted
> it), and it's very easy to reproduce: a simple kernel make on the
> hosts lets us reinstall the entire host operating system.

How do you know the problem has been introduced by kvm ?

Laurent
--
---------------- [email protected] -----------------
"Given enough eyeballs, all bugs are shallow" E. S. Raymond

2007-10-19 11:49:46

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Laurent Vivier wrote:
> How do you know the problem has been introduced by kvm ?
I don't. In fact I think it has not been introduced by kvm. All I
stated, is that we experienced the problem when running the kvm.git
kernel after the 2.6.23 update that has not been present in the
kvm.git -rc8 as of last thursday.

2007-10-19 11:54:40

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Carsten Otte a ?crit :
> Laurent Vivier wrote:
>> How do you know the problem has been introduced by kvm ?
> I don't. In fact I think it has not been introduced by kvm. All I
> stated, is that we experienced the problem when running the kvm.git
> kernel after the 2.6.23 update that has not been present in the
> kvm.git -rc8 as of last thursday.

Perhaps 2.6.23.1 corrects this ?

http://lkml.org/lkml/2007/10/12/302

Laurent
--
---------------- [email protected] -----------------
"Given enough eyeballs, all bugs are shallow" E. S. Raymond

2007-10-19 11:56:17

by Carsten Otte

[permalink] [raw]
Subject: Re: severe bug in 2.6.23+ kvm.git

Carsten Otte wrote:
> 2.6.23 is broken, 2.6.23-rc8 works for us.
Actually, the working version was 2.6.23-rc6, git-head of kvm.git as
of October 11.

2007-10-19 11:58:38

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Am Freitag, 19. Oktober 2007 schrieb Laurent Vivier:
> Carsten Otte a ?crit :
> > Laurent Vivier wrote:
> >> How do you know the problem has been introduced by kvm ?
> > I don't. In fact I think it has not been introduced by kvm. All I
> > stated, is that we experienced the problem when running the kvm.git
> > kernel after the 2.6.23 update that has not been present in the
> > kvm.git -rc8 as of last thursday.
>
> Perhaps 2.6.23.1 corrects this ?
>
> http://lkml.org/lkml/2007/10/12/302

No, we dont have an marvel chipset.

kvm:~# lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a4)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev b1)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a4)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f3)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)]
01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
02:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)

Christian

2007-10-19 12:21:56

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Mike Lampard wrote:
> There was a commit ab9c232286c2b77be78441c2d8396500b045777e regarding libata
> on linus's master tree that happened on Friday, that was pulled into kvm git
> over the weekend.. I dont know if that may be affecting you.. there is/was
> also chatter on LKML regarding some problems with s/g, you may want to check
> there.
Oh, that's a couple of patches in question. Git-bisect seems to be a
loong way once you loose your installation every time you try.
First thing we do, is figure whether or not 2.6.23.1 as released
breaks our system too. This way, we can either focus on differences
between Linus and Avi, or turn on the big red warning sign saying
"regression".

2007-10-19 12:27:17

by Mike Lampard

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

On Fri, 19 Oct 2007 09:07:42 pm Carsten Otte wrote:
> Aurelien Jarno wrote:
> > Could you please precise what is corrupted? The guest disk image?
>
> As stated, we actually did not run any guests and did not load the kvm
> kernel modules.
> The host root file system gets corrupted to an extend not correctable
> by the file system checker (we gave it 24h to repair, then interrupted
> it), and it's very easy to reproduce: a simple kernel make on the
> hosts lets us reinstall the entire host operating system.
>
There was a commit ab9c232286c2b77be78441c2d8396500b045777e regarding libata
on linus's master tree that happened on Friday, that was pulled into kvm git
over the weekend.. I dont know if that may be affecting you.. there is/was
also chatter on LKML regarding some problems with s/g, you may want to check
there.

Cheers
Mike

2007-10-19 13:45:28

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Carsten Otte wrote:
> First thing we do, is figure whether or not 2.6.23.1 as released breaks
> our system too. This way, we can either focus on differences between
> Linus and Avi, or turn on the big red warning sign saying "regression".
Looks like 2.6.23.1 works fine on that box. We'll leave it running over
the weekend with "while true; do make; make clean; done".


2007-10-19 14:19:14

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git


On Oct 19 2007 15:44, Carsten Otte wrote:
> Carsten Otte wrote:
>> First thing we do, is figure whether or not 2.6.23.1 as released breaks our
>> system too. This way, we can either focus on differences between Linus and
>> Avi, or turn on the big red warning sign saying "regression".
>
> Looks like 2.6.23.1 works fine on that box. We'll leave it running over
> the weekend with "while true; do make; make clean; done".

Well, do you happen to use sata_mv?

2007-10-19 14:49:21

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Am Freitag, 19. Oktober 2007 schrieb Jan Engelhardt:
>
> On Oct 19 2007 15:44, Carsten Otte wrote:
> > Carsten Otte wrote:
> >> First thing we do, is figure whether or not 2.6.23.1 as released breaks our
> >> system too. This way, we can either focus on differences between Linus and
> >> Avi, or turn on the big red warning sign saying "regression".
> >
> > Looks like 2.6.23.1 works fine on that box. We'll leave it running over
> > the weekend with "while true; do make; make clean; done".
>
> Well, do you happen to use sata_mv?

no, we have nvidia, so its sata_nv.


2007-10-19 14:58:22

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Christian Borntraeger a ?crit :
> Am Freitag, 19. Oktober 2007 schrieb Jan Engelhardt:
>> On Oct 19 2007 15:44, Carsten Otte wrote:
>>> Carsten Otte wrote:
>>>> First thing we do, is figure whether or not 2.6.23.1 as released breaks our
>>>> system too. This way, we can either focus on differences between Linus and
>>>> Avi, or turn on the big red warning sign saying "regression".
>>> Looks like 2.6.23.1 works fine on that box. We'll leave it running over
>>> the weekend with "while true; do make; make clean; done".
>> Well, do you happen to use sata_mv?
>
> no, we have nvidia, so its sata_nv.

Did you patch kvm.git with patch-2.6.23.1.bz2 or did you download
linux-2.6.23.1.tar.bz2 ?

2.6.23.1 corrects nothing except sata_mv...

Laurent
--
---------------- [email protected] -----------------
"Given enough eyeballs, all bugs are shallow" E. S. Raymond

2007-10-19 15:13:46

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Carsten Otte wrote:
> Hi list,
>
> we've experienced a severe bug in current kvm.git, that may have been
> introduced to the git tree quite recently around last weekend. 2.6.23
> is broken, 2.6.23-rc8 works for us. The symptom is, that our operon
> kvm test machine shredders its hard disk content to a state that is
> not correctably by the file system checker. We use raid1 md mirrored
> ext3 file systems on 4 sata hard disks on it, and we've verified
> correct operation of the hardware via badblocks and memtest86.
> The problem occurs even without kvm modules loaded, so the cause seems
> to be something that Avi pulled elsewhere. Did anyone else experience
> similar problems with the 2.6.23 based kvm tree? Does anyone have an
> idea about a possible cause, which would help us debugging it?
>
>

kvm.git is actually 2.6.24-rc, pulled from -linus at a random point in
time, so it's not at all surprising if something is broken.

One option is for you to pull -linus to get the latest and hopefully
greatest and see if the bug is fixed.

Another is to use the external module capability to build kvm.git
against 2.6.23.1.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-10-19 15:23:59

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Am Freitag, 19. Oktober 2007 schrieb Laurent Vivier:
> Did you patch kvm.git with patch-2.6.23.1.bz2 or did you download
> linux-2.6.23.1.tar.bz2 ?
>
> 2.6.23.1 corrects nothing except sata_mv...

Yes I know. The question we tried to answer was: is the bug in 2.6.23 or was
it introduced after 2.6.23, as kvm.git already contains lots of post 2. 6.23
stuff.
Current state is:

kvm.git with tag 2.6.23-rc6 works for days without a problem.
2.6.23.1 vanilla has survived and is currently still under test.
kvm.git tag master killed our filesystem at least three times.since monday.

I will continue to bang on 2.6.23.1 to see if its really fine. After that,
maybe I will try to bisect on kvm.git, but this will take quite a long time,
given that we had to reinstall the system due to this error.

Christian

2007-10-19 15:43:22

by Luca Tettamanti

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

On 10/19/07, Christian Borntraeger <[email protected]> wrote:
> Am Freitag, 19. Oktober 2007 schrieb Laurent Vivier:
> > Did you patch kvm.git with patch-2.6.23.1.bz2 or did you download
> > linux-2.6.23.1.tar.bz2 ?
> >
> > 2.6.23.1 corrects nothing except sata_mv...
>
> Yes I know. The question we tried to answer was: is the bug in 2.6.23 or was
> it introduced after 2.6.23, as kvm.git already contains lots of post 2. 6.23
> stuff.
> Current state is:
>
> kvm.git with tag 2.6.23-rc6 works for days without a problem.
> 2.6.23.1 vanilla has survived and is currently still under test.
> kvm.git tag master killed our filesystem at least three times.since monday.

linus-git has at least one bug with SG chaining, but usually it just
hangs the machine. Patch is here:

http://lkml.org/lkml/2007/10/17/269

Luca

2007-10-19 18:50:10

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Am Freitag, 19. Oktober 2007 schrieb Luca Tettamanti:
> linus-git has at least one bug with SG chaining, but usually it just
> hangs the machine. Patch is here:
>
> http://lkml.org/lkml/2007/10/17/269

Looks promising.
I pulled this fix by pulling the latest Linus-git into the kvm.git. I also
enabled some debug options in the kernel hacking section. This resulting
kernel seems to be stable so far. We will see in the next days if the problem
is really gone.

Thanks to all for your ideas.

Christian

2007-10-22 10:57:48

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Avi Kivity wrote:
> kvm.git is actually 2.6.24-rc, pulled from -linus at a random point in
> time, so it's not at all surprising if something is broken.
Right. We have a backup now, so next time we'll be ok ;-). Would you
please pull from Linus again to get the fix into kvm.git so that we
can use your tree on that machine again?

thanks,
Carsten

2007-10-22 11:50:01

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Carsten Otte wrote:
> Avi Kivity wrote:
>> kvm.git is actually 2.6.24-rc, pulled from -linus at a random point in
>> time, so it's not at all surprising if something is broken.
> Right. We have a backup now, so next time we'll be ok ;-). Would you
> please pull from Linus again to get the fix into kvm.git so that we
> can use your tree on that machine again?
>

I pulled yesterday so it should be all right (and you don't need me for
that; you can pull from Linus on top of kvm.git).


--
error compiling committee.c: too many arguments to function

2007-10-22 13:15:06

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Avi Kivity wrote:
> I pulled yesterday so it should be all right (and you don't need me for
> that; you can pull from Linus on top of kvm.git).
Thanks :-).

2007-10-22 14:01:17

by Carsten Otte

[permalink] [raw]
Subject: Re: [kvm-devel] severe bug in 2.6.23+ kvm.git

Avi Kivity wrote:
> I pulled yesterday so it should be all right (and you don't need me for
> that; you can pull from Linus on top of kvm.git).
The machine runs kvm.git for a while now, seems to work ok.