2006-05-28 05:10:43

by Mike Galbraith

[permalink] [raw]
Subject: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

Greetings,

I tried to boot 2.6.17-rc4-mm3 twice yesterday, and received the below
both times. Both times, the oops->panic occurred while X/KDE was
starting. KDE would not run thereafter, and had to be reinstalled.

Box is P4/HT/ICH5.

-Mike

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000054
printing eip:
b11c28f0
*pde = 37e93067
Oops: 0000 [#1]
PREEMPT SMP
last sysfs file: /devices/pci0000:00/0000:00:1f.3/class
Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd tda9887 ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter saa7134 ip6table_mangle snd_intel8x0 snd_ac97_codec snd_ac97_bus ir_kbd_i2c snd_pcm snd_timer snd ip_conntrack bt878 prism54 soundcore nfnetlink ohci1394 ieee1394 i2c_i801 ip_tables snd_page_alloc ip6table_filter ip6_tables x_tables tuner bttv video_buf firmware_class ir_common btcx_risc tveeprom nls_iso8859_1 nls_cp437 nls_utf8 sd_mod
CPU: 0
EIP: 0060:[<b11c28f0>] Not tainted VLI
EFLAGS: 00213046 (2.6.17-rc4-mm3-smp #157)
EIP is at cfq_dispatch_requests+0xef/0x540
eax: 00000000 ebx: ef547c30 ecx: fffc593e edx: 00000000
esi: ecbb4f98 edi: 00000001 ebp: b153def0 esp: b153deb4
ds: 007b es: 007b ss: 0068
Process X (pid: 6992, threadinfo=b153d000 task=eb757000)
Stack: 00203046 ef551f80 e44c1ee4 ee678f00 ef547c00 00000004 00000000 ef547c44
ef547c44 ef547c34 ef51f81c ef51f7f4 ef51f7e4 b1598400 ef5e5d80 b153df14
b11b744e 00000001 ef51f7e4 b153df14 b11b92a4 b1598494 b1598400 ef5e5d80
Call Trace:
<b1003cf3> show_stack_log_lvl+0x9e/0xc3 <b1003f00> show_registers+0x1ac/0x237
<b10040bd> die+0x132/0x2fb <b1019df3> do_page_fault+0x4f3/0x577
<b1003827> error_code+0x4f/0x54 <b11b744e> elv_next_request+0x1b/0x12f
<b12764a3> ide_do_request+0x1b7/0x841 <b1276e45> ide_intr+0x1dc/0x1e1
<b104a4a1> handle_IRQ_event+0x35/0x65 <b104a55f> __do_IRQ+0x8e/0xff
<b100562a> do_IRQ+0x3e/0x57
=======================
<b10036ce> common_interrupt+0x1a/0x20
Code: 00 00 75 32 8d 43 34 3b 43 34 74 2a 8b 43 34 8b 70 3c 8b 0d 00 34 4e b1 83 7b 10 01 19 c0 83 e0 fc 8b 84 10 00 01 00 00 8b 56 14 <03> 42 54 39 c8 0f 88 98 01 00 00 8b 73 20 89 f2 8b 4d d4 8b 01
EIP: [<b11c28f0>] cfq_dispatch_requests+0xef/0x540 SS:ESP 0068:b153deb4
Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:537/smp_call_function()
<b1003d52> show_trace+0xd/0xf <b1004440> dump_stack+0x17/0x19
<b10129d2> smp_call_function+0x124/0x129 <b10129f5> smp_send_stop+0x1e/0x27
<b1022a2b> panic+0x60/0x1c5 <b1004277> die+0x2ec/0x2fb
<b1019df3> do_page_fault+0x4f3/0x577 <b1003827> error_code+0x4f/0x54
<b11b744e> elv_next_request+0x1b/0x12f <b12764a3> ide_do_request+0x1b7/0x841
<b1276e45> ide_intr+0x1dc/0x1e1 <b104a4a1> handle_IRQ_event+0x35/0x65
<b104a55f> __do_IRQ+0x8e/0xff <b100562a> do_IRQ+0x3e/0x57
=======================
<b10036ce> common_interrupt+0x1a/0x20
BUG: warning at kernel/panic.c:138/panic()
<b1003d52> show_trace+0xd/0xf <b1004440> dump_stack+0x17/0x19
<b1022b5d> panic+0x192/0x1c5 <b1004277> die+0x2ec/0x2fb
<b1019df3> do_page_fault+0x4f3/0x577 <b1003827> error_code+0x4f/0x54
<b11b744e> elv_next_request+0x1b/0x12f <b12764a3> ide_do_request+0x1b7/0x841
<b1276e45> ide_intr+0x1dc/0x1e1 <b104a4a1> handle_IRQ_event+0x35/0x65
<b104a55f> __do_IRQ+0x8e/0xff <b100562a> do_IRQ+0x3e/0x57
=======================
<b10036ce> common_interrupt+0x1a/0x20

(gdb) list *cfq_dispatch_requests+0xef
0xb11c28f0 is in cfq_dispatch_requests (cfq-iosched.c:969).
964 if (!list_empty(&cfqq->fifo)) {
965 int fifo = cfq_cfqq_class_sync(cfqq);
966
967 crq = RQ_DATA(list_entry_fifo(cfqq->fifo.next));
968 rq = crq->request;
969 if (time_after(jiffies, rq->start_time + cfqd->cfq_fifo_expire[fifo])) {
970 cfq_mark_cfqq_fifo_expire(cfqq);
971 return crq;
972 }
973 }
(gdb)

0xb11c28f0 <cfq_dispatch_requests+239>: add 0x54(%edx),%eax




2006-05-28 05:25:18

by Al Viro

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Sun, May 28, 2006 at 07:12:03AM +0200, Mike Galbraith wrote:
> Greetings,
>
> I tried to boot 2.6.17-rc4-mm3 twice yesterday, and received the below
> both times. Both times, the oops->panic occurred while X/KDE was
> starting. KDE would not run thereafter, and had to be reinstalled.

Can you reproduce that with mainline?

2006-05-28 05:58:49

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Sun, 2006-05-28 at 06:25 +0100, Al Viro wrote:
> On Sun, May 28, 2006 at 07:12:03AM +0200, Mike Galbraith wrote:
> > Greetings,
> >
> > I tried to boot 2.6.17-rc4-mm3 twice yesterday, and received the below
> > both times. Both times, the oops->panic occurred while X/KDE was
> > starting. KDE would not run thereafter, and had to be reinstalled.
>
> Can you reproduce that with mainline?

Virgin rc4 has been working fine, but I've been using UP kernels. I'll
try the same config as SMP.

I'm still picking up the pieces ATM, because (expletive) YAST and I had
a minor disagreement wrt what all wanted restoration (yup, i lost;).

-Mike

2006-05-28 07:46:20

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Sun, 2006-05-28 at 08:00 +0200, Mike Galbraith wrote:
> On Sun, 2006-05-28 at 06:25 +0100, Al Viro wrote:
> > On Sun, May 28, 2006 at 07:12:03AM +0200, Mike Galbraith wrote:
> > > Greetings,
> > >
> > > I tried to boot 2.6.17-rc4-mm3 twice yesterday, and received the below
> > > both times. Both times, the oops->panic occurred while X/KDE was
> > > starting. KDE would not run thereafter, and had to be reinstalled.
> >
> > Can you reproduce that with mainline?
>
> Virgin rc4 has been working fine, but I've been using UP kernels. I'll
> try the same config as SMP.

She's running fine. Guess I'll go prod mm3 again.

-Mike

2006-05-28 08:01:13

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Sun, 2006-05-28 at 09:48 +0200, Mike Galbraith wrote:
> On Sun, 2006-05-28 at 08:00 +0200, Mike Galbraith wrote:
> > On Sun, 2006-05-28 at 06:25 +0100, Al Viro wrote:
> > > On Sun, May 28, 2006 at 07:12:03AM +0200, Mike Galbraith wrote:
> > > > Greetings,
> > > >
> > > > I tried to boot 2.6.17-rc4-mm3 twice yesterday, and received the below
> > > > both times. Both times, the oops->panic occurred while X/KDE was
> > > > starting. KDE would not run thereafter, and had to be reinstalled.
> > >
> > > Can you reproduce that with mainline?
> >
> > Virgin rc4 has been working fine, but I've been using UP kernels. I'll
> > try the same config as SMP.
>
> She's running fine. Guess I'll go prod mm3 again.

Yup, mm3 makes reliable kaboom.

I suppose the first thing to do is see if it's cfq, and then maybe toss
a dart at the patch list.

-Mike

2006-05-28 08:22:39

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Sun, 2006-05-28 at 10:03 +0200, Mike Galbraith wrote:

> Yup, mm3 makes reliable kaboom.
>
> I suppose the first thing to do is see if it's cfq, and then maybe toss
> a dart at the patch list.

That was too easy. It's git-cfq.patch.

-Mike

2006-05-29 10:58:35

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Sun, 2006-05-28 at 10:24 +0200, Mike Galbraith wrote:
> On Sun, 2006-05-28 at 10:03 +0200, Mike Galbraith wrote:
>
> > Yup, mm3 makes reliable kaboom.
> >
> > I suppose the first thing to do is see if it's cfq, and then maybe toss
> > a dart at the patch list.
>
> That was too easy. It's git-cfq.patch.

Too easy indeed.

After staring at these changes, and not having anything poke me in the
eye that looked like it might cause list corruption, I decided to try
them in a different kernel. I put them into 2.6.16-rt25, and there they
work peachy. A diff of 2.6.16-rt25+git-cfq.patch->2.6.17-rc4-mm3 shows
what I was expecting (locking changes), but it's embedded in ~1000 lines
of diff, and doesn't look particularly trivial.

Hi Jens <punt> :)

-Mike

2006-05-30 12:34:49

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Mon, May 29 2006, Mike Galbraith wrote:
> On Sun, 2006-05-28 at 10:24 +0200, Mike Galbraith wrote:
> > On Sun, 2006-05-28 at 10:03 +0200, Mike Galbraith wrote:
> >
> > > Yup, mm3 makes reliable kaboom.
> > >
> > > I suppose the first thing to do is see if it's cfq, and then maybe toss
> > > a dart at the patch list.
> >
> > That was too easy. It's git-cfq.patch.
>
> Too easy indeed.
>
> After staring at these changes, and not having anything poke me in the
> eye that looked like it might cause list corruption, I decided to try
> them in a different kernel. I put them into 2.6.16-rt25, and there they
> work peachy. A diff of 2.6.16-rt25+git-cfq.patch->2.6.17-rc4-mm3 shows
> what I was expecting (locking changes), but it's embedded in ~1000 lines
> of diff, and doesn't look particularly trivial.
>
> Hi Jens <punt> :)

I'm suspecting a recent -mm change, since git-cfq hasn't changed in
quite a while and it used to work just fine. Can you pass me the diff
you generated?

--
Jens Axboe

2006-05-30 13:24:56

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Tue, 2006-05-30 at 14:36 +0200, Jens Axboe wrote:

> I'm suspecting a recent -mm change, since git-cfq hasn't changed in
> quite a while and it used to work just fine.

It's apparently not mm. I just plugged it into 2.6.17-rc4, and get the
same explosion. It doesn't seem to play well with the changes in
2.6.17-rc1.

-Mike

2006-05-30 13:28:00

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.17-rc4-mm3 cfq oops->panic w. fs damage

On Tue, May 30 2006, Mike Galbraith wrote:
> On Tue, 2006-05-30 at 14:36 +0200, Jens Axboe wrote:
>
> > I'm suspecting a recent -mm change, since git-cfq hasn't changed in
> > quite a while and it used to work just fine.
>
> It's apparently not mm. I just plugged it into 2.6.17-rc4, and get the
> same explosion. It doesn't seem to play well with the changes in
> 2.6.17-rc1.

Ah, ok that makes sense. I'll take a closer look at it, thanks for
reporting!

--
Jens Axboe