2004-09-08 00:18:13

by Richard A Nelson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

I've received a few of these already - always during *very* heavy disk
activity. After the Oops, the disk becomes strangely idle :), and a reboot
is required.

Unable to handle kernel paging request at virtual address 6b6b6b93
printing eip:
c01ae727
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: ppp_generic slhc radeon msr ds lp binfmt_misc autofs4
thermal fan button ac battery af_packet sch_ingress cls_u32 sch_sfq
sch_htb ipt_MASQUERADE ip6t_multiport ipt_multiport ipt_TOS ipt_state
ipt_TARPIT ip6t_limit ipt_limit ipt_REJECT ip6t_LOG ipt_LOG ipt_pkttype
ipt_recent ip6table_mangle iptable_mangle ip6table_filter ip6_tables
iptable_filter eepro100 snd_intel8x0m hw_random usbhid uhci_hcd usbcore
parport_pc parport irtty_sir sir_dev irda crc_ccitt pcspkr yenta_socket
pcmcia_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi
snd_seq_device snd soundcore nls_iso8859_1 nls_cp437 vfat fat dm_mod
joydev evdev psmouse nvram capability commoncap intel_agp agpgart tun
e100 mii ip_nat_tftp ip_nat_irc ip_conntrack_irc ip_nat_ftp iptable_nat
ip_conntrack_ftp ip_conntrack ip_tables md5 ipv6 proc_intf acpi
freq_table processor microcode cpuid rtc unix
CPU: 0
EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
EFLAGS: 00010202 (2.6.9-rc1-mm4)
EIP is at __journal_clean_checkpoint_list+0xc7/0xf0
eax: ce70e650 ebx: 6b6b6b6b ecx: 00000000 edx: cf5aa000
esi: cf5aa000 edi: c322f7a8 ebp: cf5aadb8 esp: cf5aad90
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 1351, threadinfo=cf5aa000 task=cf588000)
Stack: cf5aa000 c322f7a8 0000017f ce70e578 c3f2550c ce70e650 cfcbe9a8 cf5aa000
00000000 00000000 cf5aaf58 c01abc6e 00000000 5a5a5a5a 5a5a5a5a 5a5a5a5a
5a5a5a5a cfcbea04 cf5aa000 5a5a5a5a 5a5a5a5a 00000000 00000000 00000000
Call Trace:
[show_stack+122/144] show_stack+0x7a/0x90
[show_registers+329/432] show_registers+0x149/0x1b0
[die+221/368] die+0xdd/0x170
[do_page_fault+565/1463] do_page_fault+0x235/0x5b7
[error_code+45/56] error_code+0x2d/0x38
[journal_commit_transaction+670/6480] journal_commit_transaction+0x29e/0x1950
[kjournald+342/992] kjournald+0x156/0x3e0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Code: 45 e0 83 c4 1c 5b 5e 5f 5d c3 e8 f5 1a 14 00 eb ee 8b 45 d8 ff 48
14 8b 55 d8 8b 42 08 a8 08 75 2b 8b 45 ec 8b 58 28 85 db 74 09 <8b> 43
28 8b 55 ec 89 42 28 8b 45 f0 8b 40 30 85 c0 89 45 ec 74
--
Rick Nelson
<gholam> well I'm impressed
<gholam> win98 managed to crash X from within vmware.
* gholam applauds.


2004-09-08 09:07:14

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

Richard A Nelson <[email protected]> wrote:
>
> I've received a few of these already - always during *very* heavy disk
> activity. After the Oops, the disk becomes strangely idle :), and a reboot
> is required.
>
> Unable to handle kernel paging request at virtual address 6b6b6b93
> ...
> EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI

This might have been caused by a fishy latency-reduction patch. I today
dropped that patch so could you please test next -mm and let me know?

Alternativety, revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm4/broken-out/journal_clean_checkpoint_list-latency-fix.patch

Thanks.

2004-09-08 09:23:58

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

Hi,

On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:

> > Unable to handle kernel paging request at virtual address 6b6b6b93
> > ...
> > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
>
> This might have been caused by a fishy latency-reduction patch. I today
> dropped that patch so could you please test next -mm and let me know?

That, or preempt. If the next -mm still breaks, time to hunt for the
preempt problem, I guess.

--Stephen


2004-09-08 17:09:25

by Richard A Nelson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Wed, 8 Sep 2004, Andrew Morton wrote:

> Richard A Nelson <[email protected]> wrote:
> >
> > I've received a few of these already - always during *very* heavy disk
> > activity. After the Oops, the disk becomes strangely idle :), and a reboot
> > is required.
> >
> > Unable to handle kernel paging request at virtual address 6b6b6b93
> > ...
> > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
>
> This might have been caused by a fishy latency-reduction patch. I today
> dropped that patch so could you please test next -mm and let me know?
>
> Alternativety, revert ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm4/broken-out/journal_clean_checkpoint_list-latency-fix.patch

Reverted and building now, will reboot and test asap
--
Rick Nelson
<Addi> Alter.net seems to have replaced one of its router with a zucchini.

2004-09-08 17:11:50

by Richard A Nelson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:

> On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
>
> > > Unable to handle kernel paging request at virtual address 6b6b6b93
> > > ...
> > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
> >
> > This might have been caused by a fishy latency-reduction patch. I today
> > dropped that patch so could you please test next -mm and let me know?
>
> That, or preempt. If the next -mm still breaks, time to hunt for the
> preempt problem, I guess.

Ok, if it still fails (I'll have to wait until this afternoon for the
true test - dpkg breaks it everytime), I'll check out preempt.

Thanks,
--
Rick Nelson
<Knghtbrd> learn to love Window Maker.
<Knghtbrd> a little NeXTStep is good for the soul.

2004-09-08 23:09:12

by Richard A Nelson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Wed, 8 Sep 2004, Richard A Nelson wrote:

> On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:
>
> > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
> >
> > > > Unable to handle kernel paging request at virtual address 6b6b6b93
> > > > ...
> > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
> > >
> > > This might have been caused by a fishy latency-reduction patch. I today
> > > dropped that patch so could you please test next -mm and let me know?
> >
> > That, or preempt. If the next -mm still breaks, time to hunt for the
> > preempt problem, I guess.
>
> Ok, if it still fails (I'll have to wait until this afternoon for the
> true test - dpkg breaks it everytime), I'll check out preempt.

Well, it looks like backing out the patch was sufficient, I've made it
through the torture that is a dpkg install (70+meg).

So we needn't (at this time) look to preempt.

--
Rick Nelson
<LackOfKan> What are 'bots'?
<``Erik> rsg is a bot, not a human, not a human usable client, just a bot.
<``Erik> about the same as a quake bot, except irc bots are (usually)
built to help, not shoot your ass full of holes

2004-09-08 23:16:36

by Lee Revell

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Wed, 2004-09-08 at 19:07, Richard A Nelson wrote:
> On Wed, 8 Sep 2004, Richard A Nelson wrote:
>
> > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:
> >
> > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
> > >
> > > > > Unable to handle kernel paging request at virtual address 6b6b6b93
> > > > > ...
> > > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
> > > >
> > > > This might have been caused by a fishy latency-reduction patch. I today
> > > > dropped that patch so could you please test next -mm and let me know?
> > >
> > > That, or preempt. If the next -mm still breaks, time to hunt for the
> > > preempt problem, I guess.
> >
> > Ok, if it still fails (I'll have to wait until this afternoon for the
> > true test - dpkg breaks it everytime), I'll check out preempt.
>
> Well, it looks like backing out the patch was sufficient, I've made it
> through the torture that is a dpkg install (70+meg).
>
> So we needn't (at this time) look to preempt.

Hmm, I have been running this patch for weeks as part of the voluntary
preemption patches, and put it through every torture test I can think
of, with nary an Oops. None of the other VP testers have reported
problems either. Maybe this is some interaction between that patch and
something else in -mm.

Lee

2004-09-08 23:56:58

by Richard A Nelson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Wed, 8 Sep 2004, Lee Revell wrote:

> On Wed, 2004-09-08 at 19:07, Richard A Nelson wrote:
> > On Wed, 8 Sep 2004, Richard A Nelson wrote:
> >
> > > On Wed, 8 Sep 2004, Stephen C. Tweedie wrote:
> > >
> > > > On Wed, 2004-09-08 at 10:04, Andrew Morton wrote:
> > > >
> > > > > > Unable to handle kernel paging request at virtual address 6b6b6b93
> > > > > > ...
> > > > > > EIP: 0060:[__journal_clean_checkpoint_list+199/240] Not tainted VLI
> > > > >
> > > > > This might have been caused by a fishy latency-reduction patch. I today
> > > > > dropped that patch so could you please test next -mm and let me know?
> > > >
> > > > That, or preempt. If the next -mm still breaks, time to hunt for the
> > > > preempt problem, I guess.
> > >
> > > Ok, if it still fails (I'll have to wait until this afternoon for the
> > > true test - dpkg breaks it everytime), I'll check out preempt.
> >
> > Well, it looks like backing out the patch was sufficient, I've made it
> > through the torture that is a dpkg install (70+meg).
> >
> > So we needn't (at this time) look to preempt.
>
> Hmm, I have been running this patch for weeks as part of the voluntary
> preemption patches, and put it through every torture test I can think
> of, with nary an Oops. None of the other VP testers have reported
> problems either. Maybe this is some interaction between that patch and
> something else in -mm.

Interestingly, I notice Zwane had a very similiar oops, posted on the
7th: Oops in __journal_clean_checkpoint_list
He also had preempt enabled...

I've found upgrading my Debian system using dselect to be a *very* good
stress test of the filesystem...

If you have candidates, I'll try to test them - I've typically had no
problem reproducing the issue :)

--
Rick Nelson
* Equivalent code is available from RSA Data Security, Inc.
* This code has been tested against that, and is equivalent,
* except that you don't need to include two pages of legalese
* with every copy.
-- public domain MD5 source

2004-09-09 20:01:51

by Bongani Hlope

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Wed, 8 Sep 2004 16:51:35 -0700 (PDT)
Richard A Nelson <[email protected]> wrote:

> On Wed, 8 Sep 2004, Lee Revell wrote:

8<

> >
> > Hmm, I have been running this patch for weeks as part of the voluntary
> > preemption patches, and put it through every torture test I can think
> > of, with nary an Oops. None of the other VP testers have reported
> > problems either. Maybe this is some interaction between that patch and
> > something else in -mm.
>
> Interestingly, I notice Zwane had a very similiar oops, posted on the
> 7th: Oops in __journal_clean_checkpoint_list
> He also had preempt enabled...
>
> I've found upgrading my Debian system using dselect to be a *very* good
> stress test of the filesystem...
>
> If you have candidates, I'll try to test them - I've typically had no
> problem reproducing the issue :)
>

Ok it seem I'm not the only one. Ive bee trying to find this for a while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried mm[13] ). I was only able to capture the Oops this morning (pen and paper) I also have preempt enabled. This only happens on my PII though (Mandrake cooker updates and kernel compiles), my dual opteron has been running this since last night without any problems (gentoo sync and kernel compile), also with preempt


Attachments:
(No filename) (1.26 kB)
(No filename) (189.00 B)
Download all attachments

2004-09-09 20:25:27

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

Hi,

On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote:

> Ok it seem I'm not the only one. Ive bee trying to find this for a
> while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried
> mm[13] ). I was only able to capture the Oops this morning (pen and
> paper) I also have preempt enabled. This only happens on my PII though
> (Mandrake cooker updates and kernel compiles), my dual opteron has
> been running this since last night without any problems (gentoo sync
> and kernel compile), also with preempt

The journal_clean_checkpoint_list-latency-fix.patch was added in
2.6.9rc1-mm2 and is still there in mm4, so your problem is also
consistent with a bug in that patch; could you try backing that one diff
out and seeing if it fixes it for you too?

Thanks,
Stephen

2004-09-09 20:33:17

by Lee Revell

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On Thu, 2004-09-09 at 16:20, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote:
>
> > Ok it seem I'm not the only one. Ive bee trying to find this for a
> > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried
> > mm[13] ). I was only able to capture the Oops this morning (pen and
> > paper) I also have preempt enabled. This only happens on my PII though
> > (Mandrake cooker updates and kernel compiles), my dual opteron has
> > been running this since last night without any problems (gentoo sync
> > and kernel compile), also with preempt
>
> The journal_clean_checkpoint_list-latency-fix.patch was added in
> 2.6.9rc1-mm2 and is still there in mm4, so your problem is also
> consistent with a bug in that patch; could you try backing that one diff
> out and seeing if it fixes it for you too?
>

This is not in fact the same journal_clean_checkpoint latency fix that
is in the VP patches, looks like that one is just a simple lock break.
So, disregard my previous comment, all the evidence does in fact point
to journal_clean_checkpoint_list-latency-fix.patch.

Lee

2004-09-09 20:37:33

by Bongani Hlope

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4 kjournald oops (repeatable)

On 09 Sep 2004 21:20:40 +0100
"Stephen C. Tweedie" <[email protected]> wrote:

> Hi,
>
> On Thu, 2004-09-09 at 22:56, Bongani Hlope wrote:
>
> > Ok it seem I'm not the only one. Ive bee trying to find this for a
> > while. It seems to happen on 2.6.9rc1-mm[24] kernels (I haven't tried
> > mm[13] ). I was only able to capture the Oops this morning (pen and
> > paper) I also have preempt enabled. This only happens on my PII though
> > (Mandrake cooker updates and kernel compiles), my dual opteron has
> > been running this since last night without any problems (gentoo sync
> > and kernel compile), also with preempt
>
> The journal_clean_checkpoint_list-latency-fix.patch was added in
> 2.6.9rc1-mm2 and is still there in mm4, so your problem is also
> consistent with a bug in that patch; could you try backing that one diff
> out and seeing if it fixes it for you too?
>
> Thanks,
> Stephen
>

Busy compiling, I'll let you know how it goes.
Thanx


Attachments:
(No filename) (957.00 B)
(No filename) (189.00 B)
Download all attachments