2007-01-31 10:57:00

by Mike Galbraith

[permalink] [raw]
Subject: 2.6.19.2 oops after resume from ram (corruption?)

Greetings,

I received the below upon first poke of firefox icon after a resume.

See attachment (evolution refuses to inline it).

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
printing eip:
c109a7cf
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss eeprom snd_seq_midi snd_seq_midi_event snd_seq edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device ohci1394 ieee1394 prism54 snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp agpgart i2c_i801 sd_mod fan thermal processor
CPU: 0
EIP: 0060:[<c109a7cf>] Not tainted VLI
EFLAGS: 00010246 (2.6.19.2-smp #90)
EIP is at inotify_inode_queue_event+0x51/0xd1
eax: c1599288 ebx: 00000fc6 ecx: 00000000 edx: 00000002
esi: c1599280 edi: fffffffa ebp: ef38bf58 esp: ef38bf28
ds: 007b es: 007b ss: 0068
Process klauncher (pid: 6283, ti=ef38b000 task=dff91030 task.ti=ef38b000)
Stack: dfc998c0 c1e4f1c0 ef38bf58 00000000 00000020 f346ac68 00000000 0000000c
f346ac60 dba1cd50 f346cf70 f346ab28 ef38bf80 c109aea9 dba1cdb4 ec421998
00000000 00000020 dba1cd58 00000020 ea829000 0000000c ef38bfa8 c1070f3b
Call Trace:
[<c109aea9>] inotify_dentry_parent_queue_event+0x69/0xa0
[<c1070f3b>] do_sys_open+0x83/0xc5
[<c1070fb5>] sys_open+0x1c/0x1e
[<c10030d9>] sysenter_past_esp+0x56/0x79
[<b7f9f410>] 0xb7f9f410
=======================
Code: 5e 5f 5d c3 8d 83 40 01 00 00 89 45 e4 e8 5a ee 2f 00 8b b3 38 01 00 00 83 ee 08 8b 56 08 8d 46 08 39 45 f0 74 69 8d 7a f8 eb 10 <8b> 57 08 8d 47 08 3b 45 f0 74 59 89 fe 8d 7a f8 8b 5e 20 85 5d
EIP: [<c109a7cf>] inotify_inode_queue_event+0x51/0xd1 SS:ESP 0068:ef38bf28



Attachments:
2.6.19.2-oops (7.33 kB)

2007-01-31 21:42:08

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.19.2 oops after resume from ram (corruption?)

Hi.

On Wed, 2007-01-31 at 11:56 +0100, Mike Galbraith wrote:
> Greetings,
>
> I received the below upon first poke of firefox icon after a resume.

Are you able to reproduce it reliably? Failing that, could you try
enabling some the kernel configuration options that help with debugging
memory corruption (slab corruption checking in particular will probably
be the most useful thing here).

Regards,

Nigel


2007-02-01 05:30:46

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.19.2 oops after resume from ram (corruption?)

On Thu, 2007-02-01 at 08:42 +1100, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 2007-01-31 at 11:56 +0100, Mike Galbraith wrote:
> > Greetings,
> >
> > I received the below upon first poke of firefox icon after a resume.
>
> Are you able to reproduce it reliably? Failing that, could you try
> enabling some the kernel configuration options that help with debugging
> memory corruption (slab corruption checking in particular will probably
> be the most useful thing here).

No, it's a never before seen event. That said, I have had a couple of
dead box after resume events with other kernels in the last few months,
so I may have had corruption of a more deadly variety. Unfortunately,
when I'm resuming, my serial console box is almost guaranteed to be off.

Rebuilding this particular kernel with slab debugging would probably be
a waste of time since stable kernels get very little runtime here, but
I'll re-add it to my config for test kernels just in case a survivable
event should happen.

-Mike

2007-02-01 05:39:47

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.19.2 oops after resume from ram (corruption?)

Hi.

On Thu, 2007-02-01 at 06:30 +0100, Mike Galbraith wrote:
> Rebuilding this particular kernel with slab debugging would probably be
> a waste of time since stable kernels get very little runtime here, but
> I'll re-add it to my config for test kernels just in case a survivable
> event should happen.

Thanks, Mike.

Regards,

Nigel

2007-02-04 15:43:44

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.19.2 oops after resume from ram (corruption?)

Hi!

> I received the below upon first poke of firefox icon after a resume.
>
> See attachment (evolution refuses to inline it).

Is it repeatable? You may want to try with smaller set of
modules... prism54 is quite unusual...

Pavel

> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
> printing eip:
> c109a7cf
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT SMP
> Modules linked in: xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss eeprom snd_seq_midi snd_seq_midi_event snd_seq edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device ohci1394 ieee1394 prism54 snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp agpgart i2c_i801 sd_mod fan thermal processor
> CPU: 0


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-02-05 06:55:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.19.2 oops after resume from ram (corruption?)

On Sun, 2007-02-04 at 16:43 +0100, Pavel Machek wrote:
> Hi!
>
> > I received the below upon first poke of firefox icon after a resume.
> >
> > See attachment (evolution refuses to inline it).
>
> Is it repeatable? You may want to try with smaller set of
> modules... prism54 is quite unusual...

Nope, rogue event... so far. Prism54 is my wlan card. (useless dang
thing, can't convince it to do encryption)

-Mike

2007-02-05 07:19:35

by Luming Yu

[permalink] [raw]
Subject: Re: 2.6.19.2 oops after resume from ram (corruption?)

On 2/1/07, Mike Galbraith <[email protected]> wrote:
> On Thu, 2007-02-01 at 08:42 +1100, Nigel Cunningham wrote:
> > Hi.
> >
> > On Wed, 2007-01-31 at 11:56 +0100, Mike Galbraith wrote:
> > > Greetings,
> > >
> > > I received the below upon first poke of firefox icon after a resume.
> >
> > Are you able to reproduce it reliably? Failing that, could you try
> > enabling some the kernel configuration options that help with debugging
> > memory corruption (slab corruption checking in particular will probably
> > be the most useful thing here).
>
> No, it's a never before seen event. That said, I have had a couple of
> dead box after resume events with other kernels in the last few months,
> so I may have had corruption of a more deadly variety. Unfortunately,
> when I'm resuming, my serial console box is almost guaranteed to be off.

if you have dead serial console, or no serial console at all on you
laptop. Probably you can try the alternative of firewire with
http://www.suse.de/~bk/firewire/
Ah, linux S3 resume is still a problem.