2006-11-13 08:15:10

by Martin Lorenz

[permalink] [raw]
Subject: paging request BUG in 2.6.19-rc5 on resume - X60s

Hallo again,

here is another one:

I reported a black screen on resume with my latest kernel build earlyer. But
this was not reproducible. Only occured once.

BUT I suspended with the ipw3945 module loaded once again now and got a BUG
report in the log instead of a black screen.

I only see this when ipw3945 is loaded.

[226156.057000] BUG: unable to handle kernel paging request at virtual
address 756e6567
[226156.057000] printing eip:
[226156.057000] c016ffb7
[226156.057000] *pde = 00000000
[226156.057000] Oops: 0000 [#1]
[226156.057000] SMP
[226156.057000] Modules linked in: tun ipw3945 ieee80211 ieee80211_crypt
nls_iso8859_1 nls_cp437 vfat fat usb_storage snd_hda_intel snd_hda_codec
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc
vmnet(P) vmmon(P) i915 binfmt_misc nfs nfsd exportfs lockd nfs_acl sunrpc
cpufreq_ondemand container video thermal i2c_ec fan dock button battery ac
mmc_block speedstep_centrino freq_table processor ibm_acpi sbp2 nvram
eth1394 irtty_sir sir_dev pcmcia ehci_hcd uhci_hcd firmware_class nsc_ircc
generic usbcore psmouse irda ohci1394 ieee1394 sdhci ide_core yenta_socket
rsrc_nonstatic pcmcia_core serio_raw crc_ccitt pcspkr mmc_core evdev
[226156.058000] CPU: 1
[226156.058000] EIP: 0060:[<c016ffb7>] Tainted: P VLI
[226156.058000] EFLAGS: 00010282
(2.6.19-rc5+ieee80211+e1000-45.3+1909-g6a4abeae-dirty #1)
[226156.058000] EIP is at iput+0xd/0x66
[226156.058000] eax: 756e6547 ebx: c0416e10 ecx: c016ee14 edx:
c55c7114
[226156.058000] esi: c046f1c0 edi: c046f21c ebp: f7feb800 esp:
dcfbfde4
[226156.058000] ds: 007b es: 007b ss: 0068
[226156.058000] Process mount (pid: 22076, ti=dcfbe000 task=f7df1550
task.ti=dcfbe000)
[226156.058000] Stack: c046f21c c016ef85 c046f244 c046f1c0 c016f2e0 fffffff3
00000000 f7feb800
[226156.058000] c8b73000 c01619bb 00000000 f7feb83c 00000000 f7feb800
00000000 c0172f49
[226156.058000] 00000000 c8b73000 00000000 e613a000 dcfb0000 00000444
00000020 0cf68720
[226156.058000] Call Trace:
[226156.058000] [<c016ef85>] prune_one_dentry+0x53/0x74
[226156.058000] [<c016f2e0>] shrink_dcache_sb+0x8f/0xb3
[226156.058000] [<c01619bb>] do_remount_sb+0x40/0x120
[226156.058000] [<c0172f49>] do_mount+0x1b0/0x66c
[226156.058000] [<c017347c>] sys_mount+0x77/0xb3
[226156.058000] [<c0102dc7>] syscall_call+0x7/0xb
[226156.058000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[226156.058000]
[226156.058000] Leftover inexact backtrace:
[226156.058000]
[226156.058000] =======================
[226156.058000] Code: ba 03 00 00 00 e9 ee fc fb ff 83 a0 2c 01 00 00 b7 e9
e0 ff ff ff e8 d1 3e 17 00 31 c0 c3 53 89 c3 85 c0 74 5d 8b 80 98 00 00 00
<8b> 40 20 83 bb 2c 01 00 00 20 75 08 0f 0b 5d 04 dc 61 30 c0 85
[226156.058000] EIP: [<c016ffb7>] iput+0xd/0x66 SS:ESP 0068:dcfbfde4
[226156.058000] <7>bridge-eth2: disabling the bridge
[226206.083000] bridge-eth2: down
[226206.190000] ACPI: PCI interrupt for device 0000:03:00.0 disabled
[226206.258000] ieee80211_crypt: unregistered algorithm 'NULL'

dmesg output and log is at http://www.lorenz.eu.org/~mlo/kernel/

http://www.lorenz.eu.org/~mlo/kernel/dmesg-2.6.19-rc5+ieee80211+e1000-45.3+1909-g6a4abeae-dirty-resume.out

http://www.lorenz.eu.org/~mlo/kernel/messages-2.6.19-rc5+ieee80211+e1000-45.3+1909-g6a4abeae-dirty-resume
this one includes a SysRq-t output


gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107


2006-11-13 13:54:19

by Mike Galbraith

[permalink] [raw]
Subject: Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Mon, 2006-11-13 at 09:11 +0100, Martin Lorenz wrote:
> Hallo again,
>
> here is another one:
>
> I reported a black screen on resume with my latest kernel build earlyer. But
> this was not reproducible. Only occured once.
>
> BUT I suspended with the ipw3945 module loaded once again now and got a BUG
> report in the log instead of a black screen.
>
> I only see this when ipw3945 is loaded.

Interesting oops... another one trying to dereference "Genu".

Repeatable? Repeatable without vmware modules ever having been loaded?

> [226156.057000] BUG: unable to handle kernel paging request at virtual
> address 756e6567
> [226156.057000] printing eip:
> [226156.057000] c016ffb7
> [226156.057000] *pde = 00000000
> [226156.057000] Oops: 0000 [#1]
> [226156.057000] SMP
> [226156.057000] Modules linked in: tun ipw3945 ieee80211 ieee80211_crypt
> nls_iso8859_1 nls_cp437 vfat fat usb_storage snd_hda_intel snd_hda_codec
> snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc
> vmnet(P) vmmon(P) i915 binfmt_misc nfs nfsd exportfs lockd nfs_acl sunrpc
> cpufreq_ondemand container video thermal i2c_ec fan dock button battery ac
> mmc_block speedstep_centrino freq_table processor ibm_acpi sbp2 nvram
> eth1394 irtty_sir sir_dev pcmcia ehci_hcd uhci_hcd firmware_class nsc_ircc
> generic usbcore psmouse irda ohci1394 ieee1394 sdhci ide_core yenta_socket
> rsrc_nonstatic pcmcia_core serio_raw crc_ccitt pcspkr mmc_core evdev
> [226156.058000] CPU: 1
> [226156.058000] EIP: 0060:[<c016ffb7>] Tainted: P VLI
> [226156.058000] EFLAGS: 00010282
> (2.6.19-rc5+ieee80211+e1000-45.3+1909-g6a4abeae-dirty #1)
> [226156.058000] EIP is at iput+0xd/0x66
> [226156.058000] eax: 756e6547 ebx: c0416e10 ecx: c016ee14 edx:
> c55c7114
> [226156.058000] esi: c046f1c0 edi: c046f21c ebp: f7feb800 esp:
> dcfbfde4
> [226156.058000] ds: 007b es: 007b ss: 0068
> [226156.058000] Process mount (pid: 22076, ti=dcfbe000 task=f7df1550
> task.ti=dcfbe000)
> [226156.058000] Stack: c046f21c c016ef85 c046f244 c046f1c0 c016f2e0 fffffff3
> 00000000 f7feb800
> [226156.058000] c8b73000 c01619bb 00000000 f7feb83c 00000000 f7feb800
> 00000000 c0172f49
> [226156.058000] 00000000 c8b73000 00000000 e613a000 dcfb0000 00000444
> 00000020 0cf68720
> [226156.058000] Call Trace:
> [226156.058000] [<c016ef85>] prune_one_dentry+0x53/0x74
> [226156.058000] [<c016f2e0>] shrink_dcache_sb+0x8f/0xb3
> [226156.058000] [<c01619bb>] do_remount_sb+0x40/0x120
> [226156.058000] [<c0172f49>] do_mount+0x1b0/0x66c
> [226156.058000] [<c017347c>] sys_mount+0x77/0xb3
> [226156.058000] [<c0102dc7>] syscall_call+0x7/0xb
> [226156.058000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
> [226156.058000]
> [226156.058000] Leftover inexact backtrace:
> [226156.058000]
> [226156.058000] =======================
> [226156.058000] Code: ba 03 00 00 00 e9 ee fc fb ff 83 a0 2c 01 00 00 b7 e9
> e0 ff ff ff e8 d1 3e 17 00 31 c0 c3 53 89 c3 85 c0 74 5d 8b 80 98 00 00 00
> <8b> 40 20 83 bb 2c 01 00 00 20 75 08 0f 0b 5d 04 dc 61 30 c0 85

Per ksymoops, that code is:
0: ba 03 00 00 00 mov $0x3,%edx
5: e9 ee fc fb ff jmp fffbfcf8 <_EIP+0xfffbfcf8>
a: 83 a0 2c 01 00 00 b7 andl $0xffffffb7,0x12c(%eax)
11: e9 00 00 00 00 jmp 16 <_EIP+0x16>

There is no such andl with an offset of 0x12c and that mask (I_LOCK|
I_NEW?) anywhere in my kernel or modules. How about yours?

-Mike

2006-11-13 13:56:37

by Sven-Haegar Koch

[permalink] [raw]
Subject: Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Mon, 13 Nov 2006, Martin Lorenz wrote:

> here is another one:
>
> I reported a black screen on resume with my latest kernel build earlyer. But
> this was not reproducible. Only occured once.
>
> BUT I suspended with the ipw3945 module loaded once again now and got a BUG
> report in the log instead of a black screen.

I get nearly the same oopses on my thinkpad t60, too.
Always only after resuming (never after a clean reboot), and after the
(otherwise successfull) resume it can take hours until the oops shows.

Did not report this problem anywhere yet, because I am using a heavily
modified 2.6.17 based on the ubuntu edgy tree plus lots of addon patches
(suspend2, linux-vserver, loop-aes) and most of the time with the evil
fglrx loaded, too (to get any x11 outputs at all).

> I only see this when ipw3945 is loaded.

Will try to shutdown wireless and unload the module before the next
suspend, and see if it helps.

c'ya
sven

--

The Internet treats censorship as a routing problem, and routes around it.
(John Gilmore on http://www.cygnus.com/~gnu/)

2006-11-13 14:39:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Monday, 13 November 2006 09:11, Martin Lorenz wrote:
> Hallo again,
>
> here is another one:
>
> I reported a black screen on resume with my latest kernel build earlyer. But
> this was not reproducible. Only occured once.

Is this a resume from disk? If so, which kernel are you using?

Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-11-13 19:30:44

by Martin Lorenz

[permalink] [raw]
Subject: Re: [ltp] Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Mon, Nov 13, 2006 at 03:37:01PM +0100, Rafael J. Wysocki wrote:
> On Monday, 13 November 2006 09:11, Martin Lorenz wrote:
> > Hallo again,
> >
> > here is another one:
> >
> > I reported a black screen on resume with my latest kernel build earlyer. But
> > this was not reproducible. Only occured once.
>
> Is this a resume from disk? If so, which kernel are you using?
>

no from suspend to ram

Linux gimli 2.6.19-rc5+ieee80211+e1000-45.3+1909-g6a4abeae-dirty #1 SMP Wed
Nov 8 20:14:31 CET 2006 i686 GNU/Linux


gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107

2006-11-13 19:35:24

by Martin Lorenz

[permalink] [raw]
Subject: Re: [ltp] Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Mon, Nov 13, 2006 at 02:55:18PM +0100, Mike Galbraith wrote:
>
> Per ksymoops, that code is:
> 0: ba 03 00 00 00 mov $0x3,%edx
> 5: e9 ee fc fb ff jmp fffbfcf8 <_EIP+0xfffbfcf8>
> a: 83 a0 2c 01 00 00 b7 andl $0xffffffb7,0x12c(%eax)
> 11: e9 00 00 00 00 jmp 16 <_EIP+0x16>
>
> There is no such andl with an offset of 0x12c and that mask (I_LOCK|
> I_NEW?) anywhere in my kernel or modules. How about yours?

$ objdump -D vmlinux | grep -5 'andl $0xffffffb7,0x12c'
c016ff87: 05 2c 01 00 00 add $0x12c,%eax
c016ff8c: ba 03 00 00 00 mov $0x3,%edx
c016ff91: e9 ee fc fb ff jmp c012fc84 <wake_up_bit>

c016ff96 <unlock_new_inode>:
c016ff96: 83 a0 2c 01 00 00 b7 andl $0xffffffb7,0x12c(%eax)
c016ff9d: e9 e0 ff ff ff jmp c016ff82 <wake_up_inode>

c016ffa2 <inode_wait>:
c016ffa2: e8 d1 3e 17 00 call c02e3e78 <schedule>
c016ffa7: 31 c0 xor %eax,%eax

gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107

2006-11-14 07:05:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: [ltp] Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Mon, 2006-11-13 at 20:34 +0100, Martin Lorenz wrote:

> c016ff96 <unlock_new_inode>:
> c016ff96: 83 a0 2c 01 00 00 b7 andl $0xffffffb7,0x12c(%eax)
> c016ff9d: e9 e0 ff ff ff jmp c016ff82 <wake_up_inode>

Ok, that's what I figured it had to be with that mask (though I can't
convince either of my compilers to produce that offset), so now we just
have to figure out how the heck it can get there and find a corrupted
pointer.

Can you enable frame-pointers, and capture another explosion? A more
complete trace might help. It would definitely help to reproduce
without the proprietary modules having ever been loaded.

-Mike

2006-11-14 07:10:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: [ltp] Re: paging request BUG in 2.6.19-rc5 on resume - X60s

On Mon, 2006-11-13 at 20:27 +0100, Martin Lorenz wrote:
> On Mon, Nov 13, 2006 at 03:37:01PM +0100, Rafael J. Wysocki wrote:
> > On Monday, 13 November 2006 09:11, Martin Lorenz wrote:
> > > Hallo again,
> > >
> > > here is another one:
> > >
> > > I reported a black screen on resume with my latest kernel build earlyer. But
> > > this was not reproducible. Only occured once.
> >
> > Is this a resume from disk? If so, which kernel are you using?
> >
>
> no from suspend to ram

Interesting. See http://lkml.org/lkml/2006/10/3/19

-Mike

2006-11-15 08:22:31

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: paging request BUG in 2.6.19-rc5 on resume - X60s

Martin Lorenz wrote:
> I only see this when ipw3945 is loaded.
>
> [226156.057000] BUG: unable to handle kernel paging request at virtual
> address 756e6567
>

OK, very bizarre. Another instance of this pattern:

1. Recent Core Duo Thinkpad (X60, T60, X60s)
2. tainting wireless driver loaded (ipw3945, madwifi)
3. fault at "Genu" somewhere in filesystem code
4. not long after a resume from ram (?)

Not exactly the same backtrace as before
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=208488
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=207658), but pretty
close.

The only things I can think of are:

1. ipw3945 and madwifi are sharing some 802.11 code, which splats
this pattern into memory for some reason
2. some firmware/smm bug which end up corrupting a register (?)
3. erm? anyone?


J