2006-09-08 01:21:47

by Miles Lane

[permalink] [raw]
Subject: 2.6.18-rc5-git1 + "ieee1394: nodemgr" patches -- BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000

I tried testing the patches from
http://groups.google.com/group/linux.kernel/browse_thread/thread/e25d2d810b7cf9cb
applied to 2.6.18-rc5-git1. Things went pretty well, until I ran
"pccardctl eject" and then popped out the Firewire card.

ieee1394: Node changed: 1-02:1023 -> 1-00:1023
ieee1394: Node suspended: ID:BUS[1-00:1023] GUID[0080880002103eae]
ieee1394: Node suspended: ID:BUS[1-01:1023] GUID[0090a950000b2255]
pccard: card ejected from slot 0
ieee1394: Node removed: ID:BUS[1-00:1023] GUID[0080880002103eae]
PM: Removing info for ieee1394:0080880002103eae-0
PM: Removing info for ieee1394:0080880002103eae
ieee1394: Node removed: ID:BUS[1-01:1023] GUID[0090a950000b2255]
PM: Removing info for ieee1394:0090a950000b2255-0
PM: Removing info for ieee1394:0090a950000b2255
ieee1394: Node removed: ID:BUS[1-00:1023] GUID[0090a94000007475]
PM: Removing info for ieee1394:0090a94000007475-0
PM: Removing info for ieee1394:0090a94000007475
BUG: unable to handle kernel NULL pointer dereference at virtual
address 00000000
printing eip:
f955b309
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: dv1394 raw1394 binfmt_misc apm i915 drm ipv6
speedstep_centrino freq_table cpufreq_powersave cpufreq_performance
cpufreq_ondemand cpufreq_conservative video thermal processor fan
button battery ac nls_ascii nls_cp437 vfat fat nls_utf8 ntfs nls_base
sr_mod sbp2 scsi_mod parport_pc lp parport 8139cp pcmcia 8139too
ipw2200 sdhci mmc_core ohci1394 ieee1394 yenta_socket rsrc_nonstatic
pcmcia_core mii snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss
snd_mixer_oss ide_cd snd_pcm snd_timer cdrom psmouse shpchp
pci_hotplug snd soundcore snd_page_alloc ehci_hcd uhci_hcd intel_agp
agpgart usbcore rtc evdev
CPU: 0
EIP: 0060:[<f955b309>] Not tainted VLI
EFLAGS: 00010282 (2.6.18-rc5-git1 #4)
EIP is at dv1394_remove_host+0x17/0xad [dv1394]
eax: f91ac0f4 ebx: 00000001 ecx: 00000000 edx: f955b2f2
esi: 00000000 edi: f955c4d9 ebp: f955d980 esp: eab03e74
ds: 007b es: 007b ss: 0068
Process pccardctl (pid: 7111, ti=eab02000 task=f0a02ab0 task.ti=eab02000)
Stack: f955d980 ed5c4000 ed5c4000 f91788c2 00000000 f955d980 ed5c4000 f91310cc
f7c0b448 f9178945 ed5c4000 ed5c5d48 f9177e65 ed5c5f64 f912c9f2 f52ae800
f52ae848 f91310cc c10c5d24 f52ae8b0 c111dcbd f52ae848 f52ae848 c11f4aa0
Call Trace:
[<f91788c2>] __unregister_host+0x17/0x79 [ieee1394]
[<f9178945>] highlevel_remove_host+0x21/0x42 [ieee1394]
[<f9177e65>] hpsb_remove_host+0x37/0x56 [ieee1394]
[<f912c9f2>] ohci1394_pci_remove+0x41/0x1cd [ohci1394]
[<c10c5d24>] pci_device_remove+0x16/0x28
[<c111dcbd>] __device_release_driver+0x5a/0x72
[<c111de8f>] device_release_driver+0x1b/0x29
[<c111d705>] bus_remove_device+0x78/0x8a
[<c111c8a7>] device_del+0xe9/0x11a
[<c111c8e0>] device_unregister+0x8/0x10
[<c10c3ee5>] pci_remove_bus_device+0x39/0xcf
[<c10c3f95>] pci_remove_behind_bridge+0x1a/0x2d
[<f910d5ae>] socket_shutdown+0x89/0xdd [pcmcia_core]
[<f910d675>] pcmcia_eject_card+0x56/0x65 [pcmcia_core]
[<f9110070>] pccard_store_eject+0x19/0x20 [pcmcia_core]
[<c111e2e7>] class_device_attr_store+0x1b/0x1f
[<c1075495>] sysfs_write_file+0x97/0xbe
[<c1044a48>] vfs_write+0xa6/0x14b
[<c10452d4>] sys_write+0x3c/0x63
[<c10029a5>] sysenter_past_esp+0x56/0x79
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79
Leftover inexact backtrace:
Code: c2 ff c7 87 90 01 00 00 00 00 00 00 83 c4 10 5b 5e 5f 5d c3 57
56 53 8b 98 44 1d 00 00 8b 80 3c 1d 00 00 8b 70 04 bf d9 c4 55 f9 <ac>
ae 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 75 7e 9c
EIP: [<f955b309>] dv1394_remove_host+0x17/0xad [dv1394] SS:ESP 0068:eab03e74


2006-09-08 07:21:52

by Stefan Richter

[permalink] [raw]
Subject: Re: 2.6.18-rc5-git1 + "ieee1394: nodemgr" patches -- BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000

Miles Lane wrote:
...
> Things went pretty well, until I ran
> "pccardctl eject" and then popped out the Firewire card.
...
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000
...
> EIP is at dv1394_remove_host+0x17/0xad [dv1394]
...
> Call Trace:
> [<f91788c2>] __unregister_host+0x17/0x79 [ieee1394]
> [<f9178945>] highlevel_remove_host+0x21/0x42 [ieee1394]
> [<f9177e65>] hpsb_remove_host+0x37/0x56 [ieee1394]
> [<f912c9f2>] ohci1394_pci_remove+0x41/0x1cd [ohci1394]
> [<c10c5d24>] pci_device_remove+0x16/0x28
> [<c111dcbd>] __device_release_driver+0x5a/0x72
> [<c111de8f>] device_release_driver+0x1b/0x29
> [<c111d705>] bus_remove_device+0x78/0x8a
> [<c111c8a7>] device_del+0xe9/0x11a
> [<c111c8e0>] device_unregister+0x8/0x10
> [<c10c3ee5>] pci_remove_bus_device+0x39/0xcf
> [<c10c3f95>] pci_remove_behind_bridge+0x1a/0x2d
> [<f910d5ae>] socket_shutdown+0x89/0xdd [pcmcia_core]
> [<f910d675>] pcmcia_eject_card+0x56/0x65 [pcmcia_core]
...

Looks like the last word on
http://bugzilla.kernel.org/show_bug.cgi?id=2228 isn't spoken. Maybe the
bug can be fixed in dv1394 itself, or maybe we need to rework the
ieee1394 core's *_remove_host sequence.

Checking the 1394 driver stack's conduct during card hot-ejection is in
my long-term to-do list. Hopefully someone else can look at it sooner.
But I suggest you open a new bugzilla bug so we don't lose track.

I suppose the temporary workaround is to unload dv1394 before card ejection.

Thanks for this and the previous reports,
--
Stefan Richter
-=====-=-==- =--= -=---
http://arcgraph.de/sr/

2006-09-08 07:56:55

by Miles Lane

[permalink] [raw]
Subject: Re: 2.6.18-rc5-git1 + "ieee1394: nodemgr" patches -- BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000

On 9/8/06, Stefan Richter <[email protected]> wrote:
> Miles Lane wrote:
> ...
> > Things went pretty well, until I ran
> > "pccardctl eject" and then popped out the Firewire card.
> ...
> > BUG: unable to handle kernel NULL pointer dereference at virtual
> > address 00000000
> ...
> > EIP is at dv1394_remove_host+0x17/0xad [dv1394]
> ...
> > Call Trace:
> > [<f91788c2>] __unregister_host+0x17/0x79 [ieee1394]
> > [<f9178945>] highlevel_remove_host+0x21/0x42 [ieee1394]
> > [<f9177e65>] hpsb_remove_host+0x37/0x56 [ieee1394]
> > [<f912c9f2>] ohci1394_pci_remove+0x41/0x1cd [ohci1394]
> > [<c10c5d24>] pci_device_remove+0x16/0x28
> > [<c111dcbd>] __device_release_driver+0x5a/0x72
> > [<c111de8f>] device_release_driver+0x1b/0x29
> > [<c111d705>] bus_remove_device+0x78/0x8a
> > [<c111c8a7>] device_del+0xe9/0x11a
> > [<c111c8e0>] device_unregister+0x8/0x10
> > [<c10c3ee5>] pci_remove_bus_device+0x39/0xcf
> > [<c10c3f95>] pci_remove_behind_bridge+0x1a/0x2d
> > [<f910d5ae>] socket_shutdown+0x89/0xdd [pcmcia_core]
> > [<f910d675>] pcmcia_eject_card+0x56/0x65 [pcmcia_core]
> ...
>
> Looks like the last word on
> http://bugzilla.kernel.org/show_bug.cgi?id=2228 isn't spoken. Maybe the
> bug can be fixed in dv1394 itself, or maybe we need to rework the
> ieee1394 core's *_remove_host sequence.
>
> Checking the 1394 driver stack's conduct during card hot-ejection is in
> my long-term to-do list. Hopefully someone else can look at it sooner.
> But I suggest you open a new bugzilla bug so we don't lose track.
>
> I suppose the temporary workaround is to unload dv1394 before card ejection.

Thanks,

I created the bug report: http://bugzilla.kernel.org/show_bug.cgi?id=7121

Best wishes,
Miles