2012-02-08 00:30:23

by Naveen Goswamy

[permalink] [raw]
Subject: Kernel crashing on eject SD card

The details are here:


https://bugs.gentoo.org/show_bug.cgi?id=402433

Any ideas? Is this a known issue?

Thanks,

Naveen


2012-02-12 21:08:48

by Stefan Richter

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

On Feb 07 Naveen Goswamy wrote:
> The details are here:
>
>
> https://bugs.gentoo.org/show_bug.cgi?id=402433
>
> Any ideas? Is this a known issue?

It has been reported repeatedly, AFAICT without any progress so far.
http://marc.info/?l=linux-scsi&m=132388619710052

It's the old story; an udev helper opens a block device while it is being
torn down apparently. Block subsystem soils itself.

Here is the kernel log from Naveen's report at the Gentoo bug tracker, for
the convenience of the list subscribers:

Feb 6 10:51:09 speedy kernel: ieee80211 phy0: wl0: brcms_c_d11hdrs_mac80211: txop exceeded phylen 130/256 dur 1546/1504
Feb 6 10:54:39 speedy kernel: usb 1-1.6: new high-speed USB device number 4 using ehci_hcd
Feb 6 10:54:39 speedy kernel: usb 1-1.6: New USB device found, idVendor=0bda, idProduct=0159
Feb 6 10:54:39 speedy kernel: usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Feb 6 10:54:39 speedy kernel: usb 1-1.6: Product: USB2.0-CRW
Feb 6 10:54:39 speedy kernel: usb 1-1.6: Manufacturer: Generic
Feb 6 10:54:39 speedy kernel: usb 1-1.6: SerialNumber: 20071114173400000
Feb 6 10:54:39 speedy kernel: scsi7 : usb-storage 1-1.6:1.0
Feb 6 10:54:40 speedy kernel: scsi 7:0:0:0: Direct-Access Generic- Multi-Card 1.00 PQ: 0 ANSI: 0 CCS
Feb 6 10:54:40 speedy kernel: sd 7:0:0:0: Attached scsi generic sg2 type 0
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] 3862528 512-byte logical blocks: (1.97 GB/1.84 GiB)
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Write Protect is off
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Mode Sense: 03 00 00 00
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] No Caching mode page present
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Assuming drive cache: write through
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] No Caching mode page present
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Assuming drive cache: write through
Feb 6 10:54:41 speedy kernel: sdb: sdb1
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] No Caching mode page present
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Assuming drive cache: write through
Feb 6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Attached SCSI removable disk
Feb 6 10:58:08 speedy kernel: usb 1-1.6: USB disconnect, device number 4
Feb 6 10:58:08 speedy kernel: scsi 7:0:0:0: killing request
Feb 6 10:58:08 speedy kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Feb 6 10:58:08 speedy kernel: IP: [<ffffffff8135b1c7>] sd_revalidate_disk+0x1a/0x16ee
Feb 6 10:58:08 speedy kernel: PGD 2209b0067 PUD 2209b1067 PMD 0
Feb 6 10:58:08 speedy kernel: Oops: 0000 [#1] SMP
Feb 6 10:58:08 speedy kernel: CPU 3
Feb 6 10:58:08 speedy kernel: Modules linked in: aes_x86_64 aes_generic ipt_REJECT iptable_mangle iptable_nat nf_nat iptable_filter ip_tables ipv6 dm_mod vboxnetadp(O) vboxnetflt(O) vboxdrv(O) uvcvideo videodev v4l2_compat_ioctl32 usb_storage arc4 brcmsmac snd_hda_codec_hdmi snd_hda_codec_idt mac80211 snd_hda_intel brcmutil snd_hda_codec dell_wmi ehci_hcd r8169 usbcore snd_pcm cfg80211 snd_timer dcdbas firmware_class snd usb_common sparse_keymap soundcore rtc rfkill sg wmi snd_page_alloc crc8 cordic
Feb 6 10:58:08 speedy kernel:
Feb 6 10:58:08 speedy kernel: Pid: 2434, comm: udisks-daemon Tainted: G O 3.2.1-gentoo-r2_MINE_V01 #1 Dell Inc. Vostro 3400/07MJFM
Feb 6 10:58:08 speedy kernel: RIP: 0010:[<ffffffff8135b1c7>] [<ffffffff8135b1c7>] sd_revalidate_disk+0x1a/0x16ee
Feb 6 10:58:08 speedy kernel: RSP: 0018:ffff8802209cfb08 EFLAGS: 00010292
Feb 6 10:58:08 speedy kernel: RAX: ffffffff8135b1ad RBX: 0000000000000000 RCX: 0000000000000002
Feb 6 10:58:08 speedy kernel: RDX: 0000000000000002 RSI: 0000000800000000 RDI: ffff880231668800
Feb 6 10:58:08 speedy kernel: RBP: ffff880231668800 R08: ffff8802315447a0 R09: ffffffff81852e48
Feb 6 10:58:08 speedy kernel: R10: 0000000000000000 R11: 0000000000011e00 R12: ffff880231668800
Feb 6 10:58:08 speedy kernel: R13: ffff88021746acd8 R14: 0000000000000000 R15: ffff88021746acc0
Feb 6 10:58:08 speedy kernel: FS: 00007fc6dd3ab700(0000) GS:ffff88023bd80000(0000) knlGS:0000000000000000
Feb 6 10:58:08 speedy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 6 10:58:08 speedy kernel: CR2: 0000000000000008 CR3: 00000002209af000 CR4: 00000000000006e0
Feb 6 10:58:08 speedy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 6 10:58:08 speedy kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 6 10:58:08 speedy kernel: Process udisks-daemon (pid: 2434, threadinfo ffff8802209ce000, task ffff88023aac4110)
Feb 6 10:58:08 speedy kernel: Stack:
Feb 6 10:58:08 speedy kernel: ffffffff8103461a ffff880231668848 0000000000000000 ffff880231668800
Feb 6 10:58:08 speedy kernel: ffff88021746acd8 000000000000001d ffff88021746acc0 ffffffff810a0ec9
Feb 6 10:58:08 speedy kernel: ffff88021746acc0 ffff880231668800 0000000000000000 ffff88021746ad98
Feb 6 10:58:08 speedy kernel: Call Trace:
Feb 6 10:58:08 speedy kernel: [<ffffffff8103461a>] ? try_to_wake_up+0x200/0x200
Feb 6 10:58:08 speedy kernel: [<ffffffff810a0ec9>] ? get_super+0x1a/0x95
Feb 6 10:58:08 speedy kernel: [<ffffffff810b23d8>] ? iput+0x2b/0x17e
Feb 6 10:58:08 speedy kernel: [<ffffffff810eb3ce>] ? rescan_partitions+0xac/0x446
Feb 6 10:58:08 speedy kernel: [<ffffffff810c5410>] ? __blkdev_get+0x162/0x33f
Feb 6 10:58:08 speedy kernel: [<ffffffff810c588b>] ? blkdev_get+0x29e/0x29e
Feb 6 10:58:08 speedy kernel: [<ffffffff810c57ad>] ? blkdev_get+0x1c0/0x29e
Feb 6 10:58:08 speedy kernel: [<ffffffff810c588b>] ? blkdev_get+0x29e/0x29e
Feb 6 10:58:08 speedy kernel: [<ffffffff8109e03b>] ? __dentry_open.clone.14+0x16b/0x294
Feb 6 10:58:08 speedy kernel: [<ffffffff810aaacb>] ? do_last.clone.34+0x64e/0x662
Feb 6 10:58:08 speedy kernel: [<ffffffff810aabe1>] ? path_openat+0xcb/0x354
Feb 6 10:58:08 speedy kernel: [<ffffffff8133e3fc>] ? scsi_set_medium_removal+0x46/0x6b
Feb 6 10:58:08 speedy kernel: [<ffffffff8102c3b7>] ? ttwu_do_wakeup+0x11/0x86
Feb 6 10:58:08 speedy kernel: [<ffffffff810aaf45>] ? do_filp_open+0x2c/0x72
Feb 6 10:58:08 speedy kernel: [<ffffffff810b3fde>] ? alloc_fd+0x69/0x10f
Feb 6 10:58:08 speedy kernel: [<ffffffff8109ed2e>] ? do_sys_open+0x101/0x18f
Feb 6 10:58:08 speedy kernel: [<ffffffff81482b52>] ? system_call_fastpath+0x16/0x1b
Feb 6 10:58:08 speedy kernel: Code: ff ff 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 83 ec 78 48 8b 9f 50 02 00 00 48 89 7c 24 48 <48> 8b 43 08 48 89 44 24 28 8b 05 1a e2 7e 00 c1 e8 15 83 e0 07
Feb 6 10:58:08 speedy kernel: RIP [<ffffffff8135b1c7>] sd_revalidate_disk+0x1a/0x16ee
Feb 6 10:58:08 speedy kernel: RSP <ffff8802209cfb08>
Feb 6 10:58:08 speedy kernel: CR2: 0000000000000008
Feb 6 10:58:09 speedy kernel: ---[ end trace 2cb4da56c38cb030 ]---

--
Stefan Richter
-=====-===-- --=- -==--
http://arcgraph.de/sr/

2012-02-12 21:20:55

by Stefan Richter

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

On Feb 12 Stefan Richter wrote:
> Modules linked in: [...] vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [...]

Oh, could you try without virtualbox?

The debian bug report hints that kernel 3.2 /without virtualbox drivers/
seems to behave itself.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649735
--
Stefan Richter
-=====-===-- --=- -==--
http://arcgraph.de/sr/

2012-02-13 01:47:07

by Naveen Goswamy

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

I confirm that it behaves properly when virtualbox drivers are not loaded.

Cheers,

Naveen


Quoting Stefan Richter <[email protected]>:

> On Feb 12 Stefan Richter wrote:
> > Modules linked in: [...] vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [...]
>
> Oh, could you try without virtualbox?
>
> The debian bug report hints that kernel 3.2 /without virtualbox drivers/
> seems to behave itself.
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649735
> --
> Stefan Richter
> -=====-===-- --=- -==--
> http://arcgraph.de/sr/
>


2012-02-13 02:18:33

by Dave Jones

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

On Sun, Feb 12, 2012 at 10:20:27PM +0100, Stefan Richter wrote:
> On Feb 12 Stefan Richter wrote:
> > Modules linked in: [...] vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [...]
>
> Oh, could you try without virtualbox?
>
> The debian bug report hints that kernel 3.2 /without virtualbox drivers/
> seems to behave itself.
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649735

We've seen this a bunch of times in Fedora too.

Here's a report we've been duping similar bugs against
https://bugzilla.redhat.com/show_bug.cgi?id=754518

Some of them are using vbox/vmware, but there's a few in there that
haven't used either, so I think that might be a red herring.

Dave

2012-02-13 17:40:43

by Naveen Goswamy

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

Quoting Dave Jones <[email protected]>:


> Some of them are using vbox/vmware, but there's a few in there that
> haven't used either, so I think that might be a red herring.


You are correct Dave. There was a red-herring. I managed to experience the
same crash and burn again today, without vbox drivers. Here are the logs.


Feb 13 08:50:53 speedy kernel: scsi 6:0:0:0: killing request
Feb 13 08:50:53 speedy kernel: BUG: unable to handle kernel NULL pointer
dereference at 0000000000000008
Feb 13 08:50:53 speedy kernel: IP: [<ffffffff8135b798>]
sd_revalidate_disk+0x1a/0x16ee
Feb 13 08:50:53 speedy kernel: PGD 223493067 PUD 2234de067 PMD 0
Feb 13 08:50:53 speedy kernel: Oops: 0000 [#1] SMP
Feb 13 08:50:53 speedy kernel: CPU 2
Feb 13 08:50:53 speedy kernel: Modules linked in: aes_x86_64 aes_generic
ipt_REJECT iptable_mangle iptable_nat nf_nat iptable_filter ip_tables ipv6
dm_mod uvcvideo videodev v4l2_compat_ioctl32 usb_storage arc4 brcmsmac
snd_hda_codec_hdmi snd_hda_codec_idt mac80211 brcmutil snd_hda_intel
snd_hda_codec cfg80211 r8169 rfkill snd_pcm snd_timer dell_wmi snd
sparse_keymap ehci_hcd wmi firmware_class dcdbas crc8 soundcore rtc usbcore
snd_page_alloc sg cordic usb_common
Feb 13 08:50:53 speedy kernel:
Feb 13 08:50:53 speedy kernel: Pid: 2721, comm: udisks-daemon Not tainted
3.2.5-gentoo_MINE_V00 #1 Dell Inc. Vostro 3400/07MJFM
Feb 13 08:50:53 speedy kernel: RIP: 0010:[<ffffffff8135b798>]
[<ffffffff8135b798>] sd_revalidate_disk+0x1a/0x16ee
Feb 13 08:50:53 speedy kernel: RSP: 0018:ffff8802234ddb08 EFLAGS: 00010292
Feb 13 08:50:53 speedy kernel: RAX: ffffffff8135b77e RBX: 0000000000000000 RCX:
0000000000000002
Feb 13 08:50:53 speedy kernel: RDX: 0000000000000002 RSI: 0000000800000000 RDI:
ffff880231599000
Feb 13 08:50:53 speedy kernel: RBP: ffff880231599000 R08: ffff88023ab4f9a0 R09:
ffffffff81852ec8
Feb 13 08:50:53 speedy kernel: R10: 0000000000000002 R11: 0000000000011e00 R12:
ffff880231599000
Feb 13 08:50:53 speedy kernel: R13: ffff880232322698 R14: 0000000000000000 R15:
ffff880232322680
Feb 13 08:50:53 speedy kernel: FS: 00007f7666c6b700(0000)
GS:ffff88023bd00000(0000) knlGS:0000000000000000
Feb 13 08:50:53 speedy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008 CR3: 0000000223492000 CR4:
00000000000006e0
Feb 13 08:50:53 speedy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Feb 13 08:50:53 speedy kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Feb 13 08:50:53 speedy kernel: Process udisks-daemon (pid: 2721, threadinfo
ffff8802234dc000, task ffff880230f76920)
Feb 13 08:50:53 speedy kernel: Stack:
Feb 13 08:50:53 speedy kernel: ffffffff8103468a ffff880231599048
0000000000000000 ffff880231599000
Feb 13 08:50:53 speedy kernel: ffff880232322698 000000000000001d
ffff880232322680 ffffffff810a0f35
Feb 13 08:50:53 speedy kernel: ffff880232322680 ffff880231599000
0000000000000000 ffff880232322758
Feb 13 08:50:53 speedy kernel: Call Trace:
Feb 13 08:50:53 speedy kernel: [<ffffffff8103468a>] ? try_to_wake_up+0x200/0x200
Feb 13 08:50:53 speedy kernel: [<ffffffff810a0f35>] ? get_super+0x1a/0x95
Feb 13 08:50:53 speedy kernel: [<ffffffff810b2460>] ? iput+0x2b/0x17e
Feb 13 08:50:53 speedy kernel: [<ffffffff810eb4b6>] ?
rescan_partitions+0xac/0x446
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5498>] ? __blkdev_get+0x162/0x33f
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5835>] ? blkdev_get+0x1c0/0x29e
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
Feb 13 08:50:53 speedy kernel: [<ffffffff8109e0a7>] ?
__dentry_open.clone.14+0x16b/0x294
Feb 13 08:50:53 speedy kernel: [<ffffffff810aab37>] ?
do_last.clone.34+0x64e/0x662
Feb 13 08:50:53 speedy kernel: [<ffffffff810aac4d>] ? path_openat+0xcb/0x354
Feb 13 08:50:53 speedy kernel: [<ffffffff8133e9b0>] ?
scsi_set_medium_removal+0x46/0x6b
Feb 13 08:50:53 speedy kernel: [<ffffffff810aafb1>] ? do_filp_open+0x2c/0x72
Feb 13 08:50:53 speedy kernel: [<ffffffff810b4066>] ? alloc_fd+0x69/0x10f
Feb 13 08:50:53 speedy kernel: [<ffffffff8109ed9a>] ? do_sys_open+0x101/0x18f
Feb 13 08:50:53 speedy kernel: [<ffffffff81483292>] ?
system_call_fastpath+0x16/0x1b
Feb 13 08:50:53 speedy kernel: Code: ff ff 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e
41 5f c3 41 57 41 56 41 55 41 54 55 53 48 83 ec 78 48 8b 9f 50 02 00 00 48 89
7c 24 48 <48> 8b 43 08 48 89 44 24 28 8b 05 49 dc 7e 00 c1 e8 15 83 e0 07
Feb 13 08:50:53 speedy kernel: RIP [<ffffffff8135b798>]
sd_revalidate_disk+0x1a/0x16ee
Feb 13 08:50:53 speedy kernel: RSP <ffff8802234ddb08>
Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008
Feb 13 08:50:53 speedy kernel: ---[ end trace 0370d79d444e26e5 ]---

2012-02-14 11:18:22

by Junichi Nomura

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

On 02/14/12 02:40, Naveen Goswamy wrote:
> Feb 13 08:50:53 speedy kernel: scsi 6:0:0:0: killing request
> Feb 13 08:50:53 speedy kernel: BUG: unable to handle kernel NULL pointer
> dereference at 0000000000000008
> Feb 13 08:50:53 speedy kernel: IP: [<ffffffff8135b798>]
> sd_revalidate_disk+0x1a/0x16ee
> Feb 13 08:50:53 speedy kernel: PGD 223493067 PUD 2234de067 PMD 0
> Feb 13 08:50:53 speedy kernel: Oops: 0000 [#1] SMP
> Feb 13 08:50:53 speedy kernel: CPU 2
> Feb 13 08:50:53 speedy kernel: Modules linked in: aes_x86_64 aes_generic
> ipt_REJECT iptable_mangle iptable_nat nf_nat iptable_filter ip_tables ipv6
> dm_mod uvcvideo videodev v4l2_compat_ioctl32 usb_storage arc4 brcmsmac
> snd_hda_codec_hdmi snd_hda_codec_idt mac80211 brcmutil snd_hda_intel
> snd_hda_codec cfg80211 r8169 rfkill snd_pcm snd_timer dell_wmi snd
> sparse_keymap ehci_hcd wmi firmware_class dcdbas crc8 soundcore rtc usbcore
> snd_page_alloc sg cordic usb_common
> Feb 13 08:50:53 speedy kernel:
> Feb 13 08:50:53 speedy kernel: Pid: 2721, comm: udisks-daemon Not tainted
> 3.2.5-gentoo_MINE_V00 #1 Dell Inc. Vostro 3400/07MJFM
> Feb 13 08:50:53 speedy kernel: RIP: 0010:[<ffffffff8135b798>]
> [<ffffffff8135b798>] sd_revalidate_disk+0x1a/0x16ee
> Feb 13 08:50:53 speedy kernel: RSP: 0018:ffff8802234ddb08 EFLAGS: 00010292
> Feb 13 08:50:53 speedy kernel: RAX: ffffffff8135b77e RBX: 0000000000000000 RCX:
> 0000000000000002
> Feb 13 08:50:53 speedy kernel: RDX: 0000000000000002 RSI: 0000000800000000 RDI:
> ffff880231599000
> Feb 13 08:50:53 speedy kernel: RBP: ffff880231599000 R08: ffff88023ab4f9a0 R09:
> ffffffff81852ec8
> Feb 13 08:50:53 speedy kernel: R10: 0000000000000002 R11: 0000000000011e00 R12:
> ffff880231599000
> Feb 13 08:50:53 speedy kernel: R13: ffff880232322698 R14: 0000000000000000 R15:
> ffff880232322680
> Feb 13 08:50:53 speedy kernel: FS: 00007f7666c6b700(0000)
> GS:ffff88023bd00000(0000) knlGS:0000000000000000
> Feb 13 08:50:53 speedy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008 CR3: 0000000223492000 CR4:
> 00000000000006e0
> Feb 13 08:50:53 speedy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> Feb 13 08:50:53 speedy kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> Feb 13 08:50:53 speedy kernel: Process udisks-daemon (pid: 2721, threadinfo
> ffff8802234dc000, task ffff880230f76920)
> Feb 13 08:50:53 speedy kernel: Stack:
> Feb 13 08:50:53 speedy kernel: ffffffff8103468a ffff880231599048
> 0000000000000000 ffff880231599000
> Feb 13 08:50:53 speedy kernel: ffff880232322698 000000000000001d
> ffff880232322680 ffffffff810a0f35
> Feb 13 08:50:53 speedy kernel: ffff880232322680 ffff880231599000
> 0000000000000000 ffff880232322758
> Feb 13 08:50:53 speedy kernel: Call Trace:
> Feb 13 08:50:53 speedy kernel: [<ffffffff8103468a>] ? try_to_wake_up+0x200/0x200
> Feb 13 08:50:53 speedy kernel: [<ffffffff810a0f35>] ? get_super+0x1a/0x95
> Feb 13 08:50:53 speedy kernel: [<ffffffff810b2460>] ? iput+0x2b/0x17e
> Feb 13 08:50:53 speedy kernel: [<ffffffff810eb4b6>] ?
> rescan_partitions+0xac/0x446
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5498>] ? __blkdev_get+0x162/0x33f
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5835>] ? blkdev_get+0x1c0/0x29e
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
> Feb 13 08:50:53 speedy kernel: [<ffffffff8109e0a7>] ?
> __dentry_open.clone.14+0x16b/0x294
> Feb 13 08:50:53 speedy kernel: [<ffffffff810aab37>] ?
> do_last.clone.34+0x64e/0x662
> Feb 13 08:50:53 speedy kernel: [<ffffffff810aac4d>] ? path_openat+0xcb/0x354
> Feb 13 08:50:53 speedy kernel: [<ffffffff8133e9b0>] ?
> scsi_set_medium_removal+0x46/0x6b
> Feb 13 08:50:53 speedy kernel: [<ffffffff810aafb1>] ? do_filp_open+0x2c/0x72
> Feb 13 08:50:53 speedy kernel: [<ffffffff810b4066>] ? alloc_fd+0x69/0x10f
> Feb 13 08:50:53 speedy kernel: [<ffffffff8109ed9a>] ? do_sys_open+0x101/0x18f
> Feb 13 08:50:53 speedy kernel: [<ffffffff81483292>] ?
> system_call_fastpath+0x16/0x1b
> Feb 13 08:50:53 speedy kernel: Code: ff ff 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e
> 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 83 ec 78 48 8b 9f 50 02 00 00 48 89
> 7c 24 48 <48> 8b 43 08 48 89 44 24 28 8b 05 49 dc 7e 00 c1 e8 15 83 e0 07
> Feb 13 08:50:53 speedy kernel: RIP [<ffffffff8135b798>]
> sd_revalidate_disk+0x1a/0x16ee
> Feb 13 08:50:53 speedy kernel: RSP <ffff8802234ddb08>
> Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008
> Feb 13 08:50:53 speedy kernel: ---[ end trace 0370d79d444e26e5 ]---

According to the comments by Huajun Li:
http://www.spinics.net/lists/linux-scsi/msg55698.html

The following commit has changed __blkdev_get() to end up calling
sd_revalidate_disk() without getting a refcount of scsi_device:

commit 1196f8b814f32cd04df334abf47648c2a9fd8324
Author: Tejun Heo <[email protected]>
Date: Thu Apr 21 20:54:45 2011 +0200

block: rescan partitions on invalidated devices on -ENOMEDIA too

that could lead to oops like this:

process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>

Should "revalidate_disk" of block_device_operations work
without successful open()?

If so, sd_revalidate_disk() (and possibly other drivers) needs to be
fixed. (e.g. use scsi_disk_get/put by itself)

If not, __blkdev_get() or rescan_partision() should avoid calling
"revalidate_disk" for -ENOMEDIUM case.

Thanks,
--
Jun'ichi Nomura, NEC Corporation

2012-02-14 13:32:04

by Stefan Richter

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

On Feb 14 Jun'ichi Nomura wrote:
> According to the comments by Huajun Li:
> http://www.spinics.net/lists/linux-scsi/msg55698.html
>
> The following commit has changed __blkdev_get() to end up calling
> sd_revalidate_disk() without getting a refcount of scsi_device:
>
> commit 1196f8b814f32cd04df334abf47648c2a9fd8324
> Author: Tejun Heo <[email protected]>
> Date: Thu Apr 21 20:54:45 2011 +0200
>
> block: rescan partitions on invalidated devices on -ENOMEDIA too
>
> that could lead to oops like this:
>
> process A process B
> ----------------------------------------------
> sys_open
> __blkdev_get
> sd_open
> returns -ENOMEDIUM
> scsi_remove_device
> <scsi_device torn down>
> rescan_partitions
> sd_revalidate_disk
> <oops>
>
> Should "revalidate_disk" of block_device_operations work
> without successful open()?
>
> If so, sd_revalidate_disk() (and possibly other drivers) needs to be
> fixed. (e.g. use scsi_disk_get/put by itself)
>
> If not, __blkdev_get() or rescan_partision() should avoid calling
> "revalidate_disk" for -ENOMEDIUM case.

It may very well be that not only sd_revalidate_disk is affected.

I have yet to check whether the "open -> unplug -> ioctl -> oops" bug from
http://www.spinics.net/lists/linux-scsi/msg56254.html
(a) happens under 3.3-rc still (was reported against 3.2-rc7),
(b) affects sd devices too (was reported against sr devices).
--
Stefan Richter
-=====-===-- --=- -===-
http://arcgraph.de/sr/

2012-02-14 16:29:08

by Tejun Heo

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

Hello,

On Tue, Feb 14, 2012 at 08:14:40PM +0900, Jun'ichi Nomura wrote:
> The following commit has changed __blkdev_get() to end up calling
> sd_revalidate_disk() without getting a refcount of scsi_device:
>
> commit 1196f8b814f32cd04df334abf47648c2a9fd8324
> Author: Tejun Heo <[email protected]>
> Date: Thu Apr 21 20:54:45 2011 +0200
>
> block: rescan partitions on invalidated devices on -ENOMEDIA too
>
> that could lead to oops like this:
>
> process A process B
> ----------------------------------------------
> sys_open
> __blkdev_get
> sd_open
> returns -ENOMEDIUM
> scsi_remove_device
> <scsi_device torn down>
> rescan_partitions
> sd_revalidate_disk
> <oops>
>
> Should "revalidate_disk" of block_device_operations work
> without successful open()?
>
> If so, sd_revalidate_disk() (and possibly other drivers) needs to be
> fixed. (e.g. use scsi_disk_get/put by itself)
>
> If not, __blkdev_get() or rescan_partision() should avoid calling
> "revalidate_disk" for -ENOMEDIUM case.

Hmmm... right, that's a problem. Missed rescan_partitions() calling
into driver. What we should probably do is separating out
invalidation & partition shoot down into a separate function, say
trucate_disk(), and call that on -ENOMEDIUM instead of
rescan_partitions(). All that's necessary is killing the partition
devices (and maybe zapping device size to zero). Any one interested
in trying it?

Thanks.

--
tejun

2012-02-15 02:57:24

by Junichi Nomura

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

Hi,

Thank you for the comments.

On 02/15/12 01:28, Tejun Heo wrote:
>> that could lead to oops like this:
>>
>> process A process B
>> ----------------------------------------------
>> sys_open
>> __blkdev_get
>> sd_open
>> returns -ENOMEDIUM
>> scsi_remove_device
>> <scsi_device torn down>
>> rescan_partitions
>> sd_revalidate_disk
>> <oops>
>>
>> Should "revalidate_disk" of block_device_operations work
>> without successful open()?
>>
>> If so, sd_revalidate_disk() (and possibly other drivers) needs to be
>> fixed. (e.g. use scsi_disk_get/put by itself)
>>
>> If not, __blkdev_get() or rescan_partision() should avoid calling
>> "revalidate_disk" for -ENOMEDIUM case.
>
> Hmmm... right, that's a problem. Missed rescan_partitions() calling
> into driver. What we should probably do is separating out
> invalidation & partition shoot down into a separate function, say
> trucate_disk(), and call that on -ENOMEDIUM instead of
> rescan_partitions(). All that's necessary is killing the partition
> devices (and maybe zapping device size to zero). Any one interested
> in trying it?

How about this?

If the patch looks ok, I appreciate if somebody with removable media
could test the followings:
- the oops in sd_revalidate_disk() should not occur:
http://marc.info/?l=linux-scsi&m=132388619710052
- the problem reported here should not be re-introduced:
https://bugzilla.kernel.org/show_bug.cgi?id=13029

Index: linux-3.3/block/partition-generic.c
===================================================================
--- linux-3.3.orig/block/partition-generic.c 2012-02-15 09:00:25.147293790 +0900
+++ linux-3.3/block/partition-generic.c 2012-02-15 11:31:33.835554974 +0900
@@ -389,17 +389,11 @@ static bool disk_unlock_native_capacity(
}
}

-int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
{
- struct parsed_partitions *state = NULL;
struct disk_part_iter piter;
struct hd_struct *part;
- int p, highest, res;
-rescan:
- if (state && !IS_ERR(state)) {
- kfree(state);
- state = NULL;
- }
+ int res;

if (bdev->bd_part_count)
return -EBUSY;
@@ -412,6 +406,24 @@ rescan:
delete_partition(disk, part->partno);
disk_part_iter_exit(&piter);

+ return 0;
+}
+
+int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+ struct parsed_partitions *state = NULL;
+ struct hd_struct *part;
+ int p, highest, res;
+rescan:
+ if (state && !IS_ERR(state)) {
+ kfree(state);
+ state = NULL;
+ }
+
+ res = drop_partitions(disk, bdev);
+ if (res)
+ return res;
+
if (disk->fops->revalidate_disk)
disk->fops->revalidate_disk(disk);
check_disk_size_change(disk, bdev);
@@ -515,6 +527,22 @@ rescan:
return 0;
}

+int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+ int res;
+
+ res = drop_partitions(disk, bdev);
+ if (res)
+ return res;
+
+ check_disk_size_change(disk, bdev);
+ bdev->bd_invalidated = 0;
+ /* tell userspace that the media / partition table may have changed */
+ kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
+
+ return 0;
+}
+
unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
{
struct address_space *mapping = bdev->bd_inode->i_mapping;
Index: linux-3.3/include/linux/genhd.h
===================================================================
--- linux-3.3.orig/include/linux/genhd.h 2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/include/linux/genhd.h 2012-02-15 11:18:59.661594629 +0900
@@ -596,6 +596,7 @@ extern char *disk_name (struct gendisk *

extern int disk_expand_part_tbl(struct gendisk *disk, int target);
extern int rescan_partitions(struct gendisk *disk, struct block_device *bdev);
+extern int invalidate_partitions(struct gendisk *disk, struct block_device *bdev);
extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
int partno, sector_t start,
sector_t len, int flags,
Index: linux-3.3/fs/block_dev.c
===================================================================
--- linux-3.3.orig/fs/block_dev.c 2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/fs/block_dev.c 2012-02-15 11:34:48.800549266 +0900
@@ -1183,8 +1183,12 @@ static int __blkdev_get(struct block_dev
* The latter is necessary to prevent ghost
* partitions on a removed medium.
*/
- if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
- rescan_partitions(disk, bdev);
+ if (bdev->bd_invalidated) {
+ if (!ret)
+ rescan_partitions(disk, bdev);
+ else if (ret == -ENOMEDIUM)
+ invalidate_partitions(disk, bdev);
+ }
if (ret)
goto out_clear;
} else {
@@ -1214,8 +1218,12 @@ static int __blkdev_get(struct block_dev
if (bdev->bd_disk->fops->open)
ret = bdev->bd_disk->fops->open(bdev, mode);
/* the same as first opener case, read comment there */
- if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
- rescan_partitions(bdev->bd_disk, bdev);
+ if (bdev->bd_invalidated) {
+ if (!ret)
+ rescan_partitions(bdev->bd_disk, bdev);
+ else if (ret == -ENOMEDIUM)
+ invalidate_partitions(bdev->bd_disk, bdev);
+ }
if (ret)
goto out_unlock_bdev;
}

2012-02-15 17:26:24

by Tejun Heo

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

Hello,

This seems like the right approach to me, but..

On Wed, Feb 15, 2012 at 11:56:19AM +0900, Jun'ichi Nomura wrote:
> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
> +{
> + int res;
> +
> + res = drop_partitions(disk, bdev);
> + if (res)
> + return res;
> +

Hmmm... shouldn't we have set_capacity(disk, 0) here?

> + check_disk_size_change(disk, bdev);
> + bdev->bd_invalidated = 0;
> + /* tell userspace that the media / partition table may have changed */
> + kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);

Also, we really shouldn't be generating KOBJ_CHANGE after every
-ENOMEDIUM open. This can easily lead to infinite loop. We should
generate this iff we actually dropped partitions && modified the size.

Thanks.

--
tejun

2012-02-16 01:28:21

by Junichi Nomura

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

Hi,

Thank you for review and comments.

On 02/16/12 02:26, Tejun Heo wrote:
> On Wed, Feb 15, 2012 at 11:56:19AM +0900, Jun'ichi Nomura wrote:
>> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
>> +{
>> + int res;
>> +
>> + res = drop_partitions(disk, bdev);
>> + if (res)
>> + return res;
>> +
>
> Hmmm... shouldn't we have set_capacity(disk, 0) here?

Added.
I wasn't sure whether I should leave it to drivers.
But it seems capacity 0 for ENOMEDIUM device is reasonable.

>> + check_disk_size_change(disk, bdev);
>> + bdev->bd_invalidated = 0;
>> + /* tell userspace that the media / partition table may have changed */
>> + kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
>
> Also, we really shouldn't be generating KOBJ_CHANGE after every
> -ENOMEDIUM open. This can easily lead to infinite loop. We should
> generate this iff we actually dropped partitions && modified the size.

invalidate_partitions() is called only when bd_invalidated is set.
So KOBJ_CHANGE is not raised for every ENOMEDIUM open.

I put it explicit in the function to make it safer for
possible misuse.

How about this?

---------------------------------------------------------
Do not call drivers when invalidating partitions for -ENOMEDIUM

When a scsi driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions(), which ends up calling
sd_revalidate_disk() without getting a refcount of scsi_device.

That could lead to oops like this:

process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>

Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.

Index: linux-3.3/block/partition-generic.c
===================================================================
--- linux-3.3.orig/block/partition-generic.c 2012-02-15 09:00:25.147293790 +0900
+++ linux-3.3/block/partition-generic.c 2012-02-16 10:48:22.257680685 +0900
@@ -389,17 +389,11 @@ static bool disk_unlock_native_capacity(
}
}

-int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
{
- struct parsed_partitions *state = NULL;
struct disk_part_iter piter;
struct hd_struct *part;
- int p, highest, res;
-rescan:
- if (state && !IS_ERR(state)) {
- kfree(state);
- state = NULL;
- }
+ int res;

if (bdev->bd_part_count)
return -EBUSY;
@@ -412,6 +406,24 @@ rescan:
delete_partition(disk, part->partno);
disk_part_iter_exit(&piter);

+ return 0;
+}
+
+int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+ struct parsed_partitions *state = NULL;
+ struct hd_struct *part;
+ int p, highest, res;
+rescan:
+ if (state && !IS_ERR(state)) {
+ kfree(state);
+ state = NULL;
+ }
+
+ res = drop_partitions(disk, bdev);
+ if (res)
+ return res;
+
if (disk->fops->revalidate_disk)
disk->fops->revalidate_disk(disk);
check_disk_size_change(disk, bdev);
@@ -515,6 +527,26 @@ rescan:
return 0;
}

+int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+ int res;
+
+ if (!bdev->bd_invalidated)
+ return 0;
+
+ res = drop_partitions(disk, bdev);
+ if (res)
+ return res;
+
+ set_capacity(disk, 0);
+ check_disk_size_change(disk, bdev);
+ bdev->bd_invalidated = 0;
+ /* tell userspace that the media / partition table may have changed */
+ kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
+
+ return 0;
+}
+
unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
{
struct address_space *mapping = bdev->bd_inode->i_mapping;
Index: linux-3.3/include/linux/genhd.h
===================================================================
--- linux-3.3.orig/include/linux/genhd.h 2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/include/linux/genhd.h 2012-02-16 10:47:43.783681813 +0900
@@ -596,6 +596,7 @@ extern char *disk_name (struct gendisk *

extern int disk_expand_part_tbl(struct gendisk *disk, int target);
extern int rescan_partitions(struct gendisk *disk, struct block_device *bdev);
+extern int invalidate_partitions(struct gendisk *disk, struct block_device *bdev);
extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
int partno, sector_t start,
sector_t len, int flags,
Index: linux-3.3/fs/block_dev.c
===================================================================
--- linux-3.3.orig/fs/block_dev.c 2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/fs/block_dev.c 2012-02-16 10:47:52.602681441 +0900
@@ -1183,8 +1183,12 @@ static int __blkdev_get(struct block_dev
* The latter is necessary to prevent ghost
* partitions on a removed medium.
*/
- if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
- rescan_partitions(disk, bdev);
+ if (bdev->bd_invalidated) {
+ if (!ret)
+ rescan_partitions(disk, bdev);
+ else if (ret == -ENOMEDIUM)
+ invalidate_partitions(disk, bdev);
+ }
if (ret)
goto out_clear;
} else {
@@ -1214,8 +1218,12 @@ static int __blkdev_get(struct block_dev
if (bdev->bd_disk->fops->open)
ret = bdev->bd_disk->fops->open(bdev, mode);
/* the same as first opener case, read comment there */
- if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
- rescan_partitions(bdev->bd_disk, bdev);
+ if (bdev->bd_invalidated) {
+ if (!ret)
+ rescan_partitions(bdev->bd_disk, bdev);
+ else if (ret == -ENOMEDIUM)
+ invalidate_partitions(bdev->bd_disk, bdev);
+ }
if (ret)
goto out_unlock_bdev;
}

2012-02-16 16:36:37

by Tejun Heo

[permalink] [raw]
Subject: Re: Kernel crashing on eject SD card

Hello,

On Thu, Feb 16, 2012 at 10:26:38AM +0900, Jun'ichi Nomura wrote:
> >> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
> >> +{
> >> + int res;
> >> +
> >> + res = drop_partitions(disk, bdev);
> >> + if (res)
> >> + return res;
> >> +
> >
> > Hmmm... shouldn't we have set_capacity(disk, 0) here?
>
> Added.
> I wasn't sure whether I should leave it to drivers.

The problem is that we shouldn't call into drivers without first
opening the device, so....

> But it seems capacity 0 for ENOMEDIUM device is reasonable.

Yeah, I *think* it should be okay.

> >> + check_disk_size_change(disk, bdev);
> >> + bdev->bd_invalidated = 0;
> >> + /* tell userspace that the media / partition table may have changed */
> >> + kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
> >
> > Also, we really shouldn't be generating KOBJ_CHANGE after every
> > -ENOMEDIUM open. This can easily lead to infinite loop. We should
> > generate this iff we actually dropped partitions && modified the size.
>
> invalidate_partitions() is called only when bd_invalidated is set.
> So KOBJ_CHANGE is not raised for every ENOMEDIUM open.

Ah, okay.

> I put it explicit in the function to make it safer for
> possible misuse.
>
> How about this?
>
> ---------------------------------------------------------
> Do not call drivers when invalidating partitions for -ENOMEDIUM
>
> When a scsi driver returns -ENOMEDIUM for open(),
> __blkdev_get() calls rescan_partitions(), which ends up calling
> sd_revalidate_disk() without getting a refcount of scsi_device.
>
> That could lead to oops like this:
>
> process A process B
> ----------------------------------------------
> sys_open
> __blkdev_get
> sd_open
> returns -ENOMEDIUM
> scsi_remove_device
> <scsi_device torn down>
> rescan_partitions
> sd_revalidate_disk
> <oops>
>
> Oopses are reported here:
> http://marc.info/?l=linux-scsi&m=132388619710052
>
> This patch separates the partition invalidation from rescan_partitions()
> and use it for -ENOMEDIUM case.

Yeah, this looks good to me.

Thank you.

--
tejun