2005-10-29 12:11:13

by Masanari Iida

[permalink] [raw]
Subject: oops with USB Storage on 2.6.14

Hello,
I updated my system's kernel from 2.6.13.2 to 2.6.14,
then it oops when I connect my Digital Camera via USB connection
as USB storage device.
I went back to 2.6.14-rc1, still the same panic happen.
2.6.13.2 and before, the kernel has been worked as expected.

CPU Intel P4(2.4Ghz)
USB Device Pentax Optio S40.

Unable to handle kernel paging request at virtual address dc9d1f4c
printing eip:
c02b44cc
*pde = 00073067
*pte = 1c9d1000
Oops: 0000 [#1]
SMP DEBUG_PAGEALLOC
Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
ipt_recent iptable
_filter ip_tables video rtc
CPU: 1
EIP: 0060:[<c02b44cc>] Not tainted VLI
EFLAGS: 00010286 (2.6.14)
EIP is at scsi_run_queue+0xc/0xd0
eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
ds: 007b es: 007b ss: 0068
Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c c1507ea8 c02b4612 dc9d1e3c
da51bf60 c1507ecc c02b473f d5048eb0 00000000 00000024 00000286 00000001
d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0 00000000 00000024 00000001

Call Trace:
[<c0103abf>] show_stack+0x7f/0xa0
[<c0103c72>] show_registers+0x162/0x1d0
[<c0103e90>] die+0x100/0x1a0
[<c039d7ae>] do_page_fault+0x31e/0x640
[<c0103763>] error_code+0x4f/0x54
[<c02b4612>] scsi_next_command+0x22/0x30
[<c02b473f>] scsi_end_request+0xcf/0xf0
[<c02b4b2e>] scsi_io_completion+0x26e/0x470
[<c02b4fc7>] scsi_generic_done+0x37/0x50
[<c02af9e5>] scsi_finish_command+0x85/0xa0
[<c02af89c>] scsi_softirq+0xcc/0x140
[<c0122085>] __do_softirq+0xd5/0xf0
[<c01220d8>] do_softirq+0x38/0x40
[<c0122685>] ksoftirqd+0x95/0xe0
[<c0131cfa>] kthread+0xba/0xc0
[<c0100ecd>] kernel_thread_helper+0x5/0x18
Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7 ff ff eb aa 89 f6 8d
bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b> 82 10 01 00 00 8b 38
f6 80 85 01 00 00 80 0f 85 9e 00 00 00
<0>Kernel panic - not syncing: Fatal exception in interrupt

Masanari


Attachments:
(No filename) (2.01 kB)
config-2.6.14 (33.19 kB)
Download all attachments

2005-10-29 15:04:24

by Gene Heskett

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

On Saturday 29 October 2005 08:11, Masanari Iida wrote:
>Hello,
>I updated my system's kernel from 2.6.13.2 to 2.6.14,
>then it oops when I connect my Digital Camera via USB connection
>as USB storage device.
>I went back to 2.6.14-rc1, still the same panic happen.
>2.6.13.2 and before, the kernel has been worked as expected.
>
>CPU Intel P4(2.4Ghz)
>USB Device Pentax Optio S40.

I have an Olympus C-3020 which uses the usbstorage module, and looks
like a vfat filesystem with fat bugs. I just checked it and it worked
as expected with no errors or oops's. Its worked all along anytime I
wanted to grab the pix from it and unload the ram card for further use.
Athlon XP-2800, currently running 2.6.14.

>From dmesg:
usb 3-2.3: new full speed USB device using ohci_hcd and address 6
scsi0 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 6
usb-storage: waiting for device to settle before scanning
Vendor: OLYMPUS Model: C-3020ZOOM(U) Rev: 1.00
Type: Direct-Access ANSI SCSI revision: 02
usb-storage: device scan complete
SCSI device sda: 128000 512-byte hdwr sectors (66 MB)
sda: Write Protect is off
sda: Mode Sense: 18 00 00 08
sda: assuming drive cache: write through
SCSI device sda: 128000 512-byte hdwr sectors (66 MB)
sda: Write Protect is off
sda: Mode Sense: 18 00 00 08
sda: assuming drive cache: write through
sda: sda1
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
usb 3-2.3: USB disconnect, address 6

You may want to re-check your .config & rebuild.

>Unable to handle kernel paging request at virtual address dc9d1f4c
> printing eip:
>c02b44cc
>*pde = 00073067
>*pte = 1c9d1000
>Oops: 0000 [#1]
>SMP DEBUG_PAGEALLOC
>Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
>ipt_recent iptable
>_filter ip_tables video rtc
>CPU: 1
>EIP: 0060:[<c02b44cc>] Not tainted VLI
>EFLAGS: 00010286 (2.6.14)
>EIP is at scsi_run_queue+0xc/0xd0
>eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
>esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
>ds: 007b es: 007b ss: 0068
>Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
>Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c c1507ea8 c02b4612
> dc9d1e3c da51bf60 c1507ecc c02b473f d5048eb0 00000000 00000024 00000286
> 00000001 d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0 00000000 00000024
> 00000001
>
>Call Trace:
> [<c0103abf>] show_stack+0x7f/0xa0
> [<c0103c72>] show_registers+0x162/0x1d0
> [<c0103e90>] die+0x100/0x1a0
> [<c039d7ae>] do_page_fault+0x31e/0x640
> [<c0103763>] error_code+0x4f/0x54
> [<c02b4612>] scsi_next_command+0x22/0x30
> [<c02b473f>] scsi_end_request+0xcf/0xf0
> [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> [<c02b4fc7>] scsi_generic_done+0x37/0x50
> [<c02af9e5>] scsi_finish_command+0x85/0xa0
> [<c02af89c>] scsi_softirq+0xcc/0x140
> [<c0122085>] __do_softirq+0xd5/0xf0
> [<c01220d8>] do_softirq+0x38/0x40
> [<c0122685>] ksoftirqd+0x95/0xe0
> [<c0131cfa>] kthread+0xba/0xc0
> [<c0100ecd>] kernel_thread_helper+0x5/0x18
>Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7 ff ff eb aa
> 89 f6 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b> 82
> 10 01 00 00 8b 38 f6 80 85 01 00 00 80 0f 85 9e 00 00 00
> <0>Kernel panic - not syncing: Fatal exception in interrupt
>
>Masanari

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Free OpenDocument reader/writer/converter download:
http://www.openoffice.org
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

2005-10-30 21:23:53

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

Masanari Iida <[email protected]> wrote:
>
> Hello,
> I updated my system's kernel from 2.6.13.2 to 2.6.14,
> then it oops when I connect my Digital Camera via USB connection
> as USB storage device.
> I went back to 2.6.14-rc1, still the same panic happen.
> 2.6.13.2 and before, the kernel has been worked as expected.
>
> CPU Intel P4(2.4Ghz)
> USB Device Pentax Optio S40.
>
> Unable to handle kernel paging request at virtual address dc9d1f4c
> printing eip:
> c02b44cc
> *pde = 00073067
> *pte = 1c9d1000
> Oops: 0000 [#1]
> SMP DEBUG_PAGEALLOC
> Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> ipt_recent iptable
> _filter ip_tables video rtc
> CPU: 1
> EIP: 0060:[<c02b44cc>] Not tainted VLI
> EFLAGS: 00010286 (2.6.14)
> EIP is at scsi_run_queue+0xc/0xd0
> eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
> esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
> ds: 007b es: 007b ss: 0068
> Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
> Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c c1507ea8 c02b4612 dc9d1e3c
> da51bf60 c1507ecc c02b473f d5048eb0 00000000 00000024 00000286 00000001
> d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0 00000000 00000024 00000001
>
> Call Trace:
> [<c0103abf>] show_stack+0x7f/0xa0
> [<c0103c72>] show_registers+0x162/0x1d0
> [<c0103e90>] die+0x100/0x1a0
> [<c039d7ae>] do_page_fault+0x31e/0x640
> [<c0103763>] error_code+0x4f/0x54
> [<c02b4612>] scsi_next_command+0x22/0x30
> [<c02b473f>] scsi_end_request+0xcf/0xf0
> [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> [<c02b4fc7>] scsi_generic_done+0x37/0x50
> [<c02af9e5>] scsi_finish_command+0x85/0xa0
> [<c02af89c>] scsi_softirq+0xcc/0x140
> [<c0122085>] __do_softirq+0xd5/0xf0
> [<c01220d8>] do_softirq+0x38/0x40
> [<c0122685>] ksoftirqd+0x95/0xe0
> [<c0131cfa>] kthread+0xba/0xc0
> [<c0100ecd>] kernel_thread_helper+0x5/0x18
> Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7 ff ff eb aa 89 f6 8d
> bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b> 82 10 01 00 00 8b 38
> f6 80 85 01 00 00 80 0f 85 9e 00 00 00
> <0>Kernel panic - not syncing: Fatal exception in interrupt
>

Either a scsi bug or a USB bug. Either way, regressions like this are a
top priority.

Could you please try disabling CONFIG_DEBUG_PAGEALLOC and retest? If that
works OK, it's probably a use-after-free.

Thanks.

2005-10-31 00:41:09

by Masanari Iida

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

>
> Could you please try disabling CONFIG_DEBUG_PAGEALLOC and retest? If that
> works OK, it's probably a use-after-free.
>
Hello Andrew,

I did disabled CONFIG_DEBUG_PAGEALLOC and re-tested on 2.6.14-rc1.
Now the oops didn't happen when I connect digital camera to the USB.
I could mount the camera as USB storage.
But oops still happen when I turned the camera power off.
(This oops didn't halt my system, BTW)

# Unable to handle kernel paging request at virtual address 6b6b6bb3
printing eip:
c02b88ca
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
ipt_recent iptable_filter ip_tables video rtc
CPU: 0
EIP: 0060:[<c02b88ca>] Not tainted VLI
EFLAGS: 00010296 (2.6.14-rc1)
EIP is at scsi_remove_device+0x3a/0x50
eax: 00000001 ebx: dee9f478 ecx: 00000000 edx: 6b6b6b6b
esi: de6655c4 edi: de6655bc ebp: dfd4fdb4 esp: dfd4fda4
ds: 007b es: 007b ss: 0068
Process khubd (pid: 15, threadinfo=dfd4e000 task=dff1f030)
Stack: dee9f478 00000066 dee9f478 de6655c4 dfd4fdcc c02b89a1 dee9f478 de6655c8
de213c98 de6655c4 dfd4fde8 c02b8a16 de213c84 dfd4fde8 00000282 de6655c8
de6655c8 dfd4fe04 c02b7655 de213c98 de6655cc de6655c4 de6656e0 de213a8c
Call Trace:
[<c0103aaf>] show_stack+0x7f/0xa0
[<c0103c62>] show_registers+0x162/0x1d0
[<c0103e74>] die+0xf4/0x1a0
[<c039cfce>] do_page_fault+0x31e/0x640
[<c0103753>] error_code+0x4f/0x54
[<c02b89a1>] __scsi_remove_target+0xc1/0xe0
[<c02b8a16>] scsi_remove_target+0x26/0x60
[<c02b7655>] scsi_forget_host+0x45/0x70
[<c02afd97>] scsi_remove_host+0x57/0xa0
[<c02e5145>] quiesce_and_remove_host+0x75/0xb0
[<c02e55fd>] storage_disconnect+0x1d/0x2c
[<c02c8286>] usb_unbind_interface+0x86/0x90
[<c025617b>] __device_release_driver+0x8b/0x90
[<c02561b6>] device_release_driver+0x36/0x50
[<c02557c9>] bus_remove_device+0x79/0x90
[<c0254525>] device_del+0x35/0x70
[<c02d00fb>] usb_disable_device+0xfb/0x130
[<c02caa46>] usb_disconnect+0xc6/0x180
[<c02cbe2f>] hub_port_connect_change+0x3cf/0x400
[<c02cc12e>] hub_events+0x2ce/0x410
[<c02cc285>] hub_thread+0x15/0xf0
[<c0131aaa>] kthread+0xba/0xc0
[<c0100ecd>] kernel_thread_helper+0x5/0x18
Code: 5d 08 89 75 fc 8b 33 89 44 24 04 c7 04 24 e8 a8 3b c0 e8 9a 14
e6 ff f0 ff 4e 48 0f 88 a8 04 00 00 89 1c
24 e8 38 ff ff ff 8b 13 <f0> ff 42 48 0f 8e a1 04 00 00 8b 5d f8 8b
75 fc 89 ec 5d c3 89

If you need some more test, let me know.
In that case, please specify which version of kernel you want me to test.

Regards,

Masanari iida

2005-10-31 01:03:22

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

Masanari Iida <[email protected]> wrote:
>

You removed linux-scis and linux-usb from cc. Please retain them.

> >
> > Could you please try disabling CONFIG_DEBUG_PAGEALLOC and retest? If that
> > works OK, it's probably a use-after-free.
> >
> Hello Andrew,
>
> I did disabled CONFIG_DEBUG_PAGEALLOC and re-tested on 2.6.14-rc1.
> Now the oops didn't happen when I connect digital camera to the USB.

So the first oops was probably use-after-free.

> I could mount the camera as USB storage.
> But oops still happen when I turned the camera power off.
> (This oops didn't halt my system, BTW)
>
> # Unable to handle kernel paging request at virtual address 6b6b6bb3
> printing eip:
> c02b88ca
> *pde = 00000000
> Oops: 0002 [#1]
> SMP
> Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> ipt_recent iptable_filter ip_tables video rtc
> CPU: 0
> EIP: 0060:[<c02b88ca>] Not tainted VLI
> EFLAGS: 00010296 (2.6.14-rc1)
> EIP is at scsi_remove_device+0x3a/0x50
> eax: 00000001 ebx: dee9f478 ecx: 00000000 edx: 6b6b6b6b
> esi: de6655c4 edi: de6655bc ebp: dfd4fdb4 esp: dfd4fda4
> ds: 007b es: 007b ss: 0068
> Process khubd (pid: 15, threadinfo=dfd4e000 task=dff1f030)
> Stack: dee9f478 00000066 dee9f478 de6655c4 dfd4fdcc c02b89a1 dee9f478 de6655c8
> de213c98 de6655c4 dfd4fde8 c02b8a16 de213c84 dfd4fde8 00000282 de6655c8
> de6655c8 dfd4fe04 c02b7655 de213c98 de6655cc de6655c4 de6656e0 de213a8c
> Call Trace:
> [<c0103aaf>] show_stack+0x7f/0xa0
> [<c0103c62>] show_registers+0x162/0x1d0
> [<c0103e74>] die+0xf4/0x1a0
> [<c039cfce>] do_page_fault+0x31e/0x640
> [<c0103753>] error_code+0x4f/0x54
> [<c02b89a1>] __scsi_remove_target+0xc1/0xe0
> [<c02b8a16>] scsi_remove_target+0x26/0x60
> [<c02b7655>] scsi_forget_host+0x45/0x70
> [<c02afd97>] scsi_remove_host+0x57/0xa0
> [<c02e5145>] quiesce_and_remove_host+0x75/0xb0
> [<c02e55fd>] storage_disconnect+0x1d/0x2c
> [<c02c8286>] usb_unbind_interface+0x86/0x90
> [<c025617b>] __device_release_driver+0x8b/0x90
> [<c02561b6>] device_release_driver+0x36/0x50
> [<c02557c9>] bus_remove_device+0x79/0x90
> [<c0254525>] device_del+0x35/0x70
> [<c02d00fb>] usb_disable_device+0xfb/0x130
> [<c02caa46>] usb_disconnect+0xc6/0x180
> [<c02cbe2f>] hub_port_connect_change+0x3cf/0x400
> [<c02cc12e>] hub_events+0x2ce/0x410
> [<c02cc285>] hub_thread+0x15/0xf0
> [<c0131aaa>] kthread+0xba/0xc0
> [<c0100ecd>] kernel_thread_helper+0x5/0x18
> Code: 5d 08 89 75 fc 8b 33 89 44 24 04 c7 04 24 e8 a8 3b c0 e8 9a 14
> e6 ff f0 ff 4e 48 0f 88 a8 04 00 00 89 1c
> 24 e8 38 ff ff ff 8b 13 <f0> ff 42 48 0f 8e a1 04 00 00 8b 5d f8 8b
> 75 fc 89 ec 5d c3 89
>
> If you need some more test, let me know.
> In that case, please specify which version of kernel you want me to test.
>

OK, thanks. This is a different bug. Presumably in USB.

2005-10-31 16:14:18

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: oops with USB Storage on 2.6.14

On Sun, 30 Oct 2005, Andrew Morton wrote:

> Masanari Iida <[email protected]> wrote:

> > Hello Andrew,
> >
> > I did disabled CONFIG_DEBUG_PAGEALLOC and re-tested on 2.6.14-rc1.
> > Now the oops didn't happen when I connect digital camera to the USB.
>
> So the first oops was probably use-after-free.
>
> > I could mount the camera as USB storage.
> > But oops still happen when I turned the camera power off.
> > (This oops didn't halt my system, BTW)
> >
> > # Unable to handle kernel paging request at virtual address 6b6b6bb3
> > printing eip:
> > c02b88ca
> > *pde = 00000000
> > Oops: 0002 [#1]
> > SMP
> > Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> > ipt_recent iptable_filter ip_tables video rtc
> > CPU: 0
> > EIP: 0060:[<c02b88ca>] Not tainted VLI
> > EFLAGS: 00010296 (2.6.14-rc1)
> > EIP is at scsi_remove_device+0x3a/0x50

> > If you need some more test, let me know.
> > In that case, please specify which version of kernel you want me to test.
> >
>
> OK, thanks. This is a different bug. Presumably in USB.

This was fixed in later releases of 2.6.14-rc.

I wasn't able to reproduce the original problem, even after setting
CONFIG_DEBUG_PAGEALLOC.

Alan Stern

2005-11-02 02:56:26

by Masanari Iida

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: oops with USB Storage on 2.6.14

On 11/1/05, Alan Stern <[email protected]> wrote:
> On Sun, 30 Oct 2005, Andrew Morton wrote:
>
> > Masanari Iida <[email protected]> wrote:
>
> > > Hello Andrew,
> > >
> > > I did disabled CONFIG_DEBUG_PAGEALLOC and re-tested on 2.6.14-rc1.
> > > Now the oops didn't happen when I connect digital camera to the USB.
> >
> > So the first oops was probably use-after-free.
> >
> > > I could mount the camera as USB storage.
> > > But oops still happen when I turned the camera power off.
> > > (This oops didn't halt my system, BTW)
> > >
> > > # Unable to handle kernel paging request at virtual address 6b6b6bb3
> > > printing eip:
> > > c02b88ca
> > > *pde = 00000000
> > > Oops: 0002 [#1]
> > > SMP
> > > Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> > > ipt_recent iptable_filter ip_tables video rtc
> > > CPU: 0
> > > EIP: 0060:[<c02b88ca>] Not tainted VLI
> > > EFLAGS: 00010296 (2.6.14-rc1)
> > > EIP is at scsi_remove_device+0x3a/0x50
>
> > > If you need some more test, let me know.
> > > In that case, please specify which version of kernel you want me to test.
> > >
> >
> > OK, thanks. This is a different bug. Presumably in USB.
>
> This was fixed in later releases of 2.6.14-rc.
>
> I wasn't able to reproduce the original problem, even after setting
> CONFIG_DEBUG_PAGEALLOC.
>
> Alan Stern
>
Alan,

Confirm the " scsi_remove_device " oops didn't happen on 2.4.14.
Talking about the original PANIC, as I have a workaround
(CONFIG_DEBUG_PAGEALLOC disabled), I agree to close my report, now.

Thank you.

Masanari Iida

2005-11-02 07:23:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: oops with USB Storage on 2.6.14

Masanari Iida <[email protected]> wrote:
>
> On 11/1/05, Alan Stern <[email protected]> wrote:
> > On Sun, 30 Oct 2005, Andrew Morton wrote:
> >
> > > Masanari Iida <[email protected]> wrote:
> >
> > > > Hello Andrew,
> > > >
> > > > I did disabled CONFIG_DEBUG_PAGEALLOC and re-tested on 2.6.14-rc1.
> > > > Now the oops didn't happen when I connect digital camera to the USB.
> > >
> > > So the first oops was probably use-after-free.
> > >
> > > > I could mount the camera as USB storage.
> > > > But oops still happen when I turned the camera power off.
> > > > (This oops didn't halt my system, BTW)
> > > >
> > > > # Unable to handle kernel paging request at virtual address 6b6b6bb3
> > > > printing eip:
> > > > c02b88ca
> > > > *pde = 00000000
> > > > Oops: 0002 [#1]
> > > > SMP
> > > > Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> > > > ipt_recent iptable_filter ip_tables video rtc
> > > > CPU: 0
> > > > EIP: 0060:[<c02b88ca>] Not tainted VLI
> > > > EFLAGS: 00010296 (2.6.14-rc1)
> > > > EIP is at scsi_remove_device+0x3a/0x50
> >
> > > > If you need some more test, let me know.
> > > > In that case, please specify which version of kernel you want me to test.
> > > >
> > >
> > > OK, thanks. This is a different bug. Presumably in USB.
> >
> > This was fixed in later releases of 2.6.14-rc.
> >
> > I wasn't able to reproduce the original problem, even after setting
> > CONFIG_DEBUG_PAGEALLOC.
> >
> > Alan Stern
> >
> Alan,
>
> Confirm the " scsi_remove_device " oops didn't happen on 2.4.14.

2.6.14, I assume.

> Talking about the original PANIC, as I have a workaround
> (CONFIG_DEBUG_PAGEALLOC disabled), I agree to close my report, now.

That's not a valid workaround. We're touching freed, unallocated or simply
wild memory and that is a bad bug.

2005-11-08 04:41:14

by Andrew Morton

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

Masanari Iida <[email protected]> wrote:
>
> Hello,
> I updated my system's kernel from 2.6.13.2 to 2.6.14,
> then it oops when I connect my Digital Camera via USB connection
> as USB storage device.
> I went back to 2.6.14-rc1, still the same panic happen.
> 2.6.13.2 and before, the kernel has been worked as expected.
>
> CPU Intel P4(2.4Ghz)
> USB Device Pentax Optio S40.
>
> Unable to handle kernel paging request at virtual address dc9d1f4c
> printing eip:
> c02b44cc
> *pde = 00073067
> *pte = 1c9d1000
> Oops: 0000 [#1]
> SMP DEBUG_PAGEALLOC
> Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> ipt_recent iptable
> _filter ip_tables video rtc
> CPU: 1
> EIP: 0060:[<c02b44cc>] Not tainted VLI
> EFLAGS: 00010286 (2.6.14)
> EIP is at scsi_run_queue+0xc/0xd0
> eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
> esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
> ds: 007b es: 007b ss: 0068
> Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
> Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c c1507ea8 c02b4612 dc9d1e3c
> da51bf60 c1507ecc c02b473f d5048eb0 00000000 00000024 00000286 00000001
> d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0 00000000 00000024 00000001
>
> Call Trace:
> [<c0103abf>] show_stack+0x7f/0xa0
> [<c0103c72>] show_registers+0x162/0x1d0
> [<c0103e90>] die+0x100/0x1a0
> [<c039d7ae>] do_page_fault+0x31e/0x640
> [<c0103763>] error_code+0x4f/0x54
> [<c02b4612>] scsi_next_command+0x22/0x30
> [<c02b473f>] scsi_end_request+0xcf/0xf0
> [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> [<c02b4fc7>] scsi_generic_done+0x37/0x50
> [<c02af9e5>] scsi_finish_command+0x85/0xa0
> [<c02af89c>] scsi_softirq+0xcc/0x140
> [<c0122085>] __do_softirq+0xd5/0xf0
> [<c01220d8>] do_softirq+0x38/0x40
> [<c0122685>] ksoftirqd+0x95/0xe0
> [<c0131cfa>] kthread+0xba/0xc0
> [<c0100ecd>] kernel_thread_helper+0x5/0x18
> Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7 ff ff eb aa 89 f6 8d
> bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b> 82 10 01 00 00 8b 38
> f6 80 85 01 00 00 80 0f 85 9e 00 00 00
> <0>Kernel panic - not syncing: Fatal exception in interrupt
>

Has there been any progress on this?

If not, can you please test the latest snapshot from
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots and if it still fails, raise a bug at bugzilla.kernel.org?

Thanks.

2005-11-08 15:01:27

by Masanari Iida

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

On 11/8/05, Andrew Morton <[email protected]> wrote:
> Masanari Iida <[email protected]> wrote:
> >
> > Hello,
> > I updated my system's kernel from 2.6.13.2 to 2.6.14,
> > then it oops when I connect my Digital Camera via USB connection
> > as USB storage device.
> > I went back to 2.6.14-rc1, still the same panic happen.
> > 2.6.13.2 and before, the kernel has been worked as expected.
> >
> > CPU Intel P4(2.4Ghz)
> > USB Device Pentax Optio S40.
> >
> > Unable to handle kernel paging request at virtual address dc9d1f4c
> > printing eip:
> > c02b44cc
> > *pde = 00073067
> > *pte = 1c9d1000
> > Oops: 0000 [#1]
> > SMP DEBUG_PAGEALLOC
> > Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> > ipt_recent iptable
> > _filter ip_tables video rtc
> > CPU: 1
> > EIP: 0060:[<c02b44cc>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.14)
> > EIP is at scsi_run_queue+0xc/0xd0
> > eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
> > esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
> > ds: 007b es: 007b ss: 0068
> > Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
> > Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c c1507ea8 c02b4612 dc9d1e3c
> > da51bf60 c1507ecc c02b473f d5048eb0 00000000 00000024 00000286 00000001
> > d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0 00000000 00000024 00000001
> >
> > Call Trace:
> > [<c0103abf>] show_stack+0x7f/0xa0
> > [<c0103c72>] show_registers+0x162/0x1d0
> > [<c0103e90>] die+0x100/0x1a0
> > [<c039d7ae>] do_page_fault+0x31e/0x640
> > [<c0103763>] error_code+0x4f/0x54
> > [<c02b4612>] scsi_next_command+0x22/0x30
> > [<c02b473f>] scsi_end_request+0xcf/0xf0
> > [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> > [<c02b4fc7>] scsi_generic_done+0x37/0x50
> > [<c02af9e5>] scsi_finish_command+0x85/0xa0
> > [<c02af89c>] scsi_softirq+0xcc/0x140
> > [<c0122085>] __do_softirq+0xd5/0xf0
> > [<c01220d8>] do_softirq+0x38/0x40
> > [<c0122685>] ksoftirqd+0x95/0xe0
> > [<c0131cfa>] kthread+0xba/0xc0
> > [<c0100ecd>] kernel_thread_helper+0x5/0x18
> > Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7 ff ff eb aa 89 f6 8d
> > bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b> 82 10 01 00 00 8b 38
> > f6 80 85 01 00 00 80 0f 85 9e 00 00 00
> > <0>Kernel panic - not syncing: Fatal exception in interrupt
> >
>
> Has there been any progress on this?
>
> If not, can you please test the latest snapshot from
> ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots and if it still fails, raise a bug at bugzilla.kernel.org?
>
> Thanks.
>

Hello again, Andrew,

I have tested on 2.6.14-git10 with CONFIG_DEBUG_PAGEALLOC=y.
The original oops with USB Storage (Camera) is fixed now.
Thank you.

Masanari

2005-11-08 16:25:38

by Edward Goggin

[permalink] [raw]
Subject: RE: oops with USB Storage on 2.6.14

I've run into a bug like this several times using 2.6.14-rc4 while
testing dm-multipath's reaction to uevents generated by forcing
fiber channel transport failures -- which leads to the scsi device
being detached and the queuedata pointer in the device's queue being
reset in scsi_device_dev_release. The fix I've used is below and
it seems to work well for me. I was going to place this patch on
dm-devel today or tomorrow anyway.

drivers/scsi/scsi_lib.c:scsi_next_command()
Call scsi_device_get and scsi_device_put around the calls to
scsi_put_command
and scsi_run_queue so that the scsi host structure will not be de-allocated
between scsi_put_command and scsi_run_queue.

*** ../base/linux-2.6.14-rc4/drivers/scsi/scsi_lib.c Mon Oct 10 20:19:19
2005
--- drivers/scsi/scsi_lib.c Thu Nov 3 13:30:03 2005
***************
*** 592,601 ****

void scsi_next_command(struct scsi_cmnd *cmd)
{
! struct request_queue *q = cmd->device->request_queue;

scsi_put_command(cmd);
scsi_run_queue(q);
}

void scsi_run_host_queues(struct Scsi_Host *shost)
--- 592,611 ----

void scsi_next_command(struct scsi_cmnd *cmd)
{
! struct scsi_device *sdev = cmd->device;
! struct request_queue *q = sdev->request_queue;
!
! // need to hold a reference on the device before we let go of the
cmd
! if (scsi_device_get(sdev)) {
! scsi_put_command(cmd);
! return; // maybe sdev_state == SDEV_CANCEL, SDEV_DEL
! }

scsi_put_command(cmd);
scsi_run_queue(q);
+
+ // ok to remove device now
+ scsi_device_put(sdev);
}

void scsi_run_host_queues(struct Scsi_Host *shost)


> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andrew Morton
> Sent: Monday, November 07, 2005 11:41 PM
> To: Masanari Iida
> Cc: [email protected];
> [email protected]; [email protected]
> Subject: Re: oops with USB Storage on 2.6.14
>
> Masanari Iida <[email protected]> wrote:
> >
> > Hello,
> > I updated my system's kernel from 2.6.13.2 to 2.6.14,
> > then it oops when I connect my Digital Camera via USB connection
> > as USB storage device.
> > I went back to 2.6.14-rc1, still the same panic happen.
> > 2.6.13.2 and before, the kernel has been worked as expected.
> >
> > CPU Intel P4(2.4Ghz)
> > USB Device Pentax Optio S40.
> >
> > Unable to handle kernel paging request at virtual address dc9d1f4c
> > printing eip:
> > c02b44cc
> > *pde = 00073067
> > *pte = 1c9d1000
> > Oops: 0000 [#1]
> > SMP DEBUG_PAGEALLOC
> > Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> > ipt_recent iptable
> > _filter ip_tables video rtc
> > CPU: 1
> > EIP: 0060:[<c02b44cc>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.14)
> > EIP is at scsi_run_queue+0xc/0xd0
> > eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
> > esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
> > ds: 007b es: 007b ss: 0068
> > Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
> > Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c
> c1507ea8 c02b4612 dc9d1e3c
> > da51bf60 c1507ecc c02b473f d5048eb0 00000000
> 00000024 00000286 00000001
> > d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0
> 00000000 00000024 00000001
> >
> > Call Trace:
> > [<c0103abf>] show_stack+0x7f/0xa0
> > [<c0103c72>] show_registers+0x162/0x1d0
> > [<c0103e90>] die+0x100/0x1a0
> > [<c039d7ae>] do_page_fault+0x31e/0x640
> > [<c0103763>] error_code+0x4f/0x54
> > [<c02b4612>] scsi_next_command+0x22/0x30
> > [<c02b473f>] scsi_end_request+0xcf/0xf0
> > [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> > [<c02b4fc7>] scsi_generic_done+0x37/0x50
> > [<c02af9e5>] scsi_finish_command+0x85/0xa0
> > [<c02af89c>] scsi_softirq+0xcc/0x140
> > [<c0122085>] __do_softirq+0xd5/0xf0
> > [<c01220d8>] do_softirq+0x38/0x40
> > [<c0122685>] ksoftirqd+0x95/0xe0
> > [<c0131cfa>] kthread+0xba/0xc0
> > [<c0100ecd>] kernel_thread_helper+0x5/0x18
> > Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7
> ff ff eb aa 89 f6 8d
> > bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b>
> 82 10 01 00 00 8b 38
> > f6 80 85 01 00 00 80 0f 85 9e 00 00 00
> > <0>Kernel panic - not syncing: Fatal exception in interrupt
> >
>
> Has there been any progress on this?
>
> If not, can you please test the latest snapshot from
> ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots and if
> it still fails, raise a bug at bugzilla.kernel.org?
>
> Thanks.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2005-11-08 16:38:17

by Rolf Eike Beer

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

Am Dienstag, 8. November 2005 17:24 schrieb goggin, edward:
>I've run into a bug like this several times using 2.6.14-rc4 while
>testing dm-multipath's reaction to uevents generated by forcing
>fiber channel transport failures -- which leads to the scsi device
>being detached and the queuedata pointer in the device's queue being
>reset in scsi_device_dev_release. The fix I've used is below and
>it seems to work well for me. I was going to place this patch on
>dm-devel today or tomorrow anyway.
>
>drivers/scsi/scsi_lib.c:scsi_next_command()
>Call scsi_device_get and scsi_device_put around the calls to
>scsi_put_command
>and scsi_run_queue so that the scsi host structure will not be de-allocated
>between scsi_put_command and scsi_run_queue.
>
>*** ../base/linux-2.6.14-rc4/drivers/scsi/scsi_lib.c Mon Oct 10 20:19:19
>2005
>--- drivers/scsi/scsi_lib.c Thu Nov 3 13:30:03 2005
>***************
>*** 592,601 ****

Your patch is linewrapped. Also please use unified diff format, good choice
for diff options is "-Naurp".

Eike


Attachments:
(No filename) (1.01 kB)
(No filename) (189.00 B)
Download all attachments

2005-11-08 17:03:27

by James Bottomley

[permalink] [raw]
Subject: RE: oops with USB Storage on 2.6.14

On Tue, 2005-11-08 at 11:24 -0500, goggin, edward wrote:
> ! struct scsi_device *sdev = cmd->device;
> ! struct request_queue *q = sdev->request_queue;
> !
> ! // need to hold a reference on the device before we let go of the
> cmd
> ! if (scsi_device_get(sdev)) {
> ! scsi_put_command(cmd);
> ! return; // maybe sdev_state == SDEV_CANCEL, SDEV_DEL
> ! }
>
> scsi_put_command(cmd);
> scsi_run_queue(q);
> +
> + // ok to remove device now
> + scsi_device_put(sdev);

This is the right idea, I think, but not necessarily the right fix.
scsi_device_get() will fail if the device is going offline, but we would
still need to run the queues.

try this sequence instead:

get_device(&sdev->sdev_gendev);
scsi_put_command(cmd);
scsi_run_queue(q);
put_device(&sdev->sdev_gendev);

James


2005-11-08 18:05:15

by Edward Goggin

[permalink] [raw]
Subject: RE: oops with USB Storage on 2.6.14

Good point.

I tested your suggested patch fix and it works well in my test use case.

BTW, I've got test code in my just tested version of your patch to read the
kref of the sdev_gendev between the calls to scsi_run_queue and
put_device to increase the likelihood that the use case has actually
been tested.

> -----Original Message-----
> From: James Bottomley [mailto:[email protected]]
> Sent: Tuesday, November 08, 2005 12:02 PM
> To: goggin, edward
> Cc: 'Andrew Morton'; Masanari Iida;
> [email protected];
> [email protected]; [email protected]
> Subject: RE: oops with USB Storage on 2.6.14
>
> On Tue, 2005-11-08 at 11:24 -0500, goggin, edward wrote:
> > ! struct scsi_device *sdev = cmd->device;
> > ! struct request_queue *q = sdev->request_queue;
> > !
> > ! // need to hold a reference on the device before we let
> go of the
> > cmd
> > ! if (scsi_device_get(sdev)) {
> > ! scsi_put_command(cmd);
> > ! return; // maybe sdev_state ==
> SDEV_CANCEL, SDEV_DEL
> > ! }
> >
> > scsi_put_command(cmd);
> > scsi_run_queue(q);
> > +
> > + // ok to remove device now
> > + scsi_device_put(sdev);
>
> This is the right idea, I think, but not necessarily the right fix.
> scsi_device_get() will fail if the device is going offline,
> but we would
> still need to run the queues.
>
> try this sequence instead:
>
> get_device(&sdev->sdev_gendev);
> scsi_put_command(cmd);
> scsi_run_queue(q);
> put_device(&sdev->sdev_gendev);
>
> James
>
>

2005-11-08 20:03:12

by Edward Goggin

[permalink] [raw]
Subject: RE: oops with USB Storage on 2.6.14

Thanks! Here's a better one.

--- ../base/linux-2.6.14-rc4/drivers/scsi/scsi_lib.c 2005-10-10
20:19:19.000000000 -0500
+++ drivers/scsi/scsi_lib.c 2005-11-07 04:46:23.000000000 -0600
@@ -592,10 +592,17 @@ static void scsi_requeue_command(struct

void scsi_next_command(struct scsi_cmnd *cmd)
{
- struct request_queue *q = cmd->device->request_queue;
+ struct scsi_device *sdev = cmd->device;
+ struct request_queue *q = sdev->request_queue;
+
+ /* need to hold a reference on the device before we let go of the
cmd */
+ get_device(&sdev->sdev_gendev);

scsi_put_command(cmd);
scsi_run_queue(q);
+
+ /* ok to remove device now */
+ put_device(&sdev->sdev_gendev);
}

void scsi_run_host_queues(struct Scsi_Host *shost)





> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Rolf Eike Beer
> Sent: Tuesday, November 08, 2005 11:38 AM
> To: goggin, edward
> Cc: 'Andrew Morton'; Masanari Iida;
> [email protected];
> [email protected]; [email protected]
> Subject: Re: oops with USB Storage on 2.6.14
>
> Am Dienstag, 8. November 2005 17:24 schrieb goggin, edward:
> >I've run into a bug like this several times using 2.6.14-rc4 while
> >testing dm-multipath's reaction to uevents generated by forcing
> >fiber channel transport failures -- which leads to the scsi device
> >being detached and the queuedata pointer in the device's queue being
> >reset in scsi_device_dev_release. The fix I've used is below and
> >it seems to work well for me. I was going to place this patch on
> >dm-devel today or tomorrow anyway.
> >
> >drivers/scsi/scsi_lib.c:scsi_next_command()
> >Call scsi_device_get and scsi_device_put around the calls to
> >scsi_put_command
> >and scsi_run_queue so that the scsi host structure will not
> be de-allocated
> >between scsi_put_command and scsi_run_queue.
> >
> >*** ../base/linux-2.6.14-rc4/drivers/scsi/scsi_lib.c Mon Oct
> 10 20:19:19
> >2005
> >--- drivers/scsi/scsi_lib.c Thu Nov 3 13:30:03 2005
> >***************
> >*** 592,601 ****
>
> Your patch is linewrapped. Also please use unified diff
> format, good choice
> for diff options is "-Naurp".
>
> Eike
>

2005-11-08 21:10:21

by James Bottomley

[permalink] [raw]
Subject: RE: oops with USB Storage on 2.6.14

On Tue, 2005-11-08 at 15:02 -0500, goggin, edward wrote:
> Thanks! Here's a better one.

It's line wrapped, but I fixed that up.

James


2005-11-08 21:33:19

by Patrick Mansfield

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

On Tue, Nov 08, 2005 at 04:08:43PM -0500, James Bottomley wrote:
> On Tue, 2005-11-08 at 15:02 -0500, goggin, edward wrote:
> > Thanks! Here's a better one.
>
> It's line wrapped, but I fixed that up.

What code path triggered this?

I mean we get a ref to the sdev in the upper level driver opens, scan, and
sd flush. So where are we not getting a ref?

Shouldn't the get be done at a higher level?

-- Patrick Mansfield

2005-11-08 21:47:15

by James Bottomley

[permalink] [raw]
Subject: Re: oops with USB Storage on 2.6.14

On Tue, 2005-11-08 at 13:33 -0800, Patrick Mansfield wrote:
> I mean we get a ref to the sdev in the upper level driver opens, scan, and
> sd flush. So where are we not getting a ref?
>
> Shouldn't the get be done at a higher level?

Actually, no, because of the way we run the queues for the next command.

If this is a sd_sync_cache() or something for the last possible command
on the device, the process may have a reference to the device, but as
soon as we call end_that_request_last(), they may be racing to release
it. The bug is triggered when we get into scsi_next_command() with us
holding the only remaining reference to the device.

James


2005-12-07 03:30:00

by Edward Goggin

[permalink] [raw]
Subject: RE: oops with USB Storage on 2.6.14

> -----Original Message-----
> From: Patrick Mansfield [mailto:[email protected]]
> Sent: Tuesday, November 08, 2005 4:33 PM
> To: James Bottomley
> Cc: goggin, edward; 'Rolf Eike Beer'; 'Andrew Morton';
> Masanari Iida; [email protected];
> [email protected]; [email protected]
> Subject: Re: oops with USB Storage on 2.6.14
>
> On Tue, Nov 08, 2005 at 04:08:43PM -0500, James Bottomley wrote:
> > On Tue, 2005-11-08 at 15:02 -0500, goggin, edward wrote:
> > > Thanks! Here's a better one.
> >
> > It's line wrapped, but I fixed that up.
>
> What code path triggered this?

I was testing multipath responsiveness to FC transport failures
by inducing scsi target device removal by writing into the
delete attribute of the scsi device kobject for scsi target
devices managed by multipathd. The test is simple and involves no
user block read/write IO to the multipath mapped or target devices.
After initial multipath discovery is complete, the only IO going on
to target devices is periodic test paths, for my devices this
amounts to issuing an EVPD page 0xc0 inquiry to each target
device every 10 seconds.

My kernel call stack at the time of panic looks very similar to
the one originally reported by Masanari Iida. I've shown
Masanari's kernel stack trace below.

> Call Trace:
> [<c0103abf>] show_stack+0x7f/0xa0
> [<c0103c72>] show_registers+0x162/0x1d0
> [<c0103e90>] die+0x100/0x1a0
> [<c039d7ae>] do_page_fault+0x31e/0x640
> [<c0103763>] error_code+0x4f/0x54
> [<c02b4612>] scsi_next_command+0x22/0x30
> [<c02b473f>] scsi_end_request+0xcf/0xf0
> [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> [<c02b4fc7>] scsi_generic_done+0x37/0x50
> [<c02af9e5>] scsi_finish_command+0x85/0xa0
> [<c02af89c>] scsi_softirq+0xcc/0x140
> [<c0122085>] __do_softirq+0xd5/0xf0
> [<c01220d8>] do_softirq+0x38/0x40
> [<c0122685>] ksoftirqd+0x95/0xe0
> [<c0131cfa>] kthread+0xba/0xc0
> [<c0100ecd>] kernel_thread_helper+0x5/0x18

The scsi command being terminated by scsi_end_request is
an inquiry issued by the multipathd target device testing
thread.

>
> I mean we get a ref to the sdev in the upper level driver
> opens, scan, and
> sd flush. So where are we not getting a ref?

Good question.

The ref to the sdev obtained by the device scan has been
dropped by device_del() called from scsi_remove_device()
since the scsi_device has been removed via sysfs control.

The ref held by the dm open of the target device has been
closed when multipathd updates the multipath map to not
include the target device being removed.

The ref held by the multipathd initiated open of the target
device for purposes of issuing a test IO gets removed as
soon as the multipathd test thread is notified of the
completion of its test SG_IO ioctl via the scsi_end_request()
call on the inquiry request I mentioned earlier.
Soon after this point, the target device is closed by
multipathd since there is no more need for test IOs to
be issued to that target device. Note that this is the last
ref held on the target scsi device by opens or scans and
that it is highly possible on an SMP host for this ref to be
released BEFORE the scsi_end_request() actually returns to
its soft interrupt stack.

At this point, the only refs held on the target scsi device
are from ones for active scsi commands for that device
or for an invocation of scsi_request_fn() servicing the
device's queue. If the queue is not actively being
serviced and this is the last active command for the
device, the call to scsi_put_command() from
scsi_next_command() will free the memory for both the
scsi device and its request queue will be freed in
scsi_release_dev_release() when the device's kobject's
kref count goes to zero.

>
> Shouldn't the get be done at a higher level?

As you can see, there are plenty of gets being done at
higher levels.

BTW, I have since reproduced this problem without
multipath at all, just two simple concurrently executing
processes -- one issues an ioctl to a scsi device
although any IO type would likely do) and closes its file
descriptor while the second one removes the device via sysfs.
It seems like the prerequisite sequence of events are

open device
issue io to device
device gets reaped
io completes up to scsi_end_request()
device is closed
scsi_put_command() reduces device kref count to zero, device is freed
scsi_next_command() can reference freed scsi device memory

>
> -- Patrick Mansfield
>