2012-08-15 08:27:50

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Mon, 30 Jul 2012, Bruno Pr?mont wrote:

> Hi,
>
> This series updates picoLCD driver:
> - split the driver functions into separate files which get included
> depending on Kconfig selection
> (implementation for CIR using RC_CORE will follow later)
> - drop private framebuffer refcounting in favor of refcounting added
> to fb_info some time ago
> - fix various bugs issues
> - disabled firmware version checking in probe() as it does not work
> anymore since commit 4ea5454203d991ec85264f64f89ca8855fce69b0
> [HID: Fix race condition between driver core and ll-driver]

I have now applied the series to my 'picolcd' branch, except for 6/7,
please see the comment I sent to it separately.

> Note: I still get weird behavior on quick unbind/bind sequences
> issued via sysfs (CONFIG_SMP=n system) that are triggered by framebuffer
> support and apparently more specifically fb_defio part of it.
>
> Unfortunately I'm out of ideas as to how to track down the problem which
> shows either as SLAB corruption (detected with SLUB debugging, e.g.

Would be nice to have this sorted out before the next merge window indeed,
so that it can go in together with the rest of the changes.

>
> [ 6383.521833] =============================================================================
> [ 6383.530020] BUG kmalloc-64 (Not tainted): Object already free
> [ 6383.530020] -----------------------------------------------------------------------------
> [ 6383.530020]
> [ 6383.530020] INFO: Slab 0xdde0ea20 objects=51 used=40 fp=0xcef516e0 flags=0x40000080
> [ 6383.530020] INFO: Object 0xcef51190 @offset=400 fp=0xcef51f50
> [ 6383.530020]
> [ 6383.530020] Bytes b4 cef51180: cc cc cc cc d0 12 f5 ce 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
> [ 6383.530020] Object cef51190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 6383.530020] Object cef511a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 6383.530020] Object cef511b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 6383.530020] Object cef511c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> [ 6383.530020] Redzone cef511d0: bb bb bb bb ....
> [ 6383.530020] Padding cef511d8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
> [ 6383.530020] Pid: 1922, comm: bash Not tainted 3.5.0-jupiter-00003-g8d858b1-dirty #2
> [ 6383.530020] Call Trace:
> [ 6383.530020] [<c10bd3cc>] print_trailer+0x11c/0x130
> [ 6383.530020] [<c10bd415>] object_err+0x35/0x40
> [ 6383.530020] [<c10be809>] free_debug_processing+0x99/0x200
> [ 6383.530020] [<c10bf77e>] __slab_free+0x2e/0x280
> [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> [ 6383.530020] [<c1322870>] ? __usbhid_submit_report+0xc0/0x3c0
> [ 6383.530020] [<c10bfbda>] ? kfree+0xfa/0x110
> [ 6383.530020] [<de932aa4>] ? picolcd_debug_out_report+0x8c4/0x8e0 [hid_picolcd]
> [ 6383.530020] [<c10bfbda>] kfree+0xfa/0x110
> [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> [ 6383.530020] [<c1322284>] hid_submit_out+0xa4/0x120
> [ 6383.530020] [<c1322908>] __usbhid_submit_report+0x158/0x3c0
> [ 6383.530020] [<c1322c2b>] usbhid_submit_report+0x1b/0x30
> [ 6383.530020] [<de930789>] picolcd_fb_reset+0xb9/0x180 [hid_picolcd]
> [ 6383.530020] [<de930f1d>] picolcd_init_framebuffer+0x20d/0x2e0 [hid_picolcd]
> [ 6383.530020] [<de92fb9c>] picolcd_probe+0x3cc/0x580 [hid_picolcd]
> [ 6383.530020] [<c1319147>] hid_device_probe+0x67/0xf0
> [ 6383.530020] [<c1282f97>] ? driver_sysfs_add+0x57/0x80
> [ 6383.530020] [<c128329d>] driver_probe_device+0xbd/0x1c0
> [ 6383.530020] [<c1318a1b>] ? hid_match_device+0x7b/0x90
> [ 6383.530020] [<c12821e5>] driver_bind+0x75/0xd0
> [ 6383.530020] [<c1282170>] ? driver_unbind+0x90/0x90
> [ 6383.530020] [<c12818b7>] drv_attr_store+0x27/0x30
> [ 6383.530020] [<c1114aec>] sysfs_write_file+0xac/0xf0
> [ 6383.530020] [<c10c794c>] vfs_write+0x9c/0x130
> [ 6383.530020] [<c10d4a1f>] ? sys_dup3+0x11f/0x160
> [ 6383.530020] [<c1114a40>] ? sysfs_poll+0x90/0x90
> [ 6383.530020] [<c10c7bbd>] sys_write+0x3d/0x70
> [ 6383.530020] [<c13f2557>] sysenter_do_call+0x12/0x26

So I am wondering whether the path this happens on is

if (!test_bit(HID_OUT_RUNNING, &usbhid->iofl)) {
usbhid_restart_out_queue(usbhid);

in __usbhid_submit_report(). It would then indicate perhaps some race with
iofl handling.

Could you please stick some printk() just before and after this
usbhid_restart_out_queue() call, so that we know that it's this triggering
it?

--
Jiri Kosina
SUSE Labs


2012-08-15 09:43:22

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

Hi Jiri,

On Wed, 15 August 2012 Jiri Kosina <[email protected]> wrote:
> On Mon, 30 Jul 2012, Bruno Prémont wrote:
> > Hi,
> >
> > This series updates picoLCD driver:
> > - split the driver functions into separate files which get included
> > depending on Kconfig selection
> > (implementation for CIR using RC_CORE will follow later)
> > - drop private framebuffer refcounting in favor of refcounting added
> > to fb_info some time ago
> > - fix various bugs issues
> > - disabled firmware version checking in probe() as it does not work
> > anymore since commit 4ea5454203d991ec85264f64f89ca8855fce69b0
> > [HID: Fix race condition between driver core and ll-driver]
>
> I have now applied the series to my 'picolcd' branch, except for 6/7,
> please see the comment I sent to it separately.

Will respin that one soon

> > Note: I still get weird behavior on quick unbind/bind sequences
> > issued via sysfs (CONFIG_SMP=n system) that are triggered by framebuffer
> > support and apparently more specifically fb_defio part of it.
> >
> > Unfortunately I'm out of ideas as to how to track down the problem which
> > shows either as SLAB corruption (detected with SLUB debugging, e.g.
>
> Would be nice to have this sorted out before the next merge window indeed,
> so that it can go in together with the rest of the changes.
>
> >
> > [ 6383.521833] =============================================================================
> > [ 6383.530020] BUG kmalloc-64 (Not tainted): Object already free
> > [ 6383.530020] -----------------------------------------------------------------------------
> > [ 6383.530020]
> > [ 6383.530020] INFO: Slab 0xdde0ea20 objects=51 used=40 fp=0xcef516e0 flags=0x40000080
> > [ 6383.530020] INFO: Object 0xcef51190 @offset=400 fp=0xcef51f50
> > [ 6383.530020]
> > [ 6383.530020] Bytes b4 cef51180: cc cc cc cc d0 12 f5 ce 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
> > [ 6383.530020] Object cef51190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 6383.530020] Object cef511a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 6383.530020] Object cef511b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 6383.530020] Object cef511c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> > [ 6383.530020] Redzone cef511d0: bb bb bb bb ....
> > [ 6383.530020] Padding cef511d8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
> > [ 6383.530020] Pid: 1922, comm: bash Not tainted 3.5.0-jupiter-00003-g8d858b1-dirty #2
> > [ 6383.530020] Call Trace:
> > [ 6383.530020] [<c10bd3cc>] print_trailer+0x11c/0x130
> > [ 6383.530020] [<c10bd415>] object_err+0x35/0x40
> > [ 6383.530020] [<c10be809>] free_debug_processing+0x99/0x200
> > [ 6383.530020] [<c10bf77e>] __slab_free+0x2e/0x280
> > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020] [<c1322870>] ? __usbhid_submit_report+0xc0/0x3c0
> > [ 6383.530020] [<c10bfbda>] ? kfree+0xfa/0x110
> > [ 6383.530020] [<de932aa4>] ? picolcd_debug_out_report+0x8c4/0x8e0 [hid_picolcd]
> > [ 6383.530020] [<c10bfbda>] kfree+0xfa/0x110
> > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > [ 6383.530020] [<c1322284>] hid_submit_out+0xa4/0x120
> > [ 6383.530020] [<c1322908>] __usbhid_submit_report+0x158/0x3c0
> > [ 6383.530020] [<c1322c2b>] usbhid_submit_report+0x1b/0x30
> > [ 6383.530020] [<de930789>] picolcd_fb_reset+0xb9/0x180 [hid_picolcd]
> > [ 6383.530020] [<de930f1d>] picolcd_init_framebuffer+0x20d/0x2e0 [hid_picolcd]
> > [ 6383.530020] [<de92fb9c>] picolcd_probe+0x3cc/0x580 [hid_picolcd]
> > [ 6383.530020] [<c1319147>] hid_device_probe+0x67/0xf0
> > [ 6383.530020] [<c1282f97>] ? driver_sysfs_add+0x57/0x80
> > [ 6383.530020] [<c128329d>] driver_probe_device+0xbd/0x1c0
> > [ 6383.530020] [<c1318a1b>] ? hid_match_device+0x7b/0x90
> > [ 6383.530020] [<c12821e5>] driver_bind+0x75/0xd0
> > [ 6383.530020] [<c1282170>] ? driver_unbind+0x90/0x90
> > [ 6383.530020] [<c12818b7>] drv_attr_store+0x27/0x30
> > [ 6383.530020] [<c1114aec>] sysfs_write_file+0xac/0xf0
> > [ 6383.530020] [<c10c794c>] vfs_write+0x9c/0x130
> > [ 6383.530020] [<c10d4a1f>] ? sys_dup3+0x11f/0x160
> > [ 6383.530020] [<c1114a40>] ? sysfs_poll+0x90/0x90
> > [ 6383.530020] [<c10c7bbd>] sys_write+0x3d/0x70
> > [ 6383.530020] [<c13f2557>] sysenter_do_call+0x12/0x26
>
> So I am wondering whether the path this happens on is
>
> if (!test_bit(HID_OUT_RUNNING, &usbhid->iofl)) {
> usbhid_restart_out_queue(usbhid);
>
> in __usbhid_submit_report(). It would then indicate perhaps some race with
> iofl handling.

Huh, that specific test_bit hunk I can't find in __usbhid_submit_report,
is that 3.6 material?
I'm running my tests against 3.5...

The nearest I have is:
if (!test_bit(HID_OUT_RUNNING, &usbhid->iofl))
if (!irq_out_pump_restart(hid))
set_bit(HID_OUT_RUNNING, &usbhid->iofl);


> Could you please stick some printk() just before and after this
> usbhid_restart_out_queue() call, so that we know that it's this triggering
> it?

Thanks,
Bruno

2012-08-15 12:11:41

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Wed, 15 Aug 2012, Bruno Prémont wrote:

> > > [ 6383.521833] =============================================================================
> > > [ 6383.530020] BUG kmalloc-64 (Not tainted): Object already free
> > > [ 6383.530020] -----------------------------------------------------------------------------
> > > [ 6383.530020]
> > > [ 6383.530020] INFO: Slab 0xdde0ea20 objects=51 used=40 fp=0xcef516e0 flags=0x40000080
> > > [ 6383.530020] INFO: Object 0xcef51190 @offset=400 fp=0xcef51f50
> > > [ 6383.530020]
> > > [ 6383.530020] Bytes b4 cef51180: cc cc cc cc d0 12 f5 ce 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
> > > [ 6383.530020] Object cef51190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [ 6383.530020] Object cef511a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [ 6383.530020] Object cef511b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [ 6383.530020] Object cef511c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> > > [ 6383.530020] Redzone cef511d0: bb bb bb bb ....
> > > [ 6383.530020] Padding cef511d8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
> > > [ 6383.530020] Pid: 1922, comm: bash Not tainted 3.5.0-jupiter-00003-g8d858b1-dirty #2
> > > [ 6383.530020] Call Trace:
> > > [ 6383.530020] [<c10bd3cc>] print_trailer+0x11c/0x130
> > > [ 6383.530020] [<c10bd415>] object_err+0x35/0x40
> > > [ 6383.530020] [<c10be809>] free_debug_processing+0x99/0x200
> > > [ 6383.530020] [<c10bf77e>] __slab_free+0x2e/0x280
> > > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020] [<c1322870>] ? __usbhid_submit_report+0xc0/0x3c0
> > > [ 6383.530020] [<c10bfbda>] ? kfree+0xfa/0x110
> > > [ 6383.530020] [<de932aa4>] ? picolcd_debug_out_report+0x8c4/0x8e0 [hid_picolcd]
> > > [ 6383.530020] [<c10bfbda>] kfree+0xfa/0x110
> > > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020] [<c1322284>] ? hid_submit_out+0xa4/0x120
> > > [ 6383.530020] [<c1322284>] hid_submit_out+0xa4/0x120
> > > [ 6383.530020] [<c1322908>] __usbhid_submit_report+0x158/0x3c0
> > > [ 6383.530020] [<c1322c2b>] usbhid_submit_report+0x1b/0x30
> > > [ 6383.530020] [<de930789>] picolcd_fb_reset+0xb9/0x180 [hid_picolcd]
> > > [ 6383.530020] [<de930f1d>] picolcd_init_framebuffer+0x20d/0x2e0 [hid_picolcd]
> > > [ 6383.530020] [<de92fb9c>] picolcd_probe+0x3cc/0x580 [hid_picolcd]
> > > [ 6383.530020] [<c1319147>] hid_device_probe+0x67/0xf0
> > > [ 6383.530020] [<c1282f97>] ? driver_sysfs_add+0x57/0x80
> > > [ 6383.530020] [<c128329d>] driver_probe_device+0xbd/0x1c0
> > > [ 6383.530020] [<c1318a1b>] ? hid_match_device+0x7b/0x90
> > > [ 6383.530020] [<c12821e5>] driver_bind+0x75/0xd0
> > > [ 6383.530020] [<c1282170>] ? driver_unbind+0x90/0x90
> > > [ 6383.530020] [<c12818b7>] drv_attr_store+0x27/0x30
> > > [ 6383.530020] [<c1114aec>] sysfs_write_file+0xac/0xf0
> > > [ 6383.530020] [<c10c794c>] vfs_write+0x9c/0x130
> > > [ 6383.530020] [<c10d4a1f>] ? sys_dup3+0x11f/0x160
> > > [ 6383.530020] [<c1114a40>] ? sysfs_poll+0x90/0x90
> > > [ 6383.530020] [<c10c7bbd>] sys_write+0x3d/0x70
> > > [ 6383.530020] [<c13f2557>] sysenter_do_call+0x12/0x26
> >
> > So I am wondering whether the path this happens on is
> >
> > if (!test_bit(HID_OUT_RUNNING, &usbhid->iofl)) {
> > usbhid_restart_out_queue(usbhid);
> >
> > in __usbhid_submit_report(). It would then indicate perhaps some race with
> > iofl handling.
>
> Huh, that specific test_bit hunk I can't find in __usbhid_submit_report,
> is that 3.6 material?
> I'm running my tests against 3.5...

I see. Alan Stern has fixed a huge pile of things in this area in 3.6-rc1.
I have expected all of those to actually be on theoretical problems not
ever having happened in the wild, but it might be that you are actually
chasing on of those.

Could you please retest with latest Linus' tree (or at least eb055fd0560b)
to see whether this hasn't actually been fixed already by Alan's series?

Thanks,

--
Jiri Kosina
SUSE Labs

2012-08-15 15:17:20

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

Hi Jiri,

On Wed, 15 August 2012 Jiri Kosina <[email protected]> wrote:
> I see. Alan Stern has fixed a huge pile of things in this area in 3.6-rc1.
> I have expected all of those to actually be on theoretical problems not
> ever having happened in the wild, but it might be that you are actually
> chasing on of those.
>
> Could you please retest with latest Linus' tree (or at least eb055fd0560b)
> to see whether this hasn't actually been fixed already by Alan's series?

I've started trying that out, it seems Alan's work improved things.

For the first few attempts I have not seen SLAB corruptions, though after
a few rounds I hit accumulation of the following messages:
[ 297.174828] hid-picolcd 0003:04D8:C002.0003: output queue full
[ 297.181098] hid-picolcd 0003:04D8:C002.0003: output queue full
[ 297.187820] hid-picolcd 0003:04D8:C002.0003: output queue full
[ 297.194087] hid-picolcd 0003:04D8:C002.0003: output queue full

with sporadically in between:
[ 292.668019] hid-picolcd 0003:04D8:C002.0003: usb_submit_urb(out) failed: -1

At first glance I think the queue filling up and never draining is caused
by hid_hw_stop() stalling the queue and the time between both being just too
short.
Maybe me (un)binding on hid-picolcd driver level is the major cause of this
and (un)binding at lower usb level would kill the queue thus preventing
it just growing until full...


Is there a proper way to do rate-limiting or throttling when submitting
reports (which are the urbs deeper in the stack) and catch -ENODEV early
while remove() is delayed for locking reasons?
At least until 3.5 submitting reports provides no return value for driver
which could convey errors like -ENODEV, -EBUSY.



After a few bind-unbind iterations I have also hit a race in hid-picolcd
with regard to fbdefio.
That looks like chicken-egg release sequencing between hid-picolcd and
fb_info subdev -- though I'm wondering how that happens as the only framebuffer
user I know of - fbcon - should have stopped using it after
unregister_framebuffer() returned and released the last reference!

But that part can be improved rather easily with a new spinlock synchronizing
interoperation between hid side and framebuffer side.

Thanks,
Bruno

2012-08-15 21:32:15

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Wed, 15 Aug 2012, Bruno Pr?mont wrote:

> > I see. Alan Stern has fixed a huge pile of things in this area in 3.6-rc1.
> > I have expected all of those to actually be on theoretical problems not
> > ever having happened in the wild, but it might be that you are actually
> > chasing on of those.
> >
> > Could you please retest with latest Linus' tree (or at least eb055fd0560b)
> > to see whether this hasn't actually been fixed already by Alan's series?
>
> I've started trying that out, it seems Alan's work improved things.
>
> For the first few attempts I have not seen SLAB corruptions, though after
> a few rounds I hit accumulation of the following messages:
> [ 297.174828] hid-picolcd 0003:04D8:C002.0003: output queue full
> [ 297.181098] hid-picolcd 0003:04D8:C002.0003: output queue full
> [ 297.187820] hid-picolcd 0003:04D8:C002.0003: output queue full
> [ 297.194087] hid-picolcd 0003:04D8:C002.0003: output queue full
>
> with sporadically in between:
> [ 292.668019] hid-picolcd 0003:04D8:C002.0003: usb_submit_urb(out) failed: -1
>
> At first glance I think the queue filling up and never draining is caused
> by hid_hw_stop() stalling the queue and the time between both being just too
> short.

I don't really understand this explanation. Once usb_kill_urb() returns,
the URB should be available for future use (and therefore all queues
completely drained).

--
Jiri Kosina
SUSE Labs

2012-08-16 16:31:10

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Wed, 15 August 2012 Jiri Kosina <[email protected]> wrote:
> On Wed, 15 Aug 2012, Bruno Prémont wrote:
> > > I see. Alan Stern has fixed a huge pile of things in this area in 3.6-rc1.
> > > I have expected all of those to actually be on theoretical problems not
> > > ever having happened in the wild, but it might be that you are actually
> > > chasing on of those.
> > >
> > > Could you please retest with latest Linus' tree (or at least eb055fd0560b)
> > > to see whether this hasn't actually been fixed already by Alan's series?
> >
> > I've started trying that out, it seems Alan's work improved things.
> >
> > For the first few attempts I have not seen SLAB corruptions, though after
> > a few rounds I hit accumulation of the following messages:
> > [ 297.174828] hid-picolcd 0003:04D8:C002.0003: output queue full
> > [ 297.181098] hid-picolcd 0003:04D8:C002.0003: output queue full
> > [ 297.187820] hid-picolcd 0003:04D8:C002.0003: output queue full
> > [ 297.194087] hid-picolcd 0003:04D8:C002.0003: output queue full
> >
> > with sporadically in between:
> > [ 292.668019] hid-picolcd 0003:04D8:C002.0003: usb_submit_urb(out) failed: -1
> >
> > At first glance I think the queue filling up and never draining is caused
> > by hid_hw_stop() stalling the queue and the time between both being just too
> > short.
>
> I don't really understand this explanation. Once usb_kill_urb() returns,
> the URB should be available for future use (and therefore all queues
> completely drained).

I won't have time today to check, though my guess is that on each
echo $usb-id > bind; echo $usb-id > unbind
under /sys/bus/hid/drivers/hid-picolcd/ the USB urb queue fills a bit does
not get cleared.

Is usb_kill_urb() called when unbinding just the specific hid driver? If so
my short timing between bind/unbind must be triggering something else...

Otherwise I'm missing something as at first time I got no "output queue full"
messages, but as I repeated the bind/unbind sequences the prints per bind/unbind
iteration increased in number.

Anyhow, on Friday evening/week-end I will continue digging and report back with my
findings.

Thanks,
Bruno

2012-08-16 16:47:15

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Thu, 16 Aug 2012, Bruno Prémont wrote:

> > I don't really understand this explanation. Once usb_kill_urb() returns,
> > the URB should be available for future use (and therefore all queues
> > completely drained).
>
> I won't have time today to check, though my guess is that on each
> echo $usb-id > bind; echo $usb-id > unbind
> under /sys/bus/hid/drivers/hid-picolcd/ the USB urb queue fills a bit does
> not get cleared.
>
> Is usb_kill_urb() called when unbinding just the specific hid driver?

Yup, through hid_hw_stop() -> usbhid_stop().

> If so my short timing between bind/unbind must be triggering something
> else...
>
> Otherwise I'm missing something as at first time I got no "output queue full"
> messages, but as I repeated the bind/unbind sequences the prints per bind/unbind
> iteration increased in number.
>
> Anyhow, on Friday evening/week-end I will continue digging and report back with my
> findings.

Thank you, Bruno.

--
Jiri Kosina
SUSE Labs

2012-08-18 12:41:33

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

Hi Jiri,

[CCing Alan Stern]

On Thu, 16 August 2012 Jiri Kosina <[email protected]> wrote:
> On Thu, 16 Aug 2012, Bruno Prémont wrote:
>
> > > I don't really understand this explanation. Once usb_kill_urb() returns,
> > > the URB should be available for future use (and therefore all queues
> > > completely drained).
> >
> > I won't have time today to check, though my guess is that on each
> > echo $usb-id > bind; echo $usb-id > unbind
> > under /sys/bus/hid/drivers/hid-picolcd/ the USB urb queue fills a bit does
> > not get cleared.
> >
> > Is usb_kill_urb() called when unbinding just the specific hid driver?
>
> Yup, through hid_hw_stop() -> usbhid_stop().
>
> > If so my short timing between bind/unbind must be triggering something
> > else...
> >
> > Otherwise I'm missing something as at first time I got no "output queue full"
> > messages, but as I repeated the bind/unbind sequences the prints per bind/unbind
> > iteration increased in number.
> >
> > Anyhow, on Friday evening/week-end I will continue digging and report back with my
> > findings.

Huh, after changing some of the hid-picolcd data in order to have less racy
coupling between hid and framebuffer I'm now dying way too often in _mmx_memcpy
and most of the time I don't get a (complete) trace...

The only full trace I got was:
[ 3857.426136] BUG: unable to handle kernel paging request at dbdf9000
[ 3857.432555] IP: [<c11e2be7>] _mmx_memcpy+0x27/0x16c
[ 3857.435906] *pde = 1bebb063 *pte = 1bdf9161
[ 3857.435906] Oops: 0003 [#1]
[ 3857.435906] Modules linked in: hid_picolcd fb_sys_fops sysimgblt sysfillrect syscopyarea drm_kms_helper nfs3 nfs_acl nfs lockd sunrpc
[ 3857.435906] Pid: 1935, comm: bash Not tainted 3.6.0-rc1-jupiter-00363-g0e8ccbc #1 NVIDIA Corporation. nFORCE-MCP/MS-6373
[ 3857.435906] EIP: 0060:[<c11e2be7>] EFLAGS: 00010013 CPU: 0
[ 3857.435906] EIP is at _mmx_memcpy+0x27/0x16c
[ 3857.435906] EAX: dd40a000 EBX: dd6ea010 ECX: 035af77b EDX: dbdf4230
[ 3857.435906] ESI: dbe0d1b0 EDI: dbdf9000 EBP: dd40bea8 ESP: dd40be8c
[ 3857.435906] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 3857.435906] CR0: 8005003b CR2: dbdf9000 CR3: 1d7ee000 CR4: 000007d0
[ 3857.435906] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 3857.435906] DR6: ffff0ff0 DR7: 00000400
[ 3857.435906] Process bash (pid: 1935, ti=dd40a000 task=dcdb6120 task.ti=dbce2000)
[ 3857.435906] Stack:
[ 3857.435906] 0d6d6d6f dd6eb49c dd40beb8 dbde0080 dd6ea010 dbde0080 dbd31010 dd40bed0
[ 3857.435906] c133b364 00000000 00000000 ddfaa0e0 dbd31010 dbdf4230 dd6ea010 00000046
[ 3857.435906] dbd31010 dd40bef0 c133bea0 00000000 dbd0c3f0 00000000 dbd0c3f0 00000000
[ 3857.435906] Call Trace:
[ 3857.435906] [<c133b364>] hid_submit_out+0xa4/0x130
[ 3857.435906] [<c133bea0>] hid_irq_out+0xa0/0x100
[ 3857.435906] [<c12e425e>] usb_hcd_giveback_urb+0x4e/0x90
[ 3857.435906] [<c12f9bd2>] finish_urb+0xb2/0xf0
[ 3857.435906] [<c12fa67d>] takeback_td+0x3d/0x100
[ 3857.435906] [<c12faccf>] dl_done_list+0x14f/0x180
[ 3857.435906] [<c12fc351>] ohci_irq+0x191/0x300
[ 3857.435906] [<c1077790>] ? unmask_irq+0x20/0x20
[ 3857.435906] [<c12e3ace>] usb_hcd_irq+0x1e/0x40
[ 3857.435906] [<c107549d>] handle_irq_event_percpu+0x6d/0x1c0
[ 3857.435906] [<c1077790>] ? unmask_irq+0x20/0x20
[ 3857.435906] [<c1075618>] handle_irq_event+0x28/0x40
[ 3857.435906] [<c107781a>] handle_fasteoi_irq+0x8a/0xe0
[ 3857.435906] <IRQ>
[ 3857.435906] [<c1003e0a>] ? do_IRQ+0x3a/0xb0
[ 3857.435906] [<c140c870>] ? common_interrupt+0x30/0x38
[ 3857.435906] [<c1035143>] ? __do_softirq+0x53/0x1c0
[ 3857.435906] [<c10350f0>] ? local_bh_enable_ip+0x80/0x80
[ 3857.435906] <IRQ>
[ 3857.435906] [<c103544e>] ? irq_exit+0x3e/0xa0
[ 3857.435906] [<c101dfde>] ? smp_apic_timer_interrupt+0x6e/0x80
[ 3857.435906] [<c140bee1>] ? apic_timer_interrupt+0x31/0x38
[ 3857.435906] [<c102fdba>] ? vprintk_emit+0x34a/0x390
[ 3857.435906] [<c1409f58>] ? printk+0x38/0x3a
[ 3857.435906] [<de94b021>] ? picolcd_init_framebuffer+0x261/0x350 [hid_picolcd]
[ 3857.435906] [<de949bd1>] ? picolcd_probe+0x3d1/0x5a0 [hid_picolcd]
[ 3857.435906] [<c13321f7>] ? hid_device_probe+0x67/0xf0
[ 3857.435906] [<c128ac17>] ? driver_sysfs_add+0x57/0x80
[ 3857.435906] [<c128af1d>] ? driver_probe_device+0xbd/0x1d0
[ 3857.435906] [<c1331aab>] ? hid_match_device+0x7b/0x90
[ 3857.435906] [<c1289e65>] ? driver_bind+0x75/0xd0
[ 3857.435906] [<c1289df0>] ? driver_unbind+0x90/0x90
[ 3857.435906] [<c1289537>] ? drv_attr_store+0x27/0x30
[ 3857.435906] [<c111a59c>] ? sysfs_write_file+0xac/0xf0
[ 3857.435906] [<c10cca0c>] ? vfs_write+0x9c/0x130
[ 3857.435906] [<c10da2ef>] ? sys_dup3+0x11f/0x160
[ 3857.435906] [<c111a4f0>] ? sysfs_poll+0x90/0x90
[ 3857.435906] [<c10ccc7d>] ? sys_write+0x3d/0x70
[ 3857.435906] [<c140c357>] ? sysenter_do_call+0x12/0x26
[ 3857.435906] Code: 90 90 90 90 55 89 e5 57 56 89 d6 53 83 ec 10 89 45 f0 89 4d e4 89 e0 25 00 e0 ff ff f7 40 14 00 ff ff 07 74 17 c1 e9 02 8b 7d f0 <f3> a5 8b 4d e4 83 e1 03 74 02 f3 a4 e9 27 01 00 00 8b 45 1
[ 3857.435906] EIP: [<c11e2be7>] _mmx_memcpy+0x27/0x16c SS:ESP 0068:dd40be8c
[ 3857.435906] CR2: 00000000dbdf9000
[ 3857.435906] ---[ end trace 10ae520e65e3b763 ]---
[ 3857.435906] Kernel panic - not syncing: Fatal exception in interrupt

It happens around the time of calling
device_create_file(dev, &dev_attr_fb_update_rate);
that is, after picolcd_fb_reset(data, 1) has returned but before
calling fb_deferred_io_init(info); according to printk() I put
around the various calls in picolcd_init_framebuffer().

The other case I got part of a trace it was again in _mmx_memcpy()
but the trace output stopped in the middle of register dumping.


The full trace happened after the following sequence:
- system boot
- hotplug of picolcd (triggering modprobe hid-picolcd)
- cd /sys/bus/hid/drivers/hid-picolcd/
- echo 0003:04D8:C002.0003 > unbind
- wait some time
- echo 0003:04D8:C002.0003 > bind

Any idea what it could be? It might be during processing of first
reports send in order to clear the picolcd display.




As (un)binding in hid-picolcd crashes I tried doing so a level above,
at usbhid driver on usb bus which resulted in the following:
- system boot
- hotplug of picolcd (triggering modprobe hid-picolcd)
- cd /sys/bus/usb/drivers/usbhid/
- echo 2-1:1.0 > unbind
- wait some time
- echo 2-1:1.0 > bind
- wait some time
- echo 2-1:1.0 > unbind
produced the following trace:
[ 1418.387466] BUG: unable to handle kernel NULL pointer dereference at 00000030
[ 1418.390013] IP: [<c1339c40>] hid_submit_ctrl+0x80/0x2a0
[ 1418.390013] *pde = 00000000
[ 1418.390013] Oops: 0002 [#1]
[ 1418.390013] Modules linked in: hid_picolcd fb_sys_fops sysimgblt sysfillrect syscopyarea drm_kms_helper nfs3 nfs_acl nfs lockd sunrpc
[ 1418.390013] Pid: 1994, comm: kworker/0:0 Not tainted 3.6.0-rc1-jupiter-00363-g0e8ccbc #1 NVIDIA Corporation. nFORCE-MCP/MS-6373
[ 1418.390013] EIP: 0060:[<c1339c40>] EFLAGS: 00010086 CPU: 0
[ 1418.390013] EIP is at hid_submit_ctrl+0x80/0x2a0
[ 1418.390013] EAX: 80000200 EBX: dc2d0000 ECX: dcfe2020 EDX: 00000000
[ 1418.390013] ESI: 00000040 EDI: dc1f8f00 EBP: dd743e2c ESP: dd743df4
[ 1418.390013] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 1418.390013] CR0: 8005003b CR2: 00000030 CR3: 1cd42000 CR4: 000007d0
[ 1418.390013] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 1418.390013] DR6: ffff0ff0 DR7: 00000400
[ 1418.390013] Process kworker/0:0 (pid: 1994, ti=dd742000 task=dcdd0d40 task.ti=dd742000)
[ 1418.390013] Stack:
[ 1418.390013] 0000003e dc2f9020 dcf64c7c 00000005 00000002 dd743e1c 00000002 dc1f8f00
[ 1418.390013] 00000040 dcfe2020 dc2cb4d0 dc2d0000 00000000 dcfe2020 dd743e44 c1339ef6
[ 1418.390013] 00000008 dc2d0000 dc2cb4d0 dcfe2020 dd743e74 c133bbeb 00000023 000000ff
[ 1418.390013] Call Trace:
[ 1418.390013] [<c1339ef6>] usbhid_restart_ctrl_queue+0x96/0xe0
[ 1418.390013] [<c133bbeb>] __usbhid_submit_report+0x29b/0x320
[ 1418.390013] [<c1330200>] ? hid_parser_local+0x260/0x2d0
[ 1418.390013] [<c133bd2b>] usbhid_submit_report+0x1b/0x30
[ 1418.390013] [<de996cb9>] picolcd_fb_deferred_io+0x3b9/0x4c0 [hid_picolcd]
[ 1418.390013] [<c120e118>] fb_deferred_io_work+0x78/0xc0
[ 1418.390013] [<c1045e3f>] process_one_work+0x1ff/0x380
[ 1418.390013] [<c120e0a0>] ? fb_deferred_io_fault+0xa0/0xa0
[ 1418.390013] [<c10467f7>] worker_thread+0x237/0x360
[ 1418.390013] [<c10465c0>] ? manage_workers.clone.25+0x250/0x250
[ 1418.390013] [<c10465c0>] ? manage_workers.clone.25+0x250/0x250
[ 1418.390013] [<c104a30d>] kthread+0x6d/0x80
[ 1418.390013] [<c104a2a0>] ? kthread_freezable_should_stop+0x70/0x70
[ 1418.390013] [<c140c87e>] kernel_thread_helper+0x6/0xd
[ 1418.390013] Code: 01 89 45 e8 0f 85 91 00 00 00 8b 4d ec 8b 53 1c 8b 7d e4 8b 75 e8 8b 81 68 0c 00 00 8b 00 8b 40 9c c1 e0 08 0d 00 00 00 80 85 ff <89> 42 30 8b 43 1c 89 70 54 0f 84 19 01 00 00 81 fe ff 01 0
[ 1418.390013] EIP: [<c1339c40>] hid_submit_ctrl+0x80/0x2a0 SS:ESP 0068:dd743df4
[ 1418.390013] CR2: 0000000000000030
[ 1418.390013] ---[ end trace 089e20ed16261a0a ]---
pretty early during unbind process, e.g. way before picolcd even started to
dispose framebuffer.

Either I'm doing something really wrong or I'm the first one to produce
lots of outbound reports thus exploring corner-cases...

PS: attached is a diff between the hid-picolcd from this patch series to what
I'm testing right now.

Thanks,
Bruno


Attachments:
(No filename) (9.17 kB)
pico.diff (16.07 kB)
Download all attachments

2012-08-18 13:19:15

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Sat, 18 Aug 2012, Bruno Prémont wrote:

> Hi Jiri,
>
> [CCing Alan Stern]
>
> On Thu, 16 August 2012 Jiri Kosina <[email protected]> wrote:
> > On Thu, 16 Aug 2012, Bruno Prémont wrote:
> >
> > > > I don't really understand this explanation. Once usb_kill_urb() returns,
> > > > the URB should be available for future use (and therefore all queues
> > > > completely drained).
> > >
> > > I won't have time today to check, though my guess is that on each
> > > echo $usb-id > bind; echo $usb-id > unbind
> > > under /sys/bus/hid/drivers/hid-picolcd/ the USB urb queue fills a bit does
> > > not get cleared.
> > >
> > > Is usb_kill_urb() called when unbinding just the specific hid driver?
> >
> > Yup, through hid_hw_stop() -> usbhid_stop().
> >
> > > If so my short timing between bind/unbind must be triggering something
> > > else...
> > >
> > > Otherwise I'm missing something as at first time I got no "output queue full"
> > > messages, but as I repeated the bind/unbind sequences the prints per bind/unbind
> > > iteration increased in number.
> > >
> > > Anyhow, on Friday evening/week-end I will continue digging and report back with my
> > > findings.
>
> Huh, after changing some of the hid-picolcd data in order to have less racy
> coupling between hid and framebuffer I'm now dying way too often in _mmx_memcpy
> and most of the time I don't get a (complete) trace...

There was a similar problem reported recently. It turned out to be
caused by a __devinitconst annotation attached to a usb_device_id
table.

If there are any __devinit* annotations in the hid-picolcd driver, you
should see if removing them helps.

Alan Stern

2012-08-18 13:49:18

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Sat, 18 August 2012 Alan Stern <[email protected]> wrote:
> On Sat, 18 Aug 2012, Bruno Prémont wrote:
> > On Thu, 16 August 2012 Jiri Kosina <[email protected]> wrote:
> > > On Thu, 16 Aug 2012, Bruno Prémont wrote:
> > >
> > > > > I don't really understand this explanation. Once usb_kill_urb() returns,
> > > > > the URB should be available for future use (and therefore all queues
> > > > > completely drained).
> > > >
> > > > I won't have time today to check, though my guess is that on each
> > > > echo $usb-id > bind; echo $usb-id > unbind
> > > > under /sys/bus/hid/drivers/hid-picolcd/ the USB urb queue fills a bit does
> > > > not get cleared.
> > > >
> > > > Is usb_kill_urb() called when unbinding just the specific hid driver?
> > >
> > > Yup, through hid_hw_stop() -> usbhid_stop().
> > >
> > > > If so my short timing between bind/unbind must be triggering something
> > > > else...
> > > >
> > > > Otherwise I'm missing something as at first time I got no "output queue full"
> > > > messages, but as I repeated the bind/unbind sequences the prints per bind/unbind
> > > > iteration increased in number.
> > > >
> > > > Anyhow, on Friday evening/week-end I will continue digging and report back with my
> > > > findings.
> >
> > Huh, after changing some of the hid-picolcd data in order to have less racy
> > coupling between hid and framebuffer I'm now dying way too often in _mmx_memcpy
> > and most of the time I don't get a (complete) trace...
>
> There was a similar problem reported recently. It turned out to be
> caused by a __devinitconst annotation attached to a usb_device_id
> table.
>
> If there are any __devinit* annotations in the hid-picolcd driver, you
> should see if removing them helps.

There is no such annotation around in hid-picolcd.


One thing I just though about, how does usbhid handle the calls to
usbhid_submit_report() when hid_hw_stop()/hid_hw_close() have already
been called?
I will attempt to see if it makes a difference to shortcut my
usbhid_submit_report() calls from the point on I have called hid_hw_close()...

Bruno

2012-08-18 18:50:14

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Sat, 18 August 2012 Bruno Prémont <[email protected]> wrote:
> One thing I just though about, how does usbhid handle the calls to
> usbhid_submit_report() when hid_hw_stop()/hid_hw_close() have already
> been called?
> I will attempt to see if it makes a difference to shortcut my
> usbhid_submit_report() calls from the point on I have called hid_hw_close()...

I don't know if I'm progressing or just drawing circles...

I included a check right before calling usbhid_submit_report() to prevent
calling it if hid_hw_close()/hid_hw_stop() has been called and that part
works as expected, just that at some moment it explodes on a NULL-pointer
dereference.

Initially I had usbhid_submit_report macro:
#define usbhid_submit_report(a, b, c) \
do { \
picolcd_debug_out_report(d, a, b); \
usbhid_submit_report(a, b, c); \
} while (0)

Now to make sure no new reports get submitted once hid_hw_stop()
was called I changed it to:
#define usbhid_submit_report(a, b, c) \
do { \
struct picolcd_data *d = hid_get_drvdata(a); \
unsigned long flags; \
picolcd_debug_out_report(d, a, b); \
spin_lock_irqsave(&d->lock, flags); \
if (!(d->status & PICOLCD_FAILED)) \
usbhid_submit_report(a, b, c); \
else \
printk(KERN_INFO "Submitting report after hw_stop!\n"); \
spin_unlock_irqrestore(&d->lock, flags); \
} while (0)
with the printk() to spam dmesg :)


The NULL-pointer dereference I got inside picolcd_fb_deferred_io() was due
to me calling hid_set_drvdata(hdev, NULL) too early in remove(), fixed.


But with that fixed I still encounter the bad page, probably around
the time of hid_hw_stop(), though it always takes the HARD LOCK watchdog
to trigger for the trace to show (probably because or IRQ context):

It seems to happen during the bind/unbind following the unbind that
produces a
hid-picolcd 0003:04D8:C002.0003: usb_submit_urb(out) failed: -1
either a BUG or even as reboot and happend between hid_hw_close() and
hid_hw_stop() though my extra printks seem to reduce probability of
bad things to happen.


[ 1217.643017] BUG: unable to handle kernel paging request at c7cb0000
[ 1217.643017] IP: [<c11e2be7>] _mmx_memcpy+0x27/0x16c
[ 1217.643017] *pde = 19078063 *pte = 07cb0161
[ 1217.643017] Oops: 0003 [#1]
[ 1217.643017] Modules linked in: hid_picolcd fb_sys_fops sysimgblt sysfillrect syscopyarea drm_kms_helper nfs3 nfs_acl nfs lockd sunrpc
[ 1217.643017] Pid: 0, comm: swapper Not tainted 3.6.0-rc2-jupiter-00008-gd3c2b0c-dirty #2 NVIDIA Corporation. nFORCE-MCP/MS-6373
[ 1217.643017] EIP: 0060:[<c11e2be7>] EFLAGS: 00010013 CPU: 0
[ 1217.643017] EIP is at _mmx_memcpy+0x27/0x16c
[ 1217.643017] EAX: dd40a000 EBX: dd6ea010 ECX: 035af77b EDX: d9108c80
[ 1217.643017] ESI: d9121c00 EDI: c7cb0000 EBP: dd40bebc ESP: dd40bea0
[ 1217.643017] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 1217.643017] CR0: 8005003b CR2: c7cb0000 CR3: 1cf1c000 CR4: 000007d0
[ 1217.643017] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 1217.643017] DR6: ffff0ff0 DR7: 00000400
[ 1217.643017] Process swapper (pid: 0, ti=dd40a000 task=c15d24c0 task.ti=c15c6000)
[ 1217.643017] Stack:
[ 1217.643017] 0d6d6d6f dd40beac c109bba4 c7c97080 dd6ea010 c7c97080 dceb3030 dd40bee4
[ 1217.643017] c133b364 c7d40000 c7d40000 dd40bed4 dceb3030 d9108c80 dd6ea010 00000046
[ 1217.643017] dceb3030 dd40bf04 c133bea0 0000d6d6 dbe901b0 fffffffe dbe901b0 fffffffe
[ 1217.643017] Call Trace:
[ 1217.643017] [<c109bba4>] ? free_compound_page+0x14/0x20
[ 1217.643017] [<c133b364>] hid_submit_out+0xa4/0x130
[ 1217.643017] [<c133bea0>] hid_irq_out+0xa0/0x100
[ 1217.643017] [<c12e425e>] usb_hcd_giveback_urb+0x4e/0x90
[ 1217.643017] [<c12f9bd2>] finish_urb+0xb2/0xf0
[ 1217.643017] [<c12fa8a8>] finish_unlinks+0x168/0x320
[ 1217.643017] [<c12fc43f>] ohci_irq+0x27f/0x300
[ 1217.643017] [<c1077790>] ? unmask_irq+0x20/0x20
[ 1217.643017] [<c12e3ace>] usb_hcd_irq+0x1e/0x40
[ 1217.643017] [<c107549d>] handle_irq_event_percpu+0x6d/0x1c0
[ 1217.643017] [<c1077790>] ? unmask_irq+0x20/0x20
[ 1217.643017] [<c1075618>] handle_irq_event+0x28/0x40
[ 1217.643017] [<c107781a>] handle_fasteoi_irq+0x8a/0xe0
[ 1217.643017] <IRQ>
[ 1217.643017] [<c1003e0a>] ? do_IRQ+0x3a/0xb0
[ 1217.643017] [<c140c870>] ? common_interrupt+0x30/0x38
[ 1217.643017] [<c1009028>] ? default_idle+0x68/0xd0
[ 1217.643017] [<c1009a0f>] ? cpu_idle+0x8f/0xc0
[ 1217.643017] [<c13ffe97>] ? rest_init+0x67/0x70
[ 1217.643017] [<c161b8b8>] ? start_kernel+0x2a3/0x2aa
[ 1217.643017] [<c161b3da>] ? repair_env_string+0x51/0x51
[ 1217.643017] [<c161b28e>] ? i386_start_kernel+0x44/0x46
[ 1217.643017] Code: 90 90 90 90 55 89 e5 57 56 89 d6 53 83 ec 10 89 45 f0 89 4d e4 89 e0 25 00 e0 ff ff f7 40 14 00 ff ff 07 74 17 c1 e9 02 8b 7d f0 <f3> a5 8b 4d e4 83 e1 03 74 02 f3 a4 e9 27 01 00 00 8b 45 1
[ 1217.643017] EIP: [<c11e2be7>] _mmx_memcpy+0x27/0x16c SS:ESP 0068:dd40bea0
[ 1217.643017] CR2: 00000000c7cb0000
[ 1217.643017] ---[ end trace fbaa4c5261e1a846 ]---

Bruno

2012-08-19 00:11:16

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Sat, 18 Aug 2012, Bruno Prémont wrote:

> One thing I just though about, how does usbhid handle the calls to
> usbhid_submit_report() when hid_hw_stop()/hid_hw_close() have already
> been called?

Looks like they aren't synchronized at all. That's a bug.
usbhid_submit_report() should check the HID_DISCONNECTED flag.

> I will attempt to see if it makes a difference to shortcut my
> usbhid_submit_report() calls from the point on I have called hid_hw_close()...

I don't know bout hid_hw_close(). Certainly no more reports should be
submitted following usbhid_stop().

Alan Stern

2012-08-19 16:23:50

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Sat, 18 August 2012 Alan Stern wrote:
> On Sat, 18 Aug 2012, Bruno Prémont wrote:
>
> > One thing I just though about, how does usbhid handle the calls to
> > usbhid_submit_report() when hid_hw_stop()/hid_hw_close() have already
> > been called?
>
> Looks like they aren't synchronized at all. That's a bug.
> usbhid_submit_report() should check the HID_DISCONNECTED flag.
>
> > I will attempt to see if it makes a difference to shortcut my
> > usbhid_submit_report() calls from the point on I have called hid_hw_close()...
>
> I don't know bout hid_hw_close(). Certainly no more reports should be
> submitted following usbhid_stop().

Ok, I did just that, prevent new calls to usbhid_submit_report(), after
calling hid_hw_close(), fixed one bug in my code that triggers the NULL
pointer dereference (calling hid_set_drvdata(hdev, NULL) too early).

Now I'm still seeing the bad paging request in _mmx_memcpy(), though rather
sporadically.
The last ones I saw were during remove() around the time of calling hid_hw_close()
and hid_hw_stop(). Adding a printk() between the two (at least while picoLCD
is hosting fbcon) makes it very improbably for the bad page to happen.

It looks like low-level driver did free memory in hid_hw_close() for some
in-flight URB and thus things break in following USB interrupt.

>From mapping trace information to source it seems:
usbhid/hid-core.c:
static int hid_submit_out(struct hid_device *hid)
{
struct hid_report *report;
char *raw_report;
struct usbhid_device *usbhid = hid->driver_data;
int r;

report = usbhid->out[usbhid->outtail].report;
raw_report = usbhid->out[usbhid->outtail].raw_report;

usbhid->urbout->transfer_buffer_length = ((report->size - 1) >> 3) +
1 + (report->id > 0);
usbhid->urbout->dev = hid_to_usb_dev(hid);
if (raw_report) {
memcpy(usbhid->outbuf, raw_report,
usbhid->urbout->transfer_buffer_length);
^^^^^^^^^^^^^^^_ this is exploding
kfree(raw_report);
usbhid->out[usbhid->outtail].raw_report = NULL;
}

dbg_hid("submitting out urb\n");

r = usb_submit_urb(usbhid->urbout, GFP_ATOMIC);
if (r < 0) {
hid_err(hid, "usb_submit_urb(out) failed: %d\n", r);
return r;
}
usbhid->last_out = jiffies;
return 0;
}


Bruno

2012-08-19 19:56:27

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 0/7] HID: picoLCD updates

On Sun, 19 Aug 2012, Bruno Prémont wrote:

> > I don't know bout hid_hw_close(). Certainly no more reports should be
> > submitted following usbhid_stop().
>
> Ok, I did just that, prevent new calls to usbhid_submit_report(), after
> calling hid_hw_close(), fixed one bug in my code that triggers the NULL
> pointer dereference (calling hid_set_drvdata(hdev, NULL) too early).
>
> Now I'm still seeing the bad paging request in _mmx_memcpy(), though rather
> sporadically.
> The last ones I saw were during remove() around the time of calling hid_hw_close()
> and hid_hw_stop(). Adding a printk() between the two (at least while picoLCD
> is hosting fbcon) makes it very improbably for the bad page to happen.
>
> It looks like low-level driver did free memory in hid_hw_close() for some
> in-flight URB and thus things break in following USB interrupt.

No, memory is not freed in usbhid_close(). It is freed in
usbhid_stop().

> From mapping trace information to source it seems:
> usbhid/hid-core.c:
> static int hid_submit_out(struct hid_device *hid)
> {
> struct hid_report *report;
> char *raw_report;
> struct usbhid_device *usbhid = hid->driver_data;
> int r;
>
> report = usbhid->out[usbhid->outtail].report;
> raw_report = usbhid->out[usbhid->outtail].raw_report;
>
> usbhid->urbout->transfer_buffer_length = ((report->size - 1) >> 3) +
> 1 + (report->id > 0);
> usbhid->urbout->dev = hid_to_usb_dev(hid);
> if (raw_report) {
> memcpy(usbhid->outbuf, raw_report,
> usbhid->urbout->transfer_buffer_length);
> ^^^^^^^^^^^^^^^_ this is exploding

Right. Like I said, usbhid->outbuf is freed in hid_free_buffers(),
which is called from usbhid_stop().

Forget about usbhid_close(). The race is between hid_submit_out() and
usbhid_stop().

Alan Stern