the patch is for fix the below kernel panic:
BUG: unable to handle kernel NULL pointer dereference at 000000000000002a
IP: [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
PGD 27525b067 PUD 27525a067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
Call Trace:
[<ffffffff8168b902>] ? dwc3_trace+0x52/0x60
[<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
[<ffffffff8171159c>] android_setup+0xbc/0x140
[<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
[<ffffffff816917e7>] dwc3_ep0_delegate_req+0x37/0x50
[<ffffffff81692e69>] dwc3_ep0_interrupt+0xaf9/0xc10
[<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
[<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
[<ffffffff81690281>] dwc3_thread_interrupt+0x931/0xbf0
[<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
[<ffffffff810ec1ae>] irq_thread_fn+0x1e/0x40
[<ffffffff810ec674>] irq_thread+0x134/0x1b0
[<ffffffff810ec260>] ? wake_threads_waitq+0x30/0x30
[<ffffffff810b802d>] kthread+0xed/0x110
[<ffffffff81a2fd6f>] ret_from_fork+0x3f/0x70
RIP [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
the root cause is dwc interrupt comes after usb_gadget_remove_driver.
the fix is stop udc to have the dwc3 disable the interrupt, then release
the resource in udc->driver->unbind.
usb_gadget_udc_stop-->
udc->gadget->ops->udc_stop(udc->gadget)-->
dwc3_gadget_stop
Signed-off-by: he, bo <[email protected]>
---
drivers/usb/gadget/udc/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index e6f04ee..67e9aa5 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1258,8 +1258,8 @@ static void usb_gadget_remove_driver(struct usb_udc *udc)
usb_gadget_disconnect(udc->gadget);
udc->driver->disconnect(udc->gadget);
- udc->driver->unbind(udc->gadget);
usb_gadget_udc_stop(udc);
+ udc->driver->unbind(udc->gadget);
udc->driver = NULL;
udc->dev.driver = NULL;
--
2.7.4
Hi,
"He, Bo" <[email protected]> writes:
> the patch is for fix the below kernel panic:
> BUG: unable to handle kernel NULL pointer dereference at 000000000000002a
> IP: [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
> PGD 27525b067 PUD 27525a067 PMD 0
> Oops: 0002 [#1] PREEMPT SMP
> Call Trace:
> [<ffffffff8168b902>] ? dwc3_trace+0x52/0x60
> [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
> [<ffffffff8171159c>] android_setup+0xbc/0x140
> [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
> [<ffffffff816917e7>] dwc3_ep0_delegate_req+0x37/0x50
> [<ffffffff81692e69>] dwc3_ep0_interrupt+0xaf9/0xc10
> [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
> [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
> [<ffffffff81690281>] dwc3_thread_interrupt+0x931/0xbf0
> [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
> [<ffffffff810ec1ae>] irq_thread_fn+0x1e/0x40
> [<ffffffff810ec674>] irq_thread+0x134/0x1b0
> [<ffffffff810ec260>] ? wake_threads_waitq+0x30/0x30
> [<ffffffff810b802d>] kthread+0xed/0x110
> [<ffffffff81a2fd6f>] ret_from_fork+0x3f/0x70
> RIP [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
>
> the root cause is dwc interrupt comes after usb_gadget_remove_driver.
> the fix is stop udc to have the dwc3 disable the interrupt, then release
> the resource in udc->driver->unbind.
> usb_gadget_udc_stop-->
> udc->gadget->ops->udc_stop(udc->gadget)-->
> dwc3_gadget_stop
>
> Signed-off-by: he, bo <[email protected]>
> ---
> drivers/usb/gadget/udc/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
> index e6f04ee..67e9aa5 100644
> --- a/drivers/usb/gadget/udc/core.c
> +++ b/drivers/usb/gadget/udc/core.c
> @@ -1258,8 +1258,8 @@ static void usb_gadget_remove_driver(struct usb_udc *udc)
>
> usb_gadget_disconnect(udc->gadget);
> udc->driver->disconnect(udc->gadget);
> - udc->driver->unbind(udc->gadget);
> usb_gadget_udc_stop(udc);
> + udc->driver->unbind(udc->gadget);
unbind must be called before udc_stop. This seems to be a bug *only* in
dwc3. I can't see how this would happen, actually. On dwc3_gadget_stop()
we mask dwc3's interrupts, so the handler should be executed anymore.
Can you tell me how to reproduce this? I could try this out tomorrow.
Which kernel are you using? I wonder if this is something caused by the
Android patches.
--
balbi
Hi, Balbi:
1. the issue reproduced very rarely, we run reboot test reproduce the issue, it reproduced two times on two board after more than 1500 cycles reboot.
2. the kernel version is 4.4, the test case is cold reboot, I think it's not android patches cause it, it's the interrupt thread run after the udc->driver->unbind.
3. I check more drivers, like amd5536_udc_stop, at91_stop, atmel_usba_stop, bcm63xx_udc_stop, s3c_hsudc_stop, all the interrupt disable will be in the udc_stop(), so we need guarantee to stop the interrupt then release the resource.
-----Original Message-----
From: Felipe Balbi [mailto:[email protected]]
Sent: Tuesday, July 18, 2017 6:44 PM
To: He, Bo <[email protected]>; [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; He, Bo <[email protected]>; Zhang, Yanmin <[email protected]>
Subject: Re: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
Hi,
"He, Bo" <[email protected]> writes:
> the patch is for fix the below kernel panic:
> BUG: unable to handle kernel NULL pointer dereference at
> 000000000000002a
> IP: [<ffffffff8170e19d>] composite_setup+0x3d/0x1830 PGD 27525b067 PUD
> 27525a067 PMD 0
> Oops: 0002 [#1] PREEMPT SMP
> Call Trace:
> [<ffffffff8168b902>] ? dwc3_trace+0x52/0x60 [<ffffffff810c504d>] ?
> get_parent_ip+0xd/0x50 [<ffffffff8171159c>] android_setup+0xbc/0x140
> [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
> [<ffffffff816917e7>] dwc3_ep0_delegate_req+0x37/0x50
> [<ffffffff81692e69>] dwc3_ep0_interrupt+0xaf9/0xc10
> [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50 [<ffffffff810ec190>] ?
> irq_finalize_oneshot+0xe0/0xe0 [<ffffffff81690281>]
> dwc3_thread_interrupt+0x931/0xbf0 [<ffffffff810ec190>] ?
> irq_finalize_oneshot+0xe0/0xe0 [<ffffffff810ec1ae>]
> irq_thread_fn+0x1e/0x40 [<ffffffff810ec674>] irq_thread+0x134/0x1b0
> [<ffffffff810ec260>] ? wake_threads_waitq+0x30/0x30
> [<ffffffff810b802d>] kthread+0xed/0x110 [<ffffffff81a2fd6f>]
> ret_from_fork+0x3f/0x70 RIP [<ffffffff8170e19d>]
> composite_setup+0x3d/0x1830
>
> the root cause is dwc interrupt comes after usb_gadget_remove_driver.
> the fix is stop udc to have the dwc3 disable the interrupt, then
> release the resource in udc->driver->unbind.
> usb_gadget_udc_stop-->
> udc->gadget->ops->udc_stop(udc->gadget)-->
> dwc3_gadget_stop
>
> Signed-off-by: he, bo <[email protected]>
> ---
> drivers/usb/gadget/udc/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/udc/core.c
> b/drivers/usb/gadget/udc/core.c index e6f04ee..67e9aa5 100644
> --- a/drivers/usb/gadget/udc/core.c
> +++ b/drivers/usb/gadget/udc/core.c
> @@ -1258,8 +1258,8 @@ static void usb_gadget_remove_driver(struct
> usb_udc *udc)
>
> usb_gadget_disconnect(udc->gadget);
> udc->driver->disconnect(udc->gadget);
> - udc->driver->unbind(udc->gadget);
> usb_gadget_udc_stop(udc);
> + udc->driver->unbind(udc->gadget);
unbind must be called before udc_stop. This seems to be a bug *only* in dwc3. I can't see how this would happen, actually. On dwc3_gadget_stop() we mask dwc3's interrupts, so the handler should be executed anymore.
Can you tell me how to reproduce this? I could try this out tomorrow.
Which kernel are you using? I wonder if this is something caused by the Android patches.
--
balbi
Hi,
(please don't top-post and also break your lines at 80-columns ;-)
"He, Bo" <[email protected]> writes:
> 1. the issue reproduced very rarely, we run reboot test
> reproduce the issue, it reproduced two times on two board after
> more than 1500 cycles reboot.
That's fine, we, somehow, got a use-after-free on the tracepoints. I'm
interested in fixing that without touching udc-core since that's a
dwc3-only bug.
> 2. the kernel version is 4.4, the test case is cold reboot, I think it's not android patches cause it, it's the interrupt thread run after the udc->driver->unbind.
Yeah, I need you to try v4.13-rc1. v4.4 is *really* old. I can't accept
your patch unless I'm certain the bug still exists.
> 3. I check more drivers, like amd5536_udc_stop, at91_stop,
> atmel_usba_stop, bcm63xx_udc_stop, s3c_hsudc_stop, all the
> interrupt disable will be in the udc_stop(), so we need
> guarantee to stop the interrupt then release the resource.
Right, we also disable the interrupt on ->udc_stop(). See below:
static void __dwc3_gadget_stop(struct dwc3 *dwc)
{
dwc3_gadget_disable_irq(dwc);
__dwc3_gadget_ep_disable(dwc->eps[0]);
__dwc3_gadget_ep_disable(dwc->eps[1]);
}
static int dwc3_gadget_stop(struct usb_gadget *g)
{
struct dwc3 *dwc = gadget_to_dwc(g);
unsigned long flags;
int epnum;
spin_lock_irqsave(&dwc->lock, flags);
if (pm_runtime_suspended(dwc->dev))
goto out;
__dwc3_gadget_stop(dwc);
for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
struct dwc3_ep *dep = dwc->eps[epnum];
if (!dep)
continue;
if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
continue;
wait_event_lock_irq(dep->wait_end_transfer,
!(dep->flags & DWC3_EP_END_TRANSFER_PENDING),
dwc->lock);
}
out:
dwc->gadget_driver = NULL;
spin_unlock_irqrestore(&dwc->lock, flags);
free_irq(dwc->irq_gadget, dwc->ev_buf);
return 0;
}
--
balbi
The patch I submitted is based on the latest kernel,
I checked the latest kernel has the same logic,
so I send the patch to LKML.
Thanks for your comments.
-----Original Message-----
From: Felipe Balbi [mailto:[email protected]]
Sent: Wednesday, July 19, 2017 3:51 PM
To: He, Bo <[email protected]>; [email protected]; [email protected]
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Zhang, Yanmin <[email protected]>
Subject: RE: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
Hi,
(please don't top-post and also break your lines at 80-columns ;-)
"He, Bo" <[email protected]> writes:
> 1. the issue reproduced very rarely, we run reboot test
> reproduce the issue, it reproduced two times on two board after
> more than 1500 cycles reboot.
That's fine, we, somehow, got a use-after-free on the tracepoints. I'm interested in fixing that without touching udc-core since that's a dwc3-only bug.
> 2. the kernel version is 4.4, the test case is cold reboot, I think it's not android patches cause it, it's the interrupt thread run after the udc->driver->unbind.
Yeah, I need you to try v4.13-rc1. v4.4 is *really* old. I can't accept your patch unless I'm certain the bug still exists.
> 3. I check more drivers, like amd5536_udc_stop, at91_stop,
> atmel_usba_stop, bcm63xx_udc_stop, s3c_hsudc_stop, all the
> interrupt disable will be in the udc_stop(), so we need
> guarantee to stop the interrupt then release the resource.
Right, we also disable the interrupt on ->udc_stop(). See below:
static void __dwc3_gadget_stop(struct dwc3 *dwc) {
dwc3_gadget_disable_irq(dwc);
__dwc3_gadget_ep_disable(dwc->eps[0]);
__dwc3_gadget_ep_disable(dwc->eps[1]);
}
static int dwc3_gadget_stop(struct usb_gadget *g) {
struct dwc3 *dwc = gadget_to_dwc(g);
unsigned long flags;
int epnum;
spin_lock_irqsave(&dwc->lock, flags);
if (pm_runtime_suspended(dwc->dev))
goto out;
__dwc3_gadget_stop(dwc);
for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
struct dwc3_ep *dep = dwc->eps[epnum];
if (!dep)
continue;
if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
continue;
wait_event_lock_irq(dep->wait_end_transfer,
!(dep->flags & DWC3_EP_END_TRANSFER_PENDING),
dwc->lock);
}
out:
dwc->gadget_driver = NULL;
spin_unlock_irqrestore(&dwc->lock, flags);
free_irq(dwc->irq_gadget, dwc->ev_buf);
return 0;
}
--
balbi
(No top-posting!)
"He, Bo" <[email protected]> writes:
> The patch I submitted is based on the latest kernel,
> I checked the latest kernel has the same logic,
> so I send the patch to LKML.
but you haven't reproduced the bug on latest kernel, have you?
Can you send me tracepoint output of one test run (without failures is
fine) I just wanna know what exactly it is that you're doing.
cheers
--
balbi