2023-01-02 05:44:50

by 정재훈

[permalink] [raw]
Subject: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
It must not have these status. Because, It can make happen interrupt storming
status.
So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
It means "There are no interrupts to handle.".

(struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
(void *) cache = 0xFFFFFF8839F54080,
(unsigned int) length = 0x1000,
(unsigned int) lpos = 0x0,
(unsigned int) count = 0x0,
(unsigned int) flags = 0x00000001,
(dma_addr_t) dma = 0x00000008BD7D7000,
(struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
(u64) android_kabi_reserved1 = 0x0),

(time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),

Signed-off-by: JaeHun Jung <[email protected]>
---
drivers/usb/dwc3/gadget.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 789976567f9f..5d2d5a9b9915 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
* irq event handler completes before caching new event to prevent
* losing events.
*/
- if (evt->flags & DWC3_EVENT_PENDING)
+ if (evt->flags & DWC3_EVENT_PENDING) {
+ if (!evt->count)
+ evt->flags &= ~DWC3_EVENT_PENDING;
return IRQ_HANDLED;
+ }

count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
count &= DWC3_GEVNTCOUNT_MASK;
--
2.31.1


2023-01-03 16:08:21

by Felipe Balbi

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


Hi,

JaeHun Jung <[email protected]> writes:
> Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> It must not have these status. Because, It can make happen interrupt storming
> status.
> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".
>
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> (void *) cache = 0xFFFFFF8839F54080,
> (unsigned int) length = 0x1000,
> (unsigned int) lpos = 0x0,
> (unsigned int) count = 0x0,
> (unsigned int) flags = 0x00000001,
> (dma_addr_t) dma = 0x00000008BD7D7000,
> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> (u64) android_kabi_reserved1 = 0x0),
>
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>
> Signed-off-by: JaeHun Jung <[email protected]>

Please get in the habit of running patches through
scripts/get_maintainer.pl. If you did, it would tell you Thinh is the
new maintainer for dwc3.

--
balbi

2023-01-05 03:47:13

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
>> status.
>> It must not have these status. Because, It can make happen interrupt
>> storming
>> status.
>
> could you help explain without clear the flag, how interrupt storming
> happen ?
>
> as your change didn't touch any hardware register, i don't know how it
> fix storming.
>
> storming from software layer ?
>
>> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
>> It means "There are no interrupts to handle.".
>>
>> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
>>     (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
>>     (void *) cache = 0xFFFFFF8839F54080,
>>     (unsigned int) length = 0x1000,
>>     (unsigned int) lpos = 0x0,
>>     (unsigned int) count = 0x0,
>>     (unsigned int) flags = 0x00000001,
>>     (dma_addr_t) dma = 0x00000008BD7D7000,
>>     (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
>>     (u64) android_kabi_reserved1 = 0x0),
>>
>> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 1),
>> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0,
>> en = 3),
>>
>> Signed-off-by: JaeHun Jung <[email protected]>
>> ---
>>   drivers/usb/dwc3/gadget.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 789976567f9f..5d2d5a9b9915 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct
>> dwc3_event_buffer *evt)
>>        * irq event handler completes before caching new event to prevent
>>        * losing events.
>>        */
>> -    if (evt->flags & DWC3_EVENT_PENDING)
>> +    if (evt->flags & DWC3_EVENT_PENDING) {
>> +        if (!evt->count)
>> +            evt->flags &= ~DWC3_EVENT_PENDING;
>>           return IRQ_HANDLED;
>> +    }
>>         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>       count &= DWC3_GEVNTCOUNT_MASK;
>
> how about below change ?
>
>         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>         count &= DWC3_GEVNTCOUNT_MASK;
> -       if (!count)
> +       if (!count) {
> +               evt->flags &= ~DWC3_EVENT_PENDING;
>                 return IRQ_NONE;
> +       }
ignore my suggestion, add Thinh to comment.
>
>         evt->count = count;
>         evt->flags |= DWC3_EVENT_PENDING;
>

2023-01-05 04:11:31

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> It must not have these status. Because, It can make happen interrupt storming
> status.

could you help explain without clear the flag, how interrupt storming
happen ?

as your change didn't touch any hardware register, i don't know how it
fix storming.

storming from software layer ?

> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".
>
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> (void *) cache = 0xFFFFFF8839F54080,
> (unsigned int) length = 0x1000,
> (unsigned int) lpos = 0x0,
> (unsigned int) count = 0x0,
> (unsigned int) flags = 0x00000001,
> (dma_addr_t) dma = 0x00000008BD7D7000,
> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> (u64) android_kabi_reserved1 = 0x0),
>
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>
> Signed-off-by: JaeHun Jung <[email protected]>
> ---
> drivers/usb/dwc3/gadget.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 789976567f9f..5d2d5a9b9915 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
> * irq event handler completes before caching new event to prevent
> * losing events.
> */
> - if (evt->flags & DWC3_EVENT_PENDING)
> + if (evt->flags & DWC3_EVENT_PENDING) {
> + if (!evt->count)
> + evt->flags &= ~DWC3_EVENT_PENDING;
> return IRQ_HANDLED;
> + }
>
> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> count &= DWC3_GEVNTCOUNT_MASK;

how about below change ?

        count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
        count &= DWC3_GEVNTCOUNT_MASK;
-       if (!count)
+       if (!count) {
+               evt->flags &= ~DWC3_EVENT_PENDING;
                return IRQ_NONE;
+       }

        evt->count = count;
        evt->flags |= DWC3_EVENT_PENDING;

2023-01-05 10:19:52

by 정재훈

[permalink] [raw]
Subject: RE: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0



> -----Original Message-----
> From: Linyu Yuan [mailto:[email protected]]
> Sent: Thursday, January 5, 2023 12:35 PM
> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
>
>
> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> >> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
> >> status.
> >> It must not have these status. Because, It can make happen interrupt
> >> storming status.
> >
> > could you help explain without clear the flag, how interrupt storming
> > happen ?
> >
> > as your change didn't touch any hardware register, i don't know how it
> > fix storming.
> >
H/W interrupts are still occur on IP.
But. ISR doesn't handle it. Because of this
"if (evt->flags & DWC3_EVENT_PENDING)"

This is event values.
(struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 = kernel_size_le_lo32+0xFFFFFF883B391180 -> (
(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000 -> ,
(void *) cache = 0xFFFFFF8839F54080 = kernel_size_le_lo32+0xFFFFFF88376F4080 -> ,
(unsigned int) length = 0x1000,
(unsigned int) lpos = 0x0,
(unsigned int) count = 0x0,
(unsigned int) flags = 0x00100001,
(dma_addr_t) dma = 0x00000008BD7D7000,

*((struct dwc3_event_type *) 0xFFFFFF8839F54080) = (
is_devspec = 1,
type = 0,
reserved8_31 = 774)

*((struct dwc3_event_devt *) 0xFFFFFF8839F54080) = (
one_bit = 1,
device_event = 0,
type = 6, << DWC3_DEVICE_EVENT_SUSPEND
reserved15_12 = 0,
event_info = 3,
reserved31_25 = 0)

Occurred interrupts are "DWC3_DEVICE_EVENT_SUSPEND".
If when "count has 0" and DWC3_EVENT_PENDING are set at the same time than
ISR and bottom thread couldn't handle next interrupt.
We guessing connected with "dwc3_gadget_process_pending_events()" function.
But, We could not find reproduce way.


> > storming from software layer ?
> >
> >> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> >> It means "There are no interrupts to handle.".
> >>
> >> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> >> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> >> (void *) cache = 0xFFFFFF8839F54080,
> >> (unsigned int) length = 0x1000,
> >> (unsigned int) lpos = 0x0,
> >> (unsigned int) count = 0x0,
> >> (unsigned int) flags = 0x00000001,
> >> (dma_addr_t) dma = 0x00000008BD7D7000,
> >> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> >> (u64) android_kabi_reserved1 = 0x0),
> >>
> >> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 1), (time = 47557628931268, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 3), (time = 47557628932383, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 1), (time = 47557628932652, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
> >> 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> >> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 3), (time = 47557628935152, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 1), (time = 47557628935460, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 3), (time = 47557628936575, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
> >> 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> >> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 1), (time = 47557628938229, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 3), (time = 47557628939345, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 1), (time = 47557628939652, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
> >> 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> >> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 3), (time = 47557628942152, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 1), (time = 47557628942422, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 3), (time = 47557628943537, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
> >> 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> >> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 1), (time = 47557628945229, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 3), (time = 47557628946345, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 1), (time = 47557628946614, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
> >> 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> >> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 3),
> >>
> >> Signed-off-by: JaeHun Jung <[email protected]>
> >> ---
> >> drivers/usb/dwc3/gadget.c | 5 ++++-
> >> 1 file changed, 4 insertions(+), 1 deletion(-) diff --git
> >> a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index
> >> 789976567f9f..5d2d5a9b9915 100644
> >> --- a/drivers/usb/dwc3/gadget.c
> >> +++ b/drivers/usb/dwc3/gadget.c
> >> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct
> >> dwc3_event_buffer *evt)
> >> * irq event handler completes before caching new event to
> >> prevent
> >> * losing events.
> >> */
> >> - if (evt->flags & DWC3_EVENT_PENDING)
> >> + if (evt->flags & DWC3_EVENT_PENDING) {
> >> + if (!evt->count)
> >> + evt->flags &= ~DWC3_EVENT_PENDING;
> >> return IRQ_HANDLED;
> >> + }
> >> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> >> count &= DWC3_GEVNTCOUNT_MASK;
> >
> > how about below change ?
> >
> > count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> > count &= DWC3_GEVNTCOUNT_MASK;
> > - if (!count)
> > + if (!count) {
> > + evt->flags &= ~DWC3_EVENT_PENDING;
> > return IRQ_NONE;
> > + }
> ignore my suggestion, add Thinh to comment.
> >
> > evt->count = count;
> > evt->flags |= DWC3_EVENT_PENDING;
> >

2023-01-06 03:43:04

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On 1/5/2023 5:54 PM, 정재훈 wrote:
>
>> -----Original Message-----
>> From: Linyu Yuan [mailto:[email protected]]
>> Sent: Thursday, January 5, 2023 12:35 PM
>> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
>> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
>> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
>>
>>
>> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
>>> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>>>> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
>>>> status.
>>>> It must not have these status. Because, It can make happen interrupt
>>>> storming status.
>>> could you help explain without clear the flag, how interrupt storming
>>> happen ?
>>>
>>> as your change didn't touch any hardware register, i don't know how it
>>> fix storming.
>>>
> H/W interrupts are still occur on IP.

I guess we should fix it from IP layer.


but when checking DWC3_EVENT_PENDING flag, it will auto clear in dwc3
thread irq handler.

there is one possible root cause is it cleared only after irq enabled in
dwc3_process_event_buf(),

we should move unmask irq operation at end of this function.


> But. ISR doesn't handle it. Because of this
> "if (evt->flags & DWC3_EVENT_PENDING)"
>
> This is event values.
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 = kernel_size_le_lo32+0xFFFFFF883B391180 -> (
> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000 -> ,
> (void *) cache = 0xFFFFFF8839F54080 = kernel_size_le_lo32+0xFFFFFF88376F4080 -> ,
> (unsigned int) length = 0x1000,
> (unsigned int) lpos = 0x0,
> (unsigned int) count = 0x0,
> (unsigned int) flags = 0x00100001,
> (dma_addr_t) dma = 0x00000008BD7D7000,
>
> *((struct dwc3_event_type *) 0xFFFFFF8839F54080) = (
> is_devspec = 1,
> type = 0,
> reserved8_31 = 774)
>
> *((struct dwc3_event_devt *) 0xFFFFFF8839F54080) = (
> one_bit = 1,
> device_event = 0,
> type = 6, << DWC3_DEVICE_EVENT_SUSPEND
> reserved15_12 = 0,
> event_info = 3,
> reserved31_25 = 0)
>
> Occurred interrupts are "DWC3_DEVICE_EVENT_SUSPEND".
> If when "count has 0" and DWC3_EVENT_PENDING are set at the same time than
> ISR and bottom thread couldn't handle next interrupt.
> We guessing connected with "dwc3_gadget_process_pending_events()" function.
> But, We could not find reproduce way.
>
>
>>> storming from software layer ?
>>>
>>>> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
>>>> It means "There are no interrupts to handle.".
>>>>
>>>> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
>>>> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
>>>> (void *) cache = 0xFFFFFF8839F54080,
>>>> (unsigned int) length = 0x1000,
>>>> (unsigned int) lpos = 0x0,
>>>> (unsigned int) count = 0x0,
>>>> (unsigned int) flags = 0x00000001,
>>>> (dma_addr_t) dma = 0x00000008BD7D7000,
>>>> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
>>>> (u64) android_kabi_reserved1 = 0x0),
>>>>
>>>> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 1), (time = 47557628931268, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 3), (time = 47557628932383, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 1), (time = 47557628932652, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
>>>> 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>>>> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 3), (time = 47557628935152, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 1), (time = 47557628935460, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 3), (time = 47557628936575, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
>>>> 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>>>> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 1), (time = 47557628938229, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 3), (time = 47557628939345, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 1), (time = 47557628939652, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
>>>> 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>>>> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 3), (time = 47557628942152, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 1), (time = 47557628942422, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 3), (time = 47557628943537, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
>>>> 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>>>> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 1), (time = 47557628945229, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 3), (time = 47557628946345, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 1), (time = 47557628946614, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
>>>> 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>>>> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 3),
>>>>
>>>> Signed-off-by: JaeHun Jung <[email protected]>
>>>> ---
>>>> drivers/usb/dwc3/gadget.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-) diff --git
>>>> a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index
>>>> 789976567f9f..5d2d5a9b9915 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct
>>>> dwc3_event_buffer *evt)
>>>> * irq event handler completes before caching new event to
>>>> prevent
>>>> * losing events.
>>>> */
>>>> - if (evt->flags & DWC3_EVENT_PENDING)
>>>> + if (evt->flags & DWC3_EVENT_PENDING) {
>>>> + if (!evt->count)
>>>> + evt->flags &= ~DWC3_EVENT_PENDING;
>>>> return IRQ_HANDLED;
>>>> + }
>>>> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>> count &= DWC3_GEVNTCOUNT_MASK;
>>> how about below change ?
>>>
>>> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>> count &= DWC3_GEVNTCOUNT_MASK;
>>> - if (!count)
>>> + if (!count) {
>>> + evt->flags &= ~DWC3_EVENT_PENDING;
>>> return IRQ_NONE;
>>> + }
>> ignore my suggestion, add Thinh to comment.
>>> evt->count = count;
>>> evt->flags |= DWC3_EVENT_PENDING;
>>>

2023-01-09 18:38:07

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Fri, Jan 06, 2023, Linyu Yuan wrote:
> On 1/5/2023 5:54 PM, 정재훈 wrote:
> >
> > > -----Original Message-----
> > > From: Linyu Yuan [mailto:[email protected]]
> > > Sent: Thursday, January 5, 2023 12:35 PM
> > > To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> > > Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
> > > Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
> > >
> > >
> > > On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > > > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> > > > > Sometimes very rarely, The count is 0 and the DWC3 flag is set has
> > > > > status.
> > > > > It must not have these status. Because, It can make happen interrupt
> > > > > storming status.
> > > > could you help explain without clear the flag, how interrupt storming
> > > > happen ?
> > > >
> > > > as your change didn't touch any hardware register, i don't know how it
> > > > fix storming.
> > > >
> > H/W interrupts are still occur on IP.
>
> I guess we should fix it from IP layer.
>

How are you certain the problem is from IP layer?

>
> but when checking DWC3_EVENT_PENDING flag, it will auto clear in dwc3 thread
> irq handler.
>
> there is one possible root cause is it cleared only after irq enabled in
> dwc3_process_event_buf(),
>
> we should move unmask irq operation at end of this function.
>

This interrupt storm can happen because we clear the evt->flags _after_
we unmask the interrupt. This was done to prevent false interrupt from
delay interrupt deassertion, which can be a problem for legacy pci
interrupt.

see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")

The change JaeHun Jung did should be fine.

BR,
Thinh

2023-01-09 19:22:40

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

Hi,

On Mon, Jan 02, 2023, JaeHun Jung wrote:
> Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> It must not have these status. Because, It can make happen interrupt storming
> status.
> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".

Can you reword to include the explanation from my response [*] and add a
fixes tag to 7441b273388b ("usb: dwc3: gadget: Fix event pending check")

Thanks,
Thinh

[*] https://lore.kernel.org/linux-usb/[email protected]/T/#md32deea7e6b3c6d309aadba3d1cd2800d173685e


>
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> (void *) cache = 0xFFFFFF8839F54080,
> (unsigned int) length = 0x1000,
> (unsigned int) lpos = 0x0,
> (unsigned int) count = 0x0,
> (unsigned int) flags = 0x00000001,
> (dma_addr_t) dma = 0x00000008BD7D7000,
> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> (u64) android_kabi_reserved1 = 0x0),
>
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>
> Signed-off-by: JaeHun Jung <[email protected]>
> ---
> drivers/usb/dwc3/gadget.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 789976567f9f..5d2d5a9b9915 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
> * irq event handler completes before caching new event to prevent
> * losing events.
> */
> - if (evt->flags & DWC3_EVENT_PENDING)
> + if (evt->flags & DWC3_EVENT_PENDING) {
> + if (!evt->count)
> + evt->flags &= ~DWC3_EVENT_PENDING;
> return IRQ_HANDLED;
> + }
>
> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> count &= DWC3_GEVNTCOUNT_MASK;
> --
> 2.31.1
>

2023-01-09 19:27:49

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Mon, Jan 09, 2023, Thinh Nguyen wrote:
> Hi,
>
> On Mon, Jan 02, 2023, JaeHun Jung wrote:
> > Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> > It must not have these status. Because, It can make happen interrupt storming
> > status.
> > So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> > It means "There are no interrupts to handle.".
>
> Can you reword to include the explanation from my response [*] and add a
> fixes tag to 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
>

Actually, the old problem that the commit 7441b273388b fixes may
resurface because of this change.

Can you try this instead?

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 65500246323b..3c36dfdb88f0 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -5515,8 +5515,15 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
* irq event handler completes before caching new event to prevent
* losing events.
*/
- if (evt->flags & DWC3_EVENT_PENDING)
+ if (evt->flags & DWC3_EVENT_PENDING) {
+ if (!evt->count) {
+ u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
+
+ if (!(reg & DWC3_GEVNTSIZ_INTMASK))
+ evt->flags &= ~DWC3_EVENT_PENDING;
+ }
return IRQ_HANDLED;
+ }


This is a quick solution at the moment. Let me know if you have have a
better option. Note that reading a register may take some time, albeit
short. That's why we may want to keep the evt->count check to help with
performance.

Thanks,
Thinh


>
>
> >
> > (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> > (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> > (void *) cache = 0xFFFFFF8839F54080,
> > (unsigned int) length = 0x1000,
> > (unsigned int) lpos = 0x0,
> > (unsigned int) count = 0x0,
> > (unsigned int) flags = 0x00000001,
> > (dma_addr_t) dma = 0x00000008BD7D7000,
> > (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> > (u64) android_kabi_reserved1 = 0x0),
> >
> > (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> > (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> > (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> >
> > Signed-off-by: JaeHun Jung <[email protected]>
> > ---
> > drivers/usb/dwc3/gadget.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 789976567f9f..5d2d5a9b9915 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
> > * irq event handler completes before caching new event to prevent
> > * losing events.
> > */
> > - if (evt->flags & DWC3_EVENT_PENDING)
> > + if (evt->flags & DWC3_EVENT_PENDING) {
> > + if (!evt->count)
> > + evt->flags &= ~DWC3_EVENT_PENDING;
> > return IRQ_HANDLED;
> > + }
> >
> > count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> > count &= DWC3_GEVNTCOUNT_MASK;
> > --
> > 2.31.1
> >

2023-01-10 02:09:24

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
> On Fri, Jan 06, 2023, Linyu Yuan wrote:
>> On 1/5/2023 5:54 PM, 정재훈 wrote:
>>>> -----Original Message-----
>>>> From: Linyu Yuan [mailto:[email protected]]
>>>> Sent: Thursday, January 5, 2023 12:35 PM
>>>> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
>>>> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
>>>> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
>>>>
>>>>
>>>> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
>>>>> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>>>>>> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
>>>>>> status.
>>>>>> It must not have these status. Because, It can make happen interrupt
>>>>>> storming status.
>>>>> could you help explain without clear the flag, how interrupt storming
>>>>> happen ?
>>>>>
>>>>> as your change didn't touch any hardware register, i don't know how it
>>>>> fix storming.
>>>>>
>>> H/W interrupts are still occur on IP.
>> I guess we should fix it from IP layer.
>>
> How are you certain the problem is from IP layer?

I think all IRQ is from DWC3 controller IP. if it is not IP layer, could
you share how to prevent from SW layer ?

seem IRQ can happen when event count is zero ,  why this can happen ?
does it mean event count register is not trust ?

>
>> but when checking DWC3_EVENT_PENDING flag, it will auto clear in dwc3 thread
>> irq handler.
>>
>> there is one possible root cause is it cleared only after irq enabled in
>> dwc3_process_event_buf(),
>>
>> we should move unmask irq operation at end of this function.
>>
> This interrupt storm can happen because we clear the evt->flags _after_
> we unmask the interrupt. This was done to prevent false interrupt from
> delay interrupt deassertion, which can be a problem for legacy pci
> interrupt.
thanks for explain, i didn't know that.
>
> see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
>
> The change JaeHun Jung did should be fine.
agree.
> BR,
> Thinh

2023-01-10 03:11:16

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Tue, Jan 10, 2023, Linyu Yuan wrote:
>
> On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
> > On Fri, Jan 06, 2023, Linyu Yuan wrote:
> > > On 1/5/2023 5:54 PM, 정재훈 wrote:
> > > > > -----Original Message-----
> > > > > From: Linyu Yuan [mailto:[email protected]]
> > > > > Sent: Thursday, January 5, 2023 12:35 PM
> > > > > To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> > > > > Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
> > > > > Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
> > > > >
> > > > >
> > > > > On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > > > > > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> > > > > > > Sometimes very rarely, The count is 0 and the DWC3 flag is set has
> > > > > > > status.
> > > > > > > It must not have these status. Because, It can make happen interrupt
> > > > > > > storming status.
> > > > > > could you help explain without clear the flag, how interrupt storming
> > > > > > happen ?
> > > > > >
> > > > > > as your change didn't touch any hardware register, i don't know how it
> > > > > > fix storming.
> > > > > >
> > > > H/W interrupts are still occur on IP.
> > > I guess we should fix it from IP layer.
> > >
> > How are you certain the problem is from IP layer?
>
> I think all IRQ is from DWC3 controller IP. if it is not IP layer, could you
> share how to prevent from SW layer ?
>
> seem IRQ can happen when event count is zero ,  why this can happen ? does
> it mean event count register is not trust ?

When the interrupt is unmasked, the controller will generate interrupts
as events are received. Normally, the flag checking for pending event
should be cleared before unmasking the interrupt, but we clear it after
to account for possible false interrupt due to the nature of legacy pci
interrupt. This exposes a gap where the interrupts can come but the flag
isn't cleared. While it should be rare and shouldn't be too much of a
problem, we can avoid this scenario with some additional checks.

>
> >
> > > but when checking DWC3_EVENT_PENDING flag, it will auto clear in dwc3 thread
> > > irq handler.
> > >
> > > there is one possible root cause is it cleared only after irq enabled in
> > > dwc3_process_event_buf(),
> > >
> > > we should move unmask irq operation at end of this function.
> > >
> > This interrupt storm can happen because we clear the evt->flags _after_
> > we unmask the interrupt. This was done to prevent false interrupt from
> > delay interrupt deassertion, which can be a problem for legacy pci
> > interrupt.
> thanks for explain, i didn't know that.
> >
> > see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
> >
> > The change JaeHun Jung did should be fine.
> agree.

The change may still need some additional check as suggested in my
response:
https://lore.kernel.org/linux-usb/[email protected]/T/#m7b907aa6da4023cb20fa00a57813d31fd84e081f

BR,
Thinh

2023-01-10 03:48:55

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 1/10/2023 10:53 AM, Thinh Nguyen wrote:
> On Tue, Jan 10, 2023, Linyu Yuan wrote:
>> On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
>>> On Fri, Jan 06, 2023, Linyu Yuan wrote:
>>>> On 1/5/2023 5:54 PM, 정재훈 wrote:
>>>>>> -----Original Message-----
>>>>>> From: Linyu Yuan [mailto:[email protected]]
>>>>>> Sent: Thursday, January 5, 2023 12:35 PM
>>>>>> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
>>>>>> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
>>>>>> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
>>>>>>
>>>>>>
>>>>>> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
>>>>>>> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>>>>>>>> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
>>>>>>>> status.
>>>>>>>> It must not have these status. Because, It can make happen interrupt
>>>>>>>> storming status.
>>>>>>> could you help explain without clear the flag, how interrupt storming
>>>>>>> happen ?
>>>>>>>
>>>>>>> as your change didn't touch any hardware register, i don't know how it
>>>>>>> fix storming.
>>>>>>>
>>>>> H/W interrupts are still occur on IP.
>>>> I guess we should fix it from IP layer.
>>>>
>>> How are you certain the problem is from IP layer?
>> I think all IRQ is from DWC3 controller IP. if it is not IP layer, could you
>> share how to prevent from SW layer ?
>>
>> seem IRQ can happen when event count is zero ,  why this can happen ? does
>> it mean event count register is not trust ?
> When the interrupt is unmasked, the controller will generate interrupts
> as events are received. Normally, the flag checking for pending event
> should be cleared before unmasking the interrupt, but we clear it after
> to account for possible false interrupt due to the nature of legacy pci
> interrupt. This exposes a gap where the interrupts can come but the flag
> isn't cleared. While it should be rare and shouldn't be too much of a
> problem, we can avoid this scenario with some additional checks.
>
>>>> but when checking DWC3_EVENT_PENDING flag, it will auto clear in dwc3 thread
>>>> irq handler.
>>>>
>>>> there is one possible root cause is it cleared only after irq enabled in
>>>> dwc3_process_event_buf(),
>>>>
>>>> we should move unmask irq operation at end of this function.
>>>>
>>> This interrupt storm can happen because we clear the evt->flags _after_
>>> we unmask the interrupt. This was done to prevent false interrupt from
>>> delay interrupt deassertion, which can be a problem for legacy pci
>>> interrupt.
>> thanks for explain, i didn't know that.
>>> see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
>>>
>>> The change JaeHun Jung did should be fine.
>> agree.
> The change may still need some additional check as suggested in my
> response:
> https://lore.kernel.org/linux-usb/[email protected]/T/#m7b907aa6da4023cb20fa00a57813d31fd84e081f
 do you think we need to read event count before checking
DWC3_EVENT_PENDING  ?
>
> BR,
> Thinh

2023-01-10 03:52:03

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 1/10/2023 11:05 AM, Linyu Yuan wrote:
>
> On 1/10/2023 10:53 AM, Thinh Nguyen wrote:
>> On Tue, Jan 10, 2023, Linyu Yuan wrote:
>>> On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
>>>> On Fri, Jan 06, 2023, Linyu Yuan wrote:
>>>>> On 1/5/2023 5:54 PM, 정재훈 wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Linyu Yuan [mailto:[email protected]]
>>>>>>> Sent: Thursday, January 5, 2023 12:35 PM
>>>>>>> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
>>>>>>> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh;
>>>>>>> Daehwan Jung
>>>>>>> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when
>>>>>>> count is 0
>>>>>>>
>>>>>>>
>>>>>>> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
>>>>>>>> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>>>>>>>>> Sometimes very rarely, The count is 0 and the DWC3 flag is set
>>>>>>>>> has
>>>>>>>>> status.
>>>>>>>>> It must not have these status. Because, It can make happen
>>>>>>>>> interrupt
>>>>>>>>> storming status.
>>>>>>>> could you help explain without clear the flag, how interrupt
>>>>>>>> storming
>>>>>>>> happen ?
>>>>>>>>
>>>>>>>> as your change didn't touch any hardware register, i don't know
>>>>>>>> how it
>>>>>>>> fix storming.
>>>>>>>>
>>>>>> H/W interrupts are still occur on IP.
>>>>> I guess we should fix it from IP layer.
>>>>>
>>>> How are you certain the problem is from IP layer?
>>> I think all IRQ is from DWC3 controller IP. if it is not IP layer,
>>> could you
>>> share how to prevent from SW layer ?
>>>
>>> seem IRQ can happen when event count is zero ,  why this can happen
>>> ? does
>>> it mean event count register is not trust ?
>> When the interrupt is unmasked, the controller will generate interrupts
>> as events are received. Normally, the flag checking for pending event
>> should be cleared before unmasking the interrupt, but we clear it after
>> to account for possible false interrupt due to the nature of legacy pci
>> interrupt. This exposes a gap where the interrupts can come but the flag
>> isn't cleared. While it should be rare and shouldn't be too much of a
>> problem, we can avoid this scenario with some additional checks.
>>
>>>>> but when checking DWC3_EVENT_PENDING flag, it will auto clear in
>>>>> dwc3 thread
>>>>> irq handler.
>>>>>
>>>>> there is one possible root cause is it cleared only after irq
>>>>> enabled in
>>>>> dwc3_process_event_buf(),
>>>>>
>>>>> we should move unmask irq operation at end of this function.
>>>>>
>>>> This interrupt storm can happen because we clear the evt->flags
>>>> _after_
>>>> we unmask the interrupt. This was done to prevent false interrupt from
>>>> delay interrupt deassertion, which can be a problem for legacy pci
>>>> interrupt.
>>> thanks for explain, i didn't know that.
>>>> see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
>>>>
>>>> The change JaeHun Jung did should be fine.
>>> agree.
>> The change may still need some additional check as suggested in my
>> response:
>> https://lore.kernel.org/linux-usb/[email protected]/T/#m7b907aa6da4023cb20fa00a57813d31fd84e081f
>>
>  do you think we need to read event count before checking
> DWC3_EVENT_PENDING  ?
sorry for this noise, may be i have a little understanding of the legcy
pci issue now.
>>
>> BR,
>> Thinh

2023-01-10 07:54:42

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 1/10/2023 11:13 AM, Linyu Yuan wrote:
>
> On 1/10/2023 11:05 AM, Linyu Yuan wrote:
>>
>> On 1/10/2023 10:53 AM, Thinh Nguyen wrote:
>>> On Tue, Jan 10, 2023, Linyu Yuan wrote:
>>>> On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
>>>>> On Fri, Jan 06, 2023, Linyu Yuan wrote:
>>>>>> On 1/5/2023 5:54 PM, 정재훈 wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Linyu Yuan [mailto:[email protected]]
>>>>>>>> Sent: Thursday, January 5, 2023 12:35 PM
>>>>>>>> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
>>>>>>>> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh;
>>>>>>>> Daehwan Jung
>>>>>>>> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when
>>>>>>>> count is 0
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
>>>>>>>>> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>>>>>>>>>> Sometimes very rarely, The count is 0 and the DWC3 flag is
>>>>>>>>>> set has
>>>>>>>>>> status.
>>>>>>>>>> It must not have these status. Because, It can make happen
>>>>>>>>>> interrupt
>>>>>>>>>> storming status.
>>>>>>>>> could you help explain without clear the flag, how interrupt
>>>>>>>>> storming
>>>>>>>>> happen ?
>>>>>>>>>
>>>>>>>>> as your change didn't touch any hardware register, i don't
>>>>>>>>> know how it
>>>>>>>>> fix storming.
>>>>>>>>>
>>>>>>> H/W interrupts are still occur on IP.
>>>>>> I guess we should fix it from IP layer.
>>>>>>
>>>>> How are you certain the problem is from IP layer?
>>>> I think all IRQ is from DWC3 controller IP. if it is not IP layer,
>>>> could you
>>>> share how to prevent from SW layer ?
>>>>
>>>> seem IRQ can happen when event count is zero ,  why this can happen
>>>> ? does
>>>> it mean event count register is not trust ?
>>> When the interrupt is unmasked, the controller will generate interrupts
>>> as events are received. Normally, the flag checking for pending event
>>> should be cleared before unmasking the interrupt, but we clear it after
>>> to account for possible false interrupt due to the nature of legacy pci
>>> interrupt. This exposes a gap where the interrupts can come but the
>>> flag
>>> isn't cleared. While it should be rare and shouldn't be too much of a
>>> problem, we can avoid this scenario with some additional checks.
>>>
>>>>>> but when checking DWC3_EVENT_PENDING flag, it will auto clear in
>>>>>> dwc3 thread
>>>>>> irq handler.
>>>>>>
>>>>>> there is one possible root cause is it cleared only after irq
>>>>>> enabled in
>>>>>> dwc3_process_event_buf(),
>>>>>>
>>>>>> we should move unmask irq operation at end of this function.
>>>>>>
>>>>> This interrupt storm can happen because we clear the evt->flags
>>>>> _after_
>>>>> we unmask the interrupt. This was done to prevent false interrupt
>>>>> from
>>>>> delay interrupt deassertion, which can be a problem for legacy pci
>>>>> interrupt.
>>>> thanks for explain, i didn't know that.
>>>>> see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
>>>>>
>>>>> The change JaeHun Jung did should be fine.
>>>> agree.
>>> The change may still need some additional check as suggested in my
>>> response:
>>> https://lore.kernel.org/linux-usb/[email protected]/T/#m7b907aa6da4023cb20fa00a57813d31fd84e081f
>>>
>>  do you think we need to read event count before checking
>> DWC3_EVENT_PENDING  ?
> sorry for this noise, may be i have a little understanding of the
> legcy pci issue now.
one more question, is it legacy PCIe device still exist in real world ?
and any VID/PID info ?
>>>
>>> BR,
>>> Thinh

2023-01-11 00:48:13

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Tue, Jan 10, 2023, Linyu Yuan wrote:
>
> On 1/10/2023 11:13 AM, Linyu Yuan wrote:
> >
> > On 1/10/2023 11:05 AM, Linyu Yuan wrote:
> > >
> > > On 1/10/2023 10:53 AM, Thinh Nguyen wrote:
> > > > On Tue, Jan 10, 2023, Linyu Yuan wrote:
> > > > > On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
> > > > > > On Fri, Jan 06, 2023, Linyu Yuan wrote:
> > > > > > > On 1/5/2023 5:54 PM, 정재훈 wrote:
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Linyu Yuan [mailto:[email protected]]
> > > > > > > > > Sent: Thursday, January 5, 2023 12:35 PM
> > > > > > > > > To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> > > > > > > > > Cc: open list:USB XHCI DRIVER; open list;
> > > > > > > > > Seungchull Suh; Daehwan Jung
> > > > > > > > > Subject: Re: [PATCH] usb: dwc3: Clear
> > > > > > > > > DWC3_EVENT_PENDING when count is 0
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > > > > > > > > > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> > > > > > > > > > > Sometimes very rarely, The count is
> > > > > > > > > > > 0 and the DWC3 flag is set has
> > > > > > > > > > > status.
> > > > > > > > > > > It must not have these status.
> > > > > > > > > > > Because, It can make happen
> > > > > > > > > > > interrupt
> > > > > > > > > > > storming status.
> > > > > > > > > > could you help explain without clear the
> > > > > > > > > > flag, how interrupt storming
> > > > > > > > > > happen ?
> > > > > > > > > >
> > > > > > > > > > as your change didn't touch any hardware
> > > > > > > > > > register, i don't know how it
> > > > > > > > > > fix storming.
> > > > > > > > > >
> > > > > > > > H/W interrupts are still occur on IP.
> > > > > > > I guess we should fix it from IP layer.
> > > > > > >
> > > > > > How are you certain the problem is from IP layer?
> > > > > I think all IRQ is from DWC3 controller IP. if it is not IP
> > > > > layer, could you
> > > > > share how to prevent from SW layer ?
> > > > >
> > > > > seem IRQ can happen when event count is zero ,  why this can
> > > > > happen ? does
> > > > > it mean event count register is not trust ?
> > > > When the interrupt is unmasked, the controller will generate interrupts
> > > > as events are received. Normally, the flag checking for pending event
> > > > should be cleared before unmasking the interrupt, but we clear it after
> > > > to account for possible false interrupt due to the nature of legacy pci
> > > > interrupt. This exposes a gap where the interrupts can come but
> > > > the flag
> > > > isn't cleared. While it should be rare and shouldn't be too much of a
> > > > problem, we can avoid this scenario with some additional checks.
> > > >
> > > > > > > but when checking DWC3_EVENT_PENDING flag, it will
> > > > > > > auto clear in dwc3 thread
> > > > > > > irq handler.
> > > > > > >
> > > > > > > there is one possible root cause is it cleared only
> > > > > > > after irq enabled in
> > > > > > > dwc3_process_event_buf(),
> > > > > > >
> > > > > > > we should move unmask irq operation at end of this function.
> > > > > > >
> > > > > > This interrupt storm can happen because we clear the
> > > > > > evt->flags _after_
> > > > > > we unmask the interrupt. This was done to prevent false
> > > > > > interrupt from
> > > > > > delay interrupt deassertion, which can be a problem for legacy pci
> > > > > > interrupt.
> > > > > thanks for explain, i didn't know that.
> > > > > > see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
> > > > > >
> > > > > > The change JaeHun Jung did should be fine.
> > > > > agree.
> > > > The change may still need some additional check as suggested in my
> > > > response:
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/[email protected]/T/*m7b907aa6da4023cb20fa00a57813d31fd84e081f__;Iw!!A4F2R9G_pg!dchc0UAeZnm7PcA-aav_58rsw7agMygwPSGC_kN8Ga_BS3oTbLMNh3DkfQ9B2njbva-tJzTj7Cb2dJnE5RbEW6KJnDmtFA$
> > > >
> > >  do you think we need to read event count before checking
> > > DWC3_EVENT_PENDING  ?
> > sorry for this noise, may be i have a little understanding of the legcy
> > pci issue now.
> one more question, is it legacy PCIe device still exist in real world ? and
> any VID/PID info ?

Currently, all dwc3 PCIe devices are affected. Some setups are more
noticeable than others. The dwc3 driver is implemented to probe platform
devices. So, dwc3 PCIe devices are wrapped as platform devices for the
dwc3 driver. Since we're going through the platform device code path,
the pci layer falls back to using legacy interrupt instead of MSI (last
I check awhile ago).

A little more detail on this problem:
PCIe legacy interrupt will emulate interrupt line by sending an
interrupt assert and deassert messages. After the interrupt assert
message is sent, interrupts are continuously generated until the
deassert message is sent. If there's a register write to unmask/mask
interrupt or clearing events falls in between these messages, then there
may be a race.

Let's say we don't have event pending check, this can happen:

Normal scenario
---------------
event_count += n # controller generates new events
interrupt asserts
write(mask irq)
event_count -= n # dwc3 clears events
interrupt deasserts
write(unmask irq)


Race scenario
-------------
event_count += n # new events
interrupt asserts
write(mask irq)
event_count -= n # clear events
event_count += n # more events come and hard irq handler gets called
# again as interrupt is generated, but cached
# events haven't been handled. This breaks
# serialization and causes lost events.
write(mask irq)

event_count -= n # clear events
interrupt deasserts
write(unmask irq) # events handled


For MSI, this won't be a problem because it's edge-triggered and the way
it sends interrupt is different.

BR,
Thinh

2023-01-11 02:26:17

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 1/11/2023 8:00 AM, Thinh Nguyen wrote:
>
>> one more question, is it legacy PCIe device still exist in real world ? and
>> any VID/PID info ?
> Currently, all dwc3 PCIe devices are affected. Some setups are more


if non PCIe device have no such issue, can we do some improvement for it ?

like new flag or static key/jump label to improve interrupt handler ?


> noticeable than others. The dwc3 driver is implemented to probe platform
> devices. So, dwc3 PCIe devices are wrapped as platform devices for the
> dwc3 driver. Since we're going through the platform device code path,
> the pci layer falls back to using legacy interrupt instead of MSI (last
> I check awhile ago).
>
> A little more detail on this problem:
> PCIe legacy interrupt will emulate interrupt line by sending an
> interrupt assert and deassert messages. After the interrupt assert
> message is sent, interrupts are continuously generated until the
> deassert message is sent. If there's a register write to unmask/mask
> interrupt or clearing events falls in between these messages, then there
> may be a race.
>
> Let's say we don't have event pending check, this can happen:
>
> Normal scenario
> ---------------
> event_count += n # controller generates new events
> interrupt asserts
> write(mask irq)
> event_count -= n # dwc3 clears events
> interrupt deasserts
> write(unmask irq)
>
>
> Race scenario
> -------------
> event_count += n # new events
> interrupt asserts
> write(mask irq)
> event_count -= n # clear events
> event_count += n # more events come and hard irq handler gets called
> # again as interrupt is generated, but cached
> # events haven't been handled. This breaks
> # serialization and causes lost events.
> write(mask irq)
>
> event_count -= n # clear events
> interrupt deasserts
> write(unmask irq) # events handled


if mask irq is not working, the race will happen like this, thanks for
explanation.


>
> For MSI, this won't be a problem because it's edge-triggered and the way
> it sends interrupt is different.
>
> BR,
> Thinh

2023-01-11 02:49:19

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Wed, Jan 11, 2023, Linyu Yuan wrote:
>
> On 1/11/2023 8:00 AM, Thinh Nguyen wrote:
> >
> > > one more question, is it legacy PCIe device still exist in real world ? and
> > > any VID/PID info ?
> > Currently, all dwc3 PCIe devices are affected. Some setups are more
>
>
> if non PCIe device have no such issue, can we do some improvement for it ?
>
> like new flag or static key/jump label to improve interrupt handler ?
>

There are different ways to handle this. At the time, it was (and is?)
the simplest solution. There isn't any observable performance
degradation from testing so there's no incentive to change it yet.

BR,
Thinh

>
> > noticeable than others. The dwc3 driver is implemented to probe platform
> > devices. So, dwc3 PCIe devices are wrapped as platform devices for the
> > dwc3 driver. Since we're going through the platform device code path,
> > the pci layer falls back to using legacy interrupt instead of MSI (last
> > I check awhile ago).
> >
> > A little more detail on this problem:
> > PCIe legacy interrupt will emulate interrupt line by sending an
> > interrupt assert and deassert messages. After the interrupt assert
> > message is sent, interrupts are continuously generated until the
> > deassert message is sent. If there's a register write to unmask/mask
> > interrupt or clearing events falls in between these messages, then there
> > may be a race.
> >
> > Let's say we don't have event pending check, this can happen:
> >
> > Normal scenario
> > ---------------
> > event_count += n # controller generates new events
> > interrupt asserts
> > write(mask irq)
> > event_count -= n # dwc3 clears events
> > interrupt deasserts
> > write(unmask irq)
> >
> >
> > Race scenario
> > -------------
> > event_count += n # new events
> > interrupt asserts
> > write(mask irq)
> > event_count -= n # clear events
> > event_count += n # more events come and hard irq handler gets called
> > # again as interrupt is generated, but cached
> > # events haven't been handled. This breaks
> > # serialization and causes lost events.
> > write(mask irq)
> >
> > event_count -= n # clear events
> > interrupt deasserts
> > write(unmask irq) # events handled
>
>
> if mask irq is not working, the race will happen like this, thanks for
> explanation.
>
>
> >
> > For MSI, this won't be a problem because it's edge-triggered and the way
> > it sends interrupt is different.
> >

2023-01-31 06:39:11

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

hi Thinh,


regarding your suggestion, assume it is not PCIe type,  still have one
question,


-       if (evt->flags & DWC3_EVENT_PENDING)
+       if (evt->flags & DWC3_EVENT_PENDING) {
+               if (!evt->count) {
+                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
+
+                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
+                               evt->flags &= ~DWC3_EVENT_PENDING;

do we need to return IRQ_WAKE_THREAD  ?

+               }
                return IRQ_HANDLED;

as here return IRQ HANDLED, how can we make sure a new IRQ will be
handled after previous IRQ thread clean PENDING flag ?

+       }


also for non-PCIe controller, consider IRQ mask register working correctly,

consider a case IRQ happen before IRQ thread exit,  here just return
IRQ_HANDLED.

once IRQ thread exit, it will clean PENDING flag, so next IRQ event will
run normally.

if 정재훈 saw PENDING flag is not cleared, does it mean IRQ thread have no
chance to exit ?

2023-02-01 18:58:40

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Tue, Jan 31, 2023, Linyu Yuan wrote:
> hi Thinh,
>
>
> regarding your suggestion, assume it is not PCIe type,  still have one
> question,
>
>
> -       if (evt->flags & DWC3_EVENT_PENDING)
> +       if (evt->flags & DWC3_EVENT_PENDING) {
> +               if (!evt->count) {
> +                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
> +
> +                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
> +                               evt->flags &= ~DWC3_EVENT_PENDING;
>
> do we need to return IRQ_WAKE_THREAD  ?

No, if evt->count is 0, but GEVNTCOUNT is > 0, the controller will
generate interrupt. The evt->count will be updated and the events will
be handled on the next interrupt.

>
> +               }
>                 return IRQ_HANDLED;
>
> as here return IRQ HANDLED, how can we make sure a new IRQ will be handled
> after previous IRQ thread clean PENDING flag ?

If evt->count > 0, that means the bottom half is still running. So,
leave it be. If evt->count == 0, then the cached events are processed,
we're safe to clear the PENDING flag. New interrupt will be generated if
GEVNTCOUNT is > 0.

>
> +       }
>
>
> also for non-PCIe controller, consider IRQ mask register working correctly,
>
> consider a case IRQ happen before IRQ thread exit,  here just return
> IRQ_HANDLED.
>
> once IRQ thread exit, it will clean PENDING flag, so next IRQ event will run
> normally.
>
> if 정재훈 saw PENDING flag is not cleared, does it mean IRQ thread have no
> chance to exit ?

The PENDING flag should be cleared eventually when the bottom half
completes. I don't expect the interrupt storm to block the IRQ thread
forever, but I can't guarantee the device behavor. 정재훈 can confirm.
This change should resolve the interrupt storm.

BR,
Thinh

2023-02-02 05:00:51

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 2/2/2023 2:57 AM, Thinh Nguyen wrote:
> On Tue, Jan 31, 2023, Linyu Yuan wrote:
>> hi Thinh,
>>
>>
>> regarding your suggestion, assume it is not PCIe type,  still have one
>> question,
>>
>>
>> -       if (evt->flags & DWC3_EVENT_PENDING)
>> +       if (evt->flags & DWC3_EVENT_PENDING) {
>> +               if (!evt->count) {
>> +                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>> +
>> +                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
>> +                               evt->flags &= ~DWC3_EVENT_PENDING;
>>
>> do we need to return IRQ_WAKE_THREAD  ?
> No, if evt->count is 0, but GEVNTCOUNT is > 0, the controller will
> generate interrupt. The evt->count will be updated and the events will
> be handled on the next interrupt.


when will next interrupt happen ?

as when enter here, i guess GEVENTCOUNT is already > 0, but we didn't
read it.


>
>> +               }
>>                 return IRQ_HANDLED;
>>
>> as here return IRQ HANDLED, how can we make sure a new IRQ will be handled
>> after previous IRQ thread clean PENDING flag ?
> If evt->count > 0, that means the bottom half is still running. So,
> leave it be. If evt->count == 0, then the cached events are processed,
> we're safe to clear the PENDING flag. New interrupt will be generated if
> GEVNTCOUNT is > 0.
>
>> +       }
>>
>>
>> also for non-PCIe controller, consider IRQ mask register working correctly,
>>
>> consider a case IRQ happen before IRQ thread exit,  here just return
>> IRQ_HANDLED.
>>
>> once IRQ thread exit, it will clean PENDING flag, so next IRQ event will run
>> normally.
>>
>> if 정재훈 saw PENDING flag is not cleared, does it mean IRQ thread have no
>> chance to exit ?
> The PENDING flag should be cleared eventually when the bottom half
> completes. I don't expect the interrupt storm to block the IRQ thread
> forever, but I can't guarantee the device behavor. 정재훈 can confirm.
> This change should resolve the interrupt storm.
>
> BR,
> Thinh

2023-02-02 20:07:29

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

On Thu, Feb 02, 2023, Linyu Yuan wrote:
>
> On 2/2/2023 2:57 AM, Thinh Nguyen wrote:
> > On Tue, Jan 31, 2023, Linyu Yuan wrote:
> > > hi Thinh,
> > >
> > >
> > > regarding your suggestion, assume it is not PCIe type,  still have one
> > > question,
> > >
> > >
> > > -       if (evt->flags & DWC3_EVENT_PENDING)
> > > +       if (evt->flags & DWC3_EVENT_PENDING) {
> > > +               if (!evt->count) {
> > > +                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
> > > +
> > > +                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
> > > +                               evt->flags &= ~DWC3_EVENT_PENDING;
> > >
> > > do we need to return IRQ_WAKE_THREAD  ?
> > No, if evt->count is 0, but GEVNTCOUNT is > 0, the controller will
> > generate interrupt. The evt->count will be updated and the events will
> > be handled on the next interrupt.
>
>
> when will next interrupt happen ?

Immediately after. You can test this by just return IRQ_HANDLED and not
clear the GEVNTCOUNT to see its behavior.

>
> as when enter here, i guess GEVENTCOUNT is already > 0, but we didn't read
> it.

GEVNTCOUNT is always updating as new events are generated. We only clear
however many events we process, but that doesn't stop it from
incrementing.

BR,
Thinh

>
>
> >
> > > +               }
> > >                 return IRQ_HANDLED;
> > >
> > > as here return IRQ HANDLED, how can we make sure a new IRQ will be handled
> > > after previous IRQ thread clean PENDING flag ?
> > If evt->count > 0, that means the bottom half is still running. So,
> > leave it be. If evt->count == 0, then the cached events are processed,
> > we're safe to clear the PENDING flag. New interrupt will be generated if
> > GEVNTCOUNT is > 0.
> >
> > > +       }
> > >
> > >
> > > also for non-PCIe controller, consider IRQ mask register working correctly,
> > >
> > > consider a case IRQ happen before IRQ thread exit,  here just return
> > > IRQ_HANDLED.
> > >
> > > once IRQ thread exit, it will clean PENDING flag, so next IRQ event will run
> > > normally.
> > >
> > > if 정재훈 saw PENDING flag is not cleared, does it mean IRQ thread have no
> > > chance to exit ?
> > The PENDING flag should be cleared eventually when the bottom half
> > completes. I don't expect the interrupt storm to block the IRQ thread
> > forever, but I can't guarantee the device behavor. 정재훈 can confirm.
> > This change should resolve the interrupt storm.
> >
> > BR,
> > Thinh

2023-02-03 02:19:14

by Linyu Yuan

[permalink] [raw]
Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0


On 2/3/2023 4:06 AM, Thinh Nguyen wrote:
> On Thu, Feb 02, 2023, Linyu Yuan wrote:
>> On 2/2/2023 2:57 AM, Thinh Nguyen wrote:
>>> On Tue, Jan 31, 2023, Linyu Yuan wrote:
>>>> hi Thinh,
>>>>
>>>>
>>>> regarding your suggestion, assume it is not PCIe type,  still have one
>>>> question,
>>>>
>>>>
>>>> -       if (evt->flags & DWC3_EVENT_PENDING)
>>>> +       if (evt->flags & DWC3_EVENT_PENDING) {
>>>> +               if (!evt->count) {
>>>> +                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>> +
>>>> +                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
>>>> +                               evt->flags &= ~DWC3_EVENT_PENDING;
>>>>
>>>> do we need to return IRQ_WAKE_THREAD  ?
>>> No, if evt->count is 0, but GEVNTCOUNT is > 0, the controller will
>>> generate interrupt. The evt->count will be updated and the events will
>>> be handled on the next interrupt.
>>
>> when will next interrupt happen ?
> Immediately after. You can test this by just return IRQ_HANDLED and not
> clear the GEVNTCOUNT to see its behavior.


if it immediately, it will be good.


정재훈 could you update a new patch which Thinh suggest.

maybe we didn't find the root cause of irq strom, but the change have no side effect.


>
>> as when enter here, i guess GEVENTCOUNT is already > 0, but we didn't read
>> it.
> GEVNTCOUNT is always updating as new events are generated. We only clear
> however many events we process, but that doesn't stop it from
> incrementing.


just consider if there is a case that next GEVNETCOUNT increase which
happen long time later,

maybe think too much.


>
> BR,
> Thinh
>
>>
>>>> +               }
>>>>                 return IRQ_HANDLED;
>>>>
>>>> as here return IRQ HANDLED, how can we make sure a new IRQ will be handled
>>>> after previous IRQ thread clean PENDING flag ?
>>> If evt->count > 0, that means the bottom half is still running. So,
>>> leave it be. If evt->count == 0, then the cached events are processed,
>>> we're safe to clear the PENDING flag. New interrupt will be generated if
>>> GEVNTCOUNT is > 0.
>>>
>>>> +       }
>>>>
>>>>
>>>> also for non-PCIe controller, consider IRQ mask register working correctly,
>>>>
>>>> consider a case IRQ happen before IRQ thread exit,  here just return
>>>> IRQ_HANDLED.
>>>>
>>>> once IRQ thread exit, it will clean PENDING flag, so next IRQ event will run
>>>> normally.
>>>>
>>>> if 정재훈 saw PENDING flag is not cleared, does it mean IRQ thread have no
>>>> chance to exit ?
>>> The PENDING flag should be cleared eventually when the bottom half
>>> completes. I don't expect the interrupt storm to block the IRQ thread
>>> forever, but I can't guarantee the device behavor. 정재훈 can confirm.
>>> This change should resolve the interrupt storm.
>>>
>>> BR,
>>> Thinh