2023-06-18 12:42:41

by Krishna Kurapati PSSNV

[permalink] [raw]
Subject: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup

In scenarios where pullup relies on resume (get sync) to initialize
the controller and set the run stop bit, then core_init is followed by
gadget_resume which will eventually set run stop bit.

But in cases where the core_init fails, the return value is not sent
back to udc appropriately. So according to UDC the controller has
started but in reality we never set the run stop bit.

On systems like Android, there are uevents sent to HAL depending on
whether the configfs_bind / configfs_disconnect were invoked. In the
above mentioned scnenario, if the core init fails, the run stop won't
be set and the cable plug-out won't result in generation of any
disconnect event and userspace would never get any uevent regarding
cable plug out and we never call pullup(0) again. Furthermore none of
the next Plug-In/Plug-Out's would be known to configfs.

Return back the appropriate result to UDC to let the userspace/
configfs know that the pullup failed so they can take appropriate
action.

Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded")
Signed-off-by: Krishna Kurapati <[email protected]>
Acked-by: Thinh Nguyen <[email protected]>
---
Changes in v3: Added changelog mising in v2
Changes in v2: Added Fixes tag

drivers/usb/dwc3/gadget.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 578804dc29ca..27cb671e18e3 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
ret = pm_runtime_get_sync(dwc->dev);
if (!ret || ret < 0) {
pm_runtime_put(dwc->dev);
- return 0;
+ if (ret < 0)
+ pm_runtime_set_suspended(dwc->dev);
+ return ret;
}

if (dwc->pullups_connected == is_on) {
--
2.40.0



2023-06-19 07:59:27

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup

On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:
> In scenarios where pullup relies on resume (get sync) to initialize
> the controller and set the run stop bit, then core_init is followed by
> gadget_resume which will eventually set run stop bit.
>
> But in cases where the core_init fails, the return value is not sent
> back to udc appropriately. So according to UDC the controller has
> started but in reality we never set the run stop bit.
>
> On systems like Android, there are uevents sent to HAL depending on
> whether the configfs_bind / configfs_disconnect were invoked. In the
> above mentioned scnenario, if the core init fails, the run stop won't
> be set and the cable plug-out won't result in generation of any
> disconnect event and userspace would never get any uevent regarding
> cable plug out and we never call pullup(0) again. Furthermore none of
> the next Plug-In/Plug-Out's would be known to configfs.
>
> Return back the appropriate result to UDC to let the userspace/
> configfs know that the pullup failed so they can take appropriate
> action.
>
> Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded")
> Signed-off-by: Krishna Kurapati <[email protected]>
> Acked-by: Thinh Nguyen <[email protected]>
> ---
> Changes in v3: Added changelog mising in v2
> Changes in v2: Added Fixes tag
>
> drivers/usb/dwc3/gadget.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 578804dc29ca..27cb671e18e3 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
> ret = pm_runtime_get_sync(dwc->dev);
> if (!ret || ret < 0) {
> pm_runtime_put(dwc->dev);
> - return 0;
> + if (ret < 0)
> + pm_runtime_set_suspended(dwc->dev);

This bit is broken and is also not mentioned or explained in the commit
message. What are you trying to achieve here?

You cannot set the state like this after runtime PM is enabled and the
above call will always fail.

> + return ret;
> }
>
> if (dwc->pullups_connected == is_on) {

Johan

2023-06-19 13:06:00

by Krishna Kurapati PSSNV

[permalink] [raw]
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup



On 6/19/2023 12:36 PM, Johan Hovold wrote:
> On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:
>> In scenarios where pullup relies on resume (get sync) to initialize
>> the controller and set the run stop bit, then core_init is followed by
>> gadget_resume which will eventually set run stop bit.
>>
>> But in cases where the core_init fails, the return value is not sent
>> back to udc appropriately. So according to UDC the controller has
>> started but in reality we never set the run stop bit.
>>
>> On systems like Android, there are uevents sent to HAL depending on
>> whether the configfs_bind / configfs_disconnect were invoked. In the
>> above mentioned scnenario, if the core init fails, the run stop won't
>> be set and the cable plug-out won't result in generation of any
>> disconnect event and userspace would never get any uevent regarding
>> cable plug out and we never call pullup(0) again. Furthermore none of
>> the next Plug-In/Plug-Out's would be known to configfs.
>>
>> Return back the appropriate result to UDC to let the userspace/
>> configfs know that the pullup failed so they can take appropriate
>> action.
>>
>> Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC unbinded")
>> Signed-off-by: Krishna Kurapati <[email protected]>
>> Acked-by: Thinh Nguyen <[email protected]>
>> ---
>> Changes in v3: Added changelog mising in v2
>> Changes in v2: Added Fixes tag
>>
>> drivers/usb/dwc3/gadget.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 578804dc29ca..27cb671e18e3 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>> ret = pm_runtime_get_sync(dwc->dev);
>> if (!ret || ret < 0) {
>> pm_runtime_put(dwc->dev);
>> - return 0;
>> + if (ret < 0)
>> + pm_runtime_set_suspended(dwc->dev);
>
> This bit is broken and is also not mentioned or explained in the commit
> message. What are you trying to achieve here?
>
> You cannot set the state like this after runtime PM is enabled and the
> above call will always fail.
>
Hi Johan,

Apologies, I didn't understand the comment fully.

The reason why I an returning ret is because, when the first get_sync
fails because of core_init failure and we return 0 instead of ret, the
UDC thinks that controller has started successfully but we never set the
run stop bit. So when we plug out the cable, the disconnect event won't
be generated and we never send on systems like android the user space
will never clear the UDC upon disconnect. Its a sort of mismatch between
controller and udc.

Also once the first get_sync fails, the dwc->dev->power.runtime_error
flag is set and successive calls to get_sync always return -EINVAL. In
this situation even if UDC/configfs retry pullup, resume_common will
never be called and we never actually start the controller or resume
dwc->dev.

By calling set_suspended, I am trying to clear the runtime_error flag so
that the next retry to pullup will call resume_common and retry
core_init and set run_stop.

Regards,
Krishna,

>> + return ret;
>> }
>>
>> if (dwc->pullups_connected == is_on) {
>
> Johan

2023-06-19 14:15:36

by Krishna Kurapati PSSNV

[permalink] [raw]
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup



On 6/19/2023 6:20 PM, Krishna Kurapati PSSNV wrote:
>
>
> On 6/19/2023 12:36 PM, Johan Hovold wrote:
>> On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:
>>> In scenarios where pullup relies on resume (get sync) to initialize
>>> the controller and set the run stop bit, then core_init is followed by
>>> gadget_resume which will eventually set run stop bit.
>>>
>>> But in cases where the core_init fails, the return value is not sent
>>> back to udc appropriately. So according to UDC the controller has
>>> started but in reality we never set the run stop bit.
>>>
>>> On systems like Android, there are uevents sent to HAL depending on
>>> whether the configfs_bind / configfs_disconnect were invoked. In the
>>> above mentioned scnenario, if the core init fails, the run stop won't
>>> be set and the cable plug-out won't result in generation of any
>>> disconnect event and userspace would never get any uevent regarding
>>> cable plug out and we never call pullup(0) again. Furthermore none of
>>> the next Plug-In/Plug-Out's would be known to configfs.
>>>
>>> Return back the appropriate result to UDC to let the userspace/
>>> configfs know that the pullup failed so they can take appropriate
>>> action.
>>>
>>> Fixes: 77adb8bdf422 ("usb: dwc3: gadget: Allow runtime suspend if UDC
>>> unbinded")
>>> Signed-off-by: Krishna Kurapati <[email protected]>
>>> Acked-by: Thinh Nguyen <[email protected]>
>>> ---
>>> Changes in v3: Added changelog mising in v2
>>> Changes in v2: Added Fixes tag
>>>
>>>   drivers/usb/dwc3/gadget.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index 578804dc29ca..27cb671e18e3 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget
>>> *g, int is_on)
>>>       ret = pm_runtime_get_sync(dwc->dev);
>>>       if (!ret || ret < 0) {
>>>           pm_runtime_put(dwc->dev);
>>> -        return 0;
>>> +        if (ret < 0)
>>> +            pm_runtime_set_suspended(dwc->dev);
>>
>> This bit is broken and is also not mentioned or explained in the commit
>> message. What are you trying to achieve here?
>>
>> You cannot set the state like this after runtime PM is enabled and the
>> above call will always fail.
>>
> Hi Johan,
>
>   Apologies, I didn't understand the comment fully.
>
> The reason why I an returning ret is because, when the first get_sync
> fails because of core_init failure and we return 0 instead of ret, the
> UDC thinks that controller has started successfully but we never set the
> run stop bit. So when we plug out the cable,  the disconnect event won't
> be generated and we never send on systems like android the user space > will never clear the UDC upon disconnect. Its a sort of mismatch between
> controller and udc. >
Typo here:

So when we plug out the cable, the disconnect event won't be generated
and on systems like android the user space will never clear the UDC upon
disconnect.

> Also once the first get_sync fails, the dwc->dev->power.runtime_error
> flag is set and successive calls to get_sync always return -EINVAL. In
> this situation even if UDC/configfs retry pullup, resume_common will
> never be called and we never actually start the controller or resume
> dwc->dev.
>
> By calling set_suspended, I am trying to clear the runtime_error flag so
> that the next retry to pullup will call resume_common and retry
> core_init and set run_stop.
>
> Regards,
> Krishna,
>
>>> +        return ret;
>>>       }
>>>       if (dwc->pullups_connected == is_on) {
>>
>> Johan

2023-06-19 15:10:32

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup

On Mon, Jun 19, 2023 at 06:20:43PM +0530, Krishna Kurapati PSSNV wrote:
> On 6/19/2023 12:36 PM, Johan Hovold wrote:
> > On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:

> >> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
> >> ret = pm_runtime_get_sync(dwc->dev);
> >> if (!ret || ret < 0) {
> >> pm_runtime_put(dwc->dev);
> >> - return 0;
> >> + if (ret < 0)
> >> + pm_runtime_set_suspended(dwc->dev);
> >
> > This bit is broken and is also not mentioned or explained in the commit
> > message. What are you trying to achieve here?
> >
> > You cannot set the state like this after runtime PM is enabled and the
> > above call will always fail.

> The reason why I an returning ret is because, when the first get_sync
> fails because of core_init failure and we return 0 instead of ret, the
> UDC thinks that controller has started successfully but we never set the
> run stop bit.

That bit is clear.

> So when we plug out the cable, the disconnect event won't
> be generated and we never send on systems like android the user space
> will never clear the UDC upon disconnect. Its a sort of mismatch between
> controller and udc.

Ok, but the controller is an error state after the resume failure. And
here you rely on user space to retry gadget activation in order to
eventually detect the disconnect event?

> Also once the first get_sync fails, the dwc->dev->power.runtime_error
> flag is set and successive calls to get_sync always return -EINVAL. In
> this situation even if UDC/configfs retry pullup, resume_common will
> never be called and we never actually start the controller or resume
> dwc->dev.
>
> By calling set_suspended, I am trying to clear the runtime_error flag so
> that the next retry to pullup will call resume_common and retry
> core_init and set run_stop.

Ok, thanks, that's the bit I was missing in the commit message.

First, I perhaps mistakingly thought pm_runtime_set_suspended() may only
be called with PM runtime disabled, but it appears it may indeed be
valid to call also after an error but with the caveat that the device
must then actually be in the suspended state.

The documentation and implementation is inconsistent here as the kernel
doc for pm_runtime_set_suspended() clearly states:

It is not valid to call this function for devices with runtime
PM enabled.

and it also looks like we'd end up with an active-child counter
imbalance if anyone actually tries to do so.

But either way, it also seems like the controller is not guaranteed to
be suspended here as pm_runtime_get_sync() may also fail after a
previous errors that have left the controller in the active state?

Also, what kind of errors would cause core_init and resume to fail here?

If this is something that you see during normal operation then this
seems to suggest that something is wrong with the runtime pm
implementation.

Note that virtually all drivers treat resume failures as fatal errors
and do not implement any recovery from that.

In fact, the only other example of this kind of usage that I could find
is also for a Qualcomm driver...

Johan

2023-06-19 15:24:53

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup

On Mon, Jun 19, 2023 at 05:06:45PM +0200, Johan Hovold wrote:

> and it also looks like we'd end up with an active-child counter
> imbalance if anyone actually tries to do so.

Scratch this bit, I misread the code.

Johan

2023-06-20 05:03:38

by Krishna Kurapati PSSNV

[permalink] [raw]
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC during pullup



On 6/19/2023 8:36 PM, Johan Hovold wrote:
> On Mon, Jun 19, 2023 at 06:20:43PM +0530, Krishna Kurapati PSSNV wrote:
>> On 6/19/2023 12:36 PM, Johan Hovold wrote:
>>> On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:
>
>>>> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>> ret = pm_runtime_get_sync(dwc->dev);
>>>> if (!ret || ret < 0) {
>>>> pm_runtime_put(dwc->dev);
>>>> - return 0;
>>>> + if (ret < 0)
>>>> + pm_runtime_set_suspended(dwc->dev);
>>>
>>> This bit is broken and is also not mentioned or explained in the commit
>>> message. What are you trying to achieve here?
>>>
>>> You cannot set the state like this after runtime PM is enabled and the
>>> above call will always fail.
>
>> The reason why I an returning ret is because, when the first get_sync
>> fails because of core_init failure and we return 0 instead of ret, the
>> UDC thinks that controller has started successfully but we never set the
>> run stop bit.
>
> That bit is clear.
>
>> So when we plug out the cable, the disconnect event won't
>> be generated and we never send on systems like android the user space
>> will never clear the UDC upon disconnect. Its a sort of mismatch between
>> controller and udc.
>
> Ok, but the controller is an error state after the resume failure. And
> here you rely on user space to retry gadget activation in order to
> eventually detect the disconnect event?
>
>> Also once the first get_sync fails, the dwc->dev->power.runtime_error
>> flag is set and successive calls to get_sync always return -EINVAL. In
>> this situation even if UDC/configfs retry pullup, resume_common will
>> never be called and we never actually start the controller or resume
>> dwc->dev.
>>
>> By calling set_suspended, I am trying to clear the runtime_error flag so
>> that the next retry to pullup will call resume_common and retry
>> core_init and set run_stop.
>
> Ok, thanks, that's the bit I was missing in the commit message.
>
> First, I perhaps mistakingly thought pm_runtime_set_suspended() may only
> be called with PM runtime disabled, but it appears it may indeed be
> valid to call also after an error but with the caveat that the device
> must then actually be in the suspended state.
>
> The documentation and implementation is inconsistent here as the kernel
> doc for pm_runtime_set_suspended() clearly states:
>
> It is not valid to call this function for devices with runtime
> PM enabled.
>
> and it also looks like we'd end up with an active-child counter
> imbalance if anyone actually tries to do so.
>
> But either way, it also seems like the controller is not guaranteed to
> be suspended here as pm_runtime_get_sync() may also fail after a
> previous errors that have left the controller in the active state?
> > Also, what kind of errors would cause core_init and resume to fail here?
>
Hi Johan,

As per the comment just above the get_sync during pullup, the
resume_common path is used to resume the controller and start peripheral
mode incase the dwc3 was in suspended state. So if we are entering
gadget_resume we are in suspended state the first time it is called.

Regarding the errors that might leave controller in active state, I have
faced issue in core init, and in that function there is a cleanup
happening in case something fails. So controller was actually not in
active state after cleanup was done. In resume common, if
core_init_for_resume is failing, we cleanup everything initialized up
until that point.

The scenario you mentioned would be applicable in case gadget_resume
fails. We are not having a return value check and I am not sure what
would be the side effect of not having that check there. Either ways,
since it was failing for core init, I went ahead and made this patch.

As per the reason for failure in core init, the following is what was
happening at customer's end:

1. Cable plug-in
2. get_sync calls resume common which inturn calls core_init
3. core soft reset fails in core init, we cleanup and return -110
4. After applying this patch, the -110 was propagated to UDC properly
5. We got a second call to pullup via connect_control and this time
reset was successful.

The behavior was similar to [1]. There as well, on all Gen-2 targets,
after the retry happens I see soft reset is passing but failing for
first attempt.

[1]:
https://lore.kernel.org/all/[email protected]/

Regards,
Krishna,

> If this is something that you see during normal operation then this
> seems to suggest that something is wrong with the runtime pm
> implementation.
>
> Note that virtually all drivers treat resume failures as fatal errors
> and do not implement any recovery from that.
>
> In fact, the only other example of this kind of usage that I could find
> is also for a Qualcomm driver...
>
> Johan