2023-10-11 10:02:49

by Krishna Kurapati

[permalink] [raw]
Subject: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration

This implementation is to fix RAM interface getting stuck during
enumeration and controller not responding to any command.

During plug-out test cases, it is sometimes seen that no events
are generated by the controller and all CSR register reads give "0"
and CSR_Timeout bit gets set indicating that CSR reads/writes are
timing out or timed out.

The issue comes up on different instnaces of enumeration on different
platforms. On one platform, the debug log is as follows:

Prepared a TRB on ep0out and did start transfer to get set
address request from host:

<...>-7191 [000] D..1. 66.421006: dwc3_gadget_ep_cmd: ep0out:
cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 -->
status: Successful

<...>-7191 [000] D..1. 66.421196: dwc3_event: event (0000c040):
ep0out: Transfer Complete (sIL) [Setup Phase]

<...>-7191 [000] D..1. 66.421197: dwc3_ctrl_req: Set
Address(Addr = 01)

Then XFER NRDY received on ep0in for zero length status phase and
a Start Transfer was done on ep0in with 0-length packet in 2 Stage
status phase:

<...>-7191 [000] D..1. 66.421249: dwc3_event: event (000020c2):
ep0in: Transfer Not Ready [00000000] (Not Active) [Status Phase]

<...>-7191 [000] D..1. 66.421266: dwc3_prepare_trb: ep0in: trb
ffffffc00fcfd000 (E0:D0) buf 00000000efffa000 size 0 ctrl 00000c33
sofn 00000000 (HLcs:SC:status2)

<...>-7191 [000] D..1. 66.421387: dwc3_gadget_ep_cmd: ep0in: cmd
'Start Transfer' [406] params 00000000 efffa000 00000000 -->status:
Successful

Then a bus reset was received directly after 500 msec. Software never
got the cmd complete for the start transfer done in status phase. Here
the RAM interface is stuck. So host issues a bus reset as link is
idle for 500 msec:

<...>-7191 [000] D..1. 66.935603: dwc3_event: event (00000101):
Reset [U0]

Then software sees that it is in status phase and we issue an ENDXFER
on ep0in and it gets timedout waiting for the CMDACT to go '0':

<...>-7191 [000] D..1. 66.958249: dwc3_gadget_ep_cmd: ep0in: cmd
'End Transfer' [10508] params 00000000 00000000 00000000 --> status:
Timed Out

Upon debug with Synopsys, it turns out that the root cause is as
follows:

During any transfer, if the data is not successfully transmitted,
then a Done (with failure) handshake is returned, so that the BMU
can re-attempt the same data again by rewinding its data pointers.

But, if the USB IN is a 0-length payload (which is what is happening
in this case - 2 stage status phase of set_address), then there is no
need to rewind the pointers and the Done (with failure) handshake is
not returned for failure case. This keeps the Request-Done interface
busy till the next Done handshake. The MAC sends the 0-length payload
again when the host requests. If the transmission is successful this
time, the Done (with success) handshake is provided back. Otherwise,
it repeats the same steps again.

If the cable is disconnected or if the Host aborts the transfer on 3
consecutive failed attempts, the Request-Done handshake is not
complete. This keeps the interface busy.

The subsequent RAM access cannot proceed until the above pending
transfer is complete. This results in failure of any access to RAM
address locations. Many of the EndPoint commands need to access the
RAM and they would fail to complete successfully.

Furthermore when cable removal happens, this would not generate a
disconnect event and the "connected" flag remains true always blockin
suspend.

Synopsys confirmed that the issue is present on all USB3 devices and
as a workaround, suggested to re-initialize device mode.

Signed-off-by: Krishna Kurapati <[email protected]>
---
drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++
drivers/usb/dwc3/core.h | 4 ++++
drivers/usb/dwc3/drd.c | 5 +++++
drivers/usb/dwc3/gadget.c | 6 ++++--
4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 44ee8526dc28..d18b81cccdc5 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work)
unsigned long flags;
int ret;
u32 reg;
+ u8 timeout = 100;
u32 desired_dr_role;

mutex_lock(&dwc->mutex);
@@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct *work)
if (!desired_dr_role)
goto out;

+ /*
+ * STAR 5001544 - If cable disconnect doesn't generate
+ * disconnect event in device mode, then re-initialize the
+ * controller.
+ */
+ if ((dwc->cable_disconnected == true) &&
+ (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) {
+ while (dwc->connected == true && timeout != 0) {
+ mdelay(10);
+ timeout--;
+ }
+
+ if (timeout == 0) {
+ dwc3_gadget_soft_disconnect(dwc);
+ udelay(100);
+ dwc3_gadget_soft_connect(dwc);
+ }
+ }
+
if (desired_dr_role == dwc->current_dr_role)
goto out;

diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index c6c87acbd376..7642330cf608 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1355,6 +1355,7 @@ struct dwc3 {
int last_fifo_depth;
int num_ep_resized;
struct dentry *debug_root;
+ bool cable_disconnected;
};

#define INCRX_BURST_MODE 0
@@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc);

int dwc3_core_soft_reset(struct dwc3 *dwc);

+int dwc3_gadget_soft_disconnect(struct dwc3 *dwc);
+int dwc3_gadget_soft_connect(struct dwc3 *dwc);
+
#if IS_ENABLED(CONFIG_USB_DWC3_HOST) || IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE)
int dwc3_host_init(struct dwc3 *dwc);
void dwc3_host_exit(struct dwc3 *dwc);
diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c
index 039bf241769a..593c023fc39a 100644
--- a/drivers/usb/dwc3/drd.c
+++ b/drivers/usb/dwc3/drd.c
@@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw,
struct dwc3 *dwc = usb_role_switch_get_drvdata(sw);
u32 mode;

+ dwc->cable_disconnected = false;
+
switch (role) {
case USB_ROLE_HOST:
mode = DWC3_GCTL_PRTCAP_HOST;
@@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw,
mode = DWC3_GCTL_PRTCAP_DEVICE;
break;
default:
+ if (role == USB_ROLE_NONE)
+ dwc->cable_disconnected = true;
+
if (dwc->role_switch_default_mode == USB_DR_MODE_HOST)
mode = DWC3_GCTL_PRTCAP_HOST;
else
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 858fe4c299b7..a92df2e04cce 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2634,7 +2634,7 @@ static void dwc3_gadget_disable_irq(struct dwc3 *dwc);
static void __dwc3_gadget_stop(struct dwc3 *dwc);
static int __dwc3_gadget_start(struct dwc3 *dwc);

-static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
+int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
{
unsigned long flags;
int ret;
@@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
return ret;
}

-static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
+int dwc3_gadget_soft_connect(struct dwc3 *dwc)
{
int ret;

@@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
dwc3_gadget_dctl_write_safe(dwc, reg);

dwc->connected = false;
+ dwc->cable_disconnected = true;

dwc3_disconnect_gadget(dwc);

@@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
*/
dwc3_stop_active_transfers(dwc);
dwc->connected = true;
+ dwc->cable_disconnected = false;

reg = dwc3_readl(dwc->regs, DWC3_DCTL);
reg &= ~DWC3_DCTL_TSTCTRL_MASK;
--
2.42.0


2023-10-12 18:00:09

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration

On Wed, Oct 11, 2023, Krishna Kurapati wrote:
> This implementation is to fix RAM interface getting stuck during
> enumeration and controller not responding to any command.
>
> During plug-out test cases, it is sometimes seen that no events
> are generated by the controller and all CSR register reads give "0"
> and CSR_Timeout bit gets set indicating that CSR reads/writes are
> timing out or timed out.
>
> The issue comes up on different instnaces of enumeration on different
> platforms. On one platform, the debug log is as follows:
>
> Prepared a TRB on ep0out and did start transfer to get set
> address request from host:
>
> <...>-7191 [000] D..1. 66.421006: dwc3_gadget_ep_cmd: ep0out:
> cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 -->
> status: Successful
>
> <...>-7191 [000] D..1. 66.421196: dwc3_event: event (0000c040):
> ep0out: Transfer Complete (sIL) [Setup Phase]
>
> <...>-7191 [000] D..1. 66.421197: dwc3_ctrl_req: Set
> Address(Addr = 01)
>
> Then XFER NRDY received on ep0in for zero length status phase and
> a Start Transfer was done on ep0in with 0-length packet in 2 Stage
> status phase:
>
> <...>-7191 [000] D..1. 66.421249: dwc3_event: event (000020c2):
> ep0in: Transfer Not Ready [00000000] (Not Active) [Status Phase]
>
> <...>-7191 [000] D..1. 66.421266: dwc3_prepare_trb: ep0in: trb
> ffffffc00fcfd000 (E0:D0) buf 00000000efffa000 size 0 ctrl 00000c33
> sofn 00000000 (HLcs:SC:status2)
>
> <...>-7191 [000] D..1. 66.421387: dwc3_gadget_ep_cmd: ep0in: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 -->status:
> Successful
>
> Then a bus reset was received directly after 500 msec. Software never
> got the cmd complete for the start transfer done in status phase. Here
> the RAM interface is stuck. So host issues a bus reset as link is
> idle for 500 msec:
>
> <...>-7191 [000] D..1. 66.935603: dwc3_event: event (00000101):
> Reset [U0]
>
> Then software sees that it is in status phase and we issue an ENDXFER
> on ep0in and it gets timedout waiting for the CMDACT to go '0':
>
> <...>-7191 [000] D..1. 66.958249: dwc3_gadget_ep_cmd: ep0in: cmd
> 'End Transfer' [10508] params 00000000 00000000 00000000 --> status:
> Timed Out
>
> Upon debug with Synopsys, it turns out that the root cause is as
> follows:
>
> During any transfer, if the data is not successfully transmitted,
> then a Done (with failure) handshake is returned, so that the BMU
> can re-attempt the same data again by rewinding its data pointers.
>
> But, if the USB IN is a 0-length payload (which is what is happening
> in this case - 2 stage status phase of set_address), then there is no
> need to rewind the pointers and the Done (with failure) handshake is
> not returned for failure case. This keeps the Request-Done interface
> busy till the next Done handshake. The MAC sends the 0-length payload
> again when the host requests. If the transmission is successful this
> time, the Done (with success) handshake is provided back. Otherwise,
> it repeats the same steps again.
>
> If the cable is disconnected or if the Host aborts the transfer on 3
> consecutive failed attempts, the Request-Done handshake is not
> complete. This keeps the interface busy.
>
> The subsequent RAM access cannot proceed until the above pending
> transfer is complete. This results in failure of any access to RAM
> address locations. Many of the EndPoint commands need to access the
> RAM and they would fail to complete successfully.
>
> Furthermore when cable removal happens, this would not generate a
> disconnect event and the "connected" flag remains true always blockin
> suspend.
>
> Synopsys confirmed that the issue is present on all USB3 devices and
> as a workaround, suggested to re-initialize device mode.
>
> Signed-off-by: Krishna Kurapati <[email protected]>
> ---
> drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++
> drivers/usb/dwc3/core.h | 4 ++++
> drivers/usb/dwc3/drd.c | 5 +++++
> drivers/usb/dwc3/gadget.c | 6 ++++--
> 4 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 44ee8526dc28..d18b81cccdc5 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work)
> unsigned long flags;
> int ret;
> u32 reg;
> + u8 timeout = 100;
> u32 desired_dr_role;
>
> mutex_lock(&dwc->mutex);
> @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct *work)
> if (!desired_dr_role)
> goto out;
>
> + /*
> + * STAR 5001544 - If cable disconnect doesn't generate
> + * disconnect event in device mode, then re-initialize the
> + * controller.
> + */
> + if ((dwc->cable_disconnected == true) &&
> + (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) {
> + while (dwc->connected == true && timeout != 0) {
> + mdelay(10);
> + timeout--;
> + }
> +
> + if (timeout == 0) {
> + dwc3_gadget_soft_disconnect(dwc);
> + udelay(100);
> + dwc3_gadget_soft_connect(dwc);
> + }
> + }
> +
> if (desired_dr_role == dwc->current_dr_role)
> goto out;
>
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index c6c87acbd376..7642330cf608 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -1355,6 +1355,7 @@ struct dwc3 {
> int last_fifo_depth;
> int num_ep_resized;
> struct dentry *debug_root;
> + bool cable_disconnected;
> };
>
> #define INCRX_BURST_MODE 0
> @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc);
>
> int dwc3_core_soft_reset(struct dwc3 *dwc);
>
> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc);
> +int dwc3_gadget_soft_connect(struct dwc3 *dwc);
> +
> #if IS_ENABLED(CONFIG_USB_DWC3_HOST) || IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE)
> int dwc3_host_init(struct dwc3 *dwc);
> void dwc3_host_exit(struct dwc3 *dwc);
> diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c
> index 039bf241769a..593c023fc39a 100644
> --- a/drivers/usb/dwc3/drd.c
> +++ b/drivers/usb/dwc3/drd.c
> @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw,
> struct dwc3 *dwc = usb_role_switch_get_drvdata(sw);
> u32 mode;
>
> + dwc->cable_disconnected = false;
> +
> switch (role) {
> case USB_ROLE_HOST:
> mode = DWC3_GCTL_PRTCAP_HOST;
> @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw,
> mode = DWC3_GCTL_PRTCAP_DEVICE;
> break;
> default:
> + if (role == USB_ROLE_NONE)
> + dwc->cable_disconnected = true;
> +
> if (dwc->role_switch_default_mode == USB_DR_MODE_HOST)
> mode = DWC3_GCTL_PRTCAP_HOST;
> else
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 858fe4c299b7..a92df2e04cce 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2634,7 +2634,7 @@ static void dwc3_gadget_disable_irq(struct dwc3 *dwc);
> static void __dwc3_gadget_stop(struct dwc3 *dwc);
> static int __dwc3_gadget_start(struct dwc3 *dwc);
>
> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> {
> unsigned long flags;
> int ret;
> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> return ret;
> }
>
> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
> +int dwc3_gadget_soft_connect(struct dwc3 *dwc)
> {
> int ret;
>
> @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
> dwc3_gadget_dctl_write_safe(dwc, reg);
>
> dwc->connected = false;
> + dwc->cable_disconnected = true;
>
> dwc3_disconnect_gadget(dwc);
>
> @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
> */
> dwc3_stop_active_transfers(dwc);
> dwc->connected = true;
> + dwc->cable_disconnected = false;
>
> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
> reg &= ~DWC3_DCTL_TSTCTRL_MASK;
> --
> 2.42.0
>

We can just reset the controller when there's End Transfer command
timeout as a failure recovery. No need to do what you're doing here.

BR,
Thinh

2023-10-12 18:39:36

by Krishna Kurapati

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration



On 10/12/2023 11:29 PM, Thinh Nguyen wrote:

>> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>> {
>> unsigned long flags;
>> int ret;
>> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>> return ret;
>> }
>>
>> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
>> +int dwc3_gadget_soft_connect(struct dwc3 *dwc)
>> {
>> int ret;
>>
>> @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
>> dwc3_gadget_dctl_write_safe(dwc, reg);
>>
>> dwc->connected = false;
>> + dwc->cable_disconnected = true;
>>
>> dwc3_disconnect_gadget(dwc);
>>
>> @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
>> */
>> dwc3_stop_active_transfers(dwc);
>> dwc->connected = true;
>> + dwc->cable_disconnected = false;
>>
>> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> reg &= ~DWC3_DCTL_TSTCTRL_MASK;
>> --
>> 2.42.0
>>
>
> We can just reset the controller when there's End Transfer command
> timeout as a failure recovery. No need to do what you're doing here.
>
Hi Thinh,

That was what I initially wanted to do, but there were couple of
reasons I wanted to take this approach:

1. We can't just reset the controller in midst of gadget_interrupt. We
need to process it completely and then take action.

2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on
other platforms of same targets, the issue comes up at some other
instances of code, at a point where no IRQ is running. In such cases its
not possible to accurately find out code portions and reset the
controller. The way I confirmed that both platforms are having the same
issue is:

a. During cable disconnect, I am not receiving disconnect interrupt
b. The reg dump is exactly same in both cases (BMU as well)

So I felt it was better to fix it during cable disconnect because even
if we remove cable, we are still in device mode only and in this case we
can unblock suspend and also bring back controller to a known state.

Let me know your thoughts on the above.

Regards,
Krishna,

2023-10-12 19:26:30

by Wesley Cheng

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration

Hi Krishna,

On 10/11/2023 3:02 AM, Krishna Kurapati wrote:
> This implementation is to fix RAM interface getting stuck during
> enumeration and controller not responding to any command.
>
> During plug-out test cases, it is sometimes seen that no events
> are generated by the controller and all CSR register reads give "0"
> and CSR_Timeout bit gets set indicating that CSR reads/writes are
> timing out or timed out.
>
> The issue comes up on different instnaces of enumeration on different
> platforms. On one platform, the debug log is as follows:
>
> Prepared a TRB on ep0out and did start transfer to get set
> address request from host:
>
> <...>-7191 [000] D..1. 66.421006: dwc3_gadget_ep_cmd: ep0out:
> cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 -->
> status: Successful
>
> <...>-7191 [000] D..1. 66.421196: dwc3_event: event (0000c040):
> ep0out: Transfer Complete (sIL) [Setup Phase]
>
> <...>-7191 [000] D..1. 66.421197: dwc3_ctrl_req: Set
> Address(Addr = 01)
>
> Then XFER NRDY received on ep0in for zero length status phase and
> a Start Transfer was done on ep0in with 0-length packet in 2 Stage
> status phase:
>
> <...>-7191 [000] D..1. 66.421249: dwc3_event: event (000020c2):
> ep0in: Transfer Not Ready [00000000] (Not Active) [Status Phase]
>
> <...>-7191 [000] D..1. 66.421266: dwc3_prepare_trb: ep0in: trb
> ffffffc00fcfd000 (E0:D0) buf 00000000efffa000 size 0 ctrl 00000c33
> sofn 00000000 (HLcs:SC:status2)
>
> <...>-7191 [000] D..1. 66.421387: dwc3_gadget_ep_cmd: ep0in: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 -->status:
> Successful
>
> Then a bus reset was received directly after 500 msec. Software never
> got the cmd complete for the start transfer done in status phase. Here
> the RAM interface is stuck. So host issues a bus reset as link is
> idle for 500 msec:
>
> <...>-7191 [000] D..1. 66.935603: dwc3_event: event (00000101):
> Reset [U0]
>
> Then software sees that it is in status phase and we issue an ENDXFER
> on ep0in and it gets timedout waiting for the CMDACT to go '0':
>
> <...>-7191 [000] D..1. 66.958249: dwc3_gadget_ep_cmd: ep0in: cmd
> 'End Transfer' [10508] params 00000000 00000000 00000000 --> status:
> Timed Out
>
> Upon debug with Synopsys, it turns out that the root cause is as
> follows:
>
> During any transfer, if the data is not successfully transmitted,
> then a Done (with failure) handshake is returned, so that the BMU
> can re-attempt the same data again by rewinding its data pointers.
>
> But, if the USB IN is a 0-length payload (which is what is happening
> in this case - 2 stage status phase of set_address), then there is no
> need to rewind the pointers and the Done (with failure) handshake is
> not returned for failure case. This keeps the Request-Done interface
> busy till the next Done handshake. The MAC sends the 0-length payload
> again when the host requests. If the transmission is successful this
> time, the Done (with success) handshake is provided back. Otherwise,
> it repeats the same steps again.
>
> If the cable is disconnected or if the Host aborts the transfer on 3
> consecutive failed attempts, the Request-Done handshake is not
> complete. This keeps the interface busy.
>
> The subsequent RAM access cannot proceed until the above pending
> transfer is complete. This results in failure of any access to RAM
> address locations. Many of the EndPoint commands need to access the
> RAM and they would fail to complete successfully.
>
> Furthermore when cable removal happens, this would not generate a
> disconnect event and the "connected" flag remains true always blockin
> suspend.
>
> Synopsys confirmed that the issue is present on all USB3 devices and
> as a workaround, suggested to re-initialize device mode.
>
> Signed-off-by: Krishna Kurapati <[email protected]>
> ---
> drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++
> drivers/usb/dwc3/core.h | 4 ++++
> drivers/usb/dwc3/drd.c | 5 +++++
> drivers/usb/dwc3/gadget.c | 6 ++++--
> 4 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 44ee8526dc28..d18b81cccdc5 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work)
> unsigned long flags;
> int ret;
> u32 reg;
> + u8 timeout = 100;
> u32 desired_dr_role;
>
> mutex_lock(&dwc->mutex);
> @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct *work)
> if (!desired_dr_role)
> goto out;
>
> + /*
> + * STAR 5001544 - If cable disconnect doesn't generate
> + * disconnect event in device mode, then re-initialize the
> + * controller.
> + */
> + if ((dwc->cable_disconnected == true) &&
> + (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) {
> + while (dwc->connected == true && timeout != 0) {
> + mdelay(10);
> + timeout--;
> + }
> +
> + if (timeout == 0) {
> + dwc3_gadget_soft_disconnect(dwc);
> + udelay(100);
> + dwc3_gadget_soft_connect(dwc);
> + }
> + }
> +
> if (desired_dr_role == dwc->current_dr_role)
> goto out;
>
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index c6c87acbd376..7642330cf608 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -1355,6 +1355,7 @@ struct dwc3 {
> int last_fifo_depth;
> int num_ep_resized;
> struct dentry *debug_root;
> + bool cable_disconnected;
> };
>
> #define INCRX_BURST_MODE 0
> @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc);
>
> int dwc3_core_soft_reset(struct dwc3 *dwc);
>
> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc);
> +int dwc3_gadget_soft_connect(struct dwc3 *dwc);
> +
> #if IS_ENABLED(CONFIG_USB_DWC3_HOST) || IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE)
> int dwc3_host_init(struct dwc3 *dwc);
> void dwc3_host_exit(struct dwc3 *dwc);
> diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c
> index 039bf241769a..593c023fc39a 100644
> --- a/drivers/usb/dwc3/drd.c
> +++ b/drivers/usb/dwc3/drd.c
> @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw,
> struct dwc3 *dwc = usb_role_switch_get_drvdata(sw);
> u32 mode;
>
> + dwc->cable_disconnected = false;
> +
> switch (role) {
> case USB_ROLE_HOST:
> mode = DWC3_GCTL_PRTCAP_HOST;
> @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw,
> mode = DWC3_GCTL_PRTCAP_DEVICE;
> break;
> default:
> + if (role == USB_ROLE_NONE)
> + dwc->cable_disconnected = true;
> +

How do we handle cases where role switch isn't used? (ie extcon or maybe
no cable connection notification at all)

Thanks
Wesley Cheng

2023-10-12 22:18:37

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration

On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote:
>
>
> On 10/12/2023 11:29 PM, Thinh Nguyen wrote:
>
> > > -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> > > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> > > {
> > > unsigned long flags;
> > > int ret;
> > > @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> > > return ret;
> > > }
> > > -static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
> > > +int dwc3_gadget_soft_connect(struct dwc3 *dwc)
> > > {
> > > int ret;
> > > @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
> > > dwc3_gadget_dctl_write_safe(dwc, reg);
> > > dwc->connected = false;
> > > + dwc->cable_disconnected = true;
> > > dwc3_disconnect_gadget(dwc);
> > > @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
> > > */
> > > dwc3_stop_active_transfers(dwc);
> > > dwc->connected = true;
> > > + dwc->cable_disconnected = false;
> > > reg = dwc3_readl(dwc->regs, DWC3_DCTL);
> > > reg &= ~DWC3_DCTL_TSTCTRL_MASK;
> > > --
> > > 2.42.0
> > >
> >
> > We can just reset the controller when there's End Transfer command
> > timeout as a failure recovery. No need to do what you're doing here.
> >
> Hi Thinh,
>
> That was what I initially wanted to do, but there were couple of reasons I
> wanted to take this approach:
>
> 1. We can't just reset the controller in midst of gadget_interrupt. We need
> to process it completely and then take action.

You can flag the driver so you can do the teardown/soft-reset at the
appropriate time.

>
> 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on
> other platforms of same targets, the issue comes up at some other instances
> of code, at a point where no IRQ is running. In such cases its not possible
> to accurately find out code portions and reset the controller. The way I
> confirmed that both platforms are having the same issue is:
>
> a. During cable disconnect, I am not receiving disconnect interrupt
> b. The reg dump is exactly same in both cases (BMU as well)
>
> So I felt it was better to fix it during cable disconnect because even if we
> remove cable, we are still in device mode only and in this case we can
> unblock suspend and also bring back controller to a known state.
>
> Let me know your thoughts on the above.
>

This issue happens outside of disconnect right? Did you account for port
reset?

The symptom should be the same. At some point, a command will be issued.
If a command timed out, then something is really wrong (especially End
Transfer command). We can attempt to recover base on this symptom then.

And you don't need to poll for timeout for this specific type of error.
Just read some known register like GSNPSID to see if it's invalid.

BR,
Thinh

2023-10-13 03:56:54

by Krishna Kurapati

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration



On 10/13/2023 12:55 AM, Wesley Cheng wrote:
> Hi Krishna,
>
> On 10/11/2023 3:02 AM, Krishna Kurapati wrote:
>> This implementation is to fix RAM interface getting stuck during

>> Synopsys confirmed that the issue is present on all USB3 devices and
>> as a workaround, suggested to re-initialize device mode.
>>
>> Signed-off-by: Krishna Kurapati <[email protected]>
>> ---
>>   drivers/usb/dwc3/core.c   | 20 ++++++++++++++++++++
>>   drivers/usb/dwc3/core.h   |  4 ++++
>>   drivers/usb/dwc3/drd.c    |  5 +++++
>>   drivers/usb/dwc3/gadget.c |  6 ++++--
>>   4 files changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>> index 44ee8526dc28..d18b81cccdc5 100644
>> --- a/drivers/usb/dwc3/core.c
>> +++ b/drivers/usb/dwc3/core.c
>> @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work)
>>       unsigned long flags;
>>       int ret;
>>       u32 reg;
>> +    u8 timeout = 100;
>>       u32 desired_dr_role;
>>       mutex_lock(&dwc->mutex);
>> @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct
>> *work)
>>       if (!desired_dr_role)
>>           goto out;
>> +    /*
>> +     * STAR 5001544 - If cable disconnect doesn't generate
>> +     * disconnect event in device mode, then re-initialize the
>> +     * controller.
>> +     */
>> +    if ((dwc->cable_disconnected == true) &&
>> +        (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) {
>> +        while (dwc->connected == true && timeout != 0) {
>> +            mdelay(10);
>> +            timeout--;
>> +        }
>> +
>> +        if (timeout == 0) {
>> +            dwc3_gadget_soft_disconnect(dwc);
>> +            udelay(100);
>> +            dwc3_gadget_soft_connect(dwc);
>> +        }
>> +    }
>> +
>>       if (desired_dr_role == dwc->current_dr_role)
>>           goto out;
>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>> index c6c87acbd376..7642330cf608 100644
>> --- a/drivers/usb/dwc3/core.h
>> +++ b/drivers/usb/dwc3/core.h
>> @@ -1355,6 +1355,7 @@ struct dwc3 {
>>       int            last_fifo_depth;
>>       int            num_ep_resized;
>>       struct dentry        *debug_root;
>> +    bool            cable_disconnected;
>>   };
>>   #define INCRX_BURST_MODE 0
>> @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc);
>>   int dwc3_core_soft_reset(struct dwc3 *dwc);
>> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc);
>> +int dwc3_gadget_soft_connect(struct dwc3 *dwc);
>> +
>>   #if IS_ENABLED(CONFIG_USB_DWC3_HOST) ||
>> IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE)
>>   int dwc3_host_init(struct dwc3 *dwc);
>>   void dwc3_host_exit(struct dwc3 *dwc);
>> diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c
>> index 039bf241769a..593c023fc39a 100644
>> --- a/drivers/usb/dwc3/drd.c
>> +++ b/drivers/usb/dwc3/drd.c
>> @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct
>> usb_role_switch *sw,
>>       struct dwc3 *dwc = usb_role_switch_get_drvdata(sw);
>>       u32 mode;
>> +    dwc->cable_disconnected = false;
>> +
>>       switch (role) {
>>       case USB_ROLE_HOST:
>>           mode = DWC3_GCTL_PRTCAP_HOST;
>> @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct
>> usb_role_switch *sw,
>>           mode = DWC3_GCTL_PRTCAP_DEVICE;
>>           break;
>>       default:
>> +        if (role == USB_ROLE_NONE)
>> +            dwc->cable_disconnected = true;
>> +
>
> How do we handle cases where role switch isn't used? (ie extcon or maybe
> no cable connection notification at all)
>

Hi Wesley,

Since I was considering fixing it during disconnect, I made it in role
switch. So no cable connection notification case has been ruled out in
this patch. But yes, extcon is a valid case. Will need to account for it.

Regards,
Krishna,

2023-10-13 03:58:39

by Krishna Kurapati

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration



On 10/13/2023 3:47 AM, Thinh Nguyen wrote:
> On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote:
>>
>>
>> On 10/12/2023 11:29 PM, Thinh Nguyen wrote:
>>
>>>> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>>>> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>>>> {
>>>> unsigned long flags;
>>>> int ret;
>>>> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>>>> return ret;
>>>> }
>>>> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
>>>> +int dwc3_gadget_soft_connect(struct dwc3 *dwc)
>>>> {
>>>> int ret;
>>>> @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
>>>> dwc3_gadget_dctl_write_safe(dwc, reg);
>>>> dwc->connected = false;
>>>> + dwc->cable_disconnected = true;
>>>> dwc3_disconnect_gadget(dwc);
>>>> @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
>>>> */
>>>> dwc3_stop_active_transfers(dwc);
>>>> dwc->connected = true;
>>>> + dwc->cable_disconnected = false;
>>>> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>>>> reg &= ~DWC3_DCTL_TSTCTRL_MASK;
>>>> --
>>>> 2.42.0
>>>>
>>>
>>> We can just reset the controller when there's End Transfer command
>>> timeout as a failure recovery. No need to do what you're doing here.
>>>
>> Hi Thinh,
>>
>> That was what I initially wanted to do, but there were couple of reasons I
>> wanted to take this approach:
>>
>> 1. We can't just reset the controller in midst of gadget_interrupt. We need
>> to process it completely and then take action.
>
> You can flag the driver so you can do the teardown/soft-reset at the
> appropriate time.
>
>>
>> 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on
>> other platforms of same targets, the issue comes up at some other instances
>> of code, at a point where no IRQ is running. In such cases its not possible
>> to accurately find out code portions and reset the controller. The way I
>> confirmed that both platforms are having the same issue is:
>>
>> a. During cable disconnect, I am not receiving disconnect interrupt
>> b. The reg dump is exactly same in both cases (BMU as well)
>>
>> So I felt it was better to fix it during cable disconnect because even if we
>> remove cable, we are still in device mode only and in this case we can
>> unblock suspend and also bring back controller to a known state.
>>
>> Let me know your thoughts on the above.
>>
>
> This issue happens outside of disconnect right? Did you account for port
> reset?
>
> The symptom should be the same. At some point, a command will be issued.
> If a command timed out, then something is really wrong (especially End
> Transfer command). We can attempt to recover base on this symptom then.
>
> And you don't need to poll for timeout for this specific type of error.
> Just read some known register like GSNPSID to see if it's invalid.

Hi Thinh,

Yes. It actually happens before disconnect, but I was trying to handle
it during disconnect. How about I add a check in gadget_ep_cmd saying
that if cmd timeout happens, then read the snpsid and if it reads an
invalid value, then I will call soft_disconnect followed by
soft_connect. That must cover all cases ?

Regards,
Krishna,

2023-10-15 16:50:58

by Krishna Kurapati

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration



On 10/13/2023 9:28 AM, Krishna Kurapati PSSNV wrote:
>
>
> On 10/13/2023 3:47 AM, Thinh Nguyen wrote:
>> On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote:
>>>
>>>
>>> On 10/12/2023 11:29 PM, Thinh Nguyen wrote:
>>>
>>>>> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>>>>> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
>>>>>    {
>>>>>        unsigned long flags;
>>>>>        int ret;
>>>>> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct
>>>>> dwc3 *dwc)
>>>>>        return ret;
>>>>>    }
>>>>> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
>>>>> +int dwc3_gadget_soft_connect(struct dwc3 *dwc)
>>>>>    {
>>>>>        int ret;
>>>>> @@ -3963,6 +3963,7 @@ static void
>>>>> dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
>>>>>        dwc3_gadget_dctl_write_safe(dwc, reg);
>>>>>        dwc->connected = false;
>>>>> +    dwc->cable_disconnected = true;
>>>>>        dwc3_disconnect_gadget(dwc);
>>>>> @@ -4038,6 +4039,7 @@ static void
>>>>> dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
>>>>>         */
>>>>>        dwc3_stop_active_transfers(dwc);
>>>>>        dwc->connected = true;
>>>>> +    dwc->cable_disconnected = false;
>>>>>        reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>>>>>        reg &= ~DWC3_DCTL_TSTCTRL_MASK;
>>>>> --
>>>>> 2.42.0
>>>>>
>>>>
>>>> We can just reset the controller when there's End Transfer command
>>>> timeout as a failure recovery. No need to do what you're doing here.
>>>>
>>> Hi Thinh,
>>>
>>>   That was what I initially wanted to do, but there were couple of
>>> reasons I
>>> wanted to take this approach:
>>>
>>> 1. We can't just reset the controller in midst of gadget_interrupt.
>>> We need
>>> to process it completely and then take action.
>>
>> You can flag the driver so you can do the teardown/soft-reset at the
>> appropriate time.
>>
>>>
>>> 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on
>>> other platforms of same targets, the issue comes up at some other
>>> instances
>>> of code, at a point where no IRQ is running. In such cases its not
>>> possible
>>> to accurately find out code portions and reset the controller. The way I
>>> confirmed that both platforms are having the same issue is:
>>>
>>> a. During cable disconnect, I am not receiving disconnect interrupt
>>> b. The reg dump is exactly same in both cases (BMU as well)
>>>
>>> So I felt it was better to fix it during cable disconnect because
>>> even if we
>>> remove cable, we are still in device mode only and in this case we can
>>> unblock suspend and also bring back controller to a known state.
>>>
>>> Let me know your thoughts on the above.
>>>
>>
>> This issue happens outside of disconnect right? Did you account for port
>> reset?
>>
>> The symptom should be the same. At some point, a command will be issued.
>> If a command timed out, then something is really wrong (especially End
>> Transfer command). We can attempt to recover base on this symptom then.
>>
>> And you don't need to poll for timeout for this specific type of error.
>> Just read some known register like GSNPSID to see if it's invalid.
>
> Hi Thinh,
>
>  Yes. It actually happens before disconnect, but I was trying to handle
> it during disconnect. How about I add a check in gadget_ep_cmd saying
> that if cmd timeout happens, then read the snpsid and if it reads an
> invalid value, then I will call soft_disconnect followed by
> soft_connect. That must cover all cases ?
>
Hi Thinh,

One more datapoint for putting forward the above opinion. There are two
types of situations observed during bus reset when we try to do endxfer
on ep0in followed by set_stall on ep0out:

1. If we issue endxfer for ep0in, and it times out, then the next
command set_stall is successful (only because all reads give zero and
since we check for CMDACT bit for ep0out after setting it, and since the
read of DEPCMD register gives "0" while polling for it, sw indicated
that set stall is successful. But when we check GEVTADDR at this point,
both ADDRLO and ADDRHI read zero.

2. If ram access timeout occurs while polling for CMDACT bit after
issuing endxfer on ep0in, then at that instant, polling returns true, sw
indicates that endxfer is successful but inreality when checked,
GEVTADDR LO/HI read zero.

So I think it would be better to check for validity of GEVADDR registers
at end of gadget_ep_cmd to identify id RAM access timeout happened and
call soft_disconnect followed by soft_connect.

Let me know your thoughts on this

Regards,
Krishna,

2023-10-20 22:54:31

by Thinh Nguyen

[permalink] [raw]
Subject: Re: [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration

On Sun, Oct 15, 2023, Krishna Kurapati PSSNV wrote:
>
>
> On 10/13/2023 9:28 AM, Krishna Kurapati PSSNV wrote:
> >
> >
> > On 10/13/2023 3:47 AM, Thinh Nguyen wrote:
> > > On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote:
> > > >
> > > >
> > > > On 10/12/2023 11:29 PM, Thinh Nguyen wrote:
> > > >
> > > > > > -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> > > > > > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> > > > > >    {
> > > > > >        unsigned long flags;
> > > > > >        int ret;
> > > > > > @@ -2701,7 +2701,7 @@ static int
> > > > > > dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
> > > > > >        return ret;
> > > > > >    }
> > > > > > -static int dwc3_gadget_soft_connect(struct dwc3 *dwc)
> > > > > > +int dwc3_gadget_soft_connect(struct dwc3 *dwc)
> > > > > >    {
> > > > > >        int ret;
> > > > > > @@ -3963,6 +3963,7 @@ static void
> > > > > > dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
> > > > > >        dwc3_gadget_dctl_write_safe(dwc, reg);
> > > > > >        dwc->connected = false;
> > > > > > +    dwc->cable_disconnected = true;
> > > > > >        dwc3_disconnect_gadget(dwc);
> > > > > > @@ -4038,6 +4039,7 @@ static void
> > > > > > dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
> > > > > >         */
> > > > > >        dwc3_stop_active_transfers(dwc);
> > > > > >        dwc->connected = true;
> > > > > > +    dwc->cable_disconnected = false;
> > > > > >        reg = dwc3_readl(dwc->regs, DWC3_DCTL);
> > > > > >        reg &= ~DWC3_DCTL_TSTCTRL_MASK;
> > > > > > --
> > > > > > 2.42.0
> > > > > >
> > > > >
> > > > > We can just reset the controller when there's End Transfer command
> > > > > timeout as a failure recovery. No need to do what you're doing here.
> > > > >
> > > > Hi Thinh,
> > > >
> > > >   That was what I initially wanted to do, but there were couple
> > > > of reasons I
> > > > wanted to take this approach:
> > > >
> > > > 1. We can't just reset the controller in midst of
> > > > gadget_interrupt. We need
> > > > to process it completely and then take action.
> > >
> > > You can flag the driver so you can do the teardown/soft-reset at the
> > > appropriate time.
> > >
> > > >
> > > > 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on
> > > > other platforms of same targets, the issue comes up at some
> > > > other instances
> > > > of code, at a point where no IRQ is running. In such cases its
> > > > not possible
> > > > to accurately find out code portions and reset the controller. The way I
> > > > confirmed that both platforms are having the same issue is:
> > > >
> > > > a. During cable disconnect, I am not receiving disconnect interrupt
> > > > b. The reg dump is exactly same in both cases (BMU as well)
> > > >
> > > > So I felt it was better to fix it during cable disconnect
> > > > because even if we
> > > > remove cable, we are still in device mode only and in this case we can
> > > > unblock suspend and also bring back controller to a known state.
> > > >
> > > > Let me know your thoughts on the above.
> > > >
> > >
> > > This issue happens outside of disconnect right? Did you account for port
> > > reset?
> > >
> > > The symptom should be the same. At some point, a command will be issued.
> > > If a command timed out, then something is really wrong (especially End
> > > Transfer command). We can attempt to recover base on this symptom then.
> > >
> > > And you don't need to poll for timeout for this specific type of error.
> > > Just read some known register like GSNPSID to see if it's invalid.
> >
> > Hi Thinh,
> >
> >  Yes. It actually happens before disconnect, but I was trying to handle
> > it during disconnect. How about I add a check in gadget_ep_cmd saying
> > that if cmd timeout happens, then read the snpsid and if it reads an
> > invalid value, then I will call soft_disconnect followed by
> > soft_connect. That must cover all cases ?
> >
> Hi Thinh,
>
> One more datapoint for putting forward the above opinion. There are two
> types of situations observed during bus reset when we try to do endxfer on
> ep0in followed by set_stall on ep0out:
>
> 1. If we issue endxfer for ep0in, and it times out, then the next command
> set_stall is successful (only because all reads give zero and since we check
> for CMDACT bit for ep0out after setting it, and since the read of DEPCMD
> register gives "0" while polling for it, sw indicated that set stall is
> successful. But when we check GEVTADDR at this point, both ADDRLO and ADDRHI
> read zero.
>
> 2. If ram access timeout occurs while polling for CMDACT bit after issuing
> endxfer on ep0in, then at that instant, polling returns true, sw indicates
> that endxfer is successful but inreality when checked, GEVTADDR LO/HI read
> zero.
>
> So I think it would be better to check for validity of GEVADDR registers at
> end of gadget_ep_cmd to identify id RAM access timeout happened and call
> soft_disconnect followed by soft_connect.
>
> Let me know your thoughts on this
>

In this case, the check can be simpler. You can just read back and check
for the command type of DEPCMD or DGCMD when polling for CMDACT to see
if it matches the command you initiated. If not, something is wrong and
you can perform the soft disconnect connect to recover.

Thanks,
Thinh