Fix soft lockup when resetting remote device attached
to usb host. Configuration:
pppd -> musb hub -> usb-serial -> gsm modem
When gsm modem resets, musb rolls in incoming rx interrupts
which does not give any time to other application as result
it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
Signed-off-by: Max Uvarov <[email protected]>
---
v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
drivers/usb/musb/musb_host.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index c3d5fc9..2d9aa78 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
/* stall; record URB status */
status = -EPIPE;
+ } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
- } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
- dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
-
- status = -EPROTO;
- musb_writeb(epio, MUSB_RXINTERVAL, 0);
-
- } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
+ if (rx_csr & MUSB_RXCSR_H_ERROR) {
+ status = -EPROTO;
+ musb_writeb(epio, MUSB_RXINTERVAL, 0);
+ }
if (USB_ENDPOINT_XFER_ISOC != qh->type) {
dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
--
1.9.1
Hi,
On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
> Fix soft lockup when resetting remote device attached
> to usb host. Configuration:
> pppd -> musb hub -> usb-serial -> gsm modem
I have heard a few reports similar to this symptom, but never been able
to reproduce it on my side.
> When gsm modem resets, musb rolls in incoming rx interrupts
> which does not give any time to other application as result
> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
Have you looked where exact place in the interrupt routine the execution
has stuck in?
> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
musb_regs.h.
>
> Signed-off-by: Max Uvarov <[email protected]>
> ---
> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>
> drivers/usb/musb/musb_host.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> index c3d5fc9..2d9aa78 100644
> --- a/drivers/usb/musb/musb_host.c
> +++ b/drivers/usb/musb/musb_host.c
> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
What kernel do you use? This line # is away off from upstream kernel.
>
> /* stall; record URB status */
> status = -EPIPE;
> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>
> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
> -
> - status = -EPROTO;
> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
> -
> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
> + status = -EPROTO;
> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
> + }
Please help me to understand how this change fixes the issue. I see the
most effect of the change here is directly 'goto finish' so that 'done'
flag is not set, then musb_advance_schedule() is not called. Is this the
case or I missed other important pieces?
Thanks,
-Bin.
>
> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-04-27 18:46 GMT+03:00 Bin Liu <[email protected]>:
> Hi,
>
> On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
>> Fix soft lockup when resetting remote device attached
>> to usb host. Configuration:
>> pppd -> musb hub -> usb-serial -> gsm modem
>
> I have heard a few reports similar to this symptom, but never been able
> to reproduce it on my side.
>
Ok, I can reproduce it almost very easy.
>> When gsm modem resets, musb rolls in incoming rx interrupts
>> which does not give any time to other application as result
>> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
>
> Have you looked where exact place in the interrupt routine the execution
> has stuck in?
>
It does not stuck. It goes to that line which print proto error over
and over again and
nothing stops that. After some time kernel reports lockup. But
actually it's not stuck,
all cpu time was eaten by executing that handlers.
>> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
>> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
>
> MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
> MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
> musb_regs.h.
>
>>
>> Signed-off-by: Max Uvarov <[email protected]>
>> ---
>> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>>
>> drivers/usb/musb/musb_host.c | 12 +++++-------
>> 1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> index c3d5fc9..2d9aa78 100644
>> --- a/drivers/usb/musb/musb_host.c
>> +++ b/drivers/usb/musb/musb_host.c
>> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>
> What kernel do you use? This line # is away off from upstream kernel.
>
I did this patch for 4.1 but 4.6 has the same problem and patch
cleanly applies to the
latest torvalds/linux.git v4.6-rc5. This interrupt handler has the
same code. And looks
like on 3.14 everything worked. I don't have a time to diff 2
versions. Might be regression.
>>
>> /* stall; record URB status */
>> status = -EPIPE;
>> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>>
>> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
>> -
>> - status = -EPROTO;
>> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> -
>> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> + status = -EPROTO;
>> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> + }
>
> Please help me to understand how this change fixes the issue. I see the
> most effect of the change here is directly 'goto finish' so that 'done'
> flag is not set, then musb_advance_schedule() is not called. Is this the
> case or I missed other important pieces?
>
Right that is the goal. On this rxcsr_h_error kernel reschedules
current interrupt.
And that continues forever. For example adding msleep() can give some
time for other
processes. I'm not an expert in this chip but I think that right
solution in that case is not
try to reschedule and quick and allow hub to make reset and once again
init all devices
(in my case ppp/pppd also shutdowns and then I bring everything up
with script.). The
same behavior with dma and pio mode.
Regards,
Max.
> Thanks,
> -Bin.
>
>>
>> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best regards,
Maxim Uvarov
Hi,
On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote:
> 2016-04-27 18:46 GMT+03:00 Bin Liu <[email protected]>:
> > Hi,
> >
> > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
> >> Fix soft lockup when resetting remote device attached
> >> to usb host. Configuration:
> >> pppd -> musb hub -> usb-serial -> gsm modem
> >
> > I have heard a few reports similar to this symptom, but never been able
> > to reproduce it on my side.
> >
>
> Ok, I can reproduce it almost very easy.
>
> >> When gsm modem resets, musb rolls in incoming rx interrupts
> >> which does not give any time to other application as result
> >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
> >
> > Have you looked where exact place in the interrupt routine the execution
> > has stuck in?
> >
>
> It does not stuck. It goes to that line which print proto error over
> and over again and
> nothing stops that. After some time kernel reports lockup. But
> actually it's not stuck,
> all cpu time was eaten by executing that handlers.
>
>
> >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
> >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
> >
> > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
> > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
> > musb_regs.h.
> >
> >>
> >> Signed-off-by: Max Uvarov <[email protected]>
> >> ---
> >> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
> >>
> >> drivers/usb/musb/musb_host.c | 12 +++++-------
> >> 1 file changed, 5 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> >> index c3d5fc9..2d9aa78 100644
> >> --- a/drivers/usb/musb/musb_host.c
> >> +++ b/drivers/usb/musb/musb_host.c
> >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> >
> > What kernel do you use? This line # is away off from upstream kernel.
> >
>
> I did this patch for 4.1 but 4.6 has the same problem and patch
> cleanly applies to the latest torvalds/linux.git v4.6-rc5. This
> interrupt handler has the same code. And looks like on 3.14
Yeah, this code hasn't been chaned for year. But in general, it is
prepfered to create patches on latest kernel to avoid other headache.
> everything worked. I don't have a time to diff 2 versions. Might be
> regression.
>
>
> >>
> >> /* stall; record URB status */
> >> status = -EPIPE;
> >> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
> >>
> >> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
> >> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
> >> -
> >> - status = -EPROTO;
> >> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
> >> -
> >> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
> >> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
> >> + status = -EPROTO;
> >> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
> >> + }
> >
> > Please help me to understand how this change fixes the issue. I see the
> > most effect of the change here is directly 'goto finish' so that 'done'
> > flag is not set, then musb_advance_schedule() is not called. Is this the
> > case or I missed other important pieces?
> >
>
> Right that is the goal. On this rxcsr_h_error kernel reschedules
> current interrupt. And that continues forever. For example adding
The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR
bit, but the current driver doesn't. I am wondering if this causes the
controller keeps generating the same interrupt. Can you please try the
following change instead to see if the lockup goes away?
@@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum)
status = -EPROTO;
musb_writeb(epio, MUSB_RXINTERVAL, 0);
+ rx_csr &= ~MUSB_RXCSR_H_ERROR;
+ musb_writew(epio, MUSB_RXCSR, rx_csr);
+
} else if (rx_csr & MUSB_RXCSR_DATAERROR) {
if (USB_ENDPOINT_XFER_ISOC != qh->type) {
Regards,
-Bin.
> msleep() can give some time for other processes. I'm not an expert in
> this chip but I think that right solution in that case is not try to
> reschedule and quick and allow hub to make reset and once again init
> all devices (in my case ppp/pppd also shutdowns and then I bring
> everything up with script.). The same behavior with dma and pio mode.
>
> Regards,
> Max.
>
> > Thanks,
> > -Bin.
> >
> >>
> >> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
> >> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
> >> --
> >> 1.9.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best regards,
> Maxim Uvarov
Hi,
On Wed, Apr 27, 2016 at 02:13:56PM -0500, Bin Liu wrote:
> Hi,
>
> On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote:
> > 2016-04-27 18:46 GMT+03:00 Bin Liu <[email protected]>:
> > > Hi,
> > >
> > > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
> > >> Fix soft lockup when resetting remote device attached
> > >> to usb host. Configuration:
> > >> pppd -> musb hub -> usb-serial -> gsm modem
> > >
> > > I have heard a few reports similar to this symptom, but never been able
> > > to reproduce it on my side.
> > >
> >
> > Ok, I can reproduce it almost very easy.
> >
> > >> When gsm modem resets, musb rolls in incoming rx interrupts
> > >> which does not give any time to other application as result
> > >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
> > >
> > > Have you looked where exact place in the interrupt routine the execution
> > > has stuck in?
> > >
> >
> > It does not stuck. It goes to that line which print proto error over
> > and over again and
> > nothing stops that. After some time kernel reports lockup. But
> > actually it's not stuck,
> > all cpu time was eaten by executing that handlers.
> >
> >
> > >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
> > >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
> > >
> > > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
> > > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
> > > musb_regs.h.
> > >
> > >>
> > >> Signed-off-by: Max Uvarov <[email protected]>
> > >> ---
> > >> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
> > >>
> > >> drivers/usb/musb/musb_host.c | 12 +++++-------
> > >> 1 file changed, 5 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> > >> index c3d5fc9..2d9aa78 100644
> > >> --- a/drivers/usb/musb/musb_host.c
> > >> +++ b/drivers/usb/musb/musb_host.c
> > >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> > >
> > > What kernel do you use? This line # is away off from upstream kernel.
> > >
> >
> > I did this patch for 4.1 but 4.6 has the same problem and patch
> > cleanly applies to the latest torvalds/linux.git v4.6-rc5. This
> > interrupt handler has the same code. And looks like on 3.14
>
> Yeah, this code hasn't been chaned for year. But in general, it is
> prepfered to create patches on latest kernel to avoid other headache.
>
> > everything worked. I don't have a time to diff 2 versions. Might be
> > regression.
> >
> >
> > >>
> > >> /* stall; record URB status */
> > >> status = -EPIPE;
> > >> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
> > >>
> > >> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
> > >> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
> > >> -
> > >> - status = -EPROTO;
> > >> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
> > >> -
> > >> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
> > >> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
> > >> + status = -EPROTO;
> > >> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
> > >> + }
> > >
> > > Please help me to understand how this change fixes the issue. I see the
> > > most effect of the change here is directly 'goto finish' so that 'done'
> > > flag is not set, then musb_advance_schedule() is not called. Is this the
> > > case or I missed other important pieces?
> > >
> >
> > Right that is the goal. On this rxcsr_h_error kernel reschedules
> > current interrupt. And that continues forever. For example adding
>
> The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR
> bit, but the current driver doesn't. I am wondering if this causes the
> controller keeps generating the same interrupt. Can you please try the
> following change instead to see if the lockup goes away?
>
> @@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> status = -EPROTO;
> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>
> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
> + musb_writew(epio, MUSB_RXCSR, rx_csr);
+ goto finish;
Please also add the line above. I will spend more time to understand
what is happening...
First of all, I don't like the idea of merging the two branches, it
makes the code ugly.
Regards,
-Bin.
> +
> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>
> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>
> Regards,
> -Bin.
>
> > msleep() can give some time for other processes. I'm not an expert in
> > this chip but I think that right solution in that case is not try to
> > reschedule and quick and allow hub to make reset and once again init
> > all devices (in my case ppp/pppd also shutdowns and then I bring
> > everything up with script.). The same behavior with dma and pio mode.
> >
> > Regards,
> > Max.
> >
> > > Thanks,
> > > -Bin.
> > >
> > >>
> > >> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
> > >> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
> > >> --
> > >> 1.9.1
> > >>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> > >> the body of a message to [email protected]
> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> >
> > --
> > Best regards,
> > Maxim Uvarov
2016-04-28 0:28 GMT+03:00 Bin Liu <[email protected]>:
> Hi,
>
> On Wed, Apr 27, 2016 at 02:13:56PM -0500, Bin Liu wrote:
>> Hi,
>>
>> On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote:
>> > 2016-04-27 18:46 GMT+03:00 Bin Liu <[email protected]>:
>> > > Hi,
>> > >
>> > > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
>> > >> Fix soft lockup when resetting remote device attached
>> > >> to usb host. Configuration:
>> > >> pppd -> musb hub -> usb-serial -> gsm modem
>> > >
>> > > I have heard a few reports similar to this symptom, but never been able
>> > > to reproduce it on my side.
>> > >
>> >
>> > Ok, I can reproduce it almost very easy.
>> >
>> > >> When gsm modem resets, musb rolls in incoming rx interrupts
>> > >> which does not give any time to other application as result
>> > >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
>> > >
>> > > Have you looked where exact place in the interrupt routine the execution
>> > > has stuck in?
>> > >
>> >
>> > It does not stuck. It goes to that line which print proto error over
>> > and over again and
>> > nothing stops that. After some time kernel reports lockup. But
>> > actually it's not stuck,
>> > all cpu time was eaten by executing that handlers.
>> >
>> >
>> > >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
>> > >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
>> > >
>> > > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
>> > > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
>> > > musb_regs.h.
>> > >
>> > >>
>> > >> Signed-off-by: Max Uvarov <[email protected]>
>> > >> ---
>> > >> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>> > >>
>> > >> drivers/usb/musb/musb_host.c | 12 +++++-------
>> > >> 1 file changed, 5 insertions(+), 7 deletions(-)
>> > >>
>> > >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> > >> index c3d5fc9..2d9aa78 100644
>> > >> --- a/drivers/usb/musb/musb_host.c
>> > >> +++ b/drivers/usb/musb/musb_host.c
>> > >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>> > >
>> > > What kernel do you use? This line # is away off from upstream kernel.
>> > >
>> >
>> > I did this patch for 4.1 but 4.6 has the same problem and patch
>> > cleanly applies to the latest torvalds/linux.git v4.6-rc5. This
>> > interrupt handler has the same code. And looks like on 3.14
>>
>> Yeah, this code hasn't been chaned for year. But in general, it is
>> prepfered to create patches on latest kernel to avoid other headache.
>>
>> > everything worked. I don't have a time to diff 2 versions. Might be
>> > regression.
>> >
>> >
>> > >>
>> > >> /* stall; record URB status */
>> > >> status = -EPIPE;
>> > >> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>> > >>
>> > >> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> > >> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
>> > >> -
>> > >> - status = -EPROTO;
>> > >> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> > >> -
>> > >> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> > >> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> > >> + status = -EPROTO;
>> > >> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> > >> + }
>> > >
>> > > Please help me to understand how this change fixes the issue. I see the
>> > > most effect of the change here is directly 'goto finish' so that 'done'
>> > > flag is not set, then musb_advance_schedule() is not called. Is this the
>> > > case or I missed other important pieces?
>> > >
>> >
>> > Right that is the goal. On this rxcsr_h_error kernel reschedules
>> > current interrupt. And that continues forever. For example adding
>>
>> The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR
>> bit, but the current driver doesn't. I am wondering if this causes the
>> controller keeps generating the same interrupt. Can you please try the
>> following change instead to see if the lockup goes away?
>>
>> @@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>> status = -EPROTO;
>> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>>
>> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
>> + musb_writew(epio, MUSB_RXCSR, rx_csr);
>
> + goto finish;
>
> Please also add the line above. I will spend more time to understand
> what is happening...
>
Hello Bin,
yes, it also works with that reset and go to finish:
diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index c3d5fc9..8cd98e7 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
status = -EPROTO;
musb_writeb(epio, MUSB_RXINTERVAL, 0);
+ rx_csr &= ~MUSB_RXCSR_H_ERROR;
+ musb_writew(epio, MUSB_RXCSR, rx_csr);
+
+ goto finish;
} else if (rx_csr & MUSB_RXCSR_DATAERROR) {
if (USB_ENDPOINT_XFER_ISOC != qh->type) {
That I think a key thing, which is done in other error. If that change
is good for you than I'm also happy with it.
I also not sure if musb_writeb(epio, MUSB_RXINTERVAL, 0); is needed.
In my case it's the same result with it and without it.
In other scenarios might be reasonable...
> First of all, I don't like the idea of merging the two branches, it
> makes the code ugly.
Yes, I don't like that function at all, it's too long and difficult to
read if you first look on it first time. It will be good to split it
on 3 small functions for each big if.
Maxim.
>
> Regards,
> -Bin.
>
>> +
>> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>>
>> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>>
>> Regards,
>> -Bin.
>>
>> > msleep() can give some time for other processes. I'm not an expert in
>> > this chip but I think that right solution in that case is not try to
>> > reschedule and quick and allow hub to make reset and once again init
>> > all devices (in my case ppp/pppd also shutdowns and then I bring
>> > everything up with script.). The same behavior with dma and pio mode.
>> >
>> > Regards,
>> > Max.
>> >
>> > > Thanks,
>> > > -Bin.
>> > >
>> > >>
>> > >> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>> > >> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
>> > >> --
>> > >> 1.9.1
>> > >>
>> > >> --
>> > >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> > >> the body of a message to [email protected]
>> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Maxim Uvarov
--
Best regards,
Maxim Uvarov
Hi,
On Thu, Apr 28, 2016 at 09:51:37AM +0300, Maxim Uvarov wrote:
[snip]
> Hello Bin,
>
> yes, it also works with that reset and go to finish:
>
> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> index c3d5fc9..8cd98e7 100644
> --- a/drivers/usb/musb/musb_host.c
> +++ b/drivers/usb/musb/musb_host.c
> @@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> status = -EPROTO;
> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>
> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
> + musb_writew(epio, MUSB_RXCSR, rx_csr);
> +
> + goto finish;
> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>
> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>
Thanks for testing it.
>
> That I think a key thing, which is done in other error. If that change
> is good for you than I'm also happy with it.
We need to understand why the controller keeps generating the same
interrupt to come out a proper fix.
I will take a look. But I can only use my spare time on this, so be
patient.
>
> I also not sure if musb_writeb(epio, MUSB_RXINTERVAL, 0); is needed.
> In my case it's the same result with it and without it.
> In other scenarios might be reasonable...
It disables NAK timeout.
>
>
> > First of all, I don't like the idea of merging the two branches, it
> > makes the code ugly.
>
> Yes, I don't like that function at all, it's too long and difficult to
> read if you first look on it first time. It will be good to split it
> on 3 small functions for each big if.
This particular function is not that hard to understand, but the driver
in general is messy. But I am not sure if anyone in the community can
refactory this driver. The community had some effort in the past to
clean up this driver, but it always broke usecases on different
platforms.
Regards,
-Bin.
Hi Yegor and Max,
On Tue, May 03, 2016 at 04:25:58PM +0200, Yegor Yefremov wrote:
> On Tue, May 3, 2016 at 3:48 PM, Bin Liu <[email protected]> wrote:
> > Hi,
> >
> > On Tue, May 03, 2016 at 12:03:52PM +0200, Yegor Yefremov wrote:
> >> On Thu, Apr 28, 2016 at 4:37 PM, Bin Liu <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > On Thu, Apr 28, 2016 at 09:51:37AM +0300, Maxim Uvarov wrote:
> >> >
> >> > [snip]
> >> >
> >> >> Hello Bin,
> >> >>
> >> >> yes, it also works with that reset and go to finish:
> >> >>
> >> >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
> >> >> index c3d5fc9..8cd98e7 100644
> >> >> --- a/drivers/usb/musb/musb_host.c
> >> >> +++ b/drivers/usb/musb/musb_host.c
> >> >> @@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
> >> >> status = -EPROTO;
> >> >> musb_writeb(epio, MUSB_RXINTERVAL, 0);
> >> >>
> >> >> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
> >> >> + musb_writew(epio, MUSB_RXCSR, rx_csr);
> >> >> +
> >> >> + goto finish;
> >> >> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
> >> >>
> >> >> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
> >> >>
> >> >
> >> > Thanks for testing it.
> >>
> >> Have tested your patch and now both FT4232 and Huawei don't freeze on removal.
> >>
> >> Bin, Max thanks for fixing this issue.
> >>
> >> Tested-by: Yegor Yefremov <[email protected]>
> >
> > Thanks for testing.
> >
> > Can you please test the patch [1] instead? I'd like to use it as the
> > fix.
> >
> > [1] http://marc.info/?l=linux-usb&m=146222355213935&w=2
>
> The patch behaves the same as the previous one.
Sorry for bringing up this old thread, but it seems to be too aggressive
to stop scheduling further urbs on errors [1]. So is it possible for you
to re-test your usecase by reverting commit
dbac5d07d13e ("usb: musb: host: don't start next rx urb if current one failed")
to see if only commit
b5801212229f ("usb: musb: host: clear rxcsr error bit if set")
itself solves your issue?
I know you have tested the patch in [2], which is similar to commit
b5801212229f, but tha latter doesn't have 'goto finish' which does dma
cleanup on errors, it makes more sense to me. But I'd like to have you
tested with reverting dbac5d07d13e to be sure.
[1] https://marc.info/?l=linux-usb&m=151689238420622&w=2
[2] https://marc.info/?l=linux-kernel&m=146185425805967&w=2
thanks,
-Bin.
[1] says that issue is with back ported driver to 3.12.10. Can the
latest kernel be tested on the same hw?
Maxim.
2018-01-25 18:45 GMT+03:00 Bin Liu <[email protected]>:
> Hi Yegor and Max,
>
> On Tue, May 03, 2016 at 04:25:58PM +0200, Yegor Yefremov wrote:
>> On Tue, May 3, 2016 at 3:48 PM, Bin Liu <[email protected]> wrote:
>> > Hi,
>> >
>> > On Tue, May 03, 2016 at 12:03:52PM +0200, Yegor Yefremov wrote:
>> >> On Thu, Apr 28, 2016 at 4:37 PM, Bin Liu <[email protected]> wrote:
>> >> > Hi,
>> >> >
>> >> > On Thu, Apr 28, 2016 at 09:51:37AM +0300, Maxim Uvarov wrote:
>> >> >
>> >> > [snip]
>> >> >
>> >> >> Hello Bin,
>> >> >>
>> >> >> yes, it also works with that reset and go to finish:
>> >> >>
>> >> >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> >> >> index c3d5fc9..8cd98e7 100644
>> >> >> --- a/drivers/usb/musb/musb_host.c
>> >> >> +++ b/drivers/usb/musb/musb_host.c
>> >> >> @@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>> >> >> status = -EPROTO;
>> >> >> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> >> >>
>> >> >> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
>> >> >> + musb_writew(epio, MUSB_RXCSR, rx_csr);
>> >> >> +
>> >> >> + goto finish;
>> >> >> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> >> >>
>> >> >> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>> >> >>
>> >> >
>> >> > Thanks for testing it.
>> >>
>> >> Have tested your patch and now both FT4232 and Huawei don't freeze on removal.
>> >>
>> >> Bin, Max thanks for fixing this issue.
>> >>
>> >> Tested-by: Yegor Yefremov <[email protected]>
>> >
>> > Thanks for testing.
>> >
>> > Can you please test the patch [1] instead? I'd like to use it as the
>> > fix.
>> >
>> > [1] http://marc.info/?l=linux-usb&m=146222355213935&w=2
>>
>> The patch behaves the same as the previous one.
>
> Sorry for bringing up this old thread, but it seems to be too aggressive
> to stop scheduling further urbs on errors [1]. So is it possible for you
> to re-test your usecase by reverting commit
>
> dbac5d07d13e ("usb: musb: host: don't start next rx urb if current one failed")
>
> to see if only commit
>
> b5801212229f ("usb: musb: host: clear rxcsr error bit if set")
>
> itself solves your issue?
>
> I know you have tested the patch in [2], which is similar to commit
> b5801212229f, but tha latter doesn't have 'goto finish' which does dma
> cleanup on errors, it makes more sense to me. But I'd like to have you
> tested with reverting dbac5d07d13e to be sure.
>
> [1] https://marc.info/?l=linux-usb&m=151689238420622&w=2
> [2] https://marc.info/?l=linux-kernel&m=146185425805967&w=2
>
> thanks,
> -Bin.
>
--
Best regards,
Maxim Uvarov
Maxim,
On Thu, Jan 25, 2018 at 07:24:02PM +0300, Maxim Uvarov wrote:
> [1] says that issue is with back ported driver to 3.12.10. Can the
> latest kernel be tested on the same hw?
Agreed that it should be tested with the latest kernel. But my concern
now is if stopping scheduling urbs on errors is a right thing to do,
that is why I asked if you can re-test your usecase with reverting the
commit. I am unable to reproduce the original issue you had.
Thanks,
-Bin.
Bin,
I looked to my local git and code does not have this latest line "goto
finish". It was tested without it and everything worked. Right now I
can not get access to that hardware to check with and without. But
only can confirm that without "goto finish" function works with bunch
of drivers (usb ethernet, hids, hdd).
Best regards,
Maxim.
2018-01-25 19:31 GMT+03:00 Bin Liu <[email protected]>:
> Maxim,
>
> On Thu, Jan 25, 2018 at 07:24:02PM +0300, Maxim Uvarov wrote:
>> [1] says that issue is with back ported driver to 3.12.10. Can the
>> latest kernel be tested on the same hw?
>
> Agreed that it should be tested with the latest kernel. But my concern
> now is if stopping scheduling urbs on errors is a right thing to do,
> that is why I asked if you can re-test your usecase with reverting the
> commit. I am unable to reproduce the original issue you had.
>
> Thanks,
> -Bin.
--
Best regards,
Maxim Uvarov
Hi Maxim,
unfortunately we cannot test the latest kernel right now, because we
have custom drivers and additional changes that need to be ported, but
the MUSB driver in our kernel should contain all fixes from
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/usb/musb
Best regards
Tomas
Dne 25.1.2018 v 17:24 Maxim Uvarov napsal(a):
> [1] says that issue is with back ported driver to 3.12.10. Can the
> latest kernel be tested on the same hw?
>
> Maxim.
>
> 2018-01-25 18:45 GMT+03:00 Bin Liu <[email protected]>:
>> Hi Yegor and Max,
>>
>> On Tue, May 03, 2016 at 04:25:58PM +0200, Yegor Yefremov wrote:
>>> On Tue, May 3, 2016 at 3:48 PM, Bin Liu <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> On Tue, May 03, 2016 at 12:03:52PM +0200, Yegor Yefremov wrote:
>>>>> On Thu, Apr 28, 2016 at 4:37 PM, Bin Liu <[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, Apr 28, 2016 at 09:51:37AM +0300, Maxim Uvarov wrote:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Hello Bin,
>>>>>>>
>>>>>>> yes, it also works with that reset and go to finish:
>>>>>>>
>>>>>>> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>>>>>>> index c3d5fc9..8cd98e7 100644
>>>>>>> --- a/drivers/usb/musb/musb_host.c
>>>>>>> +++ b/drivers/usb/musb/musb_host.c
>>>>>>> @@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>>>>>>> status = -EPROTO;
>>>>>>> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>>>>>>>
>>>>>>> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
>>>>>>> + musb_writew(epio, MUSB_RXCSR, rx_csr);
>>>>>>> +
>>>>>>> + goto finish;
>>>>>>> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>>>>>>>
>>>>>>> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>>>>>>>
>>>>>> Thanks for testing it.
>>>>> Have tested your patch and now both FT4232 and Huawei don't freeze on removal.
>>>>>
>>>>> Bin, Max thanks for fixing this issue.
>>>>>
>>>>> Tested-by: Yegor Yefremov <[email protected]>
>>>> Thanks for testing.
>>>>
>>>> Can you please test the patch [1] instead? I'd like to use it as the
>>>> fix.
>>>>
>>>> [1] http://marc.info/?l=linux-usb&m=146222355213935&w=2
>>> The patch behaves the same as the previous one.
>> Sorry for bringing up this old thread, but it seems to be too aggressive
>> to stop scheduling further urbs on errors [1]. So is it possible for you
>> to re-test your usecase by reverting commit
>>
>> dbac5d07d13e ("usb: musb: host: don't start next rx urb if current one failed")
>>
>> to see if only commit
>>
>> b5801212229f ("usb: musb: host: clear rxcsr error bit if set")
>>
>> itself solves your issue?
>>
>> I know you have tested the patch in [2], which is similar to commit
>> b5801212229f, but tha latter doesn't have 'goto finish' which does dma
>> cleanup on errors, it makes more sense to me. But I'd like to have you
>> tested with reverting dbac5d07d13e to be sure.
>>
>> [1] https://marc.info/?l=linux-usb&m=151689238420622&w=2
>> [2] https://marc.info/?l=linux-kernel&m=146185425805967&w=2
>>
>> thanks,
>> -Bin.
>>
>
>
Maxim,
On Fri, Jan 26, 2018 at 12:24:54PM +0300, Maxim Uvarov wrote:
> Bin,
>
> I looked to my local git and code does not have this latest line "goto
> finish". It was tested without it and everything worked. Right now I
> can not get access to that hardware to check with and without. But
> only can confirm that without "goto finish" function works with bunch
> of drivers (usb ethernet, hids, hdd).
Thanks for the confirmation. The revert patch has been sent out.
Regards,
-Bin.
>
> Best regards,
> Maxim.
>
> 2018-01-25 19:31 GMT+03:00 Bin Liu <[email protected]>:
> > Maxim,
> >
> > On Thu, Jan 25, 2018 at 07:24:02PM +0300, Maxim Uvarov wrote:
> >> [1] says that issue is with back ported driver to 3.12.10. Can the
> >> latest kernel be tested on the same hw?
> >
> > Agreed that it should be tested with the latest kernel. But my concern
> > now is if stopping scheduling urbs on errors is a right thing to do,
> > that is why I asked if you can re-test your usecase with reverting the
> > commit. I am unable to reproduce the original issue you had.
> >
> > Thanks,
> > -Bin.
>
>
>
> --
> Best regards,
> Maxim Uvarov
Hi Maxim and Bin,
I would like to reopen this issue, because we have moved to 4.12.24 and
restarting two cellular modules using AT+CFUN=1,1 command at the same
time leads to system freeze, because RX ISR is repeatedly invoked. Maybe
*musb_start_urb* should be always called from *musb_advance_schedule*
after a random communication error, but should not be called repeatedly
if the device has been disconnected.
Best regards
Tomas
Dne 26.1.2018 v 11:42 Tomas Paukrt napsal(a):
> Hi Maxim,
>
> unfortunately we cannot test the latest kernel right now, because we
> have custom drivers and additional changes that need to be ported, but
> the MUSB driver in our kernel should contain all fixes from
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/usb/musb
>
> Best regards
>
> Tomas
>
>
> Dne 25.1.2018 v 17:24 Maxim Uvarov napsal(a):
>> [1] says that issue is with back ported driver to 3.12.10. Can the
>> latest kernel be tested on the same hw?
>>
>> Maxim.
>>
>> 2018-01-25 18:45 GMT+03:00 Bin Liu <[email protected]>:
>>> Hi Yegor and Max,
>>>
>>> On Tue, May 03, 2016 at 04:25:58PM +0200, Yegor Yefremov wrote:
>>>> On Tue, May 3, 2016 at 3:48 PM, Bin Liu <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, May 03, 2016 at 12:03:52PM +0200, Yegor Yefremov wrote:
>>>>>> On Thu, Apr 28, 2016 at 4:37 PM, Bin Liu <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Thu, Apr 28, 2016 at 09:51:37AM +0300, Maxim Uvarov wrote:
>>>>>>>
>>>>>>> [snip]
>>>>>>>
>>>>>>>> Hello Bin,
>>>>>>>>
>>>>>>>> yes, it also works with that reset and go to finish:
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/musb/musb_host.c
>>>>>>>> b/drivers/usb/musb/musb_host.c
>>>>>>>> index c3d5fc9..8cd98e7 100644
>>>>>>>> --- a/drivers/usb/musb/musb_host.c
>>>>>>>> +++ b/drivers/usb/musb/musb_host.c
>>>>>>>> @@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8
>>>>>>>> epnum)
>>>>>>>> status = -EPROTO;
>>>>>>>> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>>>>>>>>
>>>>>>>> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
>>>>>>>> + musb_writew(epio, MUSB_RXCSR, rx_csr);
>>>>>>>> +
>>>>>>>> + goto finish;
>>>>>>>> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>>>>>>>>
>>>>>>>> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>>>>>>>>
>>>>>>> Thanks for testing it.
>>>>>> Have tested your patch and now both FT4232 and Huawei don't
>>>>>> freeze on removal.
>>>>>>
>>>>>> Bin, Max thanks for fixing this issue.
>>>>>>
>>>>>> Tested-by: Yegor Yefremov <[email protected]>
>>>>> Thanks for testing.
>>>>>
>>>>> Can you please test the patch [1] instead? I'd like to use it as the
>>>>> fix.
>>>>>
>>>>> [1] http://marc.info/?l=linux-usb&m=146222355213935&w=2
>>>> The patch behaves the same as the previous one.
>>> Sorry for bringing up this old thread, but it seems to be too
>>> aggressive
>>> to stop scheduling further urbs on errors [1]. So is it possible for
>>> you
>>> to re-test your usecase by reverting commit
>>>
>>> dbac5d07d13e ("usb: musb: host: don't start next rx urb if
>>> current one failed")
>>>
>>> to see if only commit
>>>
>>> b5801212229f ("usb: musb: host: clear rxcsr error bit if set")
>>>
>>> itself solves your issue?
>>>
>>> I know you have tested the patch in [2], which is similar to commit
>>> b5801212229f, but tha latter doesn't have 'goto finish' which does dma
>>> cleanup on errors, it makes more sense to me. But I'd like to have you
>>> tested with reverting dbac5d07d13e to be sure.
>>>
>>> [1] https://marc.info/?l=linux-usb&m=151689238420622&w=2
>>> [2] https://marc.info/?l=linux-kernel&m=146185425805967&w=2
>>>
>>> thanks,
>>> -Bin.
>>>
>>
>>
>