Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752507AbcD1Gvk (ORCPT ); Thu, 28 Apr 2016 02:51:40 -0400 Received: from mail-yw0-f194.google.com ([209.85.161.194]:36847 "EHLO mail-yw0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751689AbcD1Gvi (ORCPT ); Thu, 28 Apr 2016 02:51:38 -0400 MIME-Version: 1.0 In-Reply-To: <20160427212823.GA14645@uda0271908> References: <1461739918-9583-1-git-send-email-muvarov@gmail.com> <20160427154620.GB5604@uda0271908> <20160427191356.GA16821@uda0271908> <20160427212823.GA14645@uda0271908> Date: Thu, 28 Apr 2016 09:51:37 +0300 Message-ID: Subject: Re: [PATCHv2] musb_host: fix lockup on rxcsr_h_error From: Maxim Uvarov To: Bin Liu , Maxim Uvarov , linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, Greg KH , sergei.shtylyov@cogentembedded.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6896 Lines: 199 2016-04-28 0:28 GMT+03:00 Bin Liu : > Hi, > > On Wed, Apr 27, 2016 at 02:13:56PM -0500, Bin Liu wrote: >> Hi, >> >> On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote: >> > 2016-04-27 18:46 GMT+03:00 Bin Liu : >> > > Hi, >> > > >> > > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote: >> > >> Fix soft lockup when resetting remote device attached >> > >> to usb host. Configuration: >> > >> pppd -> musb hub -> usb-serial -> gsm modem >> > > >> > > I have heard a few reports similar to this symptom, but never been able >> > > to reproduce it on my side. >> > > >> > >> > Ok, I can reproduce it almost very easy. >> > >> > >> When gsm modem resets, musb rolls in incoming rx interrupts >> > >> which does not give any time to other application as result >> > >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR >> > > >> > > Have you looked where exact place in the interrupt routine the execution >> > > has stuck in? >> > > >> > >> > It does not stuck. It goes to that line which print proto error over >> > and over again and >> > nothing stops that. After some time kernel reports lockup. But >> > actually it's not stuck, >> > all cpu time was eaten by executing that handlers. >> > >> > >> > >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code >> > >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS. >> > > >> > > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures >> > > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in >> > > musb_regs.h. >> > > >> > >> >> > >> Signed-off-by: Max Uvarov >> > >> --- >> > >> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov). >> > >> >> > >> drivers/usb/musb/musb_host.c | 12 +++++------- >> > >> 1 file changed, 5 insertions(+), 7 deletions(-) >> > >> >> > >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c >> > >> index c3d5fc9..2d9aa78 100644 >> > >> --- a/drivers/usb/musb/musb_host.c >> > >> +++ b/drivers/usb/musb/musb_host.c >> > >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum) >> > > >> > > What kernel do you use? This line # is away off from upstream kernel. >> > > >> > >> > I did this patch for 4.1 but 4.6 has the same problem and patch >> > cleanly applies to the latest torvalds/linux.git v4.6-rc5. This >> > interrupt handler has the same code. And looks like on 3.14 >> >> Yeah, this code hasn't been chaned for year. But in general, it is >> prepfered to create patches on latest kernel to avoid other headache. >> >> > everything worked. I don't have a time to diff 2 versions. Might be >> > regression. >> > >> > >> > >> >> > >> /* stall; record URB status */ >> > >> status = -EPIPE; >> > >> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) { >> > >> >> > >> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) { >> > >> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum); >> > >> - >> > >> - status = -EPROTO; >> > >> - musb_writeb(epio, MUSB_RXINTERVAL, 0); >> > >> - >> > >> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) { >> > >> + if (rx_csr & MUSB_RXCSR_H_ERROR) { >> > >> + status = -EPROTO; >> > >> + musb_writeb(epio, MUSB_RXINTERVAL, 0); >> > >> + } >> > > >> > > Please help me to understand how this change fixes the issue. I see the >> > > most effect of the change here is directly 'goto finish' so that 'done' >> > > flag is not set, then musb_advance_schedule() is not called. Is this the >> > > case or I missed other important pieces? >> > > >> > >> > Right that is the goal. On this rxcsr_h_error kernel reschedules >> > current interrupt. And that continues forever. For example adding >> >> The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR >> bit, but the current driver doesn't. I am wondering if this causes the >> controller keeps generating the same interrupt. Can you please try the >> following change instead to see if the lockup goes away? >> >> @@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum) >> status = -EPROTO; >> musb_writeb(epio, MUSB_RXINTERVAL, 0); >> >> + rx_csr &= ~MUSB_RXCSR_H_ERROR; >> + musb_writew(epio, MUSB_RXCSR, rx_csr); > > + goto finish; > > Please also add the line above. I will spend more time to understand > what is happening... > Hello Bin, yes, it also works with that reset and go to finish: diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c index c3d5fc9..8cd98e7 100644 --- a/drivers/usb/musb/musb_host.c +++ b/drivers/usb/musb/musb_host.c @@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum) status = -EPROTO; musb_writeb(epio, MUSB_RXINTERVAL, 0); + rx_csr &= ~MUSB_RXCSR_H_ERROR; + musb_writew(epio, MUSB_RXCSR, rx_csr); + + goto finish; } else if (rx_csr & MUSB_RXCSR_DATAERROR) { if (USB_ENDPOINT_XFER_ISOC != qh->type) { That I think a key thing, which is done in other error. If that change is good for you than I'm also happy with it. I also not sure if musb_writeb(epio, MUSB_RXINTERVAL, 0); is needed. In my case it's the same result with it and without it. In other scenarios might be reasonable... > First of all, I don't like the idea of merging the two branches, it > makes the code ugly. Yes, I don't like that function at all, it's too long and difficult to read if you first look on it first time. It will be good to split it on 3 small functions for each big if. Maxim. > > Regards, > -Bin. > >> + >> } else if (rx_csr & MUSB_RXCSR_DATAERROR) { >> >> if (USB_ENDPOINT_XFER_ISOC != qh->type) { >> >> Regards, >> -Bin. >> >> > msleep() can give some time for other processes. I'm not an expert in >> > this chip but I think that right solution in that case is not try to >> > reschedule and quick and allow hub to make reset and once again init >> > all devices (in my case ppp/pppd also shutdowns and then I bring >> > everything up with script.). The same behavior with dma and pio mode. >> > >> > Regards, >> > Max. >> > >> > > Thanks, >> > > -Bin. >> > > >> > >> >> > >> if (USB_ENDPOINT_XFER_ISOC != qh->type) { >> > >> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum); >> > >> -- >> > >> 1.9.1 >> > >> >> > >> -- >> > >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in >> > >> the body of a message to majordomo@vger.kernel.org >> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > >> > >> > -- >> > Best regards, >> > Maxim Uvarov -- Best regards, Maxim Uvarov