Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755587Ab2KMSwn (ORCPT ); Tue, 13 Nov 2012 13:52:43 -0500 Received: from mail-pa0-f46.google.com ([209.85.220.46]:49906 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755167Ab2KMSwl (ORCPT ); Tue, 13 Nov 2012 13:52:41 -0500 Date: Tue, 13 Nov 2012 10:52:37 -0800 From: Greg Kroah-Hartman To: "Liu, Chuansheng" Cc: Thomas Gleixner , Martin Steigerwald , "linux-kernel@vger.kernel.org" , Ingo Molnar Subject: Re: [REGRESSION] 3.7-rc3+git hard lockup on CPU after inserting/removing USB stick Message-ID: <20121113185237.GA26995@kroah.com> References: <201211071501.38288.Martin@lichtvoll.de> <201211101734.07084.Martin@lichtvoll.de> <27240C0AC20F114CBF8149A2696CBE4A1C2915@SHSMSX101.ccr.corp.intel.com> <201211121527.02007.Martin@lichtvoll.de> <27240C0AC20F114CBF8149A2696CBE4A1C3887@SHSMSX101.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <27240C0AC20F114CBF8149A2696CBE4A1C3887@SHSMSX101.ccr.corp.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5601 Lines: 131 On Tue, Nov 13, 2012 at 12:47:31AM +0000, Liu, Chuansheng wrote: > > > > -----Original Message----- > > From: Thomas Gleixner [mailto:tglx@linutronix.de] > > Sent: Tuesday, November 13, 2012 3:32 AM > > To: Martin Steigerwald > > Cc: Liu, Chuansheng; linux-kernel@vger.kernel.org; Ingo Molnar; Greg > > Kroah-Hartman > > Subject: Re: [REGRESSION] 3.7-rc3+git hard lockup on CPU after > > inserting/removing USB stick > > > > On Mon, 12 Nov 2012, Martin Steigerwald wrote: > > > Am Sonntag, 11. November 2012 schrieb Liu, Chuansheng: > > > > > The first bad commit is: > > > > > > > > > > commit 73d4066055e0e2830533041f4b91df8e6e5976ff > > > > > Author: Chuansheng Liu > > > > > Date: Tue Sep 11 16:00:30 2012 +0800 > > > > > > > > > > USB/host: Cleanup unneccessary irq disable code > > > > > > > > > > Because the IRQF_DISABLED as the flag is now a NOOP and has > > been > > > > > deprecated and in hardirq context the interrupt is disabled. > > > > > > > > > > so in usb/host code: > > > > > Removing the usage of flag IRQF_DISABLED; > > > > > Removing the calling local_irq save/restore actions in irq > > > > > handler usb_hcd_irq(); > > > > > > > > > > Signed-off-by: liu chuansheng > > > > > Acked-by: Alan Stern > > > > > Signed-off-by: Greg Kroah-Hartman > > > > > > > > > > > > > > > But: > > > > > > > > > > This ony happens with threadirqs option! > > > > > > > > > > When I remove threadirqs from kernel command line and reboot with this > > > > > last bisect kernel USB sticks work. > > > > > > > > > > That may explain why nobody else has seen this. > > > > > > > > > > So I will try a 3.7-rc4 now, but without threadirqs enabled. > > > > > > > > > Thanks your pointing out, the USB HCD irq handler is designed to > > > > execute in irq handler with irq disabled. When threadirqs is in > > > > commandline, it will be executed in thread context with local irq > > > > enabling, which causes this hardlockup. > > > > No. The problem is caused by the commit above. USB with threaded > > interrupt handlers worked perfectly fine in the past. > The reason is that removing local_irq_save/restore() in function usb_hcd_irq(). > The hard lockup analysis is: > CPU3 > Usb_hcd_irq() --> > ehci_irq --> > spin_lock (&ehci->lock); > ... > if (status == ~(u32) 0) { > > At this time, one hrtimer is coming: > ehci_hrtimer_func() --> > spin_lock_irqsave(&ehci->lock, flags); > > Due to the spin_lock has been hold before, it causes the deadlock. > > The dmesg is as below: > [ 155.010424] [] ? _raw_spin_lock_irqsave+0x3f/0x48 > [ 155.010649] [] ? _raw_spin_lock_irqsave+0x3f/0x48 > [ 155.010884] [] ? _raw_spin_lock_irqsave+0x3f/0x48 > [ 155.011104] <> [] ehci_hrtimer_func+0x28/0xb5 [ehci_hcd] > [ 155.011446] [] ? __remove_hrtimer+0x31/0x8b > [ 155.011661] [] ? end_free_itds+0x108/0x108 [ehci_hcd] > [ 155.011911] [] __run_hrtimer+0xb9/0x176 > [ 155.012105] [] hrtimer_interrupt+0xcb/0x1b4 > [ 155.012311] [] ? irq_thread_fn+0x35/0x35 > [ 155.012509] [] smp_apic_timer_interrupt+0x71/0x84 > [ 155.012742] [] apic_timer_interrupt+0x6a/0x70 > [ 155.012950] [] ? ehci_irq+0x39/0x267 [ehci_hcd] > [ 155.013230] [] ? __schedule+0x57f/0x5b2 > [ 155.013424] [] ? irq_thread_fn+0x35/0x35 > [ 155.013645] [] usb_hcd_irq+0x20/0x2f [usbcore] > [ 155.013874] [] irq_forced_thread_fn+0x20/0x3e > > > > > > > --- a/drivers/usb/core/hcd.c > > > > +++ b/drivers/usb/core/hcd.c > > > > @@ -2349,7 +2349,7 @@ static int usb_hcd_request_irqs(struct usb_hcd > > *hcd, > > > > if (hcd->driver->irq) { > > > > snprintf(hcd->irq_descr, sizeof(hcd->irq_descr), > > "%s:usb%d", > > > > hcd->driver->description, > > hcd->self.busnum); > > > > - retval = request_irq(irqnum, &usb_hcd_irq, irqflags, > > > > + retval = request_irq(irqnum, &usb_hcd_irq, > > irqflags|IRQF_NO_THREAD, > > > > hcd->irq_descr, hcd); > > > > NAK. This is exactly the wrong thing to do. > > > > We want to be able to run that code in an handler thread. So you > As Martin's experience: > "I think that thread IRQs for USB based interrupts might not be a good > from another experience." > Maybe it shows something. > > > removed the local_irq_save/restore() in the driver code and with > > forced threaded irqs this breaks. Now setting IRQF_NO_THREAD is just > > working around the problem that the above commit broke it. > > > > There is no hard requirement to run USB interrupts in hard interrupt > > context. I'd rather see the above commit reverted and then a proper > > analysis done why removing local_irq_save/restore() breaks forced > > threaded interrupt handlers. > As you said, we can revert the patch directly, or submit a new patch to just add > local_irq_save/restore() back. Let me revert this for now, as it's obviously causing problems. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/