Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754883AbbGCJXk (ORCPT ); Fri, 3 Jul 2015 05:23:40 -0400 Received: from www.linutronix.de ([62.245.132.108]:46344 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753974AbbGCJXb (ORCPT ); Fri, 3 Jul 2015 05:23:31 -0400 Date: Fri, 3 Jul 2015 11:23:12 +0200 (CEST) From: Thomas Gleixner To: Geert Uytterhoeven cc: Simon Horman , Kevin Hilman , Tyler Baker , Borislav Petkov , Geert Uytterhoeven , Magnus Damm , Linux-sh list , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" Subject: Re: Possible regression due to "tick: broadcast: Prevent livelock from event handler" In-Reply-To: Message-ID: References: <20150703024044.GB24695@verge.net.au> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2009 Lines: 66 On Fri, 3 Jul 2015, Geert Uytterhoeven wrote: > Hi Simon, > > On Fri, Jul 3, 2015 at 4:40 AM, Simon Horman wrote: > > I have observed what appears to be a regression while testing next-20150702 > > which seems to be caused by 2951d5c031a3 ("tick: broadcast: Prevent > > livelock from event handler"). > > > > The problem manifests on the emev2/kzm9d board as per the boot log below. > > > > The problem manifests when booting using the shmobile_defconfig, > > which uses multiplatform and enables all devices using DT. > > > > The problem does not appear to always manifest but anecdotally it > > seems to manifest more often of late (yes, I know that is vague). > > > hctosys: unable to open rtc device (rtc0) > > > > The boot hangs here. > > The next line should be: > > > > smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 33 > > As you can reproduce it, can you please try enabling lockdep debugging? Just looking at the em_sti driver. It calls clk_prepare/unprepare from interrupt disabled regions ... But that's not the problem at hand I think. The above commit is moving the call to the event handler on the local cpu out of the broadcast lock region to prevent a live lock. The only real change is the timing. Before: bc_handler() lock(bc_lock); call_local_handler(); send_ipis(); reprogramm_bc_device(); unlock(bc_lock); After: bc_handler() lock(bc_lock); send_ipis(); reprogramm_bc_device(); unlock(bc_lock); call_local_handler(); As this runs in hard interrupt context with interrupts disabled, I really cannot figure out how that makes a difference. Can you add some debugging to figure out whether the broadcast timer interrupt still fires? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/