Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759876Ab3DCNFH (ORCPT ); Wed, 3 Apr 2013 09:05:07 -0400 Received: from mail.abilis.ch ([195.70.19.74]:10904 "EHLO mail.abilis.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754773Ab3DCNFF convert rfc822-to-8bit (ORCPT ); Wed, 3 Apr 2013 09:05:05 -0400 Date: Wed, 3 Apr 2013 15:03:41 +0200 From: Christian Ruppert To: Thomas Gleixner Cc: Vineet Gupta , Pierrick Hascoet , LKML , Peter Zijlstra , Ingo Molnar Subject: Re: [PATCH] timer: Fix possible issues with non serialized timer_pending( ) Message-ID: <20130403130340.GA13826@ab42.lan> References: <1364553218-31255-1-git-send-email-vgupta@synopsys.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: User-Agent: Mutt/1.5.20 (2009-12-10) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4940 Lines: 133 Hello Vineet, In the following a patch for a seemingly unrelated problem (RB-trees in hrtimer implementation) which seems to fix the original problem at the same time. For the moment, this patch is insufficiently tested but I am posting it here (preliminarily) since it seems to be in line with Thomas' comment: The implementation of irqsave/restore in other architectures (MIPS, ARM) imply memory barriers and adding this implicit barrier to ARC code as well seems to stabilise some of our remaining crashes. Greetings, Christian On Wed, Apr 03, 2013 at 02:36:59PM +0200, Thomas Gleixner wrote: > Vineet, > > On Fri, 29 Mar 2013, Vineet Gupta wrote: > > > When stress testing ARC Linux from 3.9-rc3, we've hit a serialization > > issue when mod_timer() races with itself. This is on a FPGA board and > > kernel .config among others has !SMP and !PREEMPT_COUNT. > > > > The issue happens in mod_timer( ) because timer_pending( ) based early > > exit check is NOT done inside the timer base spinlock - as a networking > > optimization. > > > > The value used in there, timer->entry.next is also used further in call > > chain (all inlines though) for actual list manipulation. However if the > > register containing this pointer remains live across the spinlock (in a > > UP setup with !PREEMPT_COUNT there's nothing forcing gcc to reload) then > > a stale value of next pointer causes incorrect list manipulation, > > observed with following sequence in our tests. > > > > (0). tv1[x] <----> t1 <---> t2 > > (1). mod_timer(t1) interrupted after it calls timer_pending() > > (2). mod_timer(t2) completes > > (3). mod_timer(t1) resumes but messes up the list. > > (4). __runt_timers( ) uses bogus timer_list entry / crashes in > > timer->function > > > > The simplest fix is to NOT rely on spinlock based compiler barrier but > > add an explicit one in timer_pending() > > That's simple, but dangerous. There is other code which relies on the > implicit barriers of spinlocks, so I think we need to add the barrier > to the !PREEMPT_COUNT implementation of preempt_*() macros. > > Thanks, > > tglx > > > FWIW, the relevant ARCompact disassembly of mod_timer which clearly > > shows the issue due to register reuse is: > > > > mod_timer: > > push_s blink > > mov_s r13,r0 # timer, timer > > > > ... > > ###### timer_pending( ) > > ld_s r3,[r13] # <------ .entry.next LOADED > > brne r3, 0, @.L163 > > > > .L163: > > .... > > ###### spin_lock_irq( ) > > lr r5, [status32] # flags > > bic r4, r5, 6 # temp, flags, > > and.f 0, r5, 6 # flags, > > flag.nz r4 > > > > ###### detach_if_pending( ) begins > > > > tst_s r3,r3 <-------------- > > # timer_pending( ) checks timer->entry.next > > # r3 is NOT reloaded by gcc, using stale value > > beq.d @.L169 > > mov.eq r0,0 > > > > # detach_timer( ): __list_del( ) > > > > ld r4,[r13,4] # .entry.prev, D.31439 > > st r4,[r3,4] # .prev, D.31439 > > st r3,[r4] # .next, D.30246 > > > > Signed-off-by: Vineet Gupta > > Reported-by: Christian Ruppert > > Cc: Thomas Gleixner > > Cc: Christian Ruppert > > Cc: Pierrick Hascoet > > Cc: linux-kernel@vger.kernel.org > > --- > > include/linux/timer.h | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/timer.h b/include/linux/timer.h > > index 8c5a197..1537104 100644 > > --- a/include/linux/timer.h > > +++ b/include/linux/timer.h > > @@ -168,7 +168,16 @@ static inline void init_timer_on_stack_key(struct timer_list *timer, > > */ > > static inline int timer_pending(const struct timer_list * timer) > > { > > - return timer->entry.next != NULL; > > + int pending = timer->entry.next != NULL; > > + > > + /* > > + * The check above enables timer fast path - early exit. > > + * However most of the call sites are not protected by timer->base > > + * spinlock. If the caller (say mod_timer) races with itself, it > > + * can use the stale "next" pointer. See commit log for details. > > + */ > > + barrier(); > > + return pending; > > } > > > > extern void add_timer_on(struct timer_list *timer, int cpu); > > -- > > 1.7.10.4 > > > > -- Christian Ruppert , /| Tel: +41/(0)22 816 19-42 //| 3, Chemin du Pr?-Fleuri _// | bilis Systems CH-1228 Plan-les-Ouates -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/