Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751529AbdGZOQp (ORCPT ); Wed, 26 Jul 2017 10:16:45 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:49135 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750893AbdGZOQo (ORCPT ); Wed, 26 Jul 2017 10:16:44 -0400 Date: Wed, 26 Jul 2017 16:16:37 +0200 (CEST) From: Thomas Gleixner To: qiaozhou cc: John Stultz , sboyd@codeaurora.org, LKML , Wang Wilbur , Marc Zyngier , Will Deacon , Peter Zijlstra Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync In-Reply-To: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> Message-ID: References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4892 Lines: 124 On Wed, 26 Jul 2017, qiaozhou wrote: Cc'ed ARM folks. > I want to ask you for suggestions about how to fix one contention between > expire_timers and try_to_del_timer_sync. Thanks in advance. > The issue is a hard-lockup issue detected on our platform(arm64, one cluster > with 4 a53, and the other with 4 a73). The sequence is as below: > 1. core0 checks expired timers, and wakes up a process on a73, in step 1.3. > 2. core4 starts to run the process and try to delete the timer in sync method, > in step 2.1. > 3. Before core0 can run to 1.4 and get the lock, core4 has already run from > 2.1 to 2.6 in while loop. And in step 2.4, it fails since the timer is still > the running timer. > > core0(a53): core4(a73): > run_timer_softirq > __run_timers > spin_lock_irq(&base->lock) > expire_timers() > 1.1: base->running_timer = timer; > 1.2: spin_unlock_irq(&base->lock); > 1.3: call_timer_fn(timer, fn, data); > schedule() > 1.4: spin_lock_irq(&base->lock); del_timer_sync(&timer) > 1.5: back to 1.1 in while loop > 2.1: try_to_del_timer_sync() > 2.2: get_timer_base() > 2.3: spin_lock_irqsave(&base->lock, flags); > 2.4: check "base->running_timer != timer" > 2.5: spin_unlock_irqrestore(&base->lock, flags); > 2.6:cpu_relax(); (back to 2.1 in while loop) This is horribly formatted. > Core0 runs @832MHz, and core4 runs @1.8GHz. A73 is also much more powerful in > design than a53. So the actual running is that a73 keeps looping in spin_lock > -> check running_timer -> spin_unlock -> spin_lock ->...., while a53 can't > even succeed to complete one store exclusive instruction, before it can enter > WFE to wait for lock release. It just keeps looping in +30 and+3c in below > asm, due to stxr fails. > > So it deadloop, a53 can't jump out ldaxr/stxr loop, while a73 can't pass the > running_timer check. Finally the hard-lockup is triggered.(BTW, the issue > needs a long time to occur.) > > <_raw_spin_lock_irq+0x2c>: prfm pstl1strm, [x19] > /<_raw_spin_lock_irq+0x30>: ldaxr w0, [x19]// > //<_raw_spin_lock_irq+0x34>: add w1, w0, #0x10, lsl #12// > //<_raw_spin_lock_irq+0x38>: stxr w2, w1, [x19]// > //<_raw_spin_lock_irq+0x3c>: cbnz w2, 0xffffff80089f52e0 > <_raw_spin_lock_irq+0x30>/ > <_raw_spin_lock_irq+0x40>: eor w1, w0, w0, ror #16 > <_raw_spin_lock_irq+0x44>: cbz w1, 0xffffff80089f530c > <_raw_spin_lock_irq+0x5c> > <_raw_spin_lock_irq+0x48>: sevl > <_raw_spin_lock_irq+0x4c>: wfe > <_raw_spin_lock_irq+0x50>: ldaxrh w2, [x19] > <_raw_spin_lock_irq+0x54>: eor w1, w2, w0, lsr #16 > <_raw_spin_lock_irq+0x58>: cbnz w1, 0xffffff80089f52fc > <_raw_spin_lock_irq+0x4c> > > The loop on a53 only has 4 instructions, and loop on a73 has ~100 > instructions. Still a53 has no chance to store exclusive successfully. It may > be related with ldaxr/stxr cost, core frequency, core number etc. > > I have no idea of fixing it in spinlock/unlock implement, so I try to fix it > in timer driver. I prepared a raw patch, not sure it's the correct direction > to solve this issue. Could you help to give some suggestions? Thanks. > > From fb8fbfeb8f9f92fdadd9920ce234fd433bc883e1 Mon Sep 17 00:00:00 2001 > From: Qiao Zhou > Date: Wed, 26 Jul 2017 20:30:33 +0800 > Subject: [PATCH] RFC: timers: try to fix contention between expire_timers and > del_timer_sync > > try to fix contention between expire_timers and del_timer_sync by > adding TIMER_WOKEN status, which means that the timer has already > woken up corresponding process. Timers do a lot more things than waking up a process. Just because this happens in a particular case for you where a wakeup is involved this does not mean, that this is generally true. And that whole flag business is completely broken. There is a comment in call_timer_fn() which says: /* * It is permissible to free the timer from inside the * function that is called from it .... So you _CANNOT_ touch timer after the function returned. It might be gone already. For that particular timer case we can clear base->running_timer w/o the lock held (see patch below), but this kind of lock -> test -> unlock -> retry loops are all over the place in the kernel, so this is going to hurt you sooner than later in some other place. Thanks, tglx 8<------------ --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1301,10 +1301,12 @@ static void expire_timers(struct timer_b if (timer->flags & TIMER_IRQSAFE) { raw_spin_unlock(&base->lock); call_timer_fn(timer, fn, data); + base->running_timer = NULL; raw_spin_lock(&base->lock); } else { raw_spin_unlock_irq(&base->lock); call_timer_fn(timer, fn, data); + base->running_timer = NULL; raw_spin_lock_irq(&base->lock); } }