Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751501AbdG0BaA (ORCPT ); Wed, 26 Jul 2017 21:30:00 -0400 Received: from [140.206.112.106] ([140.206.112.106]:23375 "EHLO mail2012.asrmicro.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751210AbdG0B37 (ORCPT ); Wed, 26 Jul 2017 21:29:59 -0400 Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync To: Thomas Gleixner CC: John Stultz , , LKML , Wang Wilbur , Marc Zyngier , Will Deacon , Peter Zijlstra References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> From: qiaozhou Message-ID: Date: Thu, 27 Jul 2017 09:29:20 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.1.50.16] X-ClientProxiedBy: mail2012.asrmicro.com (10.1.24.123) To mail2012.asrmicro.com (10.1.24.123) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6511 Lines: 162 On 2017年07月26日 22:16, Thomas Gleixner wrote: > On Wed, 26 Jul 2017, qiaozhou wrote: > > Cc'ed ARM folks. > >> I want to ask you for suggestions about how to fix one contention between >> expire_timers and try_to_del_timer_sync. Thanks in advance. >> The issue is a hard-lockup issue detected on our platform(arm64, one cluster >> with 4 a53, and the other with 4 a73). The sequence is as below: >> 1. core0 checks expired timers, and wakes up a process on a73, in step 1.3. >> 2. core4 starts to run the process and try to delete the timer in sync method, >> in step 2.1. >> 3. Before core0 can run to 1.4 and get the lock, core4 has already run from >> 2.1 to 2.6 in while loop. And in step 2.4, it fails since the timer is still >> the running timer. >> >> core0(a53): core4(a73): >> run_timer_softirq >> __run_timers >> spin_lock_irq(&base->lock) >> expire_timers() >> 1.1: base->running_timer = timer; >> 1.2: spin_unlock_irq(&base->lock); >> 1.3: call_timer_fn(timer, fn, data); >> schedule() >> 1.4: spin_lock_irq(&base->lock); del_timer_sync(&timer) >> 1.5: back to 1.1 in while loop >> 2.1: try_to_del_timer_sync() >> 2.2: get_timer_base() >> 2.3: spin_lock_irqsave(&base->lock, flags); >> 2.4: check "base->running_timer != timer" >> 2.5: spin_unlock_irqrestore(&base->lock, flags); >> 2.6:cpu_relax(); (back to 2.1 in while loop) > This is horribly formatted. Sorry for the format. Updated the calling sequence on two cores. core0(a53): core4(a73): run_timer_softirq __run_timers spin_lock_irq(&base->lock) expire_timers() 1.1: base->running_timer = timer; 1.2: spin_unlock_irq(&base->lock); 1.3: call_timer_fn(timer, fn, data); 1.4: spin_lock_irq(&base->lock); 1.5: back to 1.1 in while loop core4(a73): schedule() del_timer_sync(&timer) 2.1: try_to_del_timer_sync() 2.2: get_timer_base() 2.3: spin_lock_irqsave(&base->lock, flags); 2.4: check "base->running_timer != timer" 2.5: spin_unlock_irqrestore(&base->lock, flags); 2.6:cpu_relax(); (back to 2.1 in while loop) > >> Core0 runs @832MHz, and core4 runs @1.8GHz. A73 is also much more powerful in >> design than a53. So the actual running is that a73 keeps looping in spin_lock >> -> check running_timer -> spin_unlock -> spin_lock ->...., while a53 can't >> even succeed to complete one store exclusive instruction, before it can enter >> WFE to wait for lock release. It just keeps looping in +30 and+3c in below >> asm, due to stxr fails. >> >> So it deadloop, a53 can't jump out ldaxr/stxr loop, while a73 can't pass the >> running_timer check. Finally the hard-lockup is triggered.(BTW, the issue >> needs a long time to occur.) >> >> <_raw_spin_lock_irq+0x2c>: prfm pstl1strm, [x19] >> /<_raw_spin_lock_irq+0x30>: ldaxr w0, [x19]// >> //<_raw_spin_lock_irq+0x34>: add w1, w0, #0x10, lsl #12// >> //<_raw_spin_lock_irq+0x38>: stxr w2, w1, [x19]// >> //<_raw_spin_lock_irq+0x3c>: cbnz w2, 0xffffff80089f52e0 >> <_raw_spin_lock_irq+0x30>/ >> <_raw_spin_lock_irq+0x40>: eor w1, w0, w0, ror #16 >> <_raw_spin_lock_irq+0x44>: cbz w1, 0xffffff80089f530c >> <_raw_spin_lock_irq+0x5c> >> <_raw_spin_lock_irq+0x48>: sevl >> <_raw_spin_lock_irq+0x4c>: wfe >> <_raw_spin_lock_irq+0x50>: ldaxrh w2, [x19] >> <_raw_spin_lock_irq+0x54>: eor w1, w2, w0, lsr #16 >> <_raw_spin_lock_irq+0x58>: cbnz w1, 0xffffff80089f52fc >> <_raw_spin_lock_irq+0x4c> >> >> The loop on a53 only has 4 instructions, and loop on a73 has ~100 >> instructions. Still a53 has no chance to store exclusive successfully. It may >> be related with ldaxr/stxr cost, core frequency, core number etc. >> >> I have no idea of fixing it in spinlock/unlock implement, so I try to fix it >> in timer driver. I prepared a raw patch, not sure it's the correct direction >> to solve this issue. Could you help to give some suggestions? Thanks. >> >> From fb8fbfeb8f9f92fdadd9920ce234fd433bc883e1 Mon Sep 17 00:00:00 2001 >> From: Qiao Zhou >> Date: Wed, 26 Jul 2017 20:30:33 +0800 >> Subject: [PATCH] RFC: timers: try to fix contention between expire_timers and >> del_timer_sync >> >> try to fix contention between expire_timers and del_timer_sync by >> adding TIMER_WOKEN status, which means that the timer has already >> woken up corresponding process. > Timers do a lot more things than waking up a process. > > Just because this happens in a particular case for you where a wakeup is > involved this does not mean, that this is generally true. Yes, you're right. My intention is that as the timer function is executed, maybe we can do something. > > And that whole flag business is completely broken. There is a comment in > call_timer_fn() which says: > > /* > * It is permissible to free the timer from inside the > * function that is called from it .... > > So you _CANNOT_ touch timer after the function returned. It might be gone > already. You're right. Touching timer here has risk. > > For that particular timer case we can clear base->running_timer w/o the > lock held (see patch below), but this kind of > > lock -> test -> unlock -> retry > > loops are all over the place in the kernel, so this is going to hurt you > sooner than later in some other place. It's true. This is the way spinlock is used normally and widely in kernel. I'll also ask ARM experts whether we can do something to avoid or reduce the chance of such issue. ARMv8.1 has one single instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can improve and reduce the chance. > > Thanks, > > tglx > > 8<------------ > > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1301,10 +1301,12 @@ static void expire_timers(struct timer_b > if (timer->flags & TIMER_IRQSAFE) { > raw_spin_unlock(&base->lock); > call_timer_fn(timer, fn, data); > + base->running_timer = NULL; > raw_spin_lock(&base->lock); > } else { > raw_spin_unlock_irq(&base->lock); > call_timer_fn(timer, fn, data); > + base->running_timer = NULL; > raw_spin_lock_irq(&base->lock); > } > } It should work for this particular issue and I'll test it. Previously I thought it was unsafe to touch base->running_timer without holding lock. Thanks a lot for all the suggestions. Best Regards Qiao