Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751845AbdG1J22 (ORCPT ); Fri, 28 Jul 2017 05:28:28 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:55898 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751045AbdG1J2Z (ORCPT ); Fri, 28 Jul 2017 05:28:25 -0400 Date: Fri, 28 Jul 2017 10:28:32 +0100 From: Will Deacon To: Vikram Mulukutla Cc: qiaozhou , Thomas Gleixner , John Stultz , sboyd@codeaurora.org, LKML , Wang Wilbur , Marc Zyngier , Peter Zijlstra , linux-kernel-owner@vger.kernel.org, sudeep.holla@arm.com Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync Message-ID: <20170728092831.GA24839@arm.com> References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4042 Lines: 112 On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote: > On 2017-07-26 18:29, qiaozhou wrote: > >On 2017年07月26日 22:16, Thomas Gleixner wrote: > >>On Wed, 26 Jul 2017, qiaozhou wrote: > >>For that particular timer case we can clear base->running_timer w/o the > >>lock held (see patch below), but this kind of > >> > >> lock -> test -> unlock -> retry > >> > >>loops are all over the place in the kernel, so this is going to hurt you > >>sooner than later in some other place. > >It's true. This is the way spinlock is used normally and widely in > >kernel. I'll also ask ARM experts whether we can do something to avoid > >or reduce the chance of such issue. ARMv8.1 has one single > >instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can > >improve and reduce the chance. > > I think we should have this discussion now - I brought this up earlier [1] > and I promised a test case that I completely forgot about - but here it > is (attached). Essentially a Big CPU in an acquire-check-release loop > will have an unfair advantage over a little CPU concurrently attempting > to acquire the same lock, in spite of the ticket implementation. If the Big > CPU needs the little CPU to make forward progress : livelock. > > We've run into the same loop construct in other spots in the kernel and > the reason that a real symptom is so rare is that the retry-loop on the > 'Big' > CPU needs to be interrupted just once by say an IRQ/FIQ and the live-lock > is broken. If the entire retry loop is within an interrupt-disabled critical > section then the odds of live-locking are much higher. > > An example of the problem on a previous kernel is here [2]. Changes to the > workqueue code since may have fixed this particular instance. > > One solution was to use udelay(1) in such loops instead of cpu_relax(), but > that's not very 'relaxing'. I'm not sure if there's something we could do > within the ticket spin-lock implementation to deal with this. Does bodging cpu_relax to back-off to wfe after a while help? The event stream will wake it up if nothing else does. Nasty patch below, but I'd be interested to know whether or not it helps. Will --->8 diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 64c9e78f9882..1f5a29c8612e 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -149,9 +149,11 @@ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); +void __cpu_relax(unsigned long pc); + static inline void cpu_relax(void) { - asm volatile("yield" ::: "memory"); + __cpu_relax(_THIS_IP_); } /* Thread switching */ diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c index 67368c7329c0..be8a698ea680 100644 --- a/arch/arm64/kernel/arm64ksyms.c +++ b/arch/arm64/kernel/arm64ksyms.c @@ -72,6 +72,8 @@ EXPORT_SYMBOL(_mcount); NOKPROBE_SYMBOL(_mcount); #endif +EXPORT_SYMBOL(__cpu_relax); + /* arm-smccc */ EXPORT_SYMBOL(__arm_smccc_smc); EXPORT_SYMBOL(__arm_smccc_hvc); diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 659ae8094ed5..c394c3704b7f 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -403,6 +403,31 @@ unsigned long get_wchan(struct task_struct *p) return ret; } +static DEFINE_PER_CPU(u64, __cpu_relax_data); + +#define CPU_RELAX_WFE_THRESHOLD 10000 +void __cpu_relax(unsigned long pc) +{ + u64 new, old = raw_cpu_read(__cpu_relax_data); + u32 old_pc, new_pc; + bool wfe = false; + + old_pc = (u32)old; + new = new_pc = (u32)pc; + + if (old_pc == new_pc) { + if ((old >> 32) > CPU_RELAX_WFE_THRESHOLD) { + asm volatile("sevl; wfe; wfe\n" ::: "memory"); + wfe = true; + } else { + new = old + (1UL << 32); + } + } + + if (this_cpu_cmpxchg(__cpu_relax_data, old, new) == old && !wfe) + asm volatile("yield" ::: "memory"); +} + unsigned long arch_align_stack(unsigned long sp) { if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)