Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2119780imm; Thu, 27 Sep 2018 07:44:23 -0700 (PDT) X-Google-Smtp-Source: ACcGV61Agtx2YiolVNeSvV+33SjMDEQuEXZkDFOvDBauIihsNRnFI18Ejm8GAML+uzu2V7FeG0tK X-Received: by 2002:a17:902:e088:: with SMTP id cb8-v6mr11418338plb.189.1538059463370; Thu, 27 Sep 2018 07:44:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538059463; cv=none; d=google.com; s=arc-20160816; b=HfiTDDJcQOWox9wsxZ2PDvUgGW2hz4Yh5gIrDs0RXd8bjblCFzlVkFYIArXlAU/UXa xwhuvYDDMMbOTAu5eTfZnXNbj9/aCXflgd1OUs3p6YQf6n8lt+/r3OoE/V74b1bZ4M8C N9hX300VtWBaKqlD5oq7+xSxDRSMPrE4/18hwCGlu969LxrqqsehY36dAOqVC65SfSnq BUCwi87UZQ5qOZIRbV1qPQKHlHMWUiqodLdKrmidrso+Qm1eyv6aHePerD0oa7PJm/Fc XU9XY4RwrrUSBNTDHOnKezxWHNVJB/fGOcFZcqWFkV21IRCl/+wof6doFPxEtDi1nyq3 RiSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=3vFlgPBKI1vaUJfh4U9sIIDGj3JzznQgHzQulKWAKXY=; b=mgBdOU/N8TTT3+i4CQ1rUC7ScWY97Vkf/RvFaG1yStNddkrf0BLuiUJ76Mn4CYV8vT SFLmpbnzgpKTMOmaGlaXSFeASmsfJqT7z8kOIh2HJbMfcIm3ZjicZmQPM/l6Ty/UKoAr /OH0cz4m+Fo+RxLNHyD2IpchRGZ2373Rhb3kVfB3gTmuEGjW6i4Am4pxtVZL+9VBJpnd OMBtJfsN9LoATo8ItqJFn5h2guRdaLF+uOM8sWSy7i6rlJrIznfHY4AsIf4pUgfKb6mw gNKr2Z5VZD6j1zyQrmO/NsL1Sruk+kRwYyQfkgKAsR4t3n8jihHIBLLj+ZAQ6ttxU5Pf ETfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d5-v6si2448083pln.471.2018.09.27.07.44.06; Thu, 27 Sep 2018 07:44:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728087AbeI0VAG (ORCPT + 99 others); Thu, 27 Sep 2018 17:00:06 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:52154 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727076AbeI0VAG (ORCPT ); Thu, 27 Sep 2018 17:00:06 -0400 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1g5XTs-0005Ol-9b; Thu, 27 Sep 2018 16:41:28 +0200 Date: Thu, 27 Sep 2018 16:41:27 +0200 From: Kurt Kanzenbach To: Will Deacon Cc: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, Daniel Wagner , Peter Zijlstra , x86@kernel.org, Linus Torvalds , "H. Peter Anvin" , Boqun Feng , "Paul E. McKenney" , Mark Rutland Subject: Re: [Problem] Cache line starvation Message-ID: <20180927144127.qdkem4juhztxdkdb@linutronix.de> References: <20180921120226.6xjgr4oiho22ex75@linutronix.de> <20180926125301.GE2979@brain-police> <20180927142547.ucgh5elb7pxs46dq@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180927142547.ucgh5elb7pxs46dq@linutronix.de> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Will, On Thu, Sep 27, 2018 at 04:25:47PM +0200, Kurt Kanzenbach wrote: > Hi Will, > > On Wed, Sep 26, 2018 at 01:53:02PM +0100, Will Deacon wrote: > > Hi all, > > > > On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote: > > > We reproducibly observe cache line starvation on a Core2Duo E6850 (2 > > > cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4 > > > cores). > > > > > > Instrumentation show always the picture: > > > > > > CPU0 CPU1 > > > => do_syscall_64 => do_syscall_64 > > > => SyS_ptrace => syscall_slow_exit_work > > > => ptrace_check_attach => ptrace_do_notify / rt_read_unlock > > > => wait_task_inactive rt_spin_lock_slowunlock() > > > -> while task_running() __rt_mutex_unlock_common() > > > / check_task_state() mark_wakeup_next_waiter() > > > | raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(¤t->pi_lock); > > > | . . > > > | raw_spin_unlock_irq(&p->pi_lock); . > > > \ cpu_relax() . > > > - . > > > *IRQ* > > > > > > In the error case we observe that the while() loop is repeated more than > > > 5000 times which indicates that the pi_lock can be acquired. CPU1 on the > > > other side does not make progress waiting for the same lock with interrupts > > > disabled. > > > > > > This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ > > > the other CPU is able to acquire pi_lock and the situation relaxes. > > > > > > Peter suggested to do a clwb(&p->pi_lock); before the cpu_relax() in > > > wait_task_inactive() which on both the Core2Duo and the SKL gets runtime > > > patched to clflush(). That hides it as well. > > > > Given the broadcast nature of cache-flushing, I'd be pretty nervous about > > adding it on anything other than a case-by-case basis. That doesn't sound > > like something we'd want to maintain... It would also be interesting to know > > whether the problem is actually before the cache (i.e. if the lock actually > > sits in the store buffer on CPU0). Does MFENCE/DSB after the unlock() help at > > all? > > > > We've previously seen something similar to this on arm64 in big/little > > systems where the big cores can loop around and re-take a spinlock before > > the little guys can get in the queue or take a ticket. I bodged that in > > cpu_relax(), but there's a magic heuristic which I couldn't figure out how > > to specify: > > > > https://lkml.org/lkml/2017/7/28/172 > > > > For A72 (which is the core I think you're using) it would be interesting to > > try both: > > > > (1) Removing the prfm instruction from spin_lock(), and > > (2) Setting bit 42 of CPUACTLR_EL1 on each CPU (probably needs a > > firmware change) > > correct, we use the Cortex A72. > > I followed your suggestions. I've removed the prefetch instructions from > the spin lock implementation in the v4.9 kernel. In addition I've > modified armv8/start.S in U-Boot to setup bit 42 in CPUACTLR_EL1 > (S3_1_c15_c2_0). We've also made sure, that this bit is actually written > for each CPU by reading their register value in the kernel. > > However, the issue still triggers fine. With stress-ng we're able to > generate latency in millisecond range. The only workaround we've found > so far is to add a "delay" in cpu_relax(). It might interesting for you, how we added the delay. We've used: static inline void cpu_relax(void) { volatile int i = 0; asm volatile("yield" ::: "memory"); while (i++ <= 1000); } Of course it's not efficient, but it works. Thanks, Kurt > > Any ideas, what we can test further? > > Thanks, > Kurt > > > > > That should prevent the lock() operation from speculatively pulling in the > > cacheline in a unique state. > > > > More recent Arm CPUs have atomic instructions which, apart from CAS, > > *should* avoid this starvation issue entirely. > > > > Will > >