Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2102332imm; Thu, 27 Sep 2018 07:28:24 -0700 (PDT) X-Google-Smtp-Source: ACcGV63+/zcmzNpmq+To+knxX5jv2K4h0r8WFJ/eiUBJnnxxVFbEB/vMAMAZKl5WZbMrH1dAVVFw X-Received: by 2002:a17:902:a40a:: with SMTP id p10-v6mr11569123plq.118.1538058504230; Thu, 27 Sep 2018 07:28:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538058504; cv=none; d=google.com; s=arc-20160816; b=dMHiPH4ZvoN2rvqKlUi0CFF0en4laq4tJSGtkgkD+zt51iiw4eHfAcFNUfJXNEL/9a cziMf/1Z6aNFvupprMRHj3S3O96Z5irYG9cmWUaJlS2boobEI2q6/uAeDkzyeKm/t8gS 3Yz+jp423mJA2Pd8U5Ak9O7NPdNd97vTTY0b/Qyuy4EkxAiZRTx+mUu1gDT7/UwMtd+V lyb1vF+QZeuUiqe47pWJVBEHmV/ss5m1nd0I+rwRaZthzNR6diIm5zBzNtdW5vH3lF9t p7WCocHK00iLh+zv2y8jWyU9LZvsdrE8ru3I5iPAuggh4fCrW5x+C9MOH6195xXBRu7j zxgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=fUqONYwbSp7RcIJMv7tYMCJ4rn3xty0TtnJE8hd3CUo=; b=xAOmosmcgeFA5eku2u2eToSyug+7GvMeGPLa6dFipSVtmb9m9bplol603PsGq/j8Ko vzIT8zX7A8CkplLexavUlA71dt6AQw3PVpH9mZrlolD2mFgGHBC0aHfKqyAbJiVnE33Q n0bKlhzh8/k/mRi70FQWcPuHB8azno2FUWBTKgvm0gbBp+UFQ5rYSmTihGpiuz/rT6fz dWXzEW0MoRZQwrt5MZOqksxdeLBAdLHqOH8pzV7yEdg+NzaHRfC/6yCH586/cr/xvnew J93IlLW4G0HWgXJxMRiuWKtgPjIcdI8i6LrIsv+gwcd0aO83HdIThi/227mUjzLLxwjk ECIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v14-v6si2145745pgo.449.2018.09.27.07.28.08; Thu, 27 Sep 2018 07:28:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728075AbeI0Uo0 (ORCPT + 99 others); Thu, 27 Sep 2018 16:44:26 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:52095 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727270AbeI0Uo0 (ORCPT ); Thu, 27 Sep 2018 16:44:26 -0400 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1g5XEi-0005A1-CN; Thu, 27 Sep 2018 16:25:48 +0200 Date: Thu, 27 Sep 2018 16:25:47 +0200 From: Kurt Kanzenbach To: Will Deacon Cc: Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, Daniel Wagner , Peter Zijlstra , x86@kernel.org, Linus Torvalds , "H. Peter Anvin" , Boqun Feng , "Paul E. McKenney" , Mark Rutland Subject: Re: [Problem] Cache line starvation Message-ID: <20180927142547.ucgh5elb7pxs46dq@linutronix.de> References: <20180921120226.6xjgr4oiho22ex75@linutronix.de> <20180926125301.GE2979@brain-police> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180926125301.GE2979@brain-police> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Will, On Wed, Sep 26, 2018 at 01:53:02PM +0100, Will Deacon wrote: > Hi all, > > On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote: > > We reproducibly observe cache line starvation on a Core2Duo E6850 (2 > > cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4 > > cores). > > > > Instrumentation show always the picture: > > > > CPU0 CPU1 > > => do_syscall_64 => do_syscall_64 > > => SyS_ptrace => syscall_slow_exit_work > > => ptrace_check_attach => ptrace_do_notify / rt_read_unlock > > => wait_task_inactive rt_spin_lock_slowunlock() > > -> while task_running() __rt_mutex_unlock_common() > > / check_task_state() mark_wakeup_next_waiter() > > | raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(¤t->pi_lock); > > | . . > > | raw_spin_unlock_irq(&p->pi_lock); . > > \ cpu_relax() . > > - . > > *IRQ* > > > > In the error case we observe that the while() loop is repeated more than > > 5000 times which indicates that the pi_lock can be acquired. CPU1 on the > > other side does not make progress waiting for the same lock with interrupts > > disabled. > > > > This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ > > the other CPU is able to acquire pi_lock and the situation relaxes. > > > > Peter suggested to do a clwb(&p->pi_lock); before the cpu_relax() in > > wait_task_inactive() which on both the Core2Duo and the SKL gets runtime > > patched to clflush(). That hides it as well. > > Given the broadcast nature of cache-flushing, I'd be pretty nervous about > adding it on anything other than a case-by-case basis. That doesn't sound > like something we'd want to maintain... It would also be interesting to know > whether the problem is actually before the cache (i.e. if the lock actually > sits in the store buffer on CPU0). Does MFENCE/DSB after the unlock() help at > all? > > We've previously seen something similar to this on arm64 in big/little > systems where the big cores can loop around and re-take a spinlock before > the little guys can get in the queue or take a ticket. I bodged that in > cpu_relax(), but there's a magic heuristic which I couldn't figure out how > to specify: > > https://lkml.org/lkml/2017/7/28/172 > > For A72 (which is the core I think you're using) it would be interesting to > try both: > > (1) Removing the prfm instruction from spin_lock(), and > (2) Setting bit 42 of CPUACTLR_EL1 on each CPU (probably needs a > firmware change) correct, we use the Cortex A72. I followed your suggestions. I've removed the prefetch instructions from the spin lock implementation in the v4.9 kernel. In addition I've modified armv8/start.S in U-Boot to setup bit 42 in CPUACTLR_EL1 (S3_1_c15_c2_0). We've also made sure, that this bit is actually written for each CPU by reading their register value in the kernel. However, the issue still triggers fine. With stress-ng we're able to generate latency in millisecond range. The only workaround we've found so far is to add a "delay" in cpu_relax(). Any ideas, what we can test further? Thanks, Kurt > > That should prevent the lock() operation from speculatively pulling in the > cacheline in a unique state. > > More recent Arm CPUs have atomic instructions which, apart from CAS, > *should* avoid this starvation issue entirely. > > Will >