Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp719453imm; Wed, 26 Sep 2018 05:53:46 -0700 (PDT) X-Google-Smtp-Source: ACcGV62i3YeTbK0E8T6arj+FTmbzZVooXH6Cx7XHATM/KYB0q4tlB01/loHYrIys4pvEY1nnV+DT X-Received: by 2002:a63:e05:: with SMTP id d5-v6mr1752687pgl.272.1537966426193; Wed, 26 Sep 2018 05:53:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537966426; cv=none; d=google.com; s=arc-20160816; b=g/RH18vuqMNmNoPo4afGuM+cCbkcRcT6/B5rhBCumv6Ea4RagLnjXv2ZUptQ1tz8C5 +Qa9LEpia1ojHbFb3fDsoYZsrn3f+LVCowifS1P0lVd3KLqYzNvoILaAy5Vc6k7h8/r5 ZpwVmVw2BQPsTJG7u111gWlXKiXgaGm2oFsrz1xWbHOdiQIi+Hr02w7eMJN4ahRuAYbO zsQXsMmWK7GZo/vRIeV65YPo9UTOml0EBqzinPHvYLULnpAbLPdTNlY546d7yLnM39UL F+QdRL5I6JuML3ZoU3KD0Nb4Aqu+LAnJaQIaGLYwK0GMogVr6tYU78LyR6G4FMRKFepA gIFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Qg0zqzPxC1MVWNpQVfNvnK3k+IHD7Gl7AMJc4KOBEmU=; b=W9N72ILBwLmkPHFUdTYqGtov8E3OPZufQ0vmDkmWk3ySy7I6sdM/32K/AeymcLsPRe M3uvPBti3LC9MDUAZrfWlbcx9SusCi5xRZRX3yeTglUAhq3/wq5UE4de3KXV+ea2UNti ADMjoipZ9oNwrjyRAsw4gXFGWUh2WAOeCNvDrpkkUcwkG/DqRnImicw7qPBOMuAMnVbz LeVpaMqMYFvEVAM3TjzpJFb1MXxt35NDT/KJ1CNNoKwHzfJm7t3uDGxN0owmExTmqVzb PgYHAEquw8Z/OyflBzFN+/uSjHccktZDp2DHCIqTuKBsJql0ibzC6T3Jr0M7oB4BRUz/ mqHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s71-v6si4909393pfa.367.2018.09.26.05.53.30; Wed, 26 Sep 2018 05:53:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728264AbeIZTF7 (ORCPT + 99 others); Wed, 26 Sep 2018 15:05:59 -0400 Received: from foss.arm.com ([217.140.101.70]:45394 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726768AbeIZTF6 (ORCPT ); Wed, 26 Sep 2018 15:05:58 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7E75C7A9; Wed, 26 Sep 2018 05:53:07 -0700 (PDT) Received: from brain-police (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 362B43F5B7; Wed, 26 Sep 2018 05:53:05 -0700 (PDT) Date: Wed, 26 Sep 2018 13:53:02 +0100 From: Will Deacon To: Sebastian Andrzej Siewior Cc: linux-kernel@vger.kernel.org, Daniel Wagner , Peter Zijlstra , x86@kernel.org, Linus Torvalds , "H. Peter Anvin" , Boqun Feng , "Paul E. McKenney" Subject: Re: [Problem] Cache line starvation Message-ID: <20180926125301.GE2979@brain-police> References: <20180921120226.6xjgr4oiho22ex75@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180921120226.6xjgr4oiho22ex75@linutronix.de> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote: > We reproducibly observe cache line starvation on a Core2Duo E6850 (2 > cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4 > cores). > > Instrumentation show always the picture: > > CPU0 CPU1 > => do_syscall_64 => do_syscall_64 > => SyS_ptrace => syscall_slow_exit_work > => ptrace_check_attach => ptrace_do_notify / rt_read_unlock > => wait_task_inactive rt_spin_lock_slowunlock() > -> while task_running() __rt_mutex_unlock_common() > / check_task_state() mark_wakeup_next_waiter() > | raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(¤t->pi_lock); > | . . > | raw_spin_unlock_irq(&p->pi_lock); . > \ cpu_relax() . > - . > *IRQ* > > In the error case we observe that the while() loop is repeated more than > 5000 times which indicates that the pi_lock can be acquired. CPU1 on the > other side does not make progress waiting for the same lock with interrupts > disabled. > > This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ > the other CPU is able to acquire pi_lock and the situation relaxes. > > Peter suggested to do a clwb(&p->pi_lock); before the cpu_relax() in > wait_task_inactive() which on both the Core2Duo and the SKL gets runtime > patched to clflush(). That hides it as well. Given the broadcast nature of cache-flushing, I'd be pretty nervous about adding it on anything other than a case-by-case basis. That doesn't sound like something we'd want to maintain... It would also be interesting to know whether the problem is actually before the cache (i.e. if the lock actually sits in the store buffer on CPU0). Does MFENCE/DSB after the unlock() help at all? We've previously seen something similar to this on arm64 in big/little systems where the big cores can loop around and re-take a spinlock before the little guys can get in the queue or take a ticket. I bodged that in cpu_relax(), but there's a magic heuristic which I couldn't figure out how to specify: https://lkml.org/lkml/2017/7/28/172 For A72 (which is the core I think you're using) it would be interesting to try both: (1) Removing the prfm instruction from spin_lock(), and (2) Setting bit 42 of CPUACTLR_EL1 on each CPU (probably needs a firmware change) That should prevent the lock() operation from speculatively pulling in the cacheline in a unique state. More recent Arm CPUs have atomic instructions which, apart from CAS, *should* avoid this starvation issue entirely. Will