Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4179330yba; Wed, 17 Apr 2019 06:20:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqzF8dZbyjHgPRYqd4RWjO3duzF/ty27e5iMJ6JJlkz2wZU1R5mWMYNzOJMc8jiLREpJ6Mvg X-Received: by 2002:a17:902:7785:: with SMTP id o5mr89412629pll.33.1555507215564; Wed, 17 Apr 2019 06:20:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555507215; cv=none; d=google.com; s=arc-20160816; b=hD0YoNX+XyJOH9OYEM5H5jJxgoqESl1WsC5yg6QG5srlw9UTgOyEr53QElJVyVCo3R 0Elou4L3TXEY+LMvc6tAOnwY2GBtDRC21ayDT2POnG0OTe6SE4lQJkM71h013dQ20G9q knxiHSVMeT5rhmOmm8Vq+Jkpnxn72MNKm2y1XCNBce4sPAzPQOYcdQ8/F55KPNijAIod HtxNNzvPwb97aJnQVDPYXRg6HjI3L8sFH0T/ScXzCj7nSpu6ZB2/HbWbzWHTbsfJovV+ q1fq9ZwYV2T0/KY8B/Br/yN7M16+vfz6Cr+ZbPVAKPUC7rhxH+8upt6JgKMPgoaio993 TGEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=fdyer6bfSm2SazDbbCKmrCzq19V6BKZwEybJOXAoor0=; b=tIe7ZgPa35hJXdD2xgqKcYcel/dHasriFO62SMwJaPLAcrf7Pw1KW/LIw3WDyew2F4 L4jqV7awl4i7ZLn/JRGt6TrwKP4D7Z6fde/zijLnhzxe3k4KBnzCAnOWIGr0R6HreKEM OSLki2QUrWdLdBzFNcZQJbBR/7sAit+X/2fdtta+IHMnQ9ZDdF2vmiMRsnP3KGqBvI2n LYj9IeA0f3mOVESWI9vYSQWxRmtqH4FBOzCHB7norPcmjiiPHh9kO3dsRv2YBYeYOC/W p06qlMi+C4F9TDdzo0UYEB/WnmzoiRjrj9tO3McmxMQJKaukbfUyJLJVFPIQlopxeRT4 4diw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=VKrAuPHI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y204si47883494pfb.184.2019.04.17.06.20.00; Wed, 17 Apr 2019 06:20:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=VKrAuPHI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732194AbfDQNS6 (ORCPT + 99 others); Wed, 17 Apr 2019 09:18:58 -0400 Received: from merlin.infradead.org ([205.233.59.134]:57302 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726800AbfDQNS5 (ORCPT ); Wed, 17 Apr 2019 09:18:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=fdyer6bfSm2SazDbbCKmrCzq19V6BKZwEybJOXAoor0=; b=VKrAuPHI8lbrwzhHRHu/aB7g4 eIbJkJyaZpVhBKgp6K17Zlcjm+jTf5UvCwvGBHGmgZq3z/dkBRFgP5MRobo37UQ146kLCPDtKb9Sc F0OnsWxu30YienrGvw/UMHJPsXGRkmexT1sk9LsarUEpOQmD8D4IPDBp3p+SDA7nouXLHOBZXZiMf WIZ6EmHK2ilwf9RLXpZPRaS2LFFU7ij/HHDwgJ9eD9+h7581scno5LD6hE0k0vk4OgsfA/lVJofEq WSY/3aTQrmE6kFrbFZ8Zrz9DQhWU9PPvwALqV/lqIQD8L22tOD1jdJq9lLjX+XWfxbi8EoMhx5zI2 MwjdOeLow==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1hGkSY-0002mv-VE; Wed, 17 Apr 2019 13:18:43 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 6E30829B52F48; Wed, 17 Apr 2019 15:18:41 +0200 (CEST) Date: Wed, 17 Apr 2019 15:18:41 +0200 From: Peter Zijlstra To: Waiman Long Cc: Ingo Molnar , Will Deacon , Thomas Gleixner , linux-kernel@vger.kernel.org, x86@kernel.org, Davidlohr Bueso , Linus Torvalds , Tim Chen , huang ying Subject: Re: [PATCH v4 09/16] locking/rwsem: Ensure an RT task will not spin on reader Message-ID: <20190417131841.GF4038@hirez.programming.kicks-ass.net> References: <20190413172259.2740-1-longman@redhat.com> <20190413172259.2740-10-longman@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190413172259.2740-10-longman@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 13, 2019 at 01:22:52PM -0400, Waiman Long wrote: > An RT task can do optimistic spinning only if the lock holder is > actually running. If the state of the lock holder isn't known, there > is a possibility that high priority of the RT task may block forward > progress of the lock holder if it happens to reside on the same CPU. > This will lead to deadlock. So we have to make sure that an RT task > will not spin on a reader-owned rwsem. > > When the owner is temporarily set to NULL, it is more tricky to decide > if an RT task should stop spinning as it may be a temporary state > where another writer may have just stolen the lock which then failed > the task's trylock attempt. So one more retry is allowed to make sure > that the lock is not spinnable by an RT task. > > When testing on a 8-socket IvyBridge-EX system, the one additional retry > seems to improve locking performance of RT write locking threads under > heavy contentions. The table below shows the locking rates (in kops/s) > with various write locking threads before and after the patch. > > Locking threads Pre-patch Post-patch > --------------- --------- ----------- > 4 2,753 2,608 > 8 2,529 2,520 > 16 1,727 1,918 > 32 1,263 1,956 > 64 889 1,343 > > Signed-off-by: Waiman Long > --- > kernel/locking/rwsem.c | 36 +++++++++++++++++++++++++++++------- > 1 file changed, 29 insertions(+), 7 deletions(-) > > diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c > index 2d6850c3e77b..8e19b5141595 100644 > --- a/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -539,6 +539,8 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) > static bool rwsem_optimistic_spin(struct rw_semaphore *sem) > { > bool taken = false; > + bool is_rt_task = rt_task(current); Arguably this is wrong; a remote CPU could change the scheduling atributes of this task while it is spinning. In practise I don't think we do that without forcing a reschedule, but in theory we could if we find the task is current anyway. > + int prev_owner_state = OWNER_NULL; > > preempt_disable(); > > @@ -556,7 +558,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) > * 2) readers own the lock as we can't determine if they are > * actively running or not. > */ > - while (rwsem_spin_on_owner(sem) & OWNER_SPINNABLE) { > + for (;;) { > + enum owner_state owner_state = rwsem_spin_on_owner(sem); > + > + if (!(owner_state & OWNER_SPINNABLE)) > + break; > + > /* > * Try to acquire the lock > */ > @@ -566,13 +573,28 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) > } > > /* > - * When there's no owner, we might have preempted between the > - * owner acquiring the lock and setting the owner field. If > - * we're an RT task that will live-lock because we won't let > - * the owner complete. > + * An RT task cannot do optimistic spinning if it cannot > + * be sure the lock holder is running or live-lock may > + * happen if the current task and the lock holder happen > + * to run in the same CPU. > + * > + * When there's no owner or is reader-owned, an RT task > + * will stop spinning if the owner state is not a writer > + * at the previous iteration of the loop. This allows the > + * RT task to recheck if the task that steals the lock is > + * a spinnable writer. If so, it can keeps on spinning. > + * > + * If the owner is a writer, the need_resched() check is > + * done inside rwsem_spin_on_owner(). If the owner is not > + * a writer, need_resched() check needs to be done here. > */ > - if (!sem->owner && (need_resched() || rt_task(current))) > - break; > + if (owner_state != OWNER_WRITER) { > + if (need_resched()) > + break; > + if (is_rt_task && (prev_owner_state != OWNER_WRITER)) > + break; > + } > + prev_owner_state = owner_state; > > /* > * The cpu_relax() call is a compiler barrier which forces This patch confuses me mightily. I mean, I see what it does, but I can't figure out why. The Changelog is just one big source of confusion. If you want one extra trylock attempt, why make it conditional on RT, why not something simple like this? --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -556,7 +556,7 @@ static bool rwsem_optimistic_spin(struct */ if (rwsem_try_write_lock_unqueued(sem)) { taken = true; - break; + goto unlock; } /* @@ -576,6 +576,11 @@ static bool rwsem_optimistic_spin(struct */ cpu_relax(); } + + if (rwsem_try_write_lock_unqueued(sem)) + taken = true; + +unlock: osq_unlock(&sem->osq); done: preempt_enable();