Received: by 10.213.65.68 with SMTP id h4csp2394551imn; Thu, 5 Apr 2018 14:18:01 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/+KeW6YEs7npOAg+d9G/jR9eQq3ROJ0BV+4mdYpX32kjhaZWRBeziuy8CkW63U+tqGKy6T X-Received: by 10.98.211.4 with SMTP id q4mr18635996pfg.0.1522963081906; Thu, 05 Apr 2018 14:18:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522963081; cv=none; d=google.com; s=arc-20160816; b=pAeGse+Tp2xSyrk2kshxL/6NJ1SgUjbizapmXSlzFKHQ6PitDO3SDERF+cj63cx1NG dmFW3SZEpD5Xiqdd4gf5xW5DqrVfBw9Hc8I3Fyk/SeyqBy0G5iqRGWR1CO0A0Qme6YMY zQxfnunM3ab55ynW4oYY3fSDUCVNutW6nhQ5Fr0jgOaNllypRkcI/VKD15JqDFheM7Cf ueJ5CrrGxkeLp+tBmHiMguod8gGSKss8t+YjtW4GDN0+P+hLe4x7GW83l2OGDuqKXImS hvWhisfRsm8b3GYU0vG301wdfZ5kB+w6/XINPzNdPDRzpIb/QYSYfL3MW1Gla9roqkeD ePng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :arc-authentication-results; bh=ecHMqZ/YlRck/+ISm65MQstZf4aKA8fLk7BKmhuyFlU=; b=Ofkp2LfZ45GGS5oBJoJ5Qp2ffGj2+qWIgPhXq8L+4fjRvLD1M52ZAFMRPnMUka8KPS P7CjZtmN3795PeyTwNm139dWDXm+T9Hn9EpoPsY/qA+PMD9GPmJVYim8c8jaIRFU8UOY om+/J8ylZtzmU1TPAgAQQwVxlyQZKWE+8oz8ml6J+is6jHYOIxaJa9ejMQJ4fkGwJMtA oB6QJJW2IqPMvUcuV5AR5jyp98+Iv3A9PCKWqxEENv4LTT8sM7VXXYyzGAvf8o6PgRHK 91ahZGzMaZyFQnhU86mMdKckMYF0n2AjCPp7XZsAJ8/HXoud/J9P+1Tt0I7uDYRGlsph N1+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b9-v6si1135157plb.160.2018.04.05.14.17.47; Thu, 05 Apr 2018 14:18:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751483AbeDEVQS (ORCPT + 99 others); Thu, 5 Apr 2018 17:16:18 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40484 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751278AbeDEVQR (ORCPT ); Thu, 5 Apr 2018 17:16:17 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 205268185338; Thu, 5 Apr 2018 21:16:17 +0000 (UTC) Received: from llong.remote.csb (dhcp-17-164.bos.redhat.com [10.18.17.164]) by smtp.corp.redhat.com (Postfix) with ESMTP id B508C2026980; Thu, 5 Apr 2018 21:16:16 +0000 (UTC) Subject: Re: [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop from locking slowpath To: Will Deacon , linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, peterz@infradead.org, mingo@kernel.org, boqun.feng@gmail.com, paulmck@linux.vnet.ibm.com, catalin.marinas@arm.com References: <1522947547-24081-1-git-send-email-will.deacon@arm.com> <1522947547-24081-3-git-send-email-will.deacon@arm.com> From: Waiman Long Organization: Red Hat Message-ID: Date: Thu, 5 Apr 2018 17:16:16 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <1522947547-24081-3-git-send-email-will.deacon@arm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 05 Apr 2018 21:16:17 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 05 Apr 2018 21:16:17 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/05/2018 12:58 PM, Will Deacon wrote: > The qspinlock locking slowpath utilises a "pending" bit as a simple form > of an embedded test-and-set lock that can avoid the overhead of explicit > queuing in cases where the lock is held but uncontended. This bit is > managed using a cmpxchg loop which tries to transition the uncontended > lock word from (0,0,0) -> (0,0,1) or (0,0,1) -> (0,1,1). > > Unfortunately, the cmpxchg loop is unbounded and lockers can be starved > indefinitely if the lock word is seen to oscillate between unlocked > (0,0,0) and locked (0,0,1). This could happen if concurrent lockers are > able to take the lock in the cmpxchg loop without queuing and pass it > around amongst themselves. > > This patch fixes the problem by unconditionally setting _Q_PENDING_VAL > using atomic_fetch_or, and then inspecting the old value to see whether > we need to spin on the current lock owner, or whether we now effectively > hold the lock. The tricky scenario is when concurrent lockers end up > queuing on the lock and the lock becomes available, causing us to see > a lockword of (n,0,0). With pending now set, simply queuing could lead > to deadlock as the head of the queue may not have observed the pending > flag being cleared. Conversely, if the head of the queue did observe > pending being cleared, then it could transition the lock from (n,0,0) -> > (0,0,1) meaning that any attempt to "undo" our setting of the pending > bit could race with a concurrent locker trying to set it. > > We handle this race by preserving the pending bit when taking the lock > after reaching the head of the queue and leaving the tail entry intact > if we saw pending set, because we know that the tail is going to be > updated shortly. > > Cc: Peter Zijlstra > Cc: Ingo Molnar > Signed-off-by: Will Deacon > --- > kernel/locking/qspinlock.c | 80 ++++++++++++++++++++-------------------------- > 1 file changed, 35 insertions(+), 45 deletions(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index a192af2fe378..b75361d23ea5 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -294,7 +294,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, > void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > { > struct mcs_spinlock *prev, *next, *node; > - u32 new, old, tail; > + u32 old, tail; > int idx; > > BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); > @@ -306,58 +306,48 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) > return; > > /* > + * If we observe any contention; queue. > + */ > + if (val & ~_Q_LOCKED_MASK) > + goto queue; > + > + /* > * trylock || pending > * > * 0,0,0 -> 0,0,1 ; trylock > * 0,0,1 -> 0,1,1 ; pending > */ > - for (;;) { > + val = atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); > + if (!(val & ~_Q_LOCKED_MASK)) { > /* > - * If we observe any contention; queue. > + * we're pending, wait for the owner to go away. > + * > + * *,1,1 -> *,1,0 > + * > + * this wait loop must be a load-acquire such that we match the > + * store-release that clears the locked bit and create lock > + * sequentiality; this is because not all > + * clear_pending_set_locked() implementations imply full > + * barriers. > */ > - if (val & ~_Q_LOCKED_MASK) > - goto queue; > - > - new = _Q_LOCKED_VAL; > - if (val == new) > - new |= _Q_PENDING_VAL; > - > + if (val & _Q_LOCKED_MASK) > + smp_cond_load_acquire(&lock->val.counter, > + !(VAL & _Q_LOCKED_MASK)); > /* > - * Acquire semantic is required here as the function may > - * return immediately if the lock was free. > + * take ownership and clear the pending bit. > + * > + * *,1,0 -> *,0,1 > */ > - old = atomic_cmpxchg_acquire(&lock->val, val, new); > - if (old == val) > - break; > - > - val = old; > - } > - > - /* > - * we won the trylock > - */ > - if (new == _Q_LOCKED_VAL) > + clear_pending_set_locked(lock); > return; > + } > > /* > - * we're pending, wait for the owner to go away. > - * > - * *,1,1 -> *,1,0 > - * > - * this wait loop must be a load-acquire such that we match the > - * store-release that clears the locked bit and create lock > - * sequentiality; this is because not all clear_pending_set_locked() > - * implementations imply full barriers. > - */ > - smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_MASK)); > - > - /* > - * take ownership and clear the pending bit. > - * > - * *,1,0 -> *,0,1 > + * If pending was clear but there are waiters in the queue, then > + * we need to undo our setting of pending before we queue ourselves. > */ > - clear_pending_set_locked(lock); > - return; > + if (!(val & _Q_PENDING_MASK)) > + atomic_andnot(_Q_PENDING_VAL, &lock->val); Can we add a clear_pending() helper that will just clear the byte if _Q_PENDING_BITS == 8? That will eliminate one atomic instruction from the failure path. -Longman