Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1221958imm; Wed, 26 Sep 2018 13:52:40 -0700 (PDT) X-Google-Smtp-Source: ACcGV60FBhZqIZrtCoQc2VEtJZ6+IsTR0IvXnOZ6cclznKopVWF/j+USh5T6/VNkZ/cS2kMMygnG X-Received: by 2002:a63:a0d:: with SMTP id 13-v6mr7369399pgk.318.1537995160732; Wed, 26 Sep 2018 13:52:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537995160; cv=none; d=google.com; s=arc-20160816; b=xXyN+UUkBx15ny+dCokPVl5WwriJo4Y6mrlUGlyy5Iycqxe3HCopxi///Xh/gq0uwl BTphJFUI+4Pvc689kJXVTInbjlmePjKcQpc96iec0KKgSF+zNvRx+AUGLnjCTJQrv29z m9xdb1jjqs6ou33KGczxqOWXR+dRWrDmqNsuwPKmzcdfybOmYP4Qgk5sSbwu2oZDZ9EY OcqcB8ZVxmEL7Yo1cNYMlo8RE1tV5S5qqw4Uih6As+KJpSO4y5PoCJutXWdq7KSd/6RD KLeQu13uwvGssP28Np8JShqojW41QvXX6sAY8Ue6wRJuBTuu1ofxV/0FnyeAshcOzwMa PPyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=G4xMxAAf1wbpRXgI9Baw6y3Dc74TJPkMUxhPqwlof0U=; b=kOU2Lk9RQG6VGvzb7JAG7hhSN5wc/+In9S+6fNX0XgOrfFlsUh4largAS8+Ozz5E79 uTP5jhDHJVyIFA5hVmf39GbFlCshAf9roEQMmQtUkUPQa5JOHiUMP6US/0cqiVtXKziO 4xjC3BNnkf47ic6FUu0i2o4+qL8S86oGZ5OGDVAi8tNr4UCnTRY7qA+NKGvFyXur4rxy PdJt5cQ5OpuBdzgGIcgP0tEuNyln68PfQNQGRRkRPhaUjyoKoqfVyxvwXEJp/TZ1mgt1 2Oheu51AJMUApH6s1n9JW+uJL3Eje4cuDpGZiZJimF+P34cCg8TlRqfivmH2h2h8yIiZ co5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amarulasolutions.com header.s=google header.b=eL7MLyrL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n10-v6si17055pfb.316.2018.09.26.13.52.24; Wed, 26 Sep 2018 13:52:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amarulasolutions.com header.s=google header.b=eL7MLyrL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726953AbeI0DHE (ORCPT + 99 others); Wed, 26 Sep 2018 23:07:04 -0400 Received: from mail-ed1-f68.google.com ([209.85.208.68]:33319 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726453AbeI0DHE (ORCPT ); Wed, 26 Sep 2018 23:07:04 -0400 Received: by mail-ed1-f68.google.com with SMTP id g26-v6so3143671edp.0 for ; Wed, 26 Sep 2018 13:52:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amarulasolutions.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=G4xMxAAf1wbpRXgI9Baw6y3Dc74TJPkMUxhPqwlof0U=; b=eL7MLyrLy2j6v88NRa2tlucT4ZJpQjev6yQ2zYZdybQY1yAW8Xsc9PrSCR/9HfJonu kLSc38ceuqpwO3TgQDTGVVgKELezy/2mf1vLjXj31tZ865+jZPCFByzXGTwsdViJeLRR FkiXNrkfNU0lo8HVv+9IrmCGpl+QAmKE01Hlg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=G4xMxAAf1wbpRXgI9Baw6y3Dc74TJPkMUxhPqwlof0U=; b=kS9L+WG+vCp1T3eK5oim5pUu43Fh26lJmh4gw0fdixNP3Ah/XO+d4roQO6eElDjbBP SWNQIMa32jT/b0ILdVw1/foUPULNOAoek9/NkBlscBoULEhLCUW/ThPlb7dPArnxuxlu 3JCFAhPNzJM1nLic4my3VRPj5KnPsqEn7DVxaytWt+9A0b2nebS5Kn1ndl0GRSGD1IN6 Eb+aAonV4KmxnTNG03TGbcnKd8MPS9RTw7dLoEzf9Hz0nUxaizsPN1XP1e8L3veUdRjH WcE1S9CdYvFgXLVGvXRP3gPo/sk1zpDs5xzzMQBzuo8+z+E9wpTFW8OtlpzLDJbm09Za +HcA== X-Gm-Message-State: ABuFfohIdN2QjruCuA98BXme1liaupig9HDDJK+1iOQGW0vt5gcKufZ/ gfocDjmi52ZFLDTaj2rGbSK+FQ== X-Received: by 2002:a50:b5e6:: with SMTP id a93-v6mr12128050ede.94.1537995136439; Wed, 26 Sep 2018 13:52:16 -0700 (PDT) Received: from andrea (15.152.230.94.awnet.cz. [94.230.152.15]) by smtp.gmail.com with ESMTPSA id s12-v6sm178950edr.9.2018.09.26.13.52.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Sep 2018 13:52:15 -0700 (PDT) Date: Wed, 26 Sep 2018 22:52:08 +0200 From: Andrea Parri To: Peter Zijlstra Cc: will.deacon@arm.com, mingo@kernel.org, linux-kernel@vger.kernel.org, longman@redhat.com, tglx@linutronix.de Subject: Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86 Message-ID: <20180926205208.GA4864@andrea> References: <20180926110117.405325143@infradead.org> <20180926111307.513429499@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180926111307.513429499@infradead.org> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to test-and-set the pending bit > (when it is a byte) and then a load to fetch the whole word. Using > two instructions of course opens a window we previously did not have. > In particular the ordering between pending and tail is of interrest, > because that is where the split happens. > > The claim is that if we order them, it all works out just fine. There > are two specific cases where the pending,tail state changes: > > - when the 3rd lock(er) comes in and finds pending set, it'll queue > and set tail; since we set tail while pending is set, the ordering > is split is not important (and not fundamentally different form > fetch_or). [*] > > - when the last queued lock holder acquires the lock (uncontended), > we clear the tail and set the lock byte. By first setting the > pending bit this cmpxchg will fail and the later load must then > see the remaining tail. > > Another interesting scenario is where there are only 2 threads: > > lock := (0,0,0) > > CPU 0 CPU 1 > > lock() lock() > trylock(-> 0,0,1) trylock() /* fail */ > return; xchg_relaxed(pending, 1) (-> 0,1,1) > mb() > val = smp_load_acquire(*lock); > > Where, without the mb() the load would've been allowed to return 0 for > the locked byte. If this were true, we would have a violation of "coherence": C peterz {} P0(atomic_t *val) { int r0; /* in queued_spin_trylock() */ r0 = atomic_cmpxchg_acquire(val, 0, 1); } P1(atomic_t *val) { int r0; int r1; /* in queued_spin_trylock() */ r0 = atomic_cmpxchg_acquire(val, 0, 1); /* or atomic_read(val) */ /* in queued_spin_lock_slowpath)() -- set_pending_fetch_acquire() */ r1 = atomic_read_acquire(val); } exists (0:r0=0 /\ ~1:r0=0 /\ 1:r1=0) Andrea > > [*] there is a fun scenario where the 3rd locker observes the pending > bit, but before it sets the tail, the 1st lock (owner) unlocks and the > 2nd locker acquires the lock, leaving us with a temporary 0,0,1 state. > But this is not different between fetch_or or this new method. > > Suggested-by: Will Deacon > Signed-off-by: Peter Zijlstra (Intel) > --- > arch/x86/include/asm/qspinlock.h | 2 + > kernel/locking/qspinlock.c | 56 ++++++++++++++++++++++++++++++++++++++- > 2 files changed, 57 insertions(+), 1 deletion(-) > > --- a/arch/x86/include/asm/qspinlock.h > +++ b/arch/x86/include/asm/qspinlock.h > @@ -9,6 +9,8 @@ > > #define _Q_PENDING_LOOPS (1 << 9) > > +#define _Q_NO_FETCH_OR > + > #ifdef CONFIG_PARAVIRT_SPINLOCKS > extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); > extern void __pv_init_lock_hash(void); > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -232,6 +232,60 @@ static __always_inline u32 xchg_tail(str > #endif /* _Q_PENDING_BITS == 8 */ > > /** > + * set_pending_fetch_acquire - fetch the whole lock value and set pending and locked > + * @lock : Pointer to queued spinlock structure > + * Return: The previous lock value > + * > + * *,*,* -> *,1,* > + */ > +static __always_inline u32 set_pending_fetch_acquire(struct qspinlock *lock) > +{ > +#if defined(_Q_NO_FETCH_OR) && _Q_PENDING_BITS == 8 > + u32 val; > + > + /* > + * For the !LL/SC archs that do not have a native atomic_fetch_or > + * but do have a native xchg (x86) we can do trickery and avoid the > + * cmpxchg-loop based fetch-or to improve determinism. > + * > + * We first xchg the pending byte to set PENDING, and then issue a load > + * for the rest of the word and mask out the pending byte to compute a > + * 'full' value. > + */ > + val = xchg_relaxed(&lock->pending, 1) << _Q_PENDING_OFFSET; > + /* > + * Ensures the tail load happens after the xchg(). > + * > + * lock unlock (a) > + * xchg ---------------. > + * (b) lock unlock +----- fetch_or > + * load ---------------' > + * lock unlock (c) > + * > + * For both lock and unlock, (a) and (c) are the same as fetch_or(), > + * since either are fully before or after. The only new case is (b), > + * where a concurrent lock or unlock can interleave with the composite > + * operation. > + * > + * In either case, it is the queueing case that is of interrest, otherwise > + * the whole operation is covered by the xchg() and the tail will be 0. > + * > + * For lock-(b); we only care if we set the PENDING bit or not. If we lost > + * the PENDING race, we queue. Otherwise we'd observe the tail anyway. > + * > + * For unlock-(b); since we'll have set PENDING, the uncontended claim > + * will fail and we'll observe a !0 tail. > + */ > + smp_mb__after_atomic(); > + val |= atomic_read_acquire(&lock->val) & ~_Q_PENDING_MASK; > + > + return val; > +#else > + return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); > +#endif > +} > + > +/** > * set_locked - Set the lock bit and own the lock > * @lock: Pointer to queued spinlock structure > * > @@ -328,7 +382,7 @@ void queued_spin_lock_slowpath(struct qs > * > * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock > */ > - val = atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val); > + val = set_pending_fetch_acquire(lock); > > /* > * If we observe contention, there was a concurrent lock. > >