Date: Mon, 11 Apr 2016 21:58:27 -0700
From: Davidlohr Bueso <dave@stgolabs.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, will.deacon@arm.com, waiman.long@hpe.com,
        mingo@redhat.com, paulmck@linux.vnet.ibm.com, boqun.feng@gmail.com,
        torvalds@linux-foundation.org
Subject: Re: [RFC][PATCH 2/3] locking/qrwlock: Use smp_cond_load_acquire()
Message-ID: <20160412045827.GA18437@linux-uzut.site>
References: <20160404122250.340636238@infradead.org>
 <20160404123633.423468902@infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
In-Reply-To: <20160404123633.423468902@infradead.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1927
Lines: 60

On Mon, 04 Apr 2016, Peter Zijlstra wrote:

>Use smp_cond_load_acquire() to make better use of the hardware
>assisted 'spin' wait on arm64.
>
>Arguably the second hunk is the more horrid abuse possible, but
>avoids having to use cmpwait (see next patch) directly. Also, this
>makes 'clever' (ab)use of the cond+rmb acquire to omit the acquire
>from cmpxchg().
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> kernel/locking/qrwlock.c |   18 ++++--------------
> 1 file changed, 4 insertions(+), 14 deletions(-)
>
>--- a/kernel/locking/qrwlock.c
>+++ b/kernel/locking/qrwlock.c
>@@ -53,10 +53,7 @@ struct __qrwlock {
> static __always_inline void
> rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
> {
>-	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
>-		cpu_relax_lowlatency();
>-		cnts = atomic_read_acquire(&lock->cnts);
>-	}
>+	smp_cond_load_acquire(&lock->cnts.counter, (VAL & _QW_WMASK) != _QW_LOCKED);
> }
>
> /**
>@@ -109,8 +106,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath)
>  */
> void queued_write_lock_slowpath(struct qrwlock *lock)
> {
>-	u32 cnts;
>-
> 	/* Put the writer into the wait queue */
> 	arch_spin_lock(&lock->wait_lock);
>
>@@ -134,15 +129,10 @@ void queued_write_lock_slowpath(struct q
> 	}
>
> 	/* When no more readers, set the locked flag */
>-	for (;;) {
>-		cnts = atomic_read(&lock->cnts);
>-		if ((cnts == _QW_WAITING) &&
>-		    (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
>-					    _QW_LOCKED) == _QW_WAITING))
>-			break;
>+	smp_cond_load_acquire(&lock->cnts.counter,
>+		(VAL == _QW_WAITING) &&
>+		atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING, _QW_LOCKED) == _QW_WAITING);
>
>-		cpu_relax_lowlatency();

You would need some variant for cpu_relax_lowlatency otherwise you'll be hurting s390, no?
fwiw back when I was looking at this, I recall thinking about possibly introducing
smp_cond_acquire_lowlatency but never got around to it.

Thanks,
Davidlohr