Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753716AbcDLE6i (ORCPT ); Tue, 12 Apr 2016 00:58:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:55499 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751239AbcDLE6h (ORCPT ); Tue, 12 Apr 2016 00:58:37 -0400 Date: Mon, 11 Apr 2016 21:58:27 -0700 From: Davidlohr Bueso To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, will.deacon@arm.com, waiman.long@hpe.com, mingo@redhat.com, paulmck@linux.vnet.ibm.com, boqun.feng@gmail.com, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 2/3] locking/qrwlock: Use smp_cond_load_acquire() Message-ID: <20160412045827.GA18437@linux-uzut.site> References: <20160404122250.340636238@infradead.org> <20160404123633.423468902@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20160404123633.423468902@infradead.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1927 Lines: 60 On Mon, 04 Apr 2016, Peter Zijlstra wrote: >Use smp_cond_load_acquire() to make better use of the hardware >assisted 'spin' wait on arm64. > >Arguably the second hunk is the more horrid abuse possible, but >avoids having to use cmpwait (see next patch) directly. Also, this >makes 'clever' (ab)use of the cond+rmb acquire to omit the acquire >from cmpxchg(). > >Signed-off-by: Peter Zijlstra (Intel) >--- > kernel/locking/qrwlock.c | 18 ++++-------------- > 1 file changed, 4 insertions(+), 14 deletions(-) > >--- a/kernel/locking/qrwlock.c >+++ b/kernel/locking/qrwlock.c >@@ -53,10 +53,7 @@ struct __qrwlock { > static __always_inline void > rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts) > { >- while ((cnts & _QW_WMASK) == _QW_LOCKED) { >- cpu_relax_lowlatency(); >- cnts = atomic_read_acquire(&lock->cnts); >- } >+ smp_cond_load_acquire(&lock->cnts.counter, (VAL & _QW_WMASK) != _QW_LOCKED); > } > > /** >@@ -109,8 +106,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath) > */ > void queued_write_lock_slowpath(struct qrwlock *lock) > { >- u32 cnts; >- > /* Put the writer into the wait queue */ > arch_spin_lock(&lock->wait_lock); > >@@ -134,15 +129,10 @@ void queued_write_lock_slowpath(struct q > } > > /* When no more readers, set the locked flag */ >- for (;;) { >- cnts = atomic_read(&lock->cnts); >- if ((cnts == _QW_WAITING) && >- (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING, >- _QW_LOCKED) == _QW_WAITING)) >- break; >+ smp_cond_load_acquire(&lock->cnts.counter, >+ (VAL == _QW_WAITING) && >+ atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING, _QW_LOCKED) == _QW_WAITING); > >- cpu_relax_lowlatency(); You would need some variant for cpu_relax_lowlatency otherwise you'll be hurting s390, no? fwiw back when I was looking at this, I recall thinking about possibly introducing smp_cond_acquire_lowlatency but never got around to it. Thanks, Davidlohr