Received: by 10.192.165.148 with SMTP id m20csp1014492imm; Sat, 5 May 2018 03:02:33 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqB9e/o3jWk8bsOLzMfJV76NNfePQD8KETi/bEdIyOb1tPDJmvRTf6LQVIQAHX2MnLTon/I X-Received: by 2002:a65:6151:: with SMTP id o17-v6mr25161672pgv.120.1525514553435; Sat, 05 May 2018 03:02:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525514553; cv=none; d=google.com; s=arc-20160816; b=Yk6x1DPAIy/kTi+avBpZMWWjymDS7ZfZjC6GZFEJv3zuf0mAS+A4wOsFj2elytBD41 wz97plvrRnKrZCQXirChHFezvZRWtjGpPApm/ehHi3AiN4d8VzSGWQ5c0lOQn1RdhuiR UuTycWdxDNdEDOUyb2yP/kpMVc69cAJ/k14r09JBBOtM2ZR2OCp+0mvG4yyQfRbgZaqE pSqc1qEjty8C4oMkTn3Vvlj0imAG6hkCFwmO+rIVzQG8Vs+EMSZWzRsBvJmVWdG7F5v0 tSHnkT7uWtDLqzHXyHKYc/m0+e8BPltBq8abnALfx1QQ/WuKEUHVgMzbLVJSBcbtepSg 7ihA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=a1tmzfTfB7Uzo4CM64mhrtXfokj8RIDzUXa2eFp0im4=; b=TPsblG6p+frv4OzKTQgbeF4jok+LK/N998GjbnX/s8M6Qn+KnW1UikMn9ZLJo1RYPO xgjf/Kt2tbY3Fk9idrfwnkz2Gc9pO4FloCzERIAvpR3OdcceQLpPzkm8HgNT7I64bFXY 2KVs+bi6jSph4c1Vszis9joeW2FrJVl3ckZehNABGYkd3N++7lgSYe26Kk2pg83L2ssk 0NaI1WOn548r76w8Qva0i+HXkem/EwXzUJAaEcV1hWqN+r80OdpZGPiGHkvqetVV3KWt rkn4zoZEaNdyjPUGNpVoE+HwgOxxlc9dodYEw18woch9ush4yH7vcK/NFUqoEUdAEO4z E0AQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=TDON3l/7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b39-v6si18119573plb.456.2018.05.05.03.02.18; Sat, 05 May 2018 03:02:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=TDON3l/7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751193AbeEEKBB (ORCPT + 99 others); Sat, 5 May 2018 06:01:01 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:55446 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750821AbeEEKA7 (ORCPT ); Sat, 5 May 2018 06:00:59 -0400 Received: by mail-wm0-f68.google.com with SMTP id a8so7301698wmg.5 for ; Sat, 05 May 2018 03:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=a1tmzfTfB7Uzo4CM64mhrtXfokj8RIDzUXa2eFp0im4=; b=TDON3l/7RMlBgpZLw4bvm0p4zTEZvItFiXD/qalLTRW9r6mA4rKl0tqisQobT+URbX emihvs8syWLfi+afgkeoJfjjQlEHYSSrSOKcHPRlknBy3IibDRpSrmEm9QHObKtNx5SP ocCfwHaCQAEGL+Oq/dzI8W1rr8Fn3Ij4Hj2Gj8+SB7S41Po8i55nhjlHYKrduVdczMhB l88vZZ+kyuJtYfJJ5tUmv2ro8SACEtNHfvnoXC40iO5lHq0ZYf6eo/5JLzkK24OzlHL7 ieX9vzqCsspsq+HjVnMhM94cMrrElEXg3x7DQE8HcHW5sTkCDqqUHiWqv/gnu/3ZMVE6 tqfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=a1tmzfTfB7Uzo4CM64mhrtXfokj8RIDzUXa2eFp0im4=; b=l3umj2M/LUe8JeUxtlk5OnK4tje5nmIix8bdcaIopTpFDW2y1iFQN0rwF5UMwM/0bh iJ75ihOYqLsTTEnflUtjllLKQF4YloXUl723HYTW7Jm4M3p5/sj/DXPuhK2VGZEgcmp1 w0+S3mrjG0Uey8lcU6wTSLzEE6UBH4CiUDWoUDcrGNsHxDsVhJ+kGWvWC640gLsR1pyn 5qdWCuLWVFfhWwuRj0x4DG/v822JS34BJ7CF7neUHo1XqX/RjETKh3AwhnqWgRhOhlfk tylDxt1OnWkPGcKYur9TMZgQR8xjQdqzFzntfp3UT1MEJgZ20PJDEJEvjVQX31x6tj4N Q2yg== X-Gm-Message-State: ALQs6tAFSLDT0PUyuFZkRwgsKEfyedIxL3snBHs3I2FaD86sAPt0ROlu NrZy36kRyqhZxvIpd6tbf3A= X-Received: by 10.28.144.75 with SMTP id s72mr18818251wmd.85.1525514458061; Sat, 05 May 2018 03:00:58 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id o53-v6sm23008949wrc.96.2018.05.05.03.00.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 05 May 2018 03:00:57 -0700 (PDT) Date: Sat, 5 May 2018 12:00:55 +0200 From: Ingo Molnar To: Peter Zijlstra , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman Cc: Mark Rutland , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, aryabinin@virtuozzo.com, boqun.feng@gmail.com, catalin.marinas@arm.com, dvyukov@google.com, will.deacon@arm.com Subject: [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC Message-ID: <20180505100055.yc4upauxo5etq5ud@gmail.com> References: <20180504173937.25300-1-mark.rutland@arm.com> <20180504173937.25300-2-mark.rutland@arm.com> <20180504180105.GS12217@hirez.programming.kicks-ass.net> <20180504180909.dnhfflibjwywnm4l@lakrids.cambridge.arm.com> <20180505081100.nsyrqrpzq2vd27bk@gmail.com> <20180505084721.GA32344@noisy.programming.kicks-ass.net> <20180505090403.p2ywuen42rnlwizq@gmail.com> <20180505093829.xfylnedwd5nonhae@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180505093829.xfylnedwd5nonhae@gmail.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > > So there's no loss in arch flexibility. > > BTW., PowerPC for example is already in such a situation, it does not define > atomic_cmpxchg_release(), only the other APIs: > > #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) > #define atomic_cmpxchg_relaxed(v, o, n) \ > cmpxchg_relaxed(&((v)->counter), (o), (n)) > #define atomic_cmpxchg_acquire(v, o, n) \ > cmpxchg_acquire(&((v)->counter), (o), (n)) > > Was it really the intention on the PowerPC side that the generic code falls back > to cmpxchg(), i.e.: > > # define atomic_cmpxchg_release(...) __atomic_op_release(atomic_cmpxchg, __VA_ARGS__) > > Which after macro expansion becomes: > > smp_mb__before_atomic(); > atomic_cmpxchg_relaxed(v, o, n); > > smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which > falls back to mb(), which on PowerPC is the 'sync' instruction. > > Isn't this a inefficiency bug? > > While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to > have the following basic structure: > > full cmpxchg(): > > PPC_ATOMIC_ENTRY_BARRIER # sync > ldarx + stdcx > PPC_ATOMIC_EXIT_BARRIER # sync > > cmpxchg_relaxed(): > > ldarx + stdcx > > cmpxchg_acquire(): > > ldarx + stdcx > PPC_ACQUIRE_BARRIER # lwsync > > The logical extension for cmpxchg_release() would be: > > cmpxchg_release(): > > PPC_RELEASE_BARRIER # lwsync > ldarx + stdcx > > But instead we silently get the generic fallback, which does: > > smp_mb__before_atomic(); > atomic_cmpxchg_relaxed(v, o, n); > > Which maps to: > > sync > ldarx + stdcx > > Note that it uses a full barrier instead of lwsync (does that stand for > 'lightweight sync'?). > > Even if it turns out we need the full barrier, with the overly finegrained > structure of the atomics this detail is totally undocumented and non-obvious. The patch below fills in those bits and implements the optimized cmpxchg_release() family of APIs. The end effect should be that cmpxchg_release() will now use 'lwsync' instead of 'sync' on PowerPC, for the following APIs: cmpxchg_release() cmpxchg64_release() atomic_cmpxchg_release() atomic64_cmpxchg_release() I based this choice of the release barrier on an existing bitops low level PowerPC method: DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER) This clearly suggests that PPC_RELEASE_BARRIER is in active use and 'lwsync' is the 'release barrier' instruction, if I interpreted that right. But I know very little about PowerPC so this might be spectacularly wrong. It's totally untested as well. I also pretty sick today so my mental capabilities are significantly reduced ... So not signed off and such. Thanks, Ingo --- arch/powerpc/include/asm/atomic.h | 4 ++ arch/powerpc/include/asm/cmpxchg.h | 81 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+) diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index 682b3e6a1e21..f7a6f29acb12 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -213,6 +213,8 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v) cmpxchg_relaxed(&((v)->counter), (o), (n)) #define atomic_cmpxchg_acquire(v, o, n) \ cmpxchg_acquire(&((v)->counter), (o), (n)) +#define atomic_cmpxchg_release(v, o, n) \ + cmpxchg_release(&((v)->counter), (o), (n)) #define atomic_xchg(v, new) (xchg(&((v)->counter), new)) #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) @@ -519,6 +521,8 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v) cmpxchg_relaxed(&((v)->counter), (o), (n)) #define atomic64_cmpxchg_acquire(v, o, n) \ cmpxchg_acquire(&((v)->counter), (o), (n)) +#define atomic64_cmpxchg_release(v, o, n) \ + cmpxchg_release(&((v)->counter), (o), (n)) #define atomic64_xchg(v, new) (xchg(&((v)->counter), new)) #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new)) diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h index 9b001f1f6b32..6e46310b1833 100644 --- a/arch/powerpc/include/asm/cmpxchg.h +++ b/arch/powerpc/include/asm/cmpxchg.h @@ -213,10 +213,12 @@ __xchg_relaxed(void *ptr, unsigned long x, unsigned int size) CMPXCHG_GEN(u8, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory"); CMPXCHG_GEN(u8, _local, , , "memory"); CMPXCHG_GEN(u8, _acquire, , PPC_ACQUIRE_BARRIER, "memory"); +CMPXCHG_GEN(u8, _release, PPC_RELEASE_BARRIER, , "memory"); CMPXCHG_GEN(u8, _relaxed, , , "cc"); CMPXCHG_GEN(u16, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory"); CMPXCHG_GEN(u16, _local, , , "memory"); CMPXCHG_GEN(u16, _acquire, , PPC_ACQUIRE_BARRIER, "memory"); +CMPXCHG_GEN(u16, _release, PPC_RELEASE_BARRIER, , "memory"); CMPXCHG_GEN(u16, _relaxed, , , "cc"); static __always_inline unsigned long @@ -314,6 +316,29 @@ __cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new) return prev; } +static __always_inline unsigned long +__cmpxchg_u32_release(u32 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( + PPC_RELEASE_BARRIER +"1: lwarx %0,0,%2 # __cmpxchg_u32_release\n" +" cmpw 0,%0,%3\n" +" bne- 2f\n" + PPC405_ERR77(0, %2) +" stwcx. %4,0,%2\n" +" bne- 1b\n" + "\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc", "memory"); + + return prev; +} + + #ifdef CONFIG_PPC64 static __always_inline unsigned long __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new) @@ -397,6 +422,27 @@ __cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new) return prev; } + +static __always_inline unsigned long +__cmpxchg_u64_release(u64 *p, unsigned long old, unsigned long new) +{ + unsigned long prev; + + __asm__ __volatile__ ( + PPC_RELEASE_BARRIER +"1: ldarx %0,0,%2 # __cmpxchg_u64_release\n" +" cmpd 0,%0,%3\n" +" bne- 2f\n" +" stdcx. %4,0,%2\n" +" bne- 1b\n" + "\n" +"2:" + : "=&r" (prev), "+m" (*p) + : "r" (p), "r" (old), "r" (new) + : "cc", "memory"); + + return prev; +} #endif static __always_inline unsigned long @@ -478,6 +524,27 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_acquire"); return old; } + +static __always_inline unsigned long +__cmpxchg_release(void *ptr, unsigned long old, unsigned long new, + unsigned int size) +{ + switch (size) { + case 1: + return __cmpxchg_u8_release(ptr, old, new); + case 2: + return __cmpxchg_u16_release(ptr, old, new); + case 4: + return __cmpxchg_u32_release(ptr, old, new); +#ifdef CONFIG_PPC64 + case 8: + return __cmpxchg_u64_release(ptr, old, new); +#endif + } + BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_release"); + return old; +} + #define cmpxchg(ptr, o, n) \ ({ \ __typeof__(*(ptr)) _o_ = (o); \ @@ -512,6 +579,15 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, (unsigned long)_o_, (unsigned long)_n_, \ sizeof(*(ptr))); \ }) + +#define cmpxchg_release(ptr, o, n) \ +({ \ + __typeof__(*(ptr)) _o_ = (o); \ + __typeof__(*(ptr)) _n_ = (n); \ + (__typeof__(*(ptr))) __cmpxchg_release((ptr), \ + (unsigned long)_o_, (unsigned long)_n_, \ + sizeof(*(ptr))); \ +}) #ifdef CONFIG_PPC64 #define cmpxchg64(ptr, o, n) \ ({ \ @@ -533,6 +609,11 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new, BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ cmpxchg_acquire((ptr), (o), (n)); \ }) +#define cmpxchg64_release(ptr, o, n) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + cmpxchg_release((ptr), (o), (n)); \ +}) #else #include #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))