Received: by 10.192.165.148 with SMTP id m20csp1078595imm; Sat, 5 May 2018 04:27:33 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrSeNR4tXw47os9HzF/uqswRdiyYpnmoNRGOoUVBrQTyyzC33+8Lu49dlwdMMzfpq5OK2MW X-Received: by 10.98.172.20 with SMTP id v20mr30730370pfe.101.1525519653908; Sat, 05 May 2018 04:27:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525519653; cv=none; d=google.com; s=arc-20160816; b=OfhbhX90H/PVb07UhQW/j5NoqlCaOvdyVhATj6gKumjD6ntldM2+LQD7VdzH02Q2sL Tnx7ZjuTMCrvotHbsHgQcacadMxqiDk4PVBl3zV9RhGA3ynlkFR+Edc3LVtB+OJ55JFn 81GDHlzla9a6Cr1+oWvFo4xmb+XkCduRmngms+lexIXhJV4ZkAIYqQ2XdtjBVF48FRBQ 2YOXrd2/HKYvTIGd+xqB3dbTfQmJbkCypAGcHw4UP8g8MP4nL0qxWmpe6He5gLGXyqHp nn0hQR3ov4tkpBo2wqQshKk3i7iXb+GTpN1fFiZSvkcDxlqFTZ5948g0t1WlHOdvi7P6 VT3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=KXR3lsUGH8OPDpFpNi7266yxNOZIMOUym6fc9w74WvY=; b=Dc2ET0sqQnei3Tzuym4dK7+T8NJzNj4tyV/ReJ2ZVud/LHvMpswe7tOcrrqp1xEOL7 nnA7W7koBqsrNy5szQFZqO8AFX1ozX6jRMBAXt1rJWCMUXTuFEqTxeYP/Cjd5e560YSK 5R9ov3R6SQuLFy+eFfiKXNE78T0qDSGqFYzIwxLJsKtXKWyxbixKMYurkDYATVrqgMy4 MiIuiy01HWINteXIZxlTjD0ogFeUuSDHt+dnThsQura8P7oEAgNHzYLT69vP7wD/1Y7r gFDXHHeNOp/4bxrbfeAKow7uhp7RFQdR4QTURftQnnSQfldmFCcb+TQZkYAS/LH2xszJ P3lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=uKq+BEU5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n190-v6si14961583pga.524.2018.05.05.04.26.45; Sat, 05 May 2018 04:27:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=uKq+BEU5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751206AbeEELX4 (ORCPT + 99 others); Sat, 5 May 2018 07:23:56 -0400 Received: from mail-qt0-f180.google.com ([209.85.216.180]:36045 "EHLO mail-qt0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750821AbeEELXy (ORCPT ); Sat, 5 May 2018 07:23:54 -0400 Received: by mail-qt0-f180.google.com with SMTP id q6-v6so30671492qtn.3 for ; Sat, 05 May 2018 04:23:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=KXR3lsUGH8OPDpFpNi7266yxNOZIMOUym6fc9w74WvY=; b=uKq+BEU558Zy8ddVN2smtIge9Bzi3cSzPGbPyEgh+HOLQf24X2PryjEA2s+KzWaAsm rHf/VXXep6yiROmegD9IwfycuQ7VH2Es/zlrOFLO6PskevLhXaXSwomDNmvBMgRM82Rw n098UKi2IINWvws3s+L91qLfcA5f3L28QdrbB4VnUof5CvLSiJXi/n3G8e1q/CdXPlA4 VRV2Apxx2DRFKC++g+SHZowkhMOU0zEI5Ps4B1sjcuR2lemb55unNiqYLCY5blfM5YWm ln6SsVAJFOFb24BHoe5ZG1e439PEO6DwOI++ogdlhWgiSS7ynvncxl0H4iTTVHbC1gan /5Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KXR3lsUGH8OPDpFpNi7266yxNOZIMOUym6fc9w74WvY=; b=UYFdsfVUwSGYj4CS8l0y2IeeVM0ErIx/9MgeGWI/HI/n2KXTFEC6tKE5UTAA6g2mm2 U5W62WDMsryAXBW2se5hplF4j9ime3/A5n4GwkP9Q8F4m1A5q00XGoimUahS+oiS1NxQ TkBAQ3CTUgLqgrKCETSomz1J3lPtEIrA5lxjBhYMBjZ7WQ8zu7AyVeyKMp+Hz0E128NJ A3ztEesHKf8abRSFshOXjnINjWqmlH+zkZGTJHXm5NKE83lqZq7+mP25gngWC0J8q5+t 9eBmtlu1/gZSKCE+WHxZqNPJZdshEFiKcP7u8ndJVReMPDkxse8pW3e/IugalQ4NtgL6 WEmw== X-Gm-Message-State: ALQs6tCEIY56kD1z+r4rk8RwGfN13o9j0Rjoofh47tn9hNYF32BYPd87 T2Jes/pQ4cpeaU2xpxY8c10= X-Received: by 2002:a0c:b7af:: with SMTP id l47-v6mr11090609qve.110.1525519433550; Sat, 05 May 2018 04:23:53 -0700 (PDT) Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com. [66.111.4.228]) by smtp.gmail.com with ESMTPSA id d52-v6sm19119325qta.25.2018.05.05.04.23.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 05 May 2018 04:23:52 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 65E6921B65; Sat, 5 May 2018 07:23:51 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Sat, 05 May 2018 07:23:51 -0400 X-ME-Sender: Received: from localhost (unknown [45.32.128.109]) by mail.messagingengine.com (Postfix) with ESMTPA id 497B110263; Sat, 5 May 2018 07:23:50 -0400 (EDT) Date: Sat, 5 May 2018 19:28:17 +0800 From: Boqun Feng To: Ingo Molnar Cc: Peter Zijlstra , Mark Rutland , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, aryabinin@virtuozzo.com, catalin.marinas@arm.com, dvyukov@google.com, will.deacon@arm.com Subject: Re: [RFC PATCH] locking/atomics/powerpc: Clarify why the cmpxchg_relaxed() family of APIs falls back to full cmpxchg() Message-ID: <20180505112817.ihrb726i37bwm4cj@tardis> References: <20180504173937.25300-1-mark.rutland@arm.com> <20180504173937.25300-2-mark.rutland@arm.com> <20180504180105.GS12217@hirez.programming.kicks-ass.net> <20180504180909.dnhfflibjwywnm4l@lakrids.cambridge.arm.com> <20180505081100.nsyrqrpzq2vd27bk@gmail.com> <20180505084721.GA32344@noisy.programming.kicks-ass.net> <20180505090403.p2ywuen42rnlwizq@gmail.com> <20180505093829.xfylnedwd5nonhae@gmail.com> <20180505101609.5wb56j4mspjkokmw@tardis> <20180505103550.s7xsnto7tgppkmle@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2iswuugrxkyo4hcv" Content-Disposition: inline In-Reply-To: <20180505103550.s7xsnto7tgppkmle@gmail.com> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --2iswuugrxkyo4hcv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 05, 2018 at 12:35:50PM +0200, Ingo Molnar wrote: >=20 > * Boqun Feng wrote: >=20 > > On Sat, May 05, 2018 at 11:38:29AM +0200, Ingo Molnar wrote: > > >=20 > > > * Ingo Molnar wrote: > > >=20 > > > > * Peter Zijlstra wrote: > > > >=20 > > > > > > So we could do the following simplification on top of that: > > > > > >=20 > > > > > > #ifndef atomic_fetch_dec_relaxed > > > > > > # ifndef atomic_fetch_dec > > > > > > # define atomic_fetch_dec(v) atomic_fetch_sub(1, (v)) > > > > > > # define atomic_fetch_dec_relaxed(v) atomic_fetch_sub_relaxed= (1, (v)) > > > > > > # define atomic_fetch_dec_acquire(v) atomic_fetch_sub_acquire= (1, (v)) > > > > > > # define atomic_fetch_dec_release(v) atomic_fetch_sub_release= (1, (v)) > > > > > > # else > > > > > > # define atomic_fetch_dec_relaxed atomic_fetch_dec > > > > > > # define atomic_fetch_dec_acquire atomic_fetch_dec > > > > > > # define atomic_fetch_dec_release atomic_fetch_dec > > > > > > # endif > > > > > > #else > > > > > > # ifndef atomic_fetch_dec > > > > > > # define atomic_fetch_dec(...) __atomic_op_fence(atomic_fetc= h_dec, __VA_ARGS__) > > > > > > # define atomic_fetch_dec_acquire(...) __atomic_op_acquire(at= omic_fetch_dec, __VA_ARGS__) > > > > > > # define atomic_fetch_dec_release(...) __atomic_op_release(at= omic_fetch_dec, __VA_ARGS__) > > > > > > # endif > > > > > > #endif > > > > >=20 > > > > > This would disallow an architecture to override just fetch_dec_re= lease for > > > > > instance. > > > >=20 > > > > Couldn't such a crazy arch just define _all_ the 3 APIs in this gro= up? > > > > That's really a small price and makes the place pay the complexity > > > > price that does the weirdness... > > > >=20 > > > > > I don't think there currently is any architecture that does that,= but the > > > > > intent was to allow it to override anything and only provide defa= ults where it > > > > > does not. > > > >=20 > > > > I'd argue that if a new arch only defines one of these APIs that's = probably a bug.=20 > > > > If they absolutely want to do it, they still can - by defining all = 3 APIs. > > > >=20 > > > > So there's no loss in arch flexibility. > > >=20 > > > BTW., PowerPC for example is already in such a situation, it does not= define=20 > > > atomic_cmpxchg_release(), only the other APIs: > > >=20 > > > #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n))) > > > #define atomic_cmpxchg_relaxed(v, o, n) \ > > > cmpxchg_relaxed(&((v)->counter), (o), (n)) > > > #define atomic_cmpxchg_acquire(v, o, n) \ > > > cmpxchg_acquire(&((v)->counter), (o), (n)) > > >=20 > > > Was it really the intention on the PowerPC side that the generic code= falls back=20 > > > to cmpxchg(), i.e.: > > >=20 > > > # define atomic_cmpxchg_release(...) __atomic_op_release(a= tomic_cmpxchg, __VA_ARGS__) > > >=20 > >=20 > > So ppc has its own definition __atomic_op_release() in > > arch/powerpc/include/asm/atomic.h: > >=20 > > #define __atomic_op_release(op, args...) \ > > ({ \ > > __asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory"); \ > > op##_relaxed(args); \ > > }) > >=20 > > , and PPC_RELEASE_BARRIER is lwsync, so we map to > >=20 > > lwsync(); > > atomic_cmpxchg_relaxed(v, o, n); > >=20 > > And the reason, why we don't define atomic_cmpxchg_release() but define > > atomic_cmpxchg_acquire() is that, atomic_cmpxchg_*() could provide no > > ordering guarantee if the cmp fails, we did this for > > atomic_cmpxchg_acquire() but not for atomic_cmpxchg_release(), because > > doing so may introduce a memory barrier inside a ll/sc critical section, > > please see the comment before __cmpxchg_u32_acquire() in > > arch/powerpc/include/asm/cmpxchg.h: > >=20 > > /* > > * cmpxchg family don't have order guarantee if cmp part fails, theref= ore we > > * can avoid superfluous barriers if we use assembly code to implement > > * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for > > * cmpxchg_release() because that will result in putting a barrier in = the > > * middle of a ll/sc loop, which is probably a bad idea. For example, = this > > * might cause the conditional store more likely to fail. > > */ >=20 > Makes sense, thanks a lot for the explanation, missed that comment in the= middle=20 > of the assembly functions! >=20 ;-) I could move it so somewhere else in the future. > So the patch I sent is buggy, please disregard it. >=20 > May I suggest the patch below? No change in functionality, but it documen= ts the=20 > lack of the cmpxchg_release() APIs and maps them explicitly to the full c= mpxchg()=20 > version. (Which the generic code does now in a rather roundabout way.) >=20 Hmm.. cmpxchg_release() is actually lwsync() + cmpxchg_relaxed(), but you just make it sync() + cmpxchg_relaxed() + sync() with the fallback, and sync() is much heavier, so I don't think the fallback is correct. I think maybe you can move powerpc's __atomic_op_{acqurie,release}() =66rom atomic.h to cmpxchg.h (in arch/powerpc/include/asm), and #define cmpxchg_release __atomic_op_release(cmpxchg, __VA_ARGS__); #define cmpxchg64_release __atomic_op_release(cmpxchg64, __VA_ARGS__); I put a diff below to say what I mean (untested). > Also, the change to arch/powerpc/include/asm/atomic.h has no functional e= ffect=20 > right now either, but should anyone add a _relaxed() variant in the futur= e, with=20 > this change atomic_cmpxchg_release() and atomic64_cmpxchg_release() will = pick that=20 > up automatically. >=20 You mean with your other modification in include/linux/atomic.h, right? Because with the unmodified include/linux/atomic.h, we already pick that autmatically. If so, I think that's fine. Here is the diff for the modification for cmpxchg_release(), the idea is we generate them in asm/cmpxchg.h other than linux/atomic.h for ppc, so we keep the new linux/atomic.h working. Because if I understand correctly, the next linux/atomic.h only accepts that 1) architecture only defines fully ordered primitives or 2) architecture only defines _relaxed primitives or 3) architecture defines all four (fully, _relaxed, _acquire, _release) primitives So powerpc needs to define all four primitives in its only asm/cmpxchg.h. Regards, Boqun diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/a= tomic.h index 682b3e6a1e21..0136be11c84f 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -13,24 +13,6 @@ =20 #define ATOMIC_INIT(i) { (i) } =20 -/* - * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with - * a "bne-" instruction at the end, so an isync is enough as a acquire bar= rier - * on the platform without lwsync. - */ -#define __atomic_op_acquire(op, args...) \ -({ \ - typeof(op##_relaxed(args)) __ret =3D op##_relaxed(args); \ - __asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory"); \ - __ret; \ -}) - -#define __atomic_op_release(op, args...) \ -({ \ - __asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory"); \ - op##_relaxed(args); \ -}) - static __inline__ int atomic_read(const atomic_t *v) { int t; diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/= cmpxchg.h index 9b001f1f6b32..9e20a942aff9 100644 --- a/arch/powerpc/include/asm/cmpxchg.h +++ b/arch/powerpc/include/asm/cmpxchg.h @@ -8,6 +8,24 @@ #include #include =20 +/* + * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with + * a "bne-" instruction at the end, so an isync is enough as a acquire bar= rier + * on the platform without lwsync. + */ +#define __atomic_op_acquire(op, args...) \ +({ \ + typeof(op##_relaxed(args)) __ret =3D op##_relaxed(args); \ + __asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory"); \ + __ret; \ +}) + +#define __atomic_op_release(op, args...) \ +({ \ + __asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory"); \ + op##_relaxed(args); \ +}) + #ifdef __BIG_ENDIAN #define BITOFF_CAL(size, off) ((sizeof(u32) - size - off) * BITS_PER_BYTE) #else @@ -512,6 +530,8 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigne= d long new, (unsigned long)_o_, (unsigned long)_n_, \ sizeof(*(ptr))); \ }) + +#define cmpxchg_release(ptr, o, n) __atomic_op_release(cmpxchg, __VA_ARGS_= _) #ifdef CONFIG_PPC64 #define cmpxchg64(ptr, o, n) \ ({ \ @@ -533,6 +553,7 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigne= d long new, BUILD_BUG_ON(sizeof(*(ptr)) !=3D 8); \ cmpxchg_acquire((ptr), (o), (n)); \ }) +#define cmpxchg64_release(ptr, o, n) __atomic_op_release(cmpxchg64, __VA_A= RGS__) #else #include #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (= n)) --2iswuugrxkyo4hcv Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAlrtlU4ACgkQSXnow7UH +rj20Qf9EdsDMz1pYhUevokUpcINCw5RZlOfJ1MG/rsfi1/I0d+6B1hhyUsJKM8V rzH0cYH2Z9lGvlGnG0JwBaocV11e6gtJif+t6IbW+KCH5BVKNWdz83QAgSdDyKw9 6IIn5qmt7HPxuW3ezZXZIcjZ8230dDauN0q+bhLjKbfqYvDo9iTbo+tifB9v4OrD bHA6Dp/y1IQn7lvlotqpyAVZC1YgQZkugGae+rmGbfuI+KSfV7BniW96wRYrwRYm AnutV4nrwGCh41+FaEQX+KOHXaPnealpxQN0uHld2Beymlnpa0y9umWgcdAJR4vt 9STFtDsIfKf9OSdnOls8Da9Tiejgiw== =l0yj -----END PGP SIGNATURE----- --2iswuugrxkyo4hcv--