This adds an smp_acquire__after_ctrl_dep() barrier on successful
decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
variants and therefore gives stronger memory ordering guarantees than
prior versions of these functions.
Co-Developed-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
Documentation/core-api/refcount-vs-atomic.rst | 28 +++++++++++++++++++++++----
arch/x86/include/asm/refcount.h | 21 ++++++++++++++++----
lib/refcount.c | 16 ++++++++++-----
3 files changed, 52 insertions(+), 13 deletions(-)
diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
index 322851b..95d4b4e 100644
--- a/Documentation/core-api/refcount-vs-atomic.rst
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -54,6 +54,14 @@ must propagate to all other CPUs before the release operation
(A-cumulative property). This is implemented using
:c:func:`smp_store_release`.
+An ACQUIRE memory ordering guarantees that all post loads and
+stores (all po-later instructions) on the same CPU are
+completed after the acquire operation. It also guarantees that all
+po-later stores on the same CPU and all propagated stores from other CPUs
+must propagate to all other CPUs after the acquire operation
+(A-cumulative property). This is implemented using
+:c:func:`smp_acquire__after_ctrl_dep`.
+
A control dependency (on success) for refcounters guarantees that
if a reference for an object was successfully obtained (reference
counter increment or addition happened, function returned true),
@@ -119,24 +127,36 @@ Memory ordering guarantees changes:
result of obtaining pointer to the object!
-case 5) - decrement-based RMW ops that return a value
------------------------------------------------------
+case 5) - generic dec/sub decrement-based RMW ops that return a value
+---------------------------------------------------------------------
Function changes:
* :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
* :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> RELEASE ordering + ACQUIRE ordering and control dependency
+ on success.
+
+
+case 6) other decrement-based RMW ops that return a value
+---------------------------------------------------------
+
+Function changes:
+
* no atomic counterpart --> :c:func:`refcount_dec_if_one`
* ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
Memory ordering guarantees changes:
- * fully ordered --> RELEASE ordering + control dependency
+ * fully ordered --> RELEASE ordering + control dependency
.. note:: :c:func:`atomic_add_unless` only provides full order on success.
-case 6) - lock-based RMW
+case 7) - lock-based RMW
------------------------
Function changes:
diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index dbaed55..ab8f584 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
static __always_inline __must_check
bool refcount_sub_and_test(unsigned int i, refcount_t *r)
{
- return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+ bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
REFCOUNT_CHECK_LT_ZERO,
r->refs.counter, e, "er", i, "cx");
+
+ if (ret) {
+ smp_acquire__after_ctrl_dep();
+ return true;
+ }
+
+ return false;
}
static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
{
- return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
- REFCOUNT_CHECK_LT_ZERO,
- r->refs.counter, e, "cx");
+ bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+ REFCOUNT_CHECK_LT_ZERO,
+ r->refs.counter, e, "cx");
+ if (ret) {
+ smp_acquire__after_ctrl_dep();
+ return true;
+ }
+
+ return false;
}
static __always_inline __must_check
diff --git a/lib/refcount.c b/lib/refcount.c
index ebcf8cd..732feac 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -33,6 +33,9 @@
* Note that the allocator is responsible for ordering things between free()
* and alloc().
*
+ * The decrements dec_and_test() and sub_and_test() also provide acquire
+ * ordering on success.
+ *
*/
#include <linux/mutex.h>
@@ -164,8 +167,7 @@ EXPORT_SYMBOL(refcount_inc_checked);
* at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
+ * before, and provides an acquire ordering on success such that free() must come after.
*
* Use of this function is not recommended for the normal reference counting
* use case in which references are taken and released one at a time. In these
@@ -190,7 +192,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
- return !new;
+ if (!new) {
+ smp_acquire__after_ctrl_dep();
+ return true;
+ }
+ return false;
+
}
EXPORT_SYMBOL(refcount_sub_and_test_checked);
@@ -202,8 +209,7 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
* decrement when saturated at UINT_MAX.
*
* Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
+ * before, and provides an acquire ordering on success such that free() must come after.
*
* Return: true if the resulting refcount is 0, false otherwise
*/
--
2.7.4
On Mon, Jan 28, 2019 at 02:09:37PM +0200, Elena Reshetova wrote:
> This adds an smp_acquire__after_ctrl_dep() barrier on successful
> decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> variants and therefore gives stronger memory ordering guarantees than
> prior versions of these functions.
>
> Co-Developed-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
+ Alan, Dmitry; they might also deserve a Suggested-by: ;-)
[...]
> +An ACQUIRE memory ordering guarantees that all post loads and
> +stores (all po-later instructions) on the same CPU are
> +completed after the acquire operation. It also guarantees that all
> +po-later stores on the same CPU and all propagated stores from other CPUs
> +must propagate to all other CPUs after the acquire operation
> +(A-cumulative property).
Mmh, this property (A-cumulativity) isn't really associated to ACQUIREs
in the LKMM; I'd suggest to simply remove the last sentence.
[...]
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index dbaed55..ab8f584 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
> static __always_inline __must_check
> bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> {
> - return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> + bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> REFCOUNT_CHECK_LT_ZERO,
> r->refs.counter, e, "er", i, "cx");
> +
> + if (ret) {
> + smp_acquire__after_ctrl_dep();
> + return true;
> + }
> +
> + return false;
There appears to be some white-space damage (here and in other places);
checkpatch.pl should point these and other style problems out.
Andrea
> }
>
> static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
> {
> - return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> - REFCOUNT_CHECK_LT_ZERO,
> - r->refs.counter, e, "cx");
> + bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> + REFCOUNT_CHECK_LT_ZERO,
> + r->refs.counter, e, "cx");
> + if (ret) {
> + smp_acquire__after_ctrl_dep();
> + return true;
> + }
> +
> + return false;
> }
>
> static __always_inline __must_check
> diff --git a/lib/refcount.c b/lib/refcount.c
> index ebcf8cd..732feac 100644
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -33,6 +33,9 @@
> * Note that the allocator is responsible for ordering things between free()
> * and alloc().
> *
> + * The decrements dec_and_test() and sub_and_test() also provide acquire
> + * ordering on success.
> + *
> */
>
> #include <linux/mutex.h>
> @@ -164,8 +167,7 @@ EXPORT_SYMBOL(refcount_inc_checked);
> * at UINT_MAX.
> *
> * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
> *
> * Use of this function is not recommended for the normal reference counting
> * use case in which references are taken and released one at a time. In these
> @@ -190,7 +192,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
>
> } while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
>
> - return !new;
> + if (!new) {
> + smp_acquire__after_ctrl_dep();
> + return true;
> + }
> + return false;
> +
> }
> EXPORT_SYMBOL(refcount_sub_and_test_checked);
>
> @@ -202,8 +209,7 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
> * decrement when saturated at UINT_MAX.
> *
> * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
> *
> * Return: true if the resulting refcount is 0, false otherwise
> */
> --
> 2.7.4
>
On Mon, Jan 28, 2019 at 03:29:10PM +0100, Andrea Parri wrote:
> > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> > index dbaed55..ab8f584 100644
> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
> > static __always_inline __must_check
> > bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> > {
> > - return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > + bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > REFCOUNT_CHECK_LT_ZERO,
> > r->refs.counter, e, "er", i, "cx");
> > +
> > + if (ret) {
> > + smp_acquire__after_ctrl_dep();
> > + return true;
> > + }
> > +
> > + return false;
>
> There appears to be some white-space damage (here and in other places);
> checkpatch.pl should point these and other style problems out.
It's worse...
patch: **** malformed patch at line 81: diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
And yes, there's a lot of whitespace damage all around. Lots of trailing
spaces too.
> On Mon, Jan 28, 2019 at 03:29:10PM +0100, Andrea Parri wrote:
>
> > > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> > > index dbaed55..ab8f584 100644
> > > --- a/arch/x86/include/asm/refcount.h
> > > +++ b/arch/x86/include/asm/refcount.h
> > > @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
> > > static __always_inline __must_check
> > > bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> > > {
> > > - return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > > + bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > >
> REFCOUNT_CHECK_LT_ZERO,
> > > r-
> >refs.counter, e, "er", i, "cx");
> > > +
> > > + if (ret) {
> > > + smp_acquire__after_ctrl_dep();
> > > + return true;
> > > + }
> > > +
> > > + return false;
> >
> > There appears to be some white-space damage (here and in other places);
> > checkpatch.pl should point these and other style problems out.
>
> It's worse...
>
> patch: **** malformed patch at line 81: diff --git
> a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
>
> And yes, there's a lot of whitespace damage all around. Lots of trailing
> spaces too.
I am very sorry about this, smth is really wrong with my system, in addition to all
above, I haven't even received Andrea reply to my inbox, neither this patch itself.
I will fix all the whitespacing/trailing stuff and address this comment from Andrea:
"Mmh, this property (A-cumulativity) isn't really associated to ACQUIREs
in the LKMM; I'd suggest to simply remove the last sentence."
Anything else that needs fixing, content-wise?
Best Regards,
Elena.
On Mon, Jan 28, 2019 at 1:10 PM Elena Reshetova
<[email protected]> wrote:
>
> This adds an smp_acquire__after_ctrl_dep() barrier on successful
> decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> variants and therefore gives stronger memory ordering guarantees than
> prior versions of these functions.
>
> Co-Developed-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
> ---
> Documentation/core-api/refcount-vs-atomic.rst | 28 +++++++++++++++++++++++----
> arch/x86/include/asm/refcount.h | 21 ++++++++++++++++----
> lib/refcount.c | 16 ++++++++++-----
> 3 files changed, 52 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> index 322851b..95d4b4e 100644
> --- a/Documentation/core-api/refcount-vs-atomic.rst
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -54,6 +54,14 @@ must propagate to all other CPUs before the release operation
> (A-cumulative property). This is implemented using
> :c:func:`smp_store_release`.
>
> +An ACQUIRE memory ordering guarantees that all post loads and
> +stores (all po-later instructions) on the same CPU are
> +completed after the acquire operation. It also guarantees that all
> +po-later stores on the same CPU and all propagated stores from other CPUs
> +must propagate to all other CPUs after the acquire operation
> +(A-cumulative property). This is implemented using
> +:c:func:`smp_acquire__after_ctrl_dep`.
The second part starting from "It also guarantees that". I am not sure
I understand what it means. Is it just a copy-paste from RELEASE? I am
not sure ACQUIRE provides anything like this.
> +
> A control dependency (on success) for refcounters guarantees that
> if a reference for an object was successfully obtained (reference
> counter increment or addition happened, function returned true),
> @@ -119,24 +127,36 @@ Memory ordering guarantees changes:
> result of obtaining pointer to the object!
>
>
> -case 5) - decrement-based RMW ops that return a value
> ------------------------------------------------------
> +case 5) - generic dec/sub decrement-based RMW ops that return a value
> +---------------------------------------------------------------------
>
> Function changes:
>
> * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
> * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
> +
> +Memory ordering guarantees changes:
> +
> + * fully ordered --> RELEASE ordering + ACQUIRE ordering and control dependency
> + on success.
Is ACQUIRE strictly stronger than control dependency?
It generally looks so unless there is something very subtle that I am
missing. If so, should we replace it with just "RELEASE ordering +
ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> +
> +
> +case 6) other decrement-based RMW ops that return a value
> +---------------------------------------------------------
> +
> +Function changes:
> +
> * no atomic counterpart --> :c:func:`refcount_dec_if_one`
> * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
>
> Memory ordering guarantees changes:
>
> - * fully ordered --> RELEASE ordering + control dependency
> + * fully ordered --> RELEASE ordering + control dependency
>
> .. note:: :c:func:`atomic_add_unless` only provides full order on success.
>
>
> -case 6) - lock-based RMW
> +case 7) - lock-based RMW
> ------------------------
>
> Function changes:
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index dbaed55..ab8f584 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
> static __always_inline __must_check
> bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> {
> - return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> + bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> REFCOUNT_CHECK_LT_ZERO,
> r->refs.counter, e, "er", i, "cx");
> +
> + if (ret) {
> + smp_acquire__after_ctrl_dep();
> + return true;
> + }
> +
> + return false;
> }
>
> static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
> {
> - return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> - REFCOUNT_CHECK_LT_ZERO,
> - r->refs.counter, e, "cx");
> + bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> + REFCOUNT_CHECK_LT_ZERO,
> + r->refs.counter, e, "cx");
> + if (ret) {
> + smp_acquire__after_ctrl_dep();
> + return true;
> + }
> +
> + return false;
> }
>
> static __always_inline __must_check
> diff --git a/lib/refcount.c b/lib/refcount.c
> index ebcf8cd..732feac 100644
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -33,6 +33,9 @@
> * Note that the allocator is responsible for ordering things between free()
> * and alloc().
> *
> + * The decrements dec_and_test() and sub_and_test() also provide acquire
> + * ordering on success.
> + *
> */
>
> #include <linux/mutex.h>
> @@ -164,8 +167,7 @@ EXPORT_SYMBOL(refcount_inc_checked);
> * at UINT_MAX.
> *
> * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
> *
> * Use of this function is not recommended for the normal reference counting
> * use case in which references are taken and released one at a time. In these
> @@ -190,7 +192,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
>
> } while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
>
> - return !new;
> + if (!new) {
> + smp_acquire__after_ctrl_dep();
> + return true;
> + }
> + return false;
> +
> }
> EXPORT_SYMBOL(refcount_sub_and_test_checked);
>
> @@ -202,8 +209,7 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
> * decrement when saturated at UINT_MAX.
> *
> * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
> *
> * Return: true if the resulting refcount is 0, false otherwise
> */
> --
> 2.7.4
>
> On Mon, Jan 28, 2019 at 1:10 PM Elena Reshetova
> <[email protected]> wrote:
> >
> > This adds an smp_acquire__after_ctrl_dep() barrier on successful
> > decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> > variants and therefore gives stronger memory ordering guarantees than
> > prior versions of these functions.
> >
> > Co-Developed-by: Peter Zijlstra (Intel) <[email protected]>
> > Signed-off-by: Elena Reshetova <[email protected]>
> > ---
> > Documentation/core-api/refcount-vs-atomic.rst | 28
> +++++++++++++++++++++++----
> > arch/x86/include/asm/refcount.h | 21 ++++++++++++++++----
> > lib/refcount.c | 16 ++++++++++-----
> > 3 files changed, 52 insertions(+), 13 deletions(-)
> >
> > diff --git a/Documentation/core-api/refcount-vs-atomic.rst
> b/Documentation/core-api/refcount-vs-atomic.rst
> > index 322851b..95d4b4e 100644
> > --- a/Documentation/core-api/refcount-vs-atomic.rst
> > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > @@ -54,6 +54,14 @@ must propagate to all other CPUs before the release
> operation
> > (A-cumulative property). This is implemented using
> > :c:func:`smp_store_release`.
> >
> > +An ACQUIRE memory ordering guarantees that all post loads and
> > +stores (all po-later instructions) on the same CPU are
> > +completed after the acquire operation. It also guarantees that all
> > +po-later stores on the same CPU and all propagated stores from other CPUs
> > +must propagate to all other CPUs after the acquire operation
> > +(A-cumulative property). This is implemented using
> > +:c:func:`smp_acquire__after_ctrl_dep`.
>
> The second part starting from "It also guarantees that". I am not sure
> I understand what it means. Is it just a copy-paste from RELEASE? I am
> not sure ACQUIRE provides anything like this.
>
So, you are saying that ACQUIRE does not guarantee that "po-later stores
on the same CPU and all propagated stores from other CPUs
must propagate to all other CPUs after the acquire operation "?
I was reading about acquire before posting this and trying to understand,
and this was my conclusion that it should provide this, but I can easily be wrong
on this.
Andrea, Peter, could you please comment?
>
> > +
> > A control dependency (on success) for refcounters guarantees that
> > if a reference for an object was successfully obtained (reference
> > counter increment or addition happened, function returned true),
> > @@ -119,24 +127,36 @@ Memory ordering guarantees changes:
> > result of obtaining pointer to the object!
> >
> >
> > -case 5) - decrement-based RMW ops that return a value
> > ------------------------------------------------------
> > +case 5) - generic dec/sub decrement-based RMW ops that return a value
> > +---------------------------------------------------------------------
> >
> > Function changes:
> >
> > * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
> > * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
> > +
> > +Memory ordering guarantees changes:
> > +
> > + * fully ordered --> RELEASE ordering + ACQUIRE ordering and control
> dependency
> > + on success.
>
> Is ACQUIRE strictly stronger than control dependency?
In my understanding yes.
> It generally looks so unless there is something very subtle that I am
> missing. If so, should we replace it with just "RELEASE ordering +
> ACQUIRE ordering on success"? Looks simpler with less magic trickery.
I was just trying to mention all the applicable orderings/guarantees.
I can remove "control dependency" part if it is easier for people to understand
(the main goal of documentation).
Best Regards,
Elena.
> So, you are saying that ACQUIRE does not guarantee that "po-later stores
> on the same CPU and all propagated stores from other CPUs
> must propagate to all other CPUs after the acquire operation "?
> I was reading about acquire before posting this and trying to understand,
> and this was my conclusion that it should provide this, but I can easily be wrong
> on this.
>
> Andrea, Peter, could you please comment?
Short version: I am not convinced by the above sentence, and I suggest
to remove it (as done in
http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).
---
To elaborate: I think that we should first discuss the meaning of that
"[...] after the acquire operation (does)", because there is no notion
of "ACQUIRE (or more generally, load) propagation" in the LKMM:
Stores propagate (after being executed) to other CPUs. Loads _execute_
(possibly multiple times /speculatively, but this is irrelevant for the
discussion below).
A detailed, but still informal, description of these concepts is in:
tools/memory-model/Documentation/explanation.txt
(c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
them with an example:
{ initially: x=0, y=0; }
CPU0 CPU1
--------------------------------------
LOAD-ACQUIRE x=0 LOAD y=1
STORE y=1
In this scenario,
a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
executes (this is guaranteed by the ACQUIRE),
b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
CPU1 (a store cannot be propagated before being executed),
c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
executes (since CPU1 "sees the store").
The example also illustrates the following property:
ACQUIRE guarantees that po-later stores on the same CPU must
propagate to all other CPUs after the acquire _executes_.
(combine (a) and (b) ).
OTOH, please notice that:
ACQUIRE does _NOT_ guarantee that all propagated stores from
other CPUs (to the CPU executing the ACQUIRE) must propagate
to all other CPUs after the acquire operation _executes_.
In fact, we've already seen how full barriers can be used to break such
"guarantee"; for example, in
{ initially: x=0, y=0; }
CPU0 CPU1 ...
---------------------------------------------------
STORE x=1 LOAD x=1
FULL-BARRIER
LOAD-ACQUIRE y=0
the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.
Does this make sense?
> > Is ACQUIRE strictly stronger than control dependency?
>
> In my understanding yes.
+1 (or we have a problem)
>
> > It generally looks so unless there is something very subtle that I am
> > missing. If so, should we replace it with just "RELEASE ordering +
> > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
>
> I was just trying to mention all the applicable orderings/guarantees.
> I can remove "control dependency" part if it is easier for people to understand
> (the main goal of documentation).
This sounds like a good idea; thank you, Dmitry, for pointing this out.
Andrea
>
> Best Regards,
> Elena.
> So, you are saying that ACQUIRE does not guarantee that "po-later stores
> > on the same CPU and all propagated stores from other CPUs
> > must propagate to all other CPUs after the acquire operation "?
> > I was reading about acquire before posting this and trying to understand,
> > and this was my conclusion that it should provide this, but I can easily be wrong
> > on this.
> >
> > Andrea, Peter, could you please comment?
>
> Short version: I am not convinced by the above sentence, and I suggest
> to remove it (as done in
>
> http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).
Sorry, I misunderstood your previous email on this. I somehow misread it
that " A-cumulative property" as a notion that is not used in LKMM for ACQUIRE,
so I should not mention the notion, but the guarantees stay, but it is guarantees
that are also wrong, which is much worse.
>
> ---
> To elaborate: I think that we should first discuss the meaning of that
> "[...] after the acquire operation (does)", because there is no notion
> of "ACQUIRE (or more generally, load) propagation" in the LKMM:
>
> Stores propagate (after being executed) to other CPUs. Loads _execute_
> (possibly multiple times /speculatively, but this is irrelevant for the
> discussion below).
>
> A detailed, but still informal, description of these concepts is in:
>
> tools/memory-model/Documentation/explanation.txt
>
> (c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
> them with an example:
>
> { initially: x=0, y=0; }
>
> CPU0 CPU1
> --------------------------------------
> LOAD-ACQUIRE x=0 LOAD y=1
> STORE y=1
>
> In this scenario,
>
> a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
> executes (this is guaranteed by the ACQUIRE),
>
> b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
> CPU1 (a store cannot be propagated before being executed),
>
> c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
> executes (since CPU1 "sees the store").
>
> The example also illustrates the following property:
>
> ACQUIRE guarantees that po-later stores on the same CPU must
> propagate to all other CPUs after the acquire _executes_.
>
> (combine (a) and (b) ).
>
> OTOH, please notice that:
>
> ACQUIRE does _NOT_ guarantee that all propagated stores from
> other CPUs (to the CPU executing the ACQUIRE) must propagate
> to all other CPUs after the acquire operation _executes_.
Thank you very much Andrea, this example and explanation clarifies it nicely!
So Acquire only really affects the current CPU "view of the world" and operation
propagation from it, and not anything else, which is actually very logical.
My initial confusion was because I was thinking of ACQUIRE as a pair
for RELEASE, i.e. it should provide a complementary guarantees to
RELEASE ones, just on po-later operations.
>
> In fact, we've already seen how full barriers can be used to break such
> "guarantee"; for example, in
>
> { initially: x=0, y=0; }
>
> CPU0 CPU1
> ...
> ---------------------------------------------------
> STORE x=1 LOAD x=1
> FULL-BARRIER
> LOAD-ACQUIRE y=0
>
> the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
> to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.
>
> Does this make sense?
Yes, thank you again! I think it would take me still a long while to be familiar
with all these notions and not to be confused even in simple things.
>
>
> > > Is ACQUIRE strictly stronger than control dependency?
> >
> > In my understanding yes.
>
> +1 (or we have a problem)
>
>
> >
> > > It generally looks so unless there is something very subtle that I am
> > > missing. If so, should we replace it with just "RELEASE ordering +
> > > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> >
> > I was just trying to mention all the applicable orderings/guarantees.
> > I can remove "control dependency" part if it is easier for people to understand
> > (the main goal of documentation).
>
> This sounds like a good idea; thank you, Dmitry, for pointing this out.
I will remove it. So, the rule that we always mention the strongest type of barrier
When we mention some ordering guarantees, right?
Best Regards,
Elena.
On Wed, Jan 30, 2019 at 11:19 AM Reshetova, Elena
<[email protected]> wrote:
>
> > So, you are saying that ACQUIRE does not guarantee that "po-later stores
> > > on the same CPU and all propagated stores from other CPUs
> > > must propagate to all other CPUs after the acquire operation "?
> > > I was reading about acquire before posting this and trying to understand,
> > > and this was my conclusion that it should provide this, but I can easily be wrong
> > > on this.
> > >
> > > Andrea, Peter, could you please comment?
> >
> > Short version: I am not convinced by the above sentence, and I suggest
> > to remove it (as done in
> >
> > http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).
>
> Sorry, I misunderstood your previous email on this. I somehow misread it
> that " A-cumulative property" as a notion that is not used in LKMM for ACQUIRE,
> so I should not mention the notion, but the guarantees stay, but it is guarantees
> that are also wrong, which is much worse.
>
> >
> > ---
> > To elaborate: I think that we should first discuss the meaning of that
> > "[...] after the acquire operation (does)", because there is no notion
> > of "ACQUIRE (or more generally, load) propagation" in the LKMM:
> >
> > Stores propagate (after being executed) to other CPUs. Loads _execute_
> > (possibly multiple times /speculatively, but this is irrelevant for the
> > discussion below).
> >
> > A detailed, but still informal, description of these concepts is in:
> >
> > tools/memory-model/Documentation/explanation.txt
> >
> > (c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
> > them with an example:
> >
> > { initially: x=0, y=0; }
> >
> > CPU0 CPU1
> > --------------------------------------
> > LOAD-ACQUIRE x=0 LOAD y=1
> > STORE y=1
> >
> > In this scenario,
> >
> > a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
> > executes (this is guaranteed by the ACQUIRE),
> >
> > b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
> > CPU1 (a store cannot be propagated before being executed),
> >
> > c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
> > executes (since CPU1 "sees the store").
> >
> > The example also illustrates the following property:
> >
> > ACQUIRE guarantees that po-later stores on the same CPU must
> > propagate to all other CPUs after the acquire _executes_.
> >
> > (combine (a) and (b) ).
> >
> > OTOH, please notice that:
> >
> > ACQUIRE does _NOT_ guarantee that all propagated stores from
> > other CPUs (to the CPU executing the ACQUIRE) must propagate
> > to all other CPUs after the acquire operation _executes_.
>
> Thank you very much Andrea, this example and explanation clarifies it nicely!
> So Acquire only really affects the current CPU "view of the world" and operation
> propagation from it, and not anything else, which is actually very logical.
>
> My initial confusion was because I was thinking of ACQUIRE as a pair
> for RELEASE, i.e. it should provide a complementary guarantees to
> RELEASE ones, just on po-later operations.
>
> >
> > In fact, we've already seen how full barriers can be used to break such
> > "guarantee"; for example, in
> >
> > { initially: x=0, y=0; }
> >
> > CPU0 CPU1
> > ...
> > ---------------------------------------------------
> > STORE x=1 LOAD x=1
> > FULL-BARRIER
> > LOAD-ACQUIRE y=0
> >
> > the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
> > to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.
> >
> > Does this make sense?
>
> Yes, thank you again! I think it would take me still a long while to be familiar
> with all these notions and not to be confused even in simple things.
>
> >
> >
> > > > Is ACQUIRE strictly stronger than control dependency?
> > >
> > > In my understanding yes.
> >
> > +1 (or we have a problem)
> >
> >
> > >
> > > > It generally looks so unless there is something very subtle that I am
> > > > missing. If so, should we replace it with just "RELEASE ordering +
> > > > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> > >
> > > I was just trying to mention all the applicable orderings/guarantees.
> > > I can remove "control dependency" part if it is easier for people to understand
> > > (the main goal of documentation).
> >
> > This sounds like a good idea; thank you, Dmitry, for pointing this out.
>
> I will remove it. So, the rule that we always mention the strongest type of barrier
> When we mention some ordering guarantees, right?
My reasoning here was that control dependency is just a very subtle
thing so I think it's better if people just not see it at all and not
start thinking in terms of control dependencies until absolutely
necessary.
I am not sure how to generalize this. There are not too many other
cases where one barrier type is a full superset of another. E.g.
rmb/wmb are orthogonal to acquire/release.
But if we take full barrier, then, yes, it definitely makes sense to
just say that an operation provides full barrier rather than full
barrier, acquire barrier, release barrier, read barrier, write
barrier, control dependency, ... :)