2018-02-20 18:47:43

by Andrea Parri

[permalink] [raw]
Subject: [PATCH] xchg/alpha: Add unconditional memory barrier to cmpxchg

Continuing along with the fight against smp_read_barrier_depends() [1]
(or rather, against its improper use), add an unconditional barrier to
cmpxchg. This guarantees that dependency ordering is preserved when a
dependency is headed by an unsuccessful cmpxchg. As it turns out, the
change could enable further simplification of LKMM as proposed in [2].

[1] https://marc.info/?l=linux-kernel&m=150884953419377&w=2
https://marc.info/?l=linux-kernel&m=150884946319353&w=2
https://marc.info/?l=linux-kernel&m=151215810824468&w=2
https://marc.info/?l=linux-kernel&m=151215816324484&w=2

[2] https://marc.info/?l=linux-kernel&m=151881978314872&w=2

Signed-off-by: Andrea Parri <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/alpha/include/asm/xchg.h | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/xchg.h b/arch/alpha/include/asm/xchg.h
index 68dfb3cb71454..e2660866ce972 100644
--- a/arch/alpha/include/asm/xchg.h
+++ b/arch/alpha/include/asm/xchg.h
@@ -128,10 +128,9 @@ ____xchg(, volatile void *ptr, unsigned long x, int size)
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*
- * The memory barrier should be placed in SMP only when we actually
- * make the change. If we don't change anything (so if the returned
- * prev is equal to old) then we aren't acquiring anything new and
- * we don't need any memory barrier as far I can tell.
+ * The memory barrier is placed in SMP unconditionally, in order to
+ * guarantee that dependency ordering is preserved when a dependency
+ * is headed by an unsuccessful operation.
*/

static inline unsigned long
@@ -150,8 +149,8 @@ ____cmpxchg(_u8, volatile char *m, unsigned char old, unsigned char new)
" or %1,%2,%2\n"
" stq_c %2,0(%4)\n"
" beq %2,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
@@ -177,8 +176,8 @@ ____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
" or %1,%2,%2\n"
" stq_c %2,0(%4)\n"
" beq %2,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
@@ -200,8 +199,8 @@ ____cmpxchg(_u32, volatile int *m, int old, int new)
" mov %4,%1\n"
" stl_c %1,%2\n"
" beq %1,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
@@ -223,8 +222,8 @@ ____cmpxchg(_u64, volatile long *m, unsigned long old, unsigned long new)
" mov %4,%1\n"
" stq_c %1,%2\n"
" beq %1,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
--
2.7.4



2018-02-20 19:40:27

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] xchg/alpha: Add unconditional memory barrier to cmpxchg

On Tue, Feb 20, 2018 at 07:45:56PM +0100, Andrea Parri wrote:
> Continuing along with the fight against smp_read_barrier_depends() [1]
> (or rather, against its improper use), add an unconditional barrier to
> cmpxchg. This guarantees that dependency ordering is preserved when a
> dependency is headed by an unsuccessful cmpxchg. As it turns out, the
> change could enable further simplification of LKMM as proposed in [2].
>
> [1] https://marc.info/?l=linux-kernel&m=150884953419377&w=2
> https://marc.info/?l=linux-kernel&m=150884946319353&w=2
> https://marc.info/?l=linux-kernel&m=151215810824468&w=2
> https://marc.info/?l=linux-kernel&m=151215816324484&w=2
>
> [2] https://marc.info/?l=linux-kernel&m=151881978314872&w=2
>
> Signed-off-by: Andrea Parri <[email protected]>
> Acked-by: Peter Zijlstra <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>

Acked-by: "Paul E. McKenney" <[email protected]>

> Cc: Alan Stern <[email protected]>
> Cc: Richard Henderson <[email protected]>
> Cc: Ivan Kokshaysky <[email protected]>
> Cc: Matt Turner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/alpha/include/asm/xchg.h | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/alpha/include/asm/xchg.h b/arch/alpha/include/asm/xchg.h
> index 68dfb3cb71454..e2660866ce972 100644
> --- a/arch/alpha/include/asm/xchg.h
> +++ b/arch/alpha/include/asm/xchg.h
> @@ -128,10 +128,9 @@ ____xchg(, volatile void *ptr, unsigned long x, int size)
> * store NEW in MEM. Return the initial value in MEM. Success is
> * indicated by comparing RETURN with OLD.
> *
> - * The memory barrier should be placed in SMP only when we actually
> - * make the change. If we don't change anything (so if the returned
> - * prev is equal to old) then we aren't acquiring anything new and
> - * we don't need any memory barrier as far I can tell.
> + * The memory barrier is placed in SMP unconditionally, in order to
> + * guarantee that dependency ordering is preserved when a dependency
> + * is headed by an unsuccessful operation.
> */
>
> static inline unsigned long
> @@ -150,8 +149,8 @@ ____cmpxchg(_u8, volatile char *m, unsigned char old, unsigned char new)
> " or %1,%2,%2\n"
> " stq_c %2,0(%4)\n"
> " beq %2,3f\n"
> - __ASM__MB
> "2:\n"
> + __ASM__MB
> ".subsection 2\n"
> "3: br 1b\n"
> ".previous"
> @@ -177,8 +176,8 @@ ____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
> " or %1,%2,%2\n"
> " stq_c %2,0(%4)\n"
> " beq %2,3f\n"
> - __ASM__MB
> "2:\n"
> + __ASM__MB
> ".subsection 2\n"
> "3: br 1b\n"
> ".previous"
> @@ -200,8 +199,8 @@ ____cmpxchg(_u32, volatile int *m, int old, int new)
> " mov %4,%1\n"
> " stl_c %1,%2\n"
> " beq %1,3f\n"
> - __ASM__MB
> "2:\n"
> + __ASM__MB
> ".subsection 2\n"
> "3: br 1b\n"
> ".previous"
> @@ -223,8 +222,8 @@ ____cmpxchg(_u64, volatile long *m, unsigned long old, unsigned long new)
> " mov %4,%1\n"
> " stq_c %1,%2\n"
> " beq %1,3f\n"
> - __ASM__MB
> "2:\n"
> + __ASM__MB
> ".subsection 2\n"
> "3: br 1b\n"
> ".previous"
> --
> 2.7.4
>


Subject: [tip:locking/urgent] locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()

Commit-ID: cb13b424e986aed68d74cbaec3449ea23c50e167
Gitweb: https://git.kernel.org/tip/cb13b424e986aed68d74cbaec3449ea23c50e167
Author: Andrea Parri <[email protected]>
AuthorDate: Tue, 20 Feb 2018 19:45:56 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 21 Feb 2018 10:12:29 +0100

locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()

Continuing along with the fight against smp_read_barrier_depends() [1]
(or rather, against its improper use), add an unconditional barrier to
cmpxchg. This guarantees that dependency ordering is preserved when a
dependency is headed by an unsuccessful cmpxchg. As it turns out, the
change could enable further simplification of LKMM as proposed in [2].

[1] https://marc.info/?l=linux-kernel&m=150884953419377&w=2
https://marc.info/?l=linux-kernel&m=150884946319353&w=2
https://marc.info/?l=linux-kernel&m=151215810824468&w=2
https://marc.info/?l=linux-kernel&m=151215816324484&w=2

[2] https://marc.info/?l=linux-kernel&m=151881978314872&w=2

Signed-off-by: Andrea Parri <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/alpha/include/asm/xchg.h | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/include/asm/xchg.h b/arch/alpha/include/asm/xchg.h
index 68dfb3c..e266086 100644
--- a/arch/alpha/include/asm/xchg.h
+++ b/arch/alpha/include/asm/xchg.h
@@ -128,10 +128,9 @@ ____xchg(, volatile void *ptr, unsigned long x, int size)
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*
- * The memory barrier should be placed in SMP only when we actually
- * make the change. If we don't change anything (so if the returned
- * prev is equal to old) then we aren't acquiring anything new and
- * we don't need any memory barrier as far I can tell.
+ * The memory barrier is placed in SMP unconditionally, in order to
+ * guarantee that dependency ordering is preserved when a dependency
+ * is headed by an unsuccessful operation.
*/

static inline unsigned long
@@ -150,8 +149,8 @@ ____cmpxchg(_u8, volatile char *m, unsigned char old, unsigned char new)
" or %1,%2,%2\n"
" stq_c %2,0(%4)\n"
" beq %2,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
@@ -177,8 +176,8 @@ ____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
" or %1,%2,%2\n"
" stq_c %2,0(%4)\n"
" beq %2,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
@@ -200,8 +199,8 @@ ____cmpxchg(_u32, volatile int *m, int old, int new)
" mov %4,%1\n"
" stl_c %1,%2\n"
" beq %1,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"
@@ -223,8 +222,8 @@ ____cmpxchg(_u64, volatile long *m, unsigned long old, unsigned long new)
" mov %4,%1\n"
" stq_c %1,%2\n"
" beq %1,3f\n"
- __ASM__MB
"2:\n"
+ __ASM__MB
".subsection 2\n"
"3: br 1b\n"
".previous"

2018-02-21 16:53:15

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] xchg/alpha: Add unconditional memory barrier to cmpxchg

Hi Andrea,

On Tue, Feb 20, 2018 at 07:45:56PM +0100, Andrea Parri wrote:
> Continuing along with the fight against smp_read_barrier_depends() [1]
> (or rather, against its improper use), add an unconditional barrier to
> cmpxchg. This guarantees that dependency ordering is preserved when a
> dependency is headed by an unsuccessful cmpxchg. As it turns out, the
> change could enable further simplification of LKMM as proposed in [2].
>
> [1] https://marc.info/?l=linux-kernel&m=150884953419377&w=2
> https://marc.info/?l=linux-kernel&m=150884946319353&w=2
> https://marc.info/?l=linux-kernel&m=151215810824468&w=2
> https://marc.info/?l=linux-kernel&m=151215816324484&w=2
>
> [2] https://marc.info/?l=linux-kernel&m=151881978314872&w=2
>
> Signed-off-by: Andrea Parri <[email protected]>
> Acked-by: Peter Zijlstra <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Alan Stern <[email protected]>
> Cc: Richard Henderson <[email protected]>
> Cc: Ivan Kokshaysky <[email protected]>
> Cc: Matt Turner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/alpha/include/asm/xchg.h | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/alpha/include/asm/xchg.h b/arch/alpha/include/asm/xchg.h
> index 68dfb3cb71454..e2660866ce972 100644
> --- a/arch/alpha/include/asm/xchg.h
> +++ b/arch/alpha/include/asm/xchg.h
> @@ -128,10 +128,9 @@ ____xchg(, volatile void *ptr, unsigned long x, int size)
> * store NEW in MEM. Return the initial value in MEM. Success is
> * indicated by comparing RETURN with OLD.
> *
> - * The memory barrier should be placed in SMP only when we actually
> - * make the change. If we don't change anything (so if the returned
> - * prev is equal to old) then we aren't acquiring anything new and
> - * we don't need any memory barrier as far I can tell.
> + * The memory barrier is placed in SMP unconditionally, in order to
> + * guarantee that dependency ordering is preserved when a dependency
> + * is headed by an unsuccessful operation.
> */
>
> static inline unsigned long
> @@ -150,8 +149,8 @@ ____cmpxchg(_u8, volatile char *m, unsigned char old, unsigned char new)
> " or %1,%2,%2\n"
> " stq_c %2,0(%4)\n"
> " beq %2,3f\n"
> - __ASM__MB
> "2:\n"
> + __ASM__MB
> ".subsection 2\n"
> "3: br 1b\n"
> ".previous"

It might be better just to add smp_read_barrier_depends() into the cmpxchg
macro, then remove all of the __ASM__MB stuff.

That said, I don't actually understand how the Alpha cmpxchg or xchg
implementations satisfy the memory model, since they only appear to have
a barrier after the operation.

So MP using xchg:

WRITE_ONCE(x, 1)
xchg(y, 1)

smp_load_acquire(y) == 1
READ_ONCE(x) == 0

would be allowed. What am I missing?

Since I'm in the mood for dumb questions, do we need to care about
this_cpu_cmpxchg? I'm sure I've seen code that allows concurrent access to
per-cpu variables, but the asm-generic implementation of this_cpu_cmpxchg
doesn't use READ_ONCE.

Will

2018-02-21 18:42:54

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH] xchg/alpha: Add unconditional memory barrier to cmpxchg

On Wed, Feb 21, 2018 at 11:21:38AM +0000, Will Deacon wrote:
> Hi Andrea,
>
> On Tue, Feb 20, 2018 at 07:45:56PM +0100, Andrea Parri wrote:
> > Continuing along with the fight against smp_read_barrier_depends() [1]
> > (or rather, against its improper use), add an unconditional barrier to
> > cmpxchg. This guarantees that dependency ordering is preserved when a
> > dependency is headed by an unsuccessful cmpxchg. As it turns out, the
> > change could enable further simplification of LKMM as proposed in [2].
> >
> > [1] https://marc.info/?l=linux-kernel&m=150884953419377&w=2
> > https://marc.info/?l=linux-kernel&m=150884946319353&w=2
> > https://marc.info/?l=linux-kernel&m=151215810824468&w=2
> > https://marc.info/?l=linux-kernel&m=151215816324484&w=2
> >
> > [2] https://marc.info/?l=linux-kernel&m=151881978314872&w=2
> >
> > Signed-off-by: Andrea Parri <[email protected]>
> > Acked-by: Peter Zijlstra <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Alan Stern <[email protected]>
> > Cc: Richard Henderson <[email protected]>
> > Cc: Ivan Kokshaysky <[email protected]>
> > Cc: Matt Turner <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > arch/alpha/include/asm/xchg.h | 15 +++++++--------
> > 1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/alpha/include/asm/xchg.h b/arch/alpha/include/asm/xchg.h
> > index 68dfb3cb71454..e2660866ce972 100644
> > --- a/arch/alpha/include/asm/xchg.h
> > +++ b/arch/alpha/include/asm/xchg.h
> > @@ -128,10 +128,9 @@ ____xchg(, volatile void *ptr, unsigned long x, int size)
> > * store NEW in MEM. Return the initial value in MEM. Success is
> > * indicated by comparing RETURN with OLD.
> > *
> > - * The memory barrier should be placed in SMP only when we actually
> > - * make the change. If we don't change anything (so if the returned
> > - * prev is equal to old) then we aren't acquiring anything new and
> > - * we don't need any memory barrier as far I can tell.
> > + * The memory barrier is placed in SMP unconditionally, in order to
> > + * guarantee that dependency ordering is preserved when a dependency
> > + * is headed by an unsuccessful operation.
> > */
> >
> > static inline unsigned long
> > @@ -150,8 +149,8 @@ ____cmpxchg(_u8, volatile char *m, unsigned char old, unsigned char new)
> > " or %1,%2,%2\n"
> > " stq_c %2,0(%4)\n"
> > " beq %2,3f\n"
> > - __ASM__MB
> > "2:\n"
> > + __ASM__MB
> > ".subsection 2\n"
> > "3: br 1b\n"
> > ".previous"
>
> It might be better just to add smp_read_barrier_depends() into the cmpxchg
> macro, then remove all of the __ASM__MB stuff.

Mmh, it might be better to add smp_mb() into the cmpxchg macro (after the
operation), then remove all the __ASM__MB stuff.


>
> That said, I don't actually understand how the Alpha cmpxchg or xchg
> implementations satisfy the memory model, since they only appear to have
> a barrier after the operation.
>
> So MP using xchg:
>
> WRITE_ONCE(x, 1)
> xchg(y, 1)
>
> smp_load_acquire(y) == 1
> READ_ONCE(x) == 0
>
> would be allowed. What am I missing?

Good question ;-) The absence of an smp_mb() (or of an __ASM__MB) before
the operation did upset me.

If this question remains pending, I'll send a patch to add these barriers.


>
> Since I'm in the mood for dumb questions, do we need to care about
> this_cpu_cmpxchg? I'm sure I've seen code that allows concurrent access to
> per-cpu variables, but the asm-generic implementation of this_cpu_cmpxchg
> doesn't use READ_ONCE.

Frankly, I'm not sure if this's an issue in the generic implementation of
this_cpu_* or, rather, in that code. let me dig a bit more into this ...

Andrea


>
> Will