LinuxLists.cc - [PATCH 4.19 079/271] x86/atomic: Fix smp_mb__{before,after}

2019-07-24 20:06:51

Subject: [PATCH 4.19 079/271] x86/atomic: Fix smp_mb__{before,after}_atomic()

[ Upstream commit 69d927bba39517d0980462efc051875b7f4db185 ]

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
fail for:

*x = 1;
atomic_inc(u);
smp_mb__after_atomic();
r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

atomic_inc(u);
*x = 1;
smp_mb__after_atomic();
r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

atomic_inc(u);
r0 = *y;
*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.

NOTE: atomic_{or,and,xor} and the bitops already had the compiler
barrier.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
Documentation/atomic_t.txt | 3 +++
arch/x86/include/asm/atomic.h | 8 ++++----
arch/x86/include/asm/atomic64_64.h | 8 ++++----
arch/x86/include/asm/barrier.h | 4 ++--
4 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index 913396ac5824..ed0d814df7e0 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -177,6 +177,9 @@ These helper barriers exist because architectures have varying implicit
ordering on their SMP atomic primitives. For example our TSO architectures
provide full ordered atomics and these barriers are no-ops.

+NOTE: when the atomic RmW ops are fully ordered, they should also imply a
+compiler barrier.
+
Thus:

atomic_fetch_add();
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index ce84388e540c..d266a4066289 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -54,7 +54,7 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v)
{
asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)
- : "ir" (i));
+ : "ir" (i) : "memory");
}

/**
@@ -68,7 +68,7 @@ static __always_inline void arch_atomic_sub(int i, atomic_t *v)
{
asm volatile(LOCK_PREFIX "subl %1,%0"
: "+m" (v->counter)
- : "ir" (i));
+ : "ir" (i) : "memory");
}

/**
@@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v)
static __always_inline void arch_atomic_inc(atomic_t *v)
{
asm volatile(LOCK_PREFIX "incl %0"
- : "+m" (v->counter));
+ : "+m" (v->counter) :: "memory");
}
#define arch_atomic_inc arch_atomic_inc

@@ -108,7 +108,7 @@ static __always_inline void arch_atomic_inc(atomic_t *v)
static __always_inline void arch_atomic_dec(atomic_t *v)
{
asm volatile(LOCK_PREFIX "decl %0"
- : "+m" (v->counter));
+ : "+m" (v->counter) :: "memory");
}
#define arch_atomic_dec arch_atomic_dec

diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 5f851d92eecd..55ca027f8c1c 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -45,7 +45,7 @@ static __always_inline void arch_atomic64_add(long i, atomic64_t *v)
{
asm volatile(LOCK_PREFIX "addq %1,%0"
: "=m" (v->counter)
- : "er" (i), "m" (v->counter));
+ : "er" (i), "m" (v->counter) : "memory");
}

/**
@@ -59,7 +59,7 @@ static inline void arch_atomic64_sub(long i, atomic64_t *v)
{
asm volatile(LOCK_PREFIX "subq %1,%0"
: "=m" (v->counter)
- : "er" (i), "m" (v->counter));
+ : "er" (i), "m" (v->counter) : "memory");
}

/**
@@ -87,7 +87,7 @@ static __always_inline void arch_atomic64_inc(atomic64_t *v)
{
asm volatile(LOCK_PREFIX "incq %0"
: "=m" (v->counter)
- : "m" (v->counter));
+ : "m" (v->counter) : "memory");
}
#define arch_atomic64_inc arch_atomic64_inc

@@ -101,7 +101,7 @@ static __always_inline void arch_atomic64_dec(atomic64_t *v)
{
asm volatile(LOCK_PREFIX "decq %0"
: "=m" (v->counter)
- : "m" (v->counter));
+ : "m" (v->counter) : "memory");
}
#define arch_atomic64_dec arch_atomic64_dec

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 14de0432d288..84f848c2541a 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -80,8 +80,8 @@ do { \
})

/* Atomic operations are already serializing on x86 */
-#define __smp_mb__before_atomic() barrier()
-#define __smp_mb__after_atomic() barrier()
+#define __smp_mb__before_atomic() do { } while (0)
+#define __smp_mb__after_atomic() do { } while (0)

#include <asm-generic/barrier.h>

--
2.20.1

2019-07-26 10:28:04

by Jari Ruusu

[permalink] [raw]

Subject: Re: [PATCH 4.19 079/271] x86/atomic: Fix smp_mb__{before,after}_atomic()

Greg Kroah-Hartman wrote:
> [ Upstream commit 69d927bba39517d0980462efc051875b7f4db185 ]
>
> Recent probing at the Linux Kernel Memory Model uncovered a
> 'surprise'. Strongly ordered architectures where the atomic RmW
> primitive implies full memory ordering and
> smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
> fail for:
>
> *x = 1;
> atomic_inc(u);
> smp_mb__after_atomic();
> r0 = *y;

[snip]

> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -54,7 +54,7 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v)
> {
> asm volatile(LOCK_PREFIX "addl %1,%0"
> : "+m" (v->counter)
> - : "ir" (i));
> + : "ir" (i) : "memory");
> }
>
> /**

Shouldn't those clobber contraints actually be: "memory","cc"
That is because addl subl (and other) machine instructions
actually modify the flags register too.

gcc docs say: The "cc" clobber indicates that the assembler
code modifies the flags register.

--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189

2019-07-26 11:09:10

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 4.19 079/271] x86/atomic: Fix smp_mb__{before,after}_atomic()

On Fri, Jul 26, 2019 at 01:18:06PM +0300, Jari Ruusu wrote:
> Greg Kroah-Hartman wrote:
> > [ Upstream commit 69d927bba39517d0980462efc051875b7f4db185 ]
> >
> > Recent probing at the Linux Kernel Memory Model uncovered a
> > 'surprise'. Strongly ordered architectures where the atomic RmW
> > primitive implies full memory ordering and
> > smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
> > fail for:
> >
> > *x = 1;
> > atomic_inc(u);
> > smp_mb__after_atomic();
> > r0 = *y;
>
> [snip]
>
> > --- a/arch/x86/include/asm/atomic.h
> > +++ b/arch/x86/include/asm/atomic.h
> > @@ -54,7 +54,7 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v)
> > {
> > asm volatile(LOCK_PREFIX "addl %1,%0"
> > : "+m" (v->counter)
> > - : "ir" (i));
> > + : "ir" (i) : "memory");
> > }
> >
> > /**
>
> Shouldn't those clobber contraints actually be: "memory","cc"
> That is because addl subl (and other) machine instructions
> actually modify the flags register too.
>
> gcc docs say: The "cc" clobber indicates that the assembler
> code modifies the flags register.

GCC x86 assumes any asm() will clobber "cc".

2019-07-26 14:09:14

by Jari Ruusu

[permalink] [raw]

Subject: Re: [PATCH 4.19 079/271] x86/atomic: Fix smp_mb__{before,after}_atomic()

On 7/26/19, Peter Zijlstra <[email protected]> wrote:
> On Fri, Jul 26, 2019 at 01:18:06PM +0300, Jari Ruusu wrote:
>> Shouldn't those clobber contraints actually be: "memory","cc"
>> That is because addl subl (and other) machine instructions
>> actually modify the flags register too.
>>
>> gcc docs say: The "cc" clobber indicates that the assembler
>> code modifies the flags register.
>
> GCC x86 assumes any asm() will clobber "cc".

No worries then. Thanks for your clarification.

--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189