2020-10-15 02:21:03

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH] compiler.h: Fix barrier_data() on clang

Commit
815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")

neglected to copy barrier_data() from compiler-gcc.h into
compiler-clang.h. The definition in compiler-gcc.h was really to work
around clang's more aggressive optimization, so this broke
barrier_data() on clang, and consequently memzero_explicit() as well.

For example, this results in at least the memzero_explicit() call in
lib/crypto/sha256.c:sha256_transform() being optimized away by clang.

Fix this by moving the definition of barrier_data() into compiler.h.

Also move the gcc/clang definition of barrier() into compiler.h,
__memory_barrier() is icc-specific (and barrier() is already defined
using it in compiler-intel.h) and doesn't belong in compiler.h.

Signed-off-by: Arvind Sankar <[email protected]>
Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
---
include/linux/compiler-clang.h | 6 ------
include/linux/compiler-gcc.h | 19 -------------------
include/linux/compiler.h | 18 ++++++++++++++++--
3 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index cee0c728d39a..04c0a5a717f7 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -52,12 +52,6 @@
#define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
#endif

-/* The following are for compatibility with GCC, from compiler-gcc.h,
- * and may be redefined here because they should not be shared with other
- * compilers, like ICC.
- */
-#define barrier() __asm__ __volatile__("" : : : "memory")
-
#if __has_feature(shadow_call_stack)
# define __noscs __attribute__((__no_sanitize__("shadow-call-stack")))
#endif
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 7a3769040d7d..fda30ffb037b 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -15,25 +15,6 @@
# error Sorry, your compiler is too old - please upgrade it.
#endif

-/* Optimization barrier */
-
-/* The "volatile" is due to gcc bugs */
-#define barrier() __asm__ __volatile__("": : :"memory")
-/*
- * This version is i.e. to prevent dead stores elimination on @ptr
- * where gcc and llvm may behave differently when otherwise using
- * normal barrier(): while gcc behavior gets along with a normal
- * barrier(), llvm needs an explicit input variable to be assumed
- * clobbered. The issue is as follows: while the inline asm might
- * access any memory it wants, the compiler could have fit all of
- * @ptr into memory registers instead, and since @ptr never escaped
- * from that, it proved that the inline asm wasn't touching any of
- * it. This version works well with both compilers, i.e. we're telling
- * the compiler that the inline asm absolutely may see the contents
- * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
- */
-#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
-
/*
* This macro obfuscates arithmetic on a variable address so that gcc
* shouldn't recognize the original var, and make assumptions about it.
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92ef163a7479..dfba70b2644f 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,

/* Optimization barrier */
#ifndef barrier
-# define barrier() __memory_barrier()
+/* The "volatile" is due to gcc bugs */
+# define barrier() __asm__ __volatile__("": : :"memory")
#endif

#ifndef barrier_data
-# define barrier_data(ptr) barrier()
+/*
+ * This version is i.e. to prevent dead stores elimination on @ptr
+ * where gcc and llvm may behave differently when otherwise using
+ * normal barrier(): while gcc behavior gets along with a normal
+ * barrier(), llvm needs an explicit input variable to be assumed
+ * clobbered. The issue is as follows: while the inline asm might
+ * access any memory it wants, the compiler could have fit all of
+ * @ptr into memory registers instead, and since @ptr never escaped
+ * from that, it proved that the inline asm wasn't touching any of
+ * it. This version works well with both compilers, i.e. we're telling
+ * the compiler that the inline asm absolutely may see the contents
+ * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
+ */
+# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
#endif

/* workaround for GCC PR82365 if needed */
--
2.26.2


2020-10-15 03:37:43

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Wed, Oct 14, 2020 at 2:26 PM Arvind Sankar <[email protected]> wrote:
>
> Commit
> 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
>
> neglected to copy barrier_data() from compiler-gcc.h into
> compiler-clang.h. The definition in compiler-gcc.h was really to work
> around clang's more aggressive optimization, so this broke
> barrier_data() on clang, and consequently memzero_explicit() as well.
>
> For example, this results in at least the memzero_explicit() call in
> lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
>
> Fix this by moving the definition of barrier_data() into compiler.h.
>
> Also move the gcc/clang definition of barrier() into compiler.h,
> __memory_barrier() is icc-specific (and barrier() is already defined
> using it in compiler-intel.h) and doesn't belong in compiler.h.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")

Thanks for the patch! Curious how you spotted this? My mistake for
missing it. Definite difference in the disassembly before/after.

Cc: [email protected]
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>

akpm@ would you mind picking this up when you have a chance?

See also:
commit 7829fb09a2b4 ("lib: make memzero_explicit more robust against
dead store elimination")

I'm pretty sure `man 3 explicit_bzero` was created in libc for this
exact problem, though the manual page is an interesting read.

> ---
> include/linux/compiler-clang.h | 6 ------
> include/linux/compiler-gcc.h | 19 -------------------
> include/linux/compiler.h | 18 ++++++++++++++++--
> 3 files changed, 16 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index cee0c728d39a..04c0a5a717f7 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -52,12 +52,6 @@
> #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
> #endif
>
> -/* The following are for compatibility with GCC, from compiler-gcc.h,
> - * and may be redefined here because they should not be shared with other
> - * compilers, like ICC.
> - */
> -#define barrier() __asm__ __volatile__("" : : : "memory")
> -
> #if __has_feature(shadow_call_stack)
> # define __noscs __attribute__((__no_sanitize__("shadow-call-stack")))
> #endif
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index 7a3769040d7d..fda30ffb037b 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -15,25 +15,6 @@
> # error Sorry, your compiler is too old - please upgrade it.
> #endif
>
> -/* Optimization barrier */
> -
> -/* The "volatile" is due to gcc bugs */
> -#define barrier() __asm__ __volatile__("": : :"memory")
> -/*
> - * This version is i.e. to prevent dead stores elimination on @ptr
> - * where gcc and llvm may behave differently when otherwise using
> - * normal barrier(): while gcc behavior gets along with a normal
> - * barrier(), llvm needs an explicit input variable to be assumed
> - * clobbered. The issue is as follows: while the inline asm might
> - * access any memory it wants, the compiler could have fit all of
> - * @ptr into memory registers instead, and since @ptr never escaped
> - * from that, it proved that the inline asm wasn't touching any of
> - * it. This version works well with both compilers, i.e. we're telling
> - * the compiler that the inline asm absolutely may see the contents
> - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> - */
> -#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> -
> /*
> * This macro obfuscates arithmetic on a variable address so that gcc
> * shouldn't recognize the original var, and make assumptions about it.
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 92ef163a7479..dfba70b2644f 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>
> /* Optimization barrier */
> #ifndef barrier
> -# define barrier() __memory_barrier()
> +/* The "volatile" is due to gcc bugs */
> +# define barrier() __asm__ __volatile__("": : :"memory")
> #endif
>
> #ifndef barrier_data
> -# define barrier_data(ptr) barrier()
> +/*
> + * This version is i.e. to prevent dead stores elimination on @ptr
> + * where gcc and llvm may behave differently when otherwise using
> + * normal barrier(): while gcc behavior gets along with a normal
> + * barrier(), llvm needs an explicit input variable to be assumed
> + * clobbered. The issue is as follows: while the inline asm might
> + * access any memory it wants, the compiler could have fit all of
> + * @ptr into memory registers instead, and since @ptr never escaped
> + * from that, it proved that the inline asm wasn't touching any of
> + * it. This version works well with both compilers, i.e. we're telling
> + * the compiler that the inline asm absolutely may see the contents
> + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> + */
> +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> #endif
>
> /* workaround for GCC PR82365 if needed */
> --
> 2.26.2
>


--
Thanks,
~Nick Desaulniers

2020-10-15 14:48:25

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Thu, Oct 15, 2020 at 08:50:05AM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 14 October 2020 22:27
> ...
> > +/*
> > + * This version is i.e. to prevent dead stores elimination on @ptr
> > + * where gcc and llvm may behave differently when otherwise using
> > + * normal barrier(): while gcc behavior gets along with a normal
> > + * barrier(), llvm needs an explicit input variable to be assumed
> > + * clobbered. The issue is as follows: while the inline asm might
> > + * access any memory it wants, the compiler could have fit all of
> > + * @ptr into memory registers instead, and since @ptr never escaped
> > + * from that, it proved that the inline asm wasn't touching any of
> > + * it. This version works well with both compilers, i.e. we're telling
> > + * the compiler that the inline asm absolutely may see the contents
> > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> > + */
> > +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
>
> That comment doesn't actually match the asm statement.
> Although the asm statement probably has the desired effect.
>
> The "r"(ptr) constraint only passes the address of the buffer
> into the asm - it doesn't say anything at all about the associated
> memory.
>
> What the "r"(ptr) actually does is to force the address of the
> associated data to be taken.
> This means that on-stack space must actually be allocated.
> The "memory" clobber will then force the registers caching
> the variable be written out to stack.
>

I think the comment is unclear now that you bring it up, but the problem
it actually addresses is not that the data is held in registers: in the
sha256_transform() case mentioned in the commit message, for example,
the data is in fact in memory even before this change (it's a 256-byte
array), and that together with the memory clobber is enough for gcc to
assume that the asm might use it. But with clang, if the address of that
data has never escaped -- in this case the data is a local variable
whose address was never passed out of the function -- the compiler
assumes that the asm cannot possibly depend on that memory, because it
has no way of getting its address.

Passing ptr to the inline asm tells clang that the asm knows the
address, and since it also has a memory clobber, that it may use the
data. This is somewhat suboptimal, since if the data was some small
structure that the compiler was just holding in registers originally,
forcing it out to memory is a bad thing to do.

> If you only want to force stores on a single data structure
> you actually want:
> #define barrier_data(ptr) asm volatile("" :: "m"(*ptr))
> although it would be best then to add an explicit size
> and associated cast.
>

i.e. something like:
static inline void barrier_data(void *ptr, size_t size)
{
asm volatile("" : "+m"(*(char (*)[size])ptr));
}
plus some magic to disable the VLA warning, otherwise it causes a build
error.

But I think that might lead to even more subtle issues by dropping the
memory clobber. For example (and this is actually done in
sha256_transform() as well, though the zero'ing simply doesn't work with
any compiler, as it's missing the barrier_data()'s):

unsigned long x, y;
... do something secret with x/y ...
x = y = 0;
barrier_data(&x, sizeof(x));
barrier_data(&y, sizeof(y));
return;

Since x is not used after its barrier_data(), I think the compiler would
be within its rights to turn that into:

xorl %eax, %eax
leaq -16(%rbp), %rdx // &x == -16(%rbp)
movq %eax, (%rdx) // x = 0;
// inline asm for barrier_data(&x, sizeof(x));
movq %eax, (%rdx) // y = 0; (!)
// inline asm for barrier_data(&y, sizeof(y));

which saves one instruction by putting y at the same location as x, once
x is dead.

With a memory clobber, the compiler has to keep x and y at different
addresses, since the first barrier_data() might have saved the address
of x.

2020-10-15 17:58:03

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] compiler.h: Fix barrier_data() on clang

From: Arvind Sankar
> Sent: 14 October 2020 22:27
...
> +/*
> + * This version is i.e. to prevent dead stores elimination on @ptr
> + * where gcc and llvm may behave differently when otherwise using
> + * normal barrier(): while gcc behavior gets along with a normal
> + * barrier(), llvm needs an explicit input variable to be assumed
> + * clobbered. The issue is as follows: while the inline asm might
> + * access any memory it wants, the compiler could have fit all of
> + * @ptr into memory registers instead, and since @ptr never escaped
> + * from that, it proved that the inline asm wasn't touching any of
> + * it. This version works well with both compilers, i.e. we're telling
> + * the compiler that the inline asm absolutely may see the contents
> + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> + */
> +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")

That comment doesn't actually match the asm statement.
Although the asm statement probably has the desired effect.

The "r"(ptr) constraint only passes the address of the buffer
into the asm - it doesn't say anything at all about the associated
memory.

What the "r"(ptr) actually does is to force the address of the
associated data to be taken.
This means that on-stack space must actually be allocated.
The "memory" clobber will then force the registers caching
the variable be written out to stack.

If you only want to force stores on a single data structure
you actually want:
#define barrier_data(ptr) asm volatile("" :: "m"(*ptr))
although it would be best then to add an explicit size
and associated cast.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-10-15 18:29:23

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Thu, Oct 15, 2020 at 03:24:09PM +0000, David Laight wrote:
> > I think the comment is unclear now that you bring it up, but the problem
> > it actually addresses is not that the data is held in registers: in the
> > sha256_transform() case mentioned in the commit message, for example,
> > the data is in fact in memory even before this change (it's a 256-byte
> > array), and that together with the memory clobber is enough for gcc to
> > assume that the asm might use it. But with clang, if the address of that
> > data has never escaped -- in this case the data is a local variable
> > whose address was never passed out of the function -- the compiler
> > assumes that the asm cannot possibly depend on that memory, because it
> > has no way of getting its address.
>
> Ok, slightly different from what i thought.
> But the current comment is just wrong.

Should I fix up the comment in the same commit, or do a second one after
moving the macro?

> > i.e. something like:
> > static inline void barrier_data(void *ptr, size_t size)
> > {
> > asm volatile("" : "+m"(*(char (*)[size])ptr));
>
> I think it has to be a struct with an array member?

I don't think so, this is actually an example in gcc's documentation:

An x86 example where the string memory argument is of unknown length.

asm("repne scasb"
: "=c" (count), "+D" (p)
: "m" (*(const char (*)[]) p), "0" (-1), "a" (0));

If you know the above will only be reading a ten byte array then you
could instead use a memory input like: "m" (*(const char (*)[10]) p).

>
> > }
> > plus some magic to disable the VLA warning, otherwise it causes a build
> > error.
>
> It shouldn't if the size is a compile time constant.
> And given this is an instruction to the compiler it better be.

Ah right. I saw the warning when playing with something else where size
could be constant or variable depending on the call.

> >
> > With a memory clobber, the compiler has to keep x and y at different
> > addresses, since the first barrier_data() might have saved the address
> > of x.
>
> Maybe "+m"(*ptr) : "r"(ptr) would work.

Nothing that can only modify what ptr points to could avoid this, since
that storage is dead after the barrier.

> OTOH a "memory" clobber at the bottom of a function isn't going to
> cause bloat.
>
> The explicit ranged memory access (without "memory") probably has its
> uses - but only if the full "memory" clobber causes grief.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2020-10-15 18:29:39

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] compiler.h: Fix barrier_data() on clang

From: Arvind Sankar
> Sent: 15 October 2020 15:45
>
> On Thu, Oct 15, 2020 at 08:50:05AM +0000, David Laight wrote:
> > From: Arvind Sankar
> > > Sent: 14 October 2020 22:27
> > ...
> > > +/*
> > > + * This version is i.e. to prevent dead stores elimination on @ptr
> > > + * where gcc and llvm may behave differently when otherwise using
> > > + * normal barrier(): while gcc behavior gets along with a normal
> > > + * barrier(), llvm needs an explicit input variable to be assumed
> > > + * clobbered. The issue is as follows: while the inline asm might
> > > + * access any memory it wants, the compiler could have fit all of
> > > + * @ptr into memory registers instead, and since @ptr never escaped
> > > + * from that, it proved that the inline asm wasn't touching any of
> > > + * it. This version works well with both compilers, i.e. we're telling
> > > + * the compiler that the inline asm absolutely may see the contents
> > > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> > > + */
> > > +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> >
> > That comment doesn't actually match the asm statement.
> > Although the asm statement probably has the desired effect.
> >
> > The "r"(ptr) constraint only passes the address of the buffer
> > into the asm - it doesn't say anything at all about the associated
> > memory.
> >
> > What the "r"(ptr) actually does is to force the address of the
> > associated data to be taken.
> > This means that on-stack space must actually be allocated.
> > The "memory" clobber will then force the registers caching
> > the variable be written out to stack.
> >
>
> I think the comment is unclear now that you bring it up, but the problem
> it actually addresses is not that the data is held in registers: in the
> sha256_transform() case mentioned in the commit message, for example,
> the data is in fact in memory even before this change (it's a 256-byte
> array), and that together with the memory clobber is enough for gcc to
> assume that the asm might use it. But with clang, if the address of that
> data has never escaped -- in this case the data is a local variable
> whose address was never passed out of the function -- the compiler
> assumes that the asm cannot possibly depend on that memory, because it
> has no way of getting its address.

Ok, slightly different from what i thought.
But the current comment is just wrong.

> Passing ptr to the inline asm tells clang that the asm knows the
> address, and since it also has a memory clobber, that it may use the
> data. This is somewhat suboptimal, since if the data was some small
> structure that the compiler was just holding in registers originally,
> forcing it out to memory is a bad thing to do.
>
> > If you only want to force stores on a single data structure
> > you actually want:
> > #define barrier_data(ptr) asm volatile("" :: "m"(*ptr))
> > although it would be best then to add an explicit size
> > and associated cast.
> >
>
> i.e. something like:
> static inline void barrier_data(void *ptr, size_t size)
> {
> asm volatile("" : "+m"(*(char (*)[size])ptr));

I think it has to be a struct with an array member?

> }
> plus some magic to disable the VLA warning, otherwise it causes a build
> error.

It shouldn't if the size is a compile time constant.
And given this is an instruction to the compiler it better be.

> But I think that might lead to even more subtle issues by dropping the
> memory clobber. For example (and this is actually done in
> sha256_transform() as well, though the zero'ing simply doesn't work with
> any compiler, as it's missing the barrier_data()'s):
>
> unsigned long x, y;
> ... do something secret with x/y ...
> x = y = 0;
> barrier_data(&x, sizeof(x));
> barrier_data(&y, sizeof(y));
> return;
>
> Since x is not used after its barrier_data(), I think the compiler would
> be within its rights to turn that into:
>
> xorl %eax, %eax
> leaq -16(%rbp), %rdx // &x == -16(%rbp)
> movq %eax, (%rdx) // x = 0;
> // inline asm for barrier_data(&x, sizeof(x));
> movq %eax, (%rdx) // y = 0; (!)
> // inline asm for barrier_data(&y, sizeof(y));
>
> which saves one instruction by putting y at the same location as x, once
> x is dead.
>
> With a memory clobber, the compiler has to keep x and y at different
> addresses, since the first barrier_data() might have saved the address
> of x.

Maybe "+m"(*ptr) : "r"(ptr) would work.
OTOH a "memory" clobber at the bottom of a function isn't going to
cause bloat.

The explicit ranged memory access (without "memory") probably has its
uses - but only if the full "memory" clobber causes grief.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-10-15 21:27:21

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Thu, Oct 15, 2020 at 8:39 AM Arvind Sankar <[email protected]> wrote:
>
> On Thu, Oct 15, 2020 at 03:24:09PM +0000, David Laight wrote:
> > > I think the comment is unclear now that you bring it up, but the problem
> > > it actually addresses is not that the data is held in registers: in the
> > > sha256_transform() case mentioned in the commit message, for example,
> > > the data is in fact in memory even before this change (it's a 256-byte
> > > array), and that together with the memory clobber is enough for gcc to
> > > assume that the asm might use it. But with clang, if the address of that
> > > data has never escaped -- in this case the data is a local variable
> > > whose address was never passed out of the function -- the compiler
> > > assumes that the asm cannot possibly depend on that memory, because it
> > > has no way of getting its address.
> >
> > Ok, slightly different from what i thought.
> > But the current comment is just wrong.
>
> Should I fix up the comment in the same commit, or do a second one after
> moving the macro?

I would prefer undoing the mistake from 815f0ddb346c and getting that
backported to stable, then rewriting comments or the trick to retain
the memset separately.

>
> > > i.e. something like:
> > > static inline void barrier_data(void *ptr, size_t size)
> > > {
> > > asm volatile("" : "+m"(*(char (*)[size])ptr));
> >
> > I think it has to be a struct with an array member?
>
> I don't think so, this is actually an example in gcc's documentation:
>
> An x86 example where the string memory argument is of unknown length.
>
> asm("repne scasb"
> : "=c" (count), "+D" (p)
> : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
>
> If you know the above will only be reading a ten byte array then you
> could instead use a memory input like: "m" (*(const char (*)[10]) p).
>
> >
> > > }
> > > plus some magic to disable the VLA warning, otherwise it causes a build
> > > error.
> >
> > It shouldn't if the size is a compile time constant.
> > And given this is an instruction to the compiler it better be.
>
> Ah right. I saw the warning when playing with something else where size
> could be constant or variable depending on the call.
>
> > >
> > > With a memory clobber, the compiler has to keep x and y at different
> > > addresses, since the first barrier_data() might have saved the address
> > > of x.
> >
> > Maybe "+m"(*ptr) : "r"(ptr) would work.
>
> Nothing that can only modify what ptr points to could avoid this, since
> that storage is dead after the barrier.
>
> > OTOH a "memory" clobber at the bottom of a function isn't going to
> > cause bloat.
> >
> > The explicit ranged memory access (without "memory") probably has its
> > uses - but only if the full "memory" clobber causes grief.
> >
> > David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)



--
Thanks,
~Nick Desaulniers

2020-10-15 23:09:52

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

Be clear about @ptr vs the variable that @ptr points to, and add some
more details as to why the special barrier_data() macro is required.

Signed-off-by: Arvind Sankar <[email protected]>
---
include/linux/compiler.h | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 93035d7fee0d..d8cee7c8968d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -86,17 +86,28 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,

#ifndef barrier_data
/*
- * This version is i.e. to prevent dead stores elimination on @ptr
- * where gcc and llvm may behave differently when otherwise using
- * normal barrier(): while gcc behavior gets along with a normal
- * barrier(), llvm needs an explicit input variable to be assumed
- * clobbered. The issue is as follows: while the inline asm might
- * access any memory it wants, the compiler could have fit all of
- * @ptr into memory registers instead, and since @ptr never escaped
- * from that, it proved that the inline asm wasn't touching any of
- * it. This version works well with both compilers, i.e. we're telling
- * the compiler that the inline asm absolutely may see the contents
- * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
+ * This version is to prevent dead stores elimination on @ptr where gcc and
+ * llvm may behave differently when otherwise using normal barrier(): while gcc
+ * behavior gets along with a normal barrier(), llvm needs an explicit input
+ * variable to be assumed clobbered.
+ *
+ * Its primary use is in implementing memzero_explicit(), which is used for
+ * clearing temporary data that may contain secrets.
+ *
+ * The issue is as follows: while the inline asm might access any memory it
+ * wants, the compiler could have fit all of the variable that @ptr points to
+ * into registers instead, and if @ptr never escaped from the function, it
+ * proved that the inline asm wasn't touching any of it. gcc only eliminates
+ * dead stores if the variable was actually allocated in registers, but llvm
+ * reasons that the variable _could_ have been in registers, so the inline asm
+ * can't reliably access it anyway, and eliminates dead stores even if the
+ * variable is actually in memory.
+ *
+ * This version works well with both compilers, i.e. we're telling the compiler
+ * that the inline asm absolutely may see the contents of the variable pointed
+ * to by @ptr.
+ *
+ * See also: https://llvm.org/bugs/show_bug.cgi?id=15495#c5
*/
# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
#endif
--
2.26.2

2020-10-15 23:12:10

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

On Thu, Oct 15, 2020 at 11:13 AM Arvind Sankar <[email protected]> wrote:
>
> Be clear about @ptr vs the variable that @ptr points to, and add some
> more details as to why the special barrier_data() macro is required.
>
> Signed-off-by: Arvind Sankar <[email protected]>

Thanks for this distinct cleanup.
Acked-by: Nick Desaulniers <[email protected]>

> ---
> include/linux/compiler.h | 33 ++++++++++++++++++++++-----------
> 1 file changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 93035d7fee0d..d8cee7c8968d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -86,17 +86,28 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>
> #ifndef barrier_data
> /*
> - * This version is i.e. to prevent dead stores elimination on @ptr
> - * where gcc and llvm may behave differently when otherwise using
> - * normal barrier(): while gcc behavior gets along with a normal
> - * barrier(), llvm needs an explicit input variable to be assumed
> - * clobbered. The issue is as follows: while the inline asm might
> - * access any memory it wants, the compiler could have fit all of
> - * @ptr into memory registers instead, and since @ptr never escaped
> - * from that, it proved that the inline asm wasn't touching any of
> - * it. This version works well with both compilers, i.e. we're telling
> - * the compiler that the inline asm absolutely may see the contents
> - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> + * This version is to prevent dead stores elimination on @ptr where gcc and
> + * llvm may behave differently when otherwise using normal barrier(): while gcc
> + * behavior gets along with a normal barrier(), llvm needs an explicit input
> + * variable to be assumed clobbered.
> + *
> + * Its primary use is in implementing memzero_explicit(), which is used for
> + * clearing temporary data that may contain secrets.
> + *
> + * The issue is as follows: while the inline asm might access any memory it
> + * wants, the compiler could have fit all of the variable that @ptr points to
> + * into registers instead, and if @ptr never escaped from the function, it
> + * proved that the inline asm wasn't touching any of it. gcc only eliminates
> + * dead stores if the variable was actually allocated in registers, but llvm
> + * reasons that the variable _could_ have been in registers, so the inline asm
> + * can't reliably access it anyway, and eliminates dead stores even if the
> + * variable is actually in memory.
> + *
> + * This version works well with both compilers, i.e. we're telling the compiler
> + * that the inline asm absolutely may see the contents of the variable pointed
> + * to by @ptr.
> + *
> + * See also: https://llvm.org/bugs/show_bug.cgi?id=15495#c5
> */
> # define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> #endif
> --
> 2.26.2
>


--
Thanks,
~Nick Desaulniers

2020-10-22 06:47:16

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Wed, Oct 14, 2020 at 05:26:31PM -0400, Arvind Sankar wrote:
> Commit
> 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
>
> neglected to copy barrier_data() from compiler-gcc.h into
> compiler-clang.h. The definition in compiler-gcc.h was really to work
> around clang's more aggressive optimization, so this broke
> barrier_data() on clang, and consequently memzero_explicit() as well.
>
> For example, this results in at least the memzero_explicit() call in
> lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
>
> Fix this by moving the definition of barrier_data() into compiler.h.
>
> Also move the gcc/clang definition of barrier() into compiler.h,
> __memory_barrier() is icc-specific (and barrier() is already defined
> using it in compiler-intel.h) and doesn't belong in compiler.h.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")

Yeowch.

Cc: [email protected]
Reviewed-by: Kees Cook <[email protected]>

Nick just mentioned this to me; I hadn't had a chance to read it yet. This
needs to go to Linus ASAP; memzero_explicit() under Clang in v4.19 and
later isn't so explicit. :(

Andrew, Linus, can one of you pick this up please?

As Nick mentioned, sorting out the specifics of the comments[1] can
come later.

[1] https://lore.kernel.org/lkml/CAKwvOdkLvxeYeBh7Kx0gw7JPktPH8A4DomJTidUqA0jRQTR0FA@mail.gmail.com/

> ---
> include/linux/compiler-clang.h | 6 ------
> include/linux/compiler-gcc.h | 19 -------------------
> include/linux/compiler.h | 18 ++++++++++++++++--
> 3 files changed, 16 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index cee0c728d39a..04c0a5a717f7 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -52,12 +52,6 @@
> #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
> #endif
>
> -/* The following are for compatibility with GCC, from compiler-gcc.h,
> - * and may be redefined here because they should not be shared with other
> - * compilers, like ICC.
> - */
> -#define barrier() __asm__ __volatile__("" : : : "memory")
> -
> #if __has_feature(shadow_call_stack)
> # define __noscs __attribute__((__no_sanitize__("shadow-call-stack")))
> #endif
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index 7a3769040d7d..fda30ffb037b 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -15,25 +15,6 @@
> # error Sorry, your compiler is too old - please upgrade it.
> #endif
>
> -/* Optimization barrier */
> -
> -/* The "volatile" is due to gcc bugs */
> -#define barrier() __asm__ __volatile__("": : :"memory")
> -/*
> - * This version is i.e. to prevent dead stores elimination on @ptr
> - * where gcc and llvm may behave differently when otherwise using
> - * normal barrier(): while gcc behavior gets along with a normal
> - * barrier(), llvm needs an explicit input variable to be assumed
> - * clobbered. The issue is as follows: while the inline asm might
> - * access any memory it wants, the compiler could have fit all of
> - * @ptr into memory registers instead, and since @ptr never escaped
> - * from that, it proved that the inline asm wasn't touching any of
> - * it. This version works well with both compilers, i.e. we're telling
> - * the compiler that the inline asm absolutely may see the contents
> - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> - */
> -#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> -
> /*
> * This macro obfuscates arithmetic on a variable address so that gcc
> * shouldn't recognize the original var, and make assumptions about it.
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 92ef163a7479..dfba70b2644f 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>
> /* Optimization barrier */
> #ifndef barrier
> -# define barrier() __memory_barrier()
> +/* The "volatile" is due to gcc bugs */
> +# define barrier() __asm__ __volatile__("": : :"memory")
> #endif
>
> #ifndef barrier_data
> -# define barrier_data(ptr) barrier()
> +/*
> + * This version is i.e. to prevent dead stores elimination on @ptr
> + * where gcc and llvm may behave differently when otherwise using
> + * normal barrier(): while gcc behavior gets along with a normal
> + * barrier(), llvm needs an explicit input variable to be assumed
> + * clobbered. The issue is as follows: while the inline asm might
> + * access any memory it wants, the compiler could have fit all of
> + * @ptr into memory registers instead, and since @ptr never escaped
> + * from that, it proved that the inline asm wasn't touching any of
> + * it. This version works well with both compilers, i.e. we're telling
> + * the compiler that the inline asm absolutely may see the contents
> + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> + */
> +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> #endif
>
> /* workaround for GCC PR82365 if needed */
> --
> 2.26.2
>

--
Kees Cook

2020-11-16 17:49:59

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Okt 14 2020, Arvind Sankar wrote:

> Commit
> 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
>
> neglected to copy barrier_data() from compiler-gcc.h into
> compiler-clang.h. The definition in compiler-gcc.h was really to work
> around clang's more aggressive optimization, so this broke
> barrier_data() on clang, and consequently memzero_explicit() as well.
>
> For example, this results in at least the memzero_explicit() call in
> lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
>
> Fix this by moving the definition of barrier_data() into compiler.h.
>
> Also move the gcc/clang definition of barrier() into compiler.h,
> __memory_barrier() is icc-specific (and barrier() is already defined
> using it in compiler-intel.h) and doesn't belong in compiler.h.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")

This breaks build on riscv:

CC [M] drivers/net/ethernet/emulex/benet/be_main.o
In file included from ./include/vdso/processor.h:10,
from ./arch/riscv/include/asm/processor.h:11,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
14 | barrier();
| ^~~~~~~
cc1: some warnings being treated as errors
make[5]: *** [scripts/Makefile.build:283: drivers/net/ethernet/emulex/benet/be_main.o] Error 1
make[4]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex/benet] Error 2
make[3]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex] Error 2
make[2]: *** [scripts/Makefile.build:500: drivers/net/ethernet] Error 2
make[1]: *** [scripts/Makefile.build:500: drivers/net] Error 2
make: *** [Makefile:1799: drivers] Error 2

Andreas.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

2020-11-16 17:59:01

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On 11/16/20 9:47 AM, Andreas Schwab wrote:
> On Okt 14 2020, Arvind Sankar wrote:
>
>> Commit
>> 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
>>
>> neglected to copy barrier_data() from compiler-gcc.h into
>> compiler-clang.h. The definition in compiler-gcc.h was really to work
>> around clang's more aggressive optimization, so this broke
>> barrier_data() on clang, and consequently memzero_explicit() as well.
>>
>> For example, this results in at least the memzero_explicit() call in
>> lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
>>
>> Fix this by moving the definition of barrier_data() into compiler.h.
>>
>> Also move the gcc/clang definition of barrier() into compiler.h,
>> __memory_barrier() is icc-specific (and barrier() is already defined
>> using it in compiler-intel.h) and doesn't belong in compiler.h.
>>
>> Signed-off-by: Arvind Sankar <[email protected]>
>> Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
>
> This breaks build on riscv:
>
> CC [M] drivers/net/ethernet/emulex/benet/be_main.o
> In file included from ./include/vdso/processor.h:10,
> from ./arch/riscv/include/asm/processor.h:11,
> from ./include/linux/prefetch.h:15,
> from drivers/net/ethernet/emulex/benet/be_main.c:14:
> ./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
> ./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
> 14 | barrier();
> | ^~~~~~~
> cc1: some warnings being treated as errors
> make[5]: *** [scripts/Makefile.build:283: drivers/net/ethernet/emulex/benet/be_main.o] Error 1
> make[4]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex/benet] Error 2
> make[3]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex] Error 2
> make[2]: *** [scripts/Makefile.build:500: drivers/net/ethernet] Error 2
> make[1]: *** [scripts/Makefile.build:500: drivers/net] Error 2
> make: *** [Makefile:1799: drivers] Error 2

Hi,
What kernel version are you building?
This should be fixed in 5.10-rc4 by
commit 3347acc6fcd4ee71ad18a9ff9d9dac176b517329;
specifically this portion of it:

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 798027bb89be..640f09479bdf 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -13,6 +13,7 @@

#ifndef __ASSEMBLY__

+#include <linux/compiler.h>
#include <asm/rwonce.h>

#ifndef nop



--
~Randy

2020-11-16 18:34:07

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Nov 16 2020, Randy Dunlap wrote:

> What kernel version are you building?

5.10-rc4

Andreas.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

2020-11-16 21:32:57

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Mon, Nov 16, 2020 at 9:47 AM Andreas Schwab <[email protected]> wrote:
>
> On Okt 14 2020, Arvind Sankar wrote:
>
> > Commit
> > 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
> >
> > neglected to copy barrier_data() from compiler-gcc.h into
> > compiler-clang.h. The definition in compiler-gcc.h was really to work
> > around clang's more aggressive optimization, so this broke
> > barrier_data() on clang, and consequently memzero_explicit() as well.
> >
> > For example, this results in at least the memzero_explicit() call in
> > lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
> >
> > Fix this by moving the definition of barrier_data() into compiler.h.
> >
> > Also move the gcc/clang definition of barrier() into compiler.h,
> > __memory_barrier() is icc-specific (and barrier() is already defined
> > using it in compiler-intel.h) and doesn't belong in compiler.h.
> >
> > Signed-off-by: Arvind Sankar <[email protected]>
> > Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
>
> This breaks build on riscv:
>
> CC [M] drivers/net/ethernet/emulex/benet/be_main.o
> In file included from ./include/vdso/processor.h:10,
> from ./arch/riscv/include/asm/processor.h:11,
> from ./include/linux/prefetch.h:15,
> from drivers/net/ethernet/emulex/benet/be_main.c:14:
> ./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
> ./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
> 14 | barrier();
> | ^~~~~~~
> cc1: some warnings being treated as errors
> make[5]: *** [scripts/Makefile.build:283: drivers/net/ethernet/emulex/benet/be_main.o] Error 1
> make[4]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex/benet] Error 2
> make[3]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex] Error 2
> make[2]: *** [scripts/Makefile.build:500: drivers/net/ethernet] Error 2
> make[1]: *** [scripts/Makefile.build:500: drivers/net] Error 2
> make: *** [Makefile:1799: drivers] Error 2
>
> Andreas.

A lot of VDSO's reset KBUILD_CFLAGS or use a new variable for their
compiler flags. As such, they're missing `-include` command line flag
that injects include/linux/compiler_types.h, which `#includes`
numerous other headers if `__KERNEL__` is defined (`-D__KERNEL__`).
So the RISCV VDSO Makefile might need to `-include
$(srctree)/include/linux/compiler_types.h -D__KERNEL__`, or `#include
<linux/compiler.h>"` directly in
arch/riscv/include/asm/vdso/processor.h.
--
Thanks,
~Nick Desaulniers

2020-11-16 22:43:14

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On Nov 16 2020, Nick Desaulniers wrote:

> A lot of VDSO's reset KBUILD_CFLAGS or use a new variable for their
> compiler flags. As such, they're missing `-include` command line flag
> that injects include/linux/compiler_types.h,

It's not missing here.

Andreas.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

2020-11-16 23:00:36

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On 11/16/20 11:28 AM, Randy Dunlap wrote:
> On 11/16/20 10:30 AM, Andreas Schwab wrote:
>> On Nov 16 2020, Randy Dunlap wrote:
>>
>>> What kernel version are you building?
>>
>> 5.10-rc4
>>
>> Andreas.
>
> OK, thanks.
>
> My build machine is slow, but I have a patch that I am testing:
>
> ---
> From: Randy Dunlap <[email protected]>
>
> riscv's <vdso/processor.h> uses barrier() so it should
> #include <asm/barrier.h> to prevent build errors.
>
> Reported-by: Andreas Schwab <[email protected]>
> ---
>  arch/riscv/include/asm/vdso/processor.h |    2 ++
>  1 file changed, 2 insertions(+)
>
> --- lnx-510-rc4.orig/arch/riscv/include/asm/vdso/processor.h
> +++ lnx-510-rc4/arch/riscv/include/asm/vdso/processor.h
> @@ -4,6 +4,8 @@
>
>  #ifndef __ASSEMBLY__
>
> +#include <asm/barrier.h>
> +
>  static inline void cpu_relax(void)
>  {
>  #ifdef __riscv_muldiv


This fixes the emulex/benet/ driver build.
I'm still building allmodconfig to see if there are any
other issues.

--
~Randy

2020-11-17 02:02:05

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Fix barrier_data() on clang

On 11/16/20 10:30 AM, Andreas Schwab wrote:
> On Nov 16 2020, Randy Dunlap wrote:
>
>> What kernel version are you building?
>
> 5.10-rc4
>
> Andreas.

OK, thanks.

My build machine is slow, but I have a patch that I am testing:

---
From: Randy Dunlap <[email protected]>

riscv's <vdso/processor.h> uses barrier() so it should
#include <asm/barrier.h> to prevent build errors.

Reported-by: Andreas Schwab <[email protected]>
---
arch/riscv/include/asm/vdso/processor.h | 2 ++
1 file changed, 2 insertions(+)

--- lnx-510-rc4.orig/arch/riscv/include/asm/vdso/processor.h
+++ lnx-510-rc4/arch/riscv/include/asm/vdso/processor.h
@@ -4,6 +4,8 @@

#ifndef __ASSEMBLY__

+#include <asm/barrier.h>
+
static inline void cpu_relax(void)
{
#ifdef __riscv_muldiv