2020-10-15 22:23:30

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

From: Arvind Sankar
> Sent: 15 October 2020 19:14
>
> Be clear about @ptr vs the variable that @ptr points to, and add some
> more details as to why the special barrier_data() macro is required.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> ---
> include/linux/compiler.h | 33 ++++++++++++++++++++++-----------
> 1 file changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 93035d7fee0d..d8cee7c8968d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -86,17 +86,28 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>
> #ifndef barrier_data
> /*
> - * This version is i.e. to prevent dead stores elimination on @ptr
> - * where gcc and llvm may behave differently when otherwise using
> - * normal barrier(): while gcc behavior gets along with a normal
> - * barrier(), llvm needs an explicit input variable to be assumed
> - * clobbered. The issue is as follows: while the inline asm might
> - * access any memory it wants, the compiler could have fit all of
> - * @ptr into memory registers instead, and since @ptr never escaped
> - * from that, it proved that the inline asm wasn't touching any of
> - * it. This version works well with both compilers, i.e. we're telling
> - * the compiler that the inline asm absolutely may see the contents
> - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> + * This version is to prevent dead stores elimination on @ptr where gcc and
> + * llvm may behave differently when otherwise using normal barrier(): while gcc
> + * behavior gets along with a normal barrier(), llvm needs an explicit input
> + * variable to be assumed clobbered.
> + *
> + * Its primary use is in implementing memzero_explicit(), which is used for
> + * clearing temporary data that may contain secrets.
> + *
> + * The issue is as follows: while the inline asm might access any memory it
> + * wants, the compiler could have fit all of the variable that @ptr points to
> + * into registers instead, and if @ptr never escaped from the function, it
> + * proved that the inline asm wasn't touching any of it. gcc only eliminates
> + * dead stores if the variable was actually allocated in registers, but llvm
> + * reasons that the variable _could_ have been in registers, so the inline asm
> + * can't reliably access it anyway, and eliminates dead stores even if the
> + * variable is actually in memory.

I think I'd just say something like:

Although the compiler must assume a "memory" clobber may affect all
memory, local variables (on stack) cannot actually be visible to the
asm unless their address has been passed to an external function.
So the compiler may assume such variables cannot be affected by
a normal asm volatile(::"memory") barrier().
Passing the address of the local variables to the asm barrier
is enough to tell the compiler that the asm can 'see' the variables
(and spill anything held in registers to the stack) so that
the "memory" clobber has the expected effect.

This is necessary to get llvm to do a memset() of on-stack data
at the end of a function to clear memory that contains secrets.

David

> + *
> + * This version works well with both compilers, i.e. we're telling the compiler
> + * that the inline asm absolutely may see the contents of the variable pointed
> + * to by @ptr.
> + *
> + * See also: https://llvm.org/bugs/show_bug.cgi?id=15495#c5
> */
> # define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> #endif
> --
> 2.26.2

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2020-10-15 22:28:23

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

On Thu, Oct 15, 2020 at 09:09:11PM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 15 October 2020 19:14
> >
> > Be clear about @ptr vs the variable that @ptr points to, and add some
> > more details as to why the special barrier_data() macro is required.
> >
> > Signed-off-by: Arvind Sankar <[email protected]>
> > ---
> > include/linux/compiler.h | 33 ++++++++++++++++++++++-----------
> > 1 file changed, 22 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 93035d7fee0d..d8cee7c8968d 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -86,17 +86,28 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >
> > #ifndef barrier_data
> > /*
> > - * This version is i.e. to prevent dead stores elimination on @ptr
> > - * where gcc and llvm may behave differently when otherwise using
> > - * normal barrier(): while gcc behavior gets along with a normal
> > - * barrier(), llvm needs an explicit input variable to be assumed
> > - * clobbered. The issue is as follows: while the inline asm might
> > - * access any memory it wants, the compiler could have fit all of
> > - * @ptr into memory registers instead, and since @ptr never escaped
> > - * from that, it proved that the inline asm wasn't touching any of
> > - * it. This version works well with both compilers, i.e. we're telling
> > - * the compiler that the inline asm absolutely may see the contents
> > - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> > + * This version is to prevent dead stores elimination on @ptr where gcc and
> > + * llvm may behave differently when otherwise using normal barrier(): while gcc
> > + * behavior gets along with a normal barrier(), llvm needs an explicit input
> > + * variable to be assumed clobbered.
> > + *
> > + * Its primary use is in implementing memzero_explicit(), which is used for
> > + * clearing temporary data that may contain secrets.
> > + *
> > + * The issue is as follows: while the inline asm might access any memory it
> > + * wants, the compiler could have fit all of the variable that @ptr points to
> > + * into registers instead, and if @ptr never escaped from the function, it
> > + * proved that the inline asm wasn't touching any of it. gcc only eliminates
> > + * dead stores if the variable was actually allocated in registers, but llvm
> > + * reasons that the variable _could_ have been in registers, so the inline asm
> > + * can't reliably access it anyway, and eliminates dead stores even if the
> > + * variable is actually in memory.
>
> I think I'd just say something like:
>
> Although the compiler must assume a "memory" clobber may affect all
> memory, local variables (on stack) cannot actually be visible to the
> asm unless their address has been passed to an external function.
> So the compiler may assume such variables cannot be affected by
> a normal asm volatile(::"memory") barrier().
> Passing the address of the local variables to the asm barrier
> is enough to tell the compiler that the asm can 'see' the variables
> (and spill anything held in registers to the stack) so that
> the "memory" clobber has the expected effect.
>
> This is necessary to get llvm to do a memset() of on-stack data
> at the end of a function to clear memory that contains secrets.
>
> David

I think it's helpful to have the more detailed explanation about
register variables -- at first glance, it's a bit mystifying as to why
the compiler would think that the asm can't access the stack. Spilling
registers to the stack is actually an undesirable side-effect of the
workaround.

2020-10-16 08:16:16

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

From: Arvind Sankar
> Sent: 15 October 2020 23:01
,,,
> I think it's helpful to have the more detailed explanation about
> register variables -- at first glance, it's a bit mystifying as to why
> the compiler would think that the asm can't access the stack. Spilling
> registers to the stack is actually an undesirable side-effect of the
> workaround.

That is the very bit that just confuses things.
The data the memzero_explictit() is trying to clear is (probably)
on-stack already - it won't be in registers.

If it were in registers you wouldn't need the memset().

Actually I suspect that the memset() is inlined so that is
just assigns zeros to all the variables.
This will be done using 'virtual registers' that cache the
on-stack value.
You then need to do something to force the instructions to flush
the 'virtual registers' back to stack to be generated.

The fundamental thing is that the address of a local (auto!)
variable must be visible to the asm statement for the compiler
to make the contents of those variables visible.

I even suspect you may need to pass the address of the structure
(to be zeroed) to an asm block at the top of the function as well.
Otherwise the compiler could change the stack offsets where the
structure is stored.
But I don't think compilers do that.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-10-16 13:23:43

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] compiler.h: Clarify comment about the need for barrier_data()

On Fri, Oct 16, 2020 at 08:13:44AM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 15 October 2020 23:01
> ,,,
> > I think it's helpful to have the more detailed explanation about
> > register variables -- at first glance, it's a bit mystifying as to why
> > the compiler would think that the asm can't access the stack. Spilling
> > registers to the stack is actually an undesirable side-effect of the
> > workaround.
>
> That is the very bit that just confuses things.
> The data the memzero_explictit() is trying to clear is (probably)
> on-stack already - it won't be in registers.
>

Are you saying the explanation is confusing things?

What I think is confusing is the fact that the compiler believes that an
asm with a memory clobber cannot access a variable that happens to be in
memory, and the comment is explaining how the compiler came to that
conclusion. The comment is already saying that this applies to LLVM
(unlike GCC) even if the variable isn't actually in registers.

> If it were in registers you wouldn't need the memset().

There's obviously no guarantee of where the compiler decided to keep the
variables. This isn't so clear-cut: for SHA, there is a 256-byte array
that you can be pretty sure will be in memory, but there are also 10 u32
variables which may or may not be in registers depending on how many
registers the arch has and how clever the compiler was in allocating
them.

>
> Actually I suspect that the memset() is inlined so that is
> just assigns zeros to all the variables.
> This will be done using 'virtual registers' that cache the
> on-stack value.
> You then need to do something to force the instructions to flush
> the 'virtual registers' back to stack to be generated.

This is definitely getting too much into the weeds. What the compiler
knows is that memset does nothing other than writing to the variable,
and if the variable is never used after that, then the memset can be
eliminated. Whether the memset ends up getting inlined or not is not
relevant here: clang doesn't inline it, for eg.