2015-04-28 15:22:30

by Daniel Borkmann

[permalink] [raw]
Subject: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
case LTO would decide to inline memzero_explicit() and eventually
find out it could be elimiated as dead store.

While using barrier() works well for the case of gcc, recent efforts
from LLVMLinux people suggest to use llvm as an alternative to gcc,
and there, Stephan found in a simple stand-alone user space example
that llvm could nevertheless optimize and thus elimitate the memset().
A similar issue has been observed in the referenced llvm bug report,
which is regarded as not-a-bug.

The fix in this patch now works for both compilers (also tested with
more aggressive optimization levels). Arguably, in the current kernel
tree it's more of a theoretical issue, but imho, it's better to be
pedantic about it.

It's clearly visible though, with the below code: if we would have
used barrier()-only here, llvm would have omitted clearing, not so
with barrier_data() variant:

static inline void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
barrier_data(s);
}

int main(void)
{
char buff[20];
memzero_explicit(buff, sizeof(buff));
return 0;
}

$ gcc -O2 test.c
$ gdb a.out
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400400 <+0>: lea -0x28(%rsp),%rax
0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp)
0x000000000040040e <+14>: movq $0x0,-0x20(%rsp)
0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp)
0x000000000040041f <+31>: xor %eax,%eax
0x0000000000400421 <+33>: retq
End of assembler dump.

$ clang -O2 test.c
$ gdb a.out
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004004f0 <+0>: xorps %xmm0,%xmm0
0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp)
0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp)
0x0000000000400500 <+16>: lea -0x18(%rsp),%rax
0x0000000000400505 <+21>: xor %eax,%eax
0x0000000000400507 <+23>: retq
End of assembler dump.

As clang (but also icc) defines __GNUC__, it's sufficient to define this
in compiler-gcc.h only.

Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
Reported-by: Stephan Mueller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Stephan Mueller <[email protected]>
Cc: Hannes Frederic Sowa <[email protected]>
Cc: mancha security <[email protected]>
Cc: Mark Charlebois <[email protected]>
Cc: Behan Webster <[email protected]>
---
include/linux/compiler-gcc.h | 16 +++++++++++++++-
include/linux/compiler.h | 4 ++++
lib/string.c | 2 +-
3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index cdf13ca..371e560 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -9,10 +9,24 @@
+ __GNUC_MINOR__ * 100 \
+ __GNUC_PATCHLEVEL__)

-
/* Optimization barrier */
+
/* The "volatile" is due to gcc bugs */
#define barrier() __asm__ __volatile__("": : :"memory")
+/*
+ * This version is i.e. to prevent dead stores elimination on @ptr
+ * where gcc and llvm may behave differently when otherwise using
+ * normal barrier(): while gcc behavior gets along with a normal
+ * barrier(), llvm needs an explicit input variable to be assumed
+ * clobbered. The issue is as follows: while the inline asm might
+ * access any memory it wants, the compiler could have fit all of
+ * @ptr into memory registers instead, and since @ptr never escaped
+ * from that, it proofed that the inline asm wasn't touching any of
+ * it. This version works well with both compilers, i.e. we're telling
+ * the compiler that the inline asm absolutely may see the contents
+ * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
+ */
+#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")

/*
* This macro obfuscates arithmetic on a variable address so that gcc
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 0e41ca0..8677225 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
# define barrier() __memory_barrier()
#endif

+#ifndef barrier_data
+# define barrier_data(ptr) barrier()
+#endif
+
/* Unreachable code */
#ifndef unreachable
# define unreachable() do { } while (1)
diff --git a/lib/string.c b/lib/string.c
index a579201..bb3d4b6 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset);
void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
- barrier();
+ barrier_data(s);
}
EXPORT_SYMBOL(memzero_explicit);

--
1.9.3


2015-04-28 23:42:10

by Stephan Müller

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

Am Dienstag, 28. April 2015, 17:22:20 schrieb Daniel Borkmann:

Hi Daniel,

>In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
>of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
>case LTO would decide to inline memzero_explicit() and eventually
>find out it could be elimiated as dead store.
>
>While using barrier() works well for the case of gcc, recent efforts
>from LLVMLinux people suggest to use llvm as an alternative to gcc,
>and there, Stephan found in a simple stand-alone user space example
>that llvm could nevertheless optimize and thus elimitate the memset().
>A similar issue has been observed in the referenced llvm bug report,
>which is regarded as not-a-bug.
>
>The fix in this patch now works for both compilers (also tested with
>more aggressive optimization levels). Arguably, in the current kernel
>tree it's more of a theoretical issue, but imho, it's better to be
>pedantic about it.
>
>It's clearly visible though, with the below code: if we would have
>used barrier()-only here, llvm would have omitted clearing, not so
>with barrier_data() variant:
>
> static inline void memzero_explicit(void *s, size_t count)
> {
> memset(s, 0, count);
> barrier_data(s);
> }
>
> int main(void)
> {
> char buff[20];
> memzero_explicit(buff, sizeof(buff));
> return 0;
> }
>
> $ gcc -O2 test.c
> $ gdb a.out
> (gdb) disassemble main
> Dump of assembler code for function main:
> 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax
> 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp)
> 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp)
> 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp)
> 0x000000000040041f <+31>: xor %eax,%eax
> 0x0000000000400421 <+33>: retq
> End of assembler dump.
>
> $ clang -O2 test.c
> $ gdb a.out
> (gdb) disassemble main
> Dump of assembler code for function main:
> 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0
> 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp)
> 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp)
> 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax
> 0x0000000000400505 <+21>: xor %eax,%eax
> 0x0000000000400507 <+23>: retq
> End of assembler dump.
>
>As clang (but also icc) defines __GNUC__, it's sufficient to define this
>in compiler-gcc.h only.
>
>Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
>Reported-by: Stephan Mueller <[email protected]>
>Signed-off-by: Daniel Borkmann <[email protected]>
>Cc: Theodore Ts'o <[email protected]>
>Cc: Stephan Mueller <[email protected]>
>Cc: Hannes Frederic Sowa <[email protected]>
>Cc: mancha security <[email protected]>
>Cc: Mark Charlebois <[email protected]>
>Cc: Behan Webster <[email protected]>

Using a user space test app: tested clang -O3, clang -O2, gcc -O3, gcc -O2.

Tested-by: Stephan Mueller <[email protected]>

Ciao
Stephan

2015-04-29 13:08:52

by mancha security

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

Hi Daniel et al.

Looks good from here.

By the way, has anyone been able to verify that __memory_barrier
provides DSE protection under various optimizations? Unfortunately, I
don't have ready access to ICC at the moment or I'd test it myself.

--mancha

PS It would be nice if memset_s were universally adopted/implemented so
we could stop worrying about these things.


On Tue, Apr 28, 2015 at 05:22:20PM +0200, Daniel Borkmann wrote:
> In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
> of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
> case LTO would decide to inline memzero_explicit() and eventually
> find out it could be elimiated as dead store.
>
> While using barrier() works well for the case of gcc, recent efforts
> from LLVMLinux people suggest to use llvm as an alternative to gcc,
> and there, Stephan found in a simple stand-alone user space example
> that llvm could nevertheless optimize and thus elimitate the memset().
> A similar issue has been observed in the referenced llvm bug report,
> which is regarded as not-a-bug.
>
> The fix in this patch now works for both compilers (also tested with
> more aggressive optimization levels). Arguably, in the current kernel
> tree it's more of a theoretical issue, but imho, it's better to be
> pedantic about it.
>
> It's clearly visible though, with the below code: if we would have
> used barrier()-only here, llvm would have omitted clearing, not so
> with barrier_data() variant:
>
> static inline void memzero_explicit(void *s, size_t count)
> {
> memset(s, 0, count);
> barrier_data(s);
> }
>
> int main(void)
> {
> char buff[20];
> memzero_explicit(buff, sizeof(buff));
> return 0;
> }
>
> $ gcc -O2 test.c
> $ gdb a.out
> (gdb) disassemble main
> Dump of assembler code for function main:
> 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax
> 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp)
> 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp)
> 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp)
> 0x000000000040041f <+31>: xor %eax,%eax
> 0x0000000000400421 <+33>: retq
> End of assembler dump.
>
> $ clang -O2 test.c
> $ gdb a.out
> (gdb) disassemble main
> Dump of assembler code for function main:
> 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0
> 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp)
> 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp)
> 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax
> 0x0000000000400505 <+21>: xor %eax,%eax
> 0x0000000000400507 <+23>: retq
> End of assembler dump.
>
> As clang (but also icc) defines __GNUC__, it's sufficient to define this
> in compiler-gcc.h only.
>
> Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
> Reported-by: Stephan Mueller <[email protected]>
> Signed-off-by: Daniel Borkmann <[email protected]>
> Cc: Theodore Ts'o <[email protected]>
> Cc: Stephan Mueller <[email protected]>
> Cc: Hannes Frederic Sowa <[email protected]>
> Cc: mancha security <[email protected]>
> Cc: Mark Charlebois <[email protected]>
> Cc: Behan Webster <[email protected]>
> ---
> include/linux/compiler-gcc.h | 16 +++++++++++++++-
> include/linux/compiler.h | 4 ++++
> lib/string.c | 2 +-
> 3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index cdf13ca..371e560 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -9,10 +9,24 @@
> + __GNUC_MINOR__ * 100 \
> + __GNUC_PATCHLEVEL__)
>
> -
> /* Optimization barrier */
> +
> /* The "volatile" is due to gcc bugs */
> #define barrier() __asm__ __volatile__("": : :"memory")
> +/*
> + * This version is i.e. to prevent dead stores elimination on @ptr
> + * where gcc and llvm may behave differently when otherwise using
> + * normal barrier(): while gcc behavior gets along with a normal
> + * barrier(), llvm needs an explicit input variable to be assumed
> + * clobbered. The issue is as follows: while the inline asm might
> + * access any memory it wants, the compiler could have fit all of
> + * @ptr into memory registers instead, and since @ptr never escaped
> + * from that, it proofed that the inline asm wasn't touching any of
> + * it. This version works well with both compilers, i.e. we're telling
> + * the compiler that the inline asm absolutely may see the contents
> + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> + */
> +#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
>
> /*
> * This macro obfuscates arithmetic on a variable address so that gcc
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 0e41ca0..8677225 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
> # define barrier() __memory_barrier()
> #endif
>
> +#ifndef barrier_data
> +# define barrier_data(ptr) barrier()
> +#endif
> +
> /* Unreachable code */
> #ifndef unreachable
> # define unreachable() do { } while (1)
> diff --git a/lib/string.c b/lib/string.c
> index a579201..bb3d4b6 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset);
> void memzero_explicit(void *s, size_t count)
> {
> memset(s, 0, count);
> - barrier();
> + barrier_data(s);
> }
> EXPORT_SYMBOL(memzero_explicit);
>
> --
> 1.9.3
>


Attachments:
(No filename) (5.40 kB)
(No filename) (819.00 B)
Download all attachments

2015-04-29 14:01:26

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

On 04/29/2015 03:08 PM, mancha security wrote:
...
> By the way, has anyone been able to verify that __memory_barrier
> provides DSE protection under various optimizations? Unfortunately, I
> don't have ready access to ICC at the moment or I'd test it myself.

Never used icc, but it looks like it's free for open source projects;
I can give it a try, but in case you're faster than I am, feel free
to post results here.

From what I see based on the code, i.e. after that buggy cleanup
commit ...

commit 73679e50820123ebdedc67ebcda4562d1d6e4aba
Author: Pranith Kumar <[email protected]>
Date: Tue Apr 15 12:05:22 2014 -0400

compiler-intel.h: Remove duplicate definition

barrier is already defined as __memory_barrier in compiler.h
Remove this unnecessary redefinition.

Signed-off-by: Pranith Kumar <[email protected]>
Link: http://lkml.kernel.org/r/CAJhHMCAnYPy0%2BqD-1KBnJPLt3XgAjdR12j%[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>

... it looks like it's currently using the _same_ gcc inline asm
for the barrier on icc instead of what that commit intended to do.

So funny enough, we don't actually use __memory_barrier() at the
moment. ;)

Nonetheless, having a look might be good.

2015-04-29 14:54:40

by mancha security

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote:
> On 04/29/2015 03:08 PM, mancha security wrote:
> ...
> >By the way, has anyone been able to verify that __memory_barrier
> >provides DSE protection under various optimizations? Unfortunately, I
> >don't have ready access to ICC at the moment or I'd test it myself.
>
> Never used icc, but it looks like it's free for open source projects;
> I can give it a try, but in case you're faster than I am, feel free
> to post results here.

Time permitting, I'll try setting this up and post my results.

>
> From what I see based on the code, i.e. after that buggy cleanup
> commit ...
>
> commit 73679e50820123ebdedc67ebcda4562d1d6e4aba
> Author: Pranith Kumar <[email protected]>
> Date: Tue Apr 15 12:05:22 2014 -0400
>
> compiler-intel.h: Remove duplicate definition
>
> barrier is already defined as __memory_barrier in compiler.h
> Remove this unnecessary redefinition.
>
> Signed-off-by: Pranith Kumar <[email protected]>
> Link: http://lkml.kernel.org/r/CAJhHMCAnYPy0%2BqD-1KBnJPLt3XgAjdR12j%[email protected]
> Signed-off-by: H. Peter Anvin <[email protected]>
>
> ... it looks like it's currently using the _same_ gcc inline asm
> for the barrier on icc instead of what that commit intended to do.
>
> So funny enough, we don't actually use __memory_barrier() at the
> moment. ;)
>
> Nonetheless, having a look might be good.

Nice catch, 73679e50820 is indeed buggy because ICC defines __GNUC__
(unless -no-gcc is used). That should be reverted.

Bug aside, according to [1], ICC does support GNU-style inline asm so
for the purposes of barrier_data(), it would be interesting to see if
it affords better/worse DSE protection compared to __memory_barrier().

--mancha

[1]
https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5100C4FC-BC2F-4E36-943A-120CFFFB4285.htm


Attachments:
(No filename) (1.90 kB)
(No filename) (819.00 B)
Download all attachments

2015-04-29 14:56:42

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

On 04/29/2015 04:01 PM, Daniel Borkmann wrote:
...
> So funny enough, we don't actually use __memory_barrier() at the
> moment. ;)

Anyway, I will send a v2 of the patch since we need to undef the gcc
definition in case someone really uses an ecc compiler that doesn't
support inline asm.

Thanks,
Daniel

2015-04-29 23:43:23

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

On 04/29/2015 04:54 PM, mancha security wrote:
> On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote:
>> On 04/29/2015 03:08 PM, mancha security wrote:
>> ...
>>> By the way, has anyone been able to verify that __memory_barrier
>>> provides DSE protection under various optimizations? Unfortunately, I
>>> don't have ready access to ICC at the moment or I'd test it myself.
>>
>> Never used icc, but it looks like it's free for open source projects;
>> I can give it a try, but in case you're faster than I am, feel free
>> to post results here.
>
> Time permitting, I'll try setting this up and post my results.

So I finally got the download link and an eval license for icc, and
after needing to download gigbytes of bloat for the suite, I could
finally start to experiment a bit.

So __GNUC__ and __INTEL_COMPILER is definitely defined by icc, __ECC
not in my case, so that part is as expected for the kernel header
includes.

With barrier_data(), I could observe insns for an inlined memset()
being emitted in the disassembly, same with barrier(), same with
__memory_barrier(). In fact, even if I only use ...

static inline void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
}

int main(void)
{
char buff[20];
memzero_explicit(buff, sizeof(buff));
return 0;
}

... icc will emit memset instrinsic insns (did you notice that as
well?) when using various optimization levels. Using f.e. -Ofast
-ffreestanding resp. -fno-builtin-memset will emit a function call,
presumably, icc is then not allowed to make any assumptions, so given
the previous result, then would then be expected.

So, crafting a stupid example:

static inline void
dumb_memset(char *s, unsigned char c, size_t n)
{
int i;

for (i = 0; i < n; i++)
s[i] = c;
}

static inline void memzero_explicit(void *s, size_t count)
{
dumb_memset(s, 0, count);
<barrier-variant>
}

int main(void)
{
char buff[20];
memzero_explicit(buff, sizeof(buff));
return 0;
}

With no barrier at all, icc optimizes all that away (using -Ofast),
with barrier_data() it inlines and emits additional mov* insns!
Just using barrier() or __memory_barrier(), we end up with the same
case as with clang, that is, it gets optimized away. So, barrier_data()
seems to be better here as well.

Cheers,
Daniel

2015-04-30 06:17:33

by mancha security

[permalink] [raw]
Subject: Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

On Thu, Apr 30, 2015 at 01:43:07AM +0200, Daniel Borkmann wrote:
> On 04/29/2015 04:54 PM, mancha security wrote:
> >On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote:
> >>On 04/29/2015 03:08 PM, mancha security wrote:
> >>...
> >>>By the way, has anyone been able to verify that __memory_barrier
> >>>provides DSE protection under various optimizations? Unfortunately,
> >>>I don't have ready access to ICC at the moment or I'd test it
> >>>myself.
> >>
> >>Never used icc, but it looks like it's free for open source
> >>projects; I can give it a try, but in case you're faster than I am,
> >>feel free to post results here.
> >
> >Time permitting, I'll try setting this up and post my results.
>
> So I finally got the download link and an eval license for icc, and
> after needing to download gigbytes of bloat for the suite, I could
> finally start to experiment a bit.

Ugh.

> So __GNUC__ and __INTEL_COMPILER is definitely defined by icc, __ECC
> not in my case, so that part is as expected for the kernel header
> includes.
>
> With barrier_data(), I could observe insns for an inlined memset()
> being emitted in the disassembly, same with barrier(), same with
> __memory_barrier(). In fact, even if I only use ...
>
> static inline void memzero_explicit(void *s, size_t count)
> {
> memset(s, 0, count);
> }
>
> int main(void)
> {
> char buff[20];
> memzero_explicit(buff, sizeof(buff));
> return 0;
> }
>
> ... icc will emit memset instrinsic insns (did you notice that as
> well?) when using various optimization levels. Using f.e. -Ofast
> -ffreestanding resp. -fno-builtin-memset will emit a function call,
> presumably, icc is then not allowed to make any assumptions, so given
> the previous result, then would then be expected.

I didn't get around to installing ICC so thanks for sharing the very
interesting results.

> So, crafting a stupid example:
>
> static inline void
> dumb_memset(char *s, unsigned char c, size_t n)
> {
> int i;
>
> for (i = 0; i < n; i++)
> s[i] = c;
> }
>
> static inline void memzero_explicit(void *s, size_t count)
> {
> dumb_memset(s, 0, count);
> <barrier-variant>
> }
>
> int main(void)
> {
> char buff[20];
> memzero_explicit(buff, sizeof(buff));
> return 0;
> }
>
> With no barrier at all, icc optimizes all that away (using -Ofast),
> with barrier_data() it inlines and emits additional mov* insns! Just
> using barrier() or __memory_barrier(), we end up with the same case as
> with clang, that is, it gets optimized away. So, barrier_data() seems
> to be better here as well.

For now, seems we're good with barrier_data should things like the LTO
initiative pick up steam, etc.

Cheers.


Attachments:
(No filename) (2.66 kB)
(No filename) (819.00 B)
Download all attachments