2023-01-30 21:30:52

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

For x86 kernel stack offset randomization uses the RDTSC instruction,
which according to H. Peter Anvin is not a secure source of entropy:

"RDTSC isn't a super fast instruction either, but what is *way* more
significant is that this use of RDTSC is NOT safe: in certain power states
it may very well be that stone number of lower bits of TSC contain no
entropy at all."

It also causes an invalid opcode exception with hardware that does not
implement this instruction:

process '/sbin/init' started with executable stack
invalid opcode: 0000 [#1]
CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc4+ #1
EIP: exit_to_user_mode_prepare+0x90/0xe1
Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
CR0: 80050033 CR2: bfe8659b CR3: 012e0000 CR4: 00000000
Call Trace:
? rest_init+0x72/0x72
syscall_exit_to_user_mode+0x15/0x27
ret_from_fork+0x10/0x30
EIP: 0xb7f74800
Code: Unable to access opcode bytes at 0xb7f747d6.
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfe864b0
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
---[ end trace 0000000000000000 ]---
EIP: exit_to_user_mode_prepare+0x90/0xe1
Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
CR0: 80050033 CR2: b7f747d6 CR3: 012e0000 CR4: 00000000
Kernel panic - not syncing: Fatal exception

Therefore switch to our generic entropy source and use `get_random_u8'
instead, which according to Jason A. Donenfeld is supposed to be fast
enough:

"Generally it's very very fast, as most cases wind up being only a
memcpy -- in this case, a single byte copy. So by and large it should
be suitable. It's fast enough now that most networking things are able
to use it. And lots of other places where you'd want really high
performance. So I'd expect it's okay to use here too. And if it is too
slow, we should figure out how to make it faster. But I don't suspect
it'll be too slow."

Signed-off-by: Maciej W. Rozycki <[email protected]>
Suggested-by: Jason A. Donenfeld <[email protected]>
Fixes: fe950f602033 ("x86/entry: Enable random_kstack_offset support")
Cc: [email protected] # v5.13+
---
Changes from v2:

- Use `get_random_u8' rather than `rdtsc', universally; update the heading
(was: "x86: Disable kernel stack offset randomization for !TSC") and the
description accordingly.

- As a security concern mark for backporting.

Changes from v1:

- Disable randomization at run time rather than in configuration.
---
arch/x86/include/asm/entry-common.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

linux-x86-randomize-kstack-offset-random-u8.diff
Index: linux-macro/arch/x86/include/asm/entry-common.h
===================================================================
--- linux-macro.orig/arch/x86/include/asm/entry-common.h
+++ linux-macro/arch/x86/include/asm/entry-common.h
@@ -2,6 +2,7 @@
#ifndef _ASM_X86_ENTRY_COMMON_H
#define _ASM_X86_ENTRY_COMMON_H

+#include <linux/random.h>
#include <linux/randomize_kstack.h>
#include <linux/user-return-notifier.h>

@@ -85,7 +86,7 @@ static inline void arch_exit_to_user_mod
* Therefore, final stack offset entropy will be 5 (x86_64) or
* 6 (ia32) bits.
*/
- choose_random_kstack_offset(rdtsc() & 0xFF);
+ choose_random_kstack_offset(get_random_u8());
}
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare



2023-01-31 19:35:05

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Mon, Jan 30, 2023 at 09:30:31PM +0000, Maciej W. Rozycki wrote:
> For x86 kernel stack offset randomization uses the RDTSC instruction,
> which according to H. Peter Anvin is not a secure source of entropy:
>
> "RDTSC isn't a super fast instruction either, but what is *way* more
> significant is that this use of RDTSC is NOT safe: in certain power states
> it may very well be that stone number of lower bits of TSC contain no
> entropy at all."
>
> It also causes an invalid opcode exception with hardware that does not
> implement this instruction:
>
> process '/sbin/init' started with executable stack
> invalid opcode: 0000 [#1]
> CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc4+ #1
> EIP: exit_to_user_mode_prepare+0x90/0xe1
> Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
> EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
> ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
> CR0: 80050033 CR2: bfe8659b CR3: 012e0000 CR4: 00000000
> Call Trace:
> ? rest_init+0x72/0x72
> syscall_exit_to_user_mode+0x15/0x27
> ret_from_fork+0x10/0x30
> EIP: 0xb7f74800
> Code: Unable to access opcode bytes at 0xb7f747d6.
> EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfe864b0
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
> ---[ end trace 0000000000000000 ]---
> EIP: exit_to_user_mode_prepare+0x90/0xe1
> Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
> EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
> ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
> CR0: 80050033 CR2: b7f747d6 CR3: 012e0000 CR4: 00000000
> Kernel panic - not syncing: Fatal exception
>
> Therefore switch to our generic entropy source and use `get_random_u8'
> instead, which according to Jason A. Donenfeld is supposed to be fast
> enough:
>
> "Generally it's very very fast, as most cases wind up being only a
> memcpy -- in this case, a single byte copy. So by and large it should
> be suitable. It's fast enough now that most networking things are able
> to use it. And lots of other places where you'd want really high
> performance. So I'd expect it's okay to use here too. And if it is too
> slow, we should figure out how to make it faster. But I don't suspect
> it'll be too slow."
>
> Signed-off-by: Maciej W. Rozycki <[email protected]>
> Suggested-by: Jason A. Donenfeld <[email protected]>
> Fixes: fe950f602033 ("x86/entry: Enable random_kstack_offset support")
> Cc: [email protected] # v5.13+
> ---
> Changes from v2:
>
> - Use `get_random_u8' rather than `rdtsc', universally; update the heading
> (was: "x86: Disable kernel stack offset randomization for !TSC") and the
> description accordingly.
>
> - As a security concern mark for backporting.
>
> Changes from v1:
>
> - Disable randomization at run time rather than in configuration.
> ---
> arch/x86/include/asm/entry-common.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> linux-x86-randomize-kstack-offset-random-u8.diff
> Index: linux-macro/arch/x86/include/asm/entry-common.h
> ===================================================================
> --- linux-macro.orig/arch/x86/include/asm/entry-common.h
> +++ linux-macro/arch/x86/include/asm/entry-common.h
> @@ -2,6 +2,7 @@
> #ifndef _ASM_X86_ENTRY_COMMON_H
> #define _ASM_X86_ENTRY_COMMON_H
>
> +#include <linux/random.h>
> #include <linux/randomize_kstack.h>
> #include <linux/user-return-notifier.h>
>
> @@ -85,7 +86,7 @@ static inline void arch_exit_to_user_mod
> * Therefore, final stack offset entropy will be 5 (x86_64) or
> * 6 (ia32) bits.
> */
> - choose_random_kstack_offset(rdtsc() & 0xFF);
> + choose_random_kstack_offset(get_random_u8());
> }
> #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare

Acked-by: Jason A. Donenfeld <[email protected]>

2023-01-31 20:52:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On January 30, 2023 1:30:31 PM PST, "Maciej W. Rozycki" <[email protected]> wrote:
>For x86 kernel stack offset randomization uses the RDTSC instruction,
>which according to H. Peter Anvin is not a secure source of entropy:
>
>"RDTSC isn't a super fast instruction either, but what is *way* more
>significant is that this use of RDTSC is NOT safe: in certain power states
>it may very well be that stone number of lower bits of TSC contain no
>entropy at all."
>
>It also causes an invalid opcode exception with hardware that does not
>implement this instruction:
>
>process '/sbin/init' started with executable stack
>invalid opcode: 0000 [#1]
>CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc4+ #1
>EIP: exit_to_user_mode_prepare+0x90/0xe1
>Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
>EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
>ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
>DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
>CR0: 80050033 CR2: bfe8659b CR3: 012e0000 CR4: 00000000
>Call Trace:
> ? rest_init+0x72/0x72
> syscall_exit_to_user_mode+0x15/0x27
> ret_from_fork+0x10/0x30
>EIP: 0xb7f74800
>Code: Unable to access opcode bytes at 0xb7f747d6.
>EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
>ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfe864b0
>DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
>---[ end trace 0000000000000000 ]---
>EIP: exit_to_user_mode_prepare+0x90/0xe1
>Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
>EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
>ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
>DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
>CR0: 80050033 CR2: b7f747d6 CR3: 012e0000 CR4: 00000000
>Kernel panic - not syncing: Fatal exception
>
>Therefore switch to our generic entropy source and use `get_random_u8'
>instead, which according to Jason A. Donenfeld is supposed to be fast
>enough:
>
>"Generally it's very very fast, as most cases wind up being only a
>memcpy -- in this case, a single byte copy. So by and large it should
>be suitable. It's fast enough now that most networking things are able
>to use it. And lots of other places where you'd want really high
>performance. So I'd expect it's okay to use here too. And if it is too
>slow, we should figure out how to make it faster. But I don't suspect
>it'll be too slow."
>
>Signed-off-by: Maciej W. Rozycki <[email protected]>
>Suggested-by: Jason A. Donenfeld <[email protected]>
>Fixes: fe950f602033 ("x86/entry: Enable random_kstack_offset support")
>Cc: [email protected] # v5.13+
>---
>Changes from v2:
>
>- Use `get_random_u8' rather than `rdtsc', universally; update the heading
> (was: "x86: Disable kernel stack offset randomization for !TSC") and the
> description accordingly.
>
>- As a security concern mark for backporting.
>
>Changes from v1:
>
>- Disable randomization at run time rather than in configuration.
>---
> arch/x86/include/asm/entry-common.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>linux-x86-randomize-kstack-offset-random-u8.diff
>Index: linux-macro/arch/x86/include/asm/entry-common.h
>===================================================================
>--- linux-macro.orig/arch/x86/include/asm/entry-common.h
>+++ linux-macro/arch/x86/include/asm/entry-common.h
>@@ -2,6 +2,7 @@
> #ifndef _ASM_X86_ENTRY_COMMON_H
> #define _ASM_X86_ENTRY_COMMON_H
>
>+#include <linux/random.h>
> #include <linux/randomize_kstack.h>
> #include <linux/user-return-notifier.h>
>
>@@ -85,7 +86,7 @@ static inline void arch_exit_to_user_mod
> * Therefore, final stack offset entropy will be 5 (x86_64) or
> * 6 (ia32) bits.
> */
>- choose_random_kstack_offset(rdtsc() & 0xFF);
>+ choose_random_kstack_offset(get_random_u8());
> }
> #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
>

Well, what I said was that masking out the low bits of TSC is not a valid use to extract a random(-ish) number this way, because the lower bits may be affected by quantization. Something like a circular multiply using a large prime with a good 0:1 balance can be used to mitigate that.

However, the second part is that subsequent RDTSCs will be highly correlated, and so a CSPRNG is needed if you are actually trying to get reasonable security this way – and, well, we already have one of those.

2023-01-31 21:02:03

by Miko Larsson

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Mon, 2023-01-30 at 21:30 +0000, Maciej W. Rozycki wrote:
> For x86 kernel stack offset randomization uses the RDTSC instruction,
> which according to H. Peter Anvin is not a secure source of entropy:
>
> "RDTSC isn't a super fast instruction either, but what is *way* more
> significant is that this use of RDTSC is NOT safe: in certain power
> states
> it may very well be that stone number of lower bits of TSC contain no
> entropy at all."
>
> It also causes an invalid opcode exception with hardware that does
> not
> implement this instruction:
>
> process '/sbin/init' started with executable stack
> invalid opcode: 0000 [#1]
> CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc4+ #1
> EIP: exit_to_user_mode_prepare+0x90/0xe1
> Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73
> 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f>
> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
> EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
> ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
> CR0: 80050033 CR2: bfe8659b CR3: 012e0000 CR4: 00000000
> Call Trace:
>  ? rest_init+0x72/0x72
>  syscall_exit_to_user_mode+0x15/0x27
>  ret_from_fork+0x10/0x30
> EIP: 0xb7f74800
> Code: Unable to access opcode bytes at 0xb7f747d6.
> EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: bfe864b0
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 007b EFLAGS: 00000200
> ---[ end trace 0000000000000000 ]---
> EIP: exit_to_user_mode_prepare+0x90/0xe1
> Code: 30 02 00 75 ad 0f ba e3 16 73 05 e8 a7 a5 fc ff 0f ba e3 0e 73
> 05 e8 3e af fc ff a1 c4 c6 51 c0 85 c0 7e 13 8b 0d ac 01 53 c0 <0f>
> 31 0f b6 c0 31 c1 89 0d ac 01 53 c0 83 3d 30 ed 62 c0 00 75 33
> EAX: 00000001 EBX: 00004000 ECX: 00000000 EDX: 000004ff
> ESI: c10253c0 EDI: 00000000 EBP: c1027f98 ESP: c1027f8c
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
> CR0: 80050033 CR2: b7f747d6 CR3: 012e0000 CR4: 00000000
> Kernel panic - not syncing: Fatal exception
>
> Therefore switch to our generic entropy source and use
> `get_random_u8'
> instead, which according to Jason A. Donenfeld is supposed to be fast
> enough:
>
> "Generally it's very very fast, as most cases wind up being only a
> memcpy -- in this case, a single byte copy. So by and large it should
> be suitable. It's fast enough now that most networking things are
> able
> to use it. And lots of other places where you'd want really high
> performance. So I'd expect it's okay to use here too. And if it is
> too
> slow, we should figure out how to make it faster. But I don't suspect
> it'll be too slow."
>
> Signed-off-by: Maciej W. Rozycki <[email protected]>
> Suggested-by: Jason A. Donenfeld <[email protected]>
> Fixes: fe950f602033 ("x86/entry: Enable random_kstack_offset
> support")
> Cc: [email protected] # v5.13+
> ---
> Changes from v2:
>
> - Use `get_random_u8' rather than `rdtsc', universally; update the
> heading
>   (was: "x86: Disable kernel stack offset randomization for !TSC")
> and the
>   description accordingly.
>
> - As a security concern mark for backporting.
>
> Changes from v1:
>
> - Disable randomization at run time rather than in configuration.
> ---
>  arch/x86/include/asm/entry-common.h |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> linux-x86-randomize-kstack-offset-random-u8.diff
> Index: linux-macro/arch/x86/include/asm/entry-common.h
> ===================================================================
> --- linux-macro.orig/arch/x86/include/asm/entry-common.h
> +++ linux-macro/arch/x86/include/asm/entry-common.h
> @@ -2,6 +2,7 @@
>  #ifndef _ASM_X86_ENTRY_COMMON_H
>  #define _ASM_X86_ENTRY_COMMON_H
>  
> +#include <linux/random.h>
>  #include <linux/randomize_kstack.h>
>  #include <linux/user-return-notifier.h>
>  
> @@ -85,7 +86,7 @@ static inline void arch_exit_to_user_mod
>          * Therefore, final stack offset entropy will be 5 (x86_64)
> or
>          * 6 (ia32) bits.
>          */
> -       choose_random_kstack_offset(rdtsc() & 0xFF);
> +       choose_random_kstack_offset(get_random_u8());
>  }
>  #define arch_exit_to_user_mode_prepare
> arch_exit_to_user_mode_prepare
>  
Tested-by: Miko Larsson <[email protected]>
--
~miko

2023-02-12 23:18:35

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PING][PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Mon, 30 Jan 2023, Maciej W. Rozycki wrote:

> For x86 kernel stack offset randomization uses the RDTSC instruction,
> which according to H. Peter Anvin is not a secure source of entropy:
>
> "RDTSC isn't a super fast instruction either, but what is *way* more
> significant is that this use of RDTSC is NOT safe: in certain power states
> it may very well be that stone number of lower bits of TSC contain no
> entropy at all."

Ping for:
<https://lore.kernel.org/all/[email protected]/>.

Maciej


2023-02-13 19:02:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PING][PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Sun, Feb 12 2023 at 23:17, Maciej W. Rozycki wrote:
> On Mon, 30 Jan 2023, Maciej W. Rozycki wrote:
>
>> For x86 kernel stack offset randomization uses the RDTSC instruction,
>> which according to H. Peter Anvin is not a secure source of entropy:
>>
>> "RDTSC isn't a super fast instruction either, but what is *way* more
>> significant is that this use of RDTSC is NOT safe: in certain power states
>> it may very well be that stone number of lower bits of TSC contain no
>> entropy at all."
>
> Ping for:
> <https://lore.kernel.org/all/[email protected]/>.

I'm waiting for you to address Peter Anvins feedback. You also cite him
w/o providing a link to the conversation, so any context is missing.

Thanks,

tglx

2023-02-13 19:04:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Mon, Jan 30 2023 at 21:30, Maciej W. Rozycki wrote:
>
> Therefore switch to our generic entropy source and use `get_random_u8'
> instead, which according to Jason A. Donenfeld is supposed to be fast
> enough:
>
> "Generally it's very very fast, as most cases wind up being only a
> memcpy -- in this case, a single byte copy. So by and large it should
> be suitable. It's fast enough now that most networking things are able
> to use it. And lots of other places where you'd want really high
> performance. So I'd expect it's okay to use here too. And if it is too
> slow, we should figure out how to make it faster. But I don't suspect
> it'll be too slow."

Please provide numbers on contemporary hardware.

Up to that point, it's easy enough to just disable that randomization on
32bit.

Thanks,

tglx

2023-02-14 04:54:59

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PING][PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Mon, 13 Feb 2023, Thomas Gleixner wrote:

> >> For x86 kernel stack offset randomization uses the RDTSC instruction,
> >> which according to H. Peter Anvin is not a secure source of entropy:
> >>
> >> "RDTSC isn't a super fast instruction either, but what is *way* more
> >> significant is that this use of RDTSC is NOT safe: in certain power states
> >> it may very well be that stone number of lower bits of TSC contain no
> >> entropy at all."
> >
> > Ping for:
> > <https://lore.kernel.org/all/[email protected]/>.
>
> I'm waiting for you to address Peter Anvins feedback.

Do you mean this part:

On Tue, 31 Jan 2023, H. Peter Anvin wrote:

> Well, what I said was that masking out the low bits of TSC is not a valid use to
> extract a random(-ish) number this way, because the lower bits may be affected
> by quantization. Something like a circular multiply using a large prime with a
> good 0:1 balance can be used to mitigate that.
>
> However, the second part is that subsequent RDTSCs will be highly correlated,
> and so a CSPRNG is needed if you are actually trying to get reasonable security
> this way – and, well, we already have one of those.

? Well, I inferred, perhaps incorrectly, from the second paragraph that
Peter agrees with my approach (with the CSPRNG being what `get_random_u8'
and friends get at).

> You also cite him
> w/o providing a link to the conversation, so any context is missing.

Sorry about that. I put the change heading for the previous iterations
in the change log, but I agree actual web links would've been better:
<https://lore.kernel.org/all/[email protected]/>,
<https://lore.kernel.org/all/[email protected]/>.

Please let me know if you need anything else. Thank you for your review.

Maciej

2023-02-14 05:13:33

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Mon, 13 Feb 2023, Thomas Gleixner wrote:

> On Mon, Jan 30 2023 at 21:30, Maciej W. Rozycki wrote:
> >
> > Therefore switch to our generic entropy source and use `get_random_u8'
> > instead, which according to Jason A. Donenfeld is supposed to be fast
> > enough:
> >
> > "Generally it's very very fast, as most cases wind up being only a
> > memcpy -- in this case, a single byte copy. So by and large it should
> > be suitable. It's fast enough now that most networking things are able
> > to use it. And lots of other places where you'd want really high
> > performance. So I'd expect it's okay to use here too. And if it is too
> > slow, we should figure out how to make it faster. But I don't suspect
> > it'll be too slow."
>
> Please provide numbers on contemporary hardware.

Jason, is this something you could help me with to back up your claim?

My access to modern x86 gear is limited and I just don't have anything I
can randomly fiddle with (I guess an Intel Core 2 Duo T5600 processor back
from 2008 doesn't count as "contemporary", does it?).

> Up to that point, it's easy enough to just disable that randomization on
> 32bit.

I think for 32-bit we could just go with `get_random_u8' unconditionally,
but if you'd rather I disabled the feature altogether such as in v1 or v2,
then I'm happy to resubmit whichever version seems the best, or make yet a
different one. Please mind the security implications of RDTSC raised in
the discussion though. Thanks for your feedback.

Maciej

2023-02-14 13:40:29

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Tue, Feb 14, 2023 at 6:12 AM Maciej W. Rozycki <[email protected]> wrote:
>
> On Mon, 13 Feb 2023, Thomas Gleixner wrote:
>
> > On Mon, Jan 30 2023 at 21:30, Maciej W. Rozycki wrote:
> > >
> > > Therefore switch to our generic entropy source and use `get_random_u8'
> > > instead, which according to Jason A. Donenfeld is supposed to be fast
> > > enough:
> > >
> > > "Generally it's very very fast, as most cases wind up being only a
> > > memcpy -- in this case, a single byte copy. So by and large it should
> > > be suitable. It's fast enough now that most networking things are able
> > > to use it. And lots of other places where you'd want really high
> > > performance. So I'd expect it's okay to use here too. And if it is too
> > > slow, we should figure out how to make it faster. But I don't suspect
> > > it'll be too slow."
> >
> > Please provide numbers on contemporary hardware.
>
> Jason, is this something you could help me with to back up your claim?
>
> My access to modern x86 gear is limited and I just don't have anything I
> can randomly fiddle with (I guess an Intel Core 2 Duo T5600 processor back
> from 2008 doesn't count as "contemporary", does it?).

I imagine tglx wants real life performance numbers rather than a
microbench of the rng. So the thing to do would be to exercise
arch_exit_to_user_mode() a bunch. Does this trigger on every syscall,
even invalid ones? If so, you could make a test like:

#include <sys/syscall.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
for (int i = 0; i < (1 << 26); ++i)
syscall(0xffffffff);
return 0;
}

And then see if the timing changes across your patch.

Jason

2023-02-14 20:43:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PING][PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On February 13, 2023 8:54:53 PM PST, "Maciej W. Rozycki" <[email protected]> wrote:
>On Mon, 13 Feb 2023, Thomas Gleixner wrote:
>
>> >> For x86 kernel stack offset randomization uses the RDTSC instruction,
>> >> which according to H. Peter Anvin is not a secure source of entropy:
>> >>
>> >> "RDTSC isn't a super fast instruction either, but what is *way* more
>> >> significant is that this use of RDTSC is NOT safe: in certain power states
>> >> it may very well be that stone number of lower bits of TSC contain no
>> >> entropy at all."
>> >
>> > Ping for:
>> > <https://lore.kernel.org/all/[email protected]/>.
>>
>> I'm waiting for you to address Peter Anvins feedback.
>
> Do you mean this part:
>
>On Tue, 31 Jan 2023, H. Peter Anvin wrote:
>
>> Well, what I said was that masking out the low bits of TSC is not a valid use to
>> extract a random(-ish) number this way, because the lower bits may be affected
>> by quantization. Something like a circular multiply using a large prime with a
>> good 0:1 balance can be used to mitigate that.
>>
>> However, the second part is that subsequent RDTSCs will be highly correlated,
>> and so a CSPRNG is needed if you are actually trying to get reasonable security
>> this way – and, well, we already have one of those.
>
>? Well, I inferred, perhaps incorrectly, from the second paragraph that
>Peter agrees with my approach (with the CSPRNG being what `get_random_u8'
>and friends get at).
>
>> You also cite him
>> w/o providing a link to the conversation, so any context is missing.
>
> Sorry about that. I put the change heading for the previous iterations
>in the change log, but I agree actual web links would've been better:
><https://lore.kernel.org/all/[email protected]/>,
><https://lore.kernel.org/all/[email protected]/>.
>
> Please let me know if you need anything else. Thank you for your review.
>
> Maciej

No, I do indeed agree. We're talking something that is a part of an operation that is already fairly expensive. Now, if RDRAND is available on the hardware then that could be used if someone really wants it to go faster... but get_random_*() seems saner than doing ad hoc hacks.

2023-02-22 12:06:45

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Tue, 14 Feb 2023, Jason A. Donenfeld wrote:

> > > Please provide numbers on contemporary hardware.
> >
> > Jason, is this something you could help me with to back up your claim?
> >
> > My access to modern x86 gear is limited and I just don't have anything I
> > can randomly fiddle with (I guess an Intel Core 2 Duo T5600 processor back
> > from 2008 doesn't count as "contemporary", does it?).
>
> I imagine tglx wants real life performance numbers rather than a
> microbench of the rng. So the thing to do would be to exercise
> arch_exit_to_user_mode() a bunch. Does this trigger on every syscall,
> even invalid ones? If so, you could make a test like:
>
> #include <sys/syscall.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> for (int i = 0; i < (1 << 26); ++i)
> syscall(0xffffffff);
> return 0;
> }
>
> And then see if the timing changes across your patch.

Thanks. Though that does not solve my lack of suitable hardware, sigh.
It's not like I have x86 systems scattered all over the place. I guess I
could try to benchmark with said T5600 piece, but it won't be until April
the earliest as I'm away most of the time.

Maciej

2023-02-22 16:44:30

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PING][PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Tue, Feb 14, 2023 at 9:43 PM H. Peter Anvin <[email protected]> wrote:
> No, I do indeed agree. We're talking something that is a part of an operation that is already fairly expensive. Now, if RDRAND is available on the hardware then that could be used if someone really wants it to go faster... but get_random_*() seems saner than doing ad hoc hacks.

Do you want to go ahead and apply Maciej's patch in [1] then?

[1] https://lore.kernel.org/lkml/[email protected]/

2023-06-05 16:11:14

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Use `get_random_u8' for kernel stack offset randomization

On Wed, 22 Feb 2023, Maciej W. Rozycki wrote:

> > > > Please provide numbers on contemporary hardware.
> > >
> > > Jason, is this something you could help me with to back up your claim?
> > >
> > > My access to modern x86 gear is limited and I just don't have anything I
> > > can randomly fiddle with (I guess an Intel Core 2 Duo T5600 processor back
> > > from 2008 doesn't count as "contemporary", does it?).
> >
> > I imagine tglx wants real life performance numbers rather than a
> > microbench of the rng. So the thing to do would be to exercise
> > arch_exit_to_user_mode() a bunch. Does this trigger on every syscall,
> > even invalid ones? If so, you could make a test like:
> >
> > #include <sys/syscall.h>
> > #include <unistd.h>
> >
> > int main(int argc, char *argv[])
> > {
> > for (int i = 0; i < (1 << 26); ++i)
> > syscall(0xffffffff);
> > return 0;
> > }
> >
> > And then see if the timing changes across your patch.
>
> Thanks. Though that does not solve my lack of suitable hardware, sigh.
> It's not like I have x86 systems scattered all over the place. I guess I
> could try to benchmark with said T5600 piece, but it won't be until April
> the earliest as I'm away most of the time.

Thank you for waiting. I was able to arrange for benchmarking now with
an "Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz" piece. I did a minor research
and chose to use `perf bench syscall all' to evaluate the change, as this
software is readily available and bundled with Linux even. Results are as
follows:

1. Randomisation configured in, but disabled:

# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 4.601 [sec]

0.460165 usecs/op
2173132 ops/sec

# Running syscall/getpgid benchmark...
# Executed 10000000 getpgid() calls
Total time: 3.241 [sec]

0.324109 usecs/op
3085383 ops/sec

# Running syscall/execve benchmark...
# Executed 10000 execve() calls
Total time: 7.041 [sec]

704.193800 usecs/op
1420 ops/sec

2. Randomisation enabled, using RDTSC:

# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 4.995 [sec]

0.499529 usecs/op
2001886 ops/sec

# Running syscall/getpgid benchmark...
# Executed 10000000 getpgid() calls
Total time: 3.625 [sec]

0.362521 usecs/op
2758460 ops/sec

# Running syscall/execve benchmark...
# Executed 10000 execve() calls
Total time: 7.009 [sec]

700.990800 usecs/op
1426 ops/sec

3. Randomisation enabled, using `get_random_u8':

# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 6.053 [sec]

0.605394 usecs/op
1651817 ops/sec

# Running syscall/getpgid benchmark...
# Executed 10000000 getpgid() calls
Total time: 4.641 [sec]

0.464124 usecs/op
2154598 ops/sec

# Running syscall/execve benchmark...
# Executed 10000 execve() calls
Total time: 7.023 [sec]

702.355400 usecs/op
1423 ops/sec

There is some variance between runs, but the trend is stable. NB this has
been obtained with 6.3.0 (both Linux and `perf') and GCC 11.

So enabling randomisation with RDTSC and with `get_random_u8' makes fast
syscalls respectively 8% and 24% slower. I think it has been expected
that a call to `get_random_u8' will be slower than RDTSC. But can we
accept the slowdown given the security concerns about RDTSC?

What are the next steps then?

Maciej