2018-01-19 00:11:41

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution

Changes since v3 [1]
* Drop 'ifence_array_ptr' and associated compile-time + run-time
switching and just use the masking approach all the time.

* Convert 'get_user' to use pointer sanitization via masking rather than
lfence. '__get_user' and associated paths still rely on
lfence. (Linus)

"Basically, the rule is trivial: find all 'stac' users, and use
address masking if those users already integrate the limit
check, and lfence they don't."

* At syscall entry sanitize the syscall number under speculation
to remove a user controlled pointer de-reference in kernel
space. (Linus)

* Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
'array_ptr'.

* Propose 'array_idx' as a way to sanitize user input that is
later used as an array index, but where the validation is
happening in a different code block than the array reference.
(Christian).

* Fix grammar in speculation.txt (Kees)

---

Quoting Mark's original RFC:

"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."

A precondition of using this attack on the kernel is to get a user
controlled pointer de-referenced (under speculation) in privileged code.
The primary source of user controlled pointers in the kernel is the
arguments passed to 'get_user' and '__get_user'. An example of other
user controlled pointers are user-controlled array / pointer offsets.

Better tooling is needed to find more arrays / pointers with user
controlled indices / offsets that can be converted to use 'array_ptr' or
'array_idx'. A few are included in this set, and these are not expected
to be complete. That said, the 'get_user' protections raise the bar on
finding a vulnerable gadget in the kernel.

These patches are also available via the 'nospec-v4' git branch here:

git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

Note that the BPF fix for Spectre variant1 is merged for 4.15-rc8.

[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html

---

Dan Williams (9):
asm/nospec, array_ptr: sanitize speculative array de-references
x86: implement array_ptr_mask()
x86: introduce __uaccess_begin_nospec and ifence
x86, __get_user: use __uaccess_begin_nospec
x86, get_user: use pointer masking to limit speculation
x86: narrow out of bounds syscalls to sys_read under speculation
vfs, fdtable: prevent bounds-check bypass via speculative execution
kvm, x86: fix spectre-v1 mitigation
nl80211: sanitize array index in parse_txq_params

Mark Rutland (1):
Documentation: document array_ptr


Documentation/speculation.txt | 143 +++++++++++++++++++++++++++++++++++++
arch/x86/entry/entry_64.S | 2 +
arch/x86/include/asm/barrier.h | 28 +++++++
arch/x86/include/asm/msr.h | 3 -
arch/x86/include/asm/smap.h | 24 ++++++
arch/x86/include/asm/uaccess.h | 15 +++-
arch/x86/include/asm/uaccess_32.h | 6 +-
arch/x86/include/asm/uaccess_64.h | 12 ++-
arch/x86/kvm/vmx.c | 19 ++---
arch/x86/lib/getuser.S | 5 +
arch/x86/lib/usercopy_32.c | 8 +-
include/linux/fdtable.h | 7 +-
include/linux/nospec.h | 65 +++++++++++++++++
net/wireless/nl80211.c | 10 ++-
14 files changed, 312 insertions(+), 35 deletions(-)
create mode 100644 Documentation/speculation.txt
create mode 100644 include/linux/nospec.h


2018-01-19 00:12:00

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 01/10] Documentation: document array_ptr

From: Mark Rutland <[email protected]>

Document the rationale and usage of the new array_ptr() helper.

Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Documentation/speculation.txt | 143 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 143 insertions(+)
create mode 100644 Documentation/speculation.txt

diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..a47fbffe0dab
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,143 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+ int load_array(int *array, unsigned int idx)
+ {
+ if (idx >= MAX_ARRAY_ELEMS)
+ return 0;
+ else
+ return array[idx];
+ }
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+ CMP <idx>, #MAX_ARRAY_ELEMS
+ B.LT less
+ MOV <returnval>, #0
+ RET
+ less:
+ LDR <returnval>, [<array>, <idx>]
+ RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+ int load_dependent_arrays(int *arr1, int *arr2, int idx)
+ {
+ int val1, val2,
+
+ val1 = load_array(arr1, idx);
+ val2 = load_array(arr2, val1);
+
+ return val2;
+ }
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_ptr() helper in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
+arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,
+NULL is returned. Additionally, array_ptr() of an out-of-bounds pointer is
+not propagated to code which is speculatively executed.
+
+This can be used to protect the earlier load_array() example:
+
+ int load_array(int *array, unsigned int idx)
+ {
+ int *elem;
+
+ elem = array_ptr(array, idx, MAX_ARRAY_ELEMS);
+ if (elem)
+ return *elem;
+ else
+ return 0;
+ }
+
+This can also be used in situations where multiple fields on a structure are
+accessed:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ struct foo *elem;
+
+ elem = array_ptr(array, idx, SIZE);
+ if (elem) {
+ a = elem->field_a;
+ b = elem->field_b;
+ }
+ }
+
+It is imperative that the returned pointer is used. Pointers which are
+generated separately are subject to a number of potential CPU and compiler
+optimizations, and may still be used speculatively. For example, this means
+that the following sequence is unsafe:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ if (array_ptr(array, idx, SIZE) != NULL) {
+ // unsafe as wrong pointer is used
+ a = array[idx].field_a;
+ b = array[idx].field_b;
+ }
+ }
+
+Similarly, it is unsafe to compare the returned pointer with other pointers,
+as this may permit the compiler to substitute one pointer with another,
+permitting speculation. For example, the following sequence is unsafe:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ struct foo *elem = array_ptr(array, idx, size);
+
+ // unsafe due to pointer substitution
+ if (elem == &array[idx]) {
+ a = elem->field_a;
+ b = elem->field_b;
+ }
+ }
+


2018-01-19 00:12:15

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

'array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
via speculative execution). The 'array_ptr' implementation is expected
to be safe for current generation cpus across multiple architectures
(ARM, x86).

Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.

Co-developed-by: Linus Torvalds <[email protected]>
Co-developed-by: Alexei Starovoitov <[email protected]>
Co-developed-by: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
include/linux/nospec.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
create mode 100644 include/linux/nospec.h

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..f841c11f3f1f
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+/*
+ * If idx is negative or if idx > size then bit 63 is set in the mask,
+ * and the value of ~(-1L) is zero. When the mask is zero, bounds check
+ * failed, array_ptr will return NULL.
+ */
+#ifndef array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+ return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
+}
+#endif
+
+/**
+ * array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define array_ptr(base, idx, sz) \
+({ \
+ union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
+ typeof(*(base)) *_arr = (base); \
+ unsigned long _i = (idx); \
+ unsigned long _mask = array_ptr_mask(_i, (sz)); \
+ \
+ __u._ptr = _arr + (_i & _mask); \
+ __u._bit &= _mask; \
+ __u._ptr; \
+})
+#endif /* __NOSPEC_H__ */


2018-01-19 00:12:54

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 04/10] x86: introduce __uaccess_begin_nospec and ifence

For '__get_user' paths, do not allow the kernel to speculate on the
value of a user controlled pointer. In addition to the 'stac'
instruction for Supervisor Mode Access Protection, an 'ifence' causes
the 'access_ok' result to resolve in the pipeline before the cpu might
take any speculative action on the pointer value.

Since __get_user is a major kernel interface that deals with user
controlled pointers, the '__uaccess_begin_nospec' mechanism will prevent
speculative execution past an 'access_ok' permission check. While
speculative execution past 'access_ok' is not enough to lead to a kernel
memory leak, it is a necessary precondition.

To be clear, '__uaccess_begin_nospec' is addressing a class of potential
problems near '__get_user' usages.

Note, that while ifence is used to protect '__get_user', pointer masking
will be used for 'get_user' since it incorporates a bounds check near
the usage.

There are no functional changes in this patch.

Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Andi Kleen <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/barrier.h | 4 ++++
arch/x86/include/asm/msr.h | 3 +--
arch/x86/include/asm/uaccess.h | 9 +++++++++
3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 67f6d4707a2c..0f48c832d1fb 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -48,6 +48,10 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
return mask;
}

+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+ "lfence", X86_FEATURE_LFENCE_RDTSC)
+
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* that some other imaginary CPU is updating continuously with a
* time stamp.
*/
- alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
- "lfence", X86_FEATURE_LFENCE_RDTSC);
+ ifence();
return rdtsc();
}

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..626caf58183a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);

#define __uaccess_begin() stac()
#define __uaccess_end() clac()
+#define __uaccess_begin_nospec() \
+({ \
+ stac(); \
+ ifence(); \
+})

/*
* This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,10 @@ struct __large_struct { unsigned long buf[100]; };
__uaccess_begin(); \
barrier();

+#define uaccess_try_nospec do { \
+ current->thread.uaccess_err = 0; \
+ __uaccess_begin_nospec(); \
+
#define uaccess_catch(err) \
__uaccess_end(); \
(err) |= (current->thread.uaccess_err ? -EFAULT : 0); \


2018-01-19 00:13:39

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 03/10] x86: implement array_ptr_mask()

'array_ptr' uses a mask to sanitize user controllable pointers. The x86
'array_ptr_mask' is an assembler optimized way to generate a 0 or ~0
mask if an array index is out-of-bounds or in-bounds.

Suggested-by: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/barrier.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 7fb336210e1b..67f6d4707a2c 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,30 @@
#define wmb() asm volatile("sfence" ::: "memory")
#endif

+/**
+ * array_ptr_mask - generate a mask for array_ptr() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ */
+#define array_ptr_mask array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+ unsigned long mask;
+
+ /*
+ * mask = index - size, if that result is >= 0 then the index is
+ * invalid and the mask is 0 else ~0
+ */
+#ifdef CONFIG_X86_32
+ asm ("cmpl %1,%2; sbbl %0,%0;"
+#else
+ asm ("cmpq %1,%2; sbbq %0,%0;"
+#endif
+ :"=r" (mask)
+ :"r"(sz),"r" (idx)
+ :"cc");
+ return mask;
+}
+
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else


2018-01-19 00:14:04

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation

Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
added a raw 'asm("lfence");' to prevent a bounds check bypass of
'vmcs_field_to_offset_table'. This does not work for some AMD cpus, see
the 'ifence' helper, and it otherwise does not use the common
'array_ptr' helper designed for these types of fixes. Convert this to
use 'array_ptr'.

Cc: Andrew Honig <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/kvm/vmx.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..20b9b0b5e336 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -34,6 +34,7 @@
#include <linux/tboot.h>
#include <linux/hrtimer.h>
#include <linux/frame.h>
+#include <linux/nospec.h>
#include "kvm_cache_regs.h"
#include "x86.h"

@@ -898,21 +899,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {

static inline short vmcs_field_to_offset(unsigned long field)
{
- BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
-
- if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
- return -ENOENT;
+ const unsigned short *offset;

- /*
- * FIXME: Mitigation for CVE-2017-5753. To be replaced with a
- * generic mechanism.
- */
- asm("lfence");
+ BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);

- if (vmcs_field_to_offset_table[field] == 0)
+ offset = array_ptr(vmcs_field_to_offset_table, field,
+ ARRAY_SIZE(vmcs_field_to_offset_table));
+ if (!offset || *offset == 0)
return -ENOENT;
-
- return vmcs_field_to_offset_table[field];
+ return *offset;
}

static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)


2018-01-19 00:14:14

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 06/10] x86, get_user: use pointer masking to limit speculation

Quoting Linus:

I do think that it would be a good idea to very expressly document
the fact that it's not that the user access itself is unsafe. I do
agree that things like "get_user()" want to be protected, but not
because of any direct bugs or problems with get_user() and friends,
but simply because get_user() is an excellent source of a pointer
that is obviously controlled from a potentially attacking user
space. So it's a prime candidate for then finding _subsequent_
accesses that can then be used to perturb the cache.

Unlike the '__get_user' case 'get_user' includes the address limit check
near the pointer de-reference. With that locality the speculation can be
mitigated with pointer narrowing rather than a barrier. Where the
narrowing is performed by:

cmp %limit, %ptr
sbb %mask, %mask
and %mask, %ptr

With respect to speculation the value of %ptr is either less than %limit
or NULL.

Co-developed-by: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/smap.h | 17 +++++++++++++++++
arch/x86/lib/getuser.S | 5 +++++
2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..2b4ad4c6a226 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -25,6 +25,23 @@

#include <asm/alternative-asm.h>

+/*
+ * MASK_NOSPEC - sanitize the value of a user controlled value with
+ * respect to speculation
+ *
+ * In the get_user path once we have determined that the pointer is
+ * below the current address limit sanitize its value with respect to
+ * speculation. In the case when the pointer is above the address limit
+ * this directs the cpu to speculate with a NULL ptr rather than
+ * something targeting kernel memory.
+ *
+ * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ */
+.macro MASK_NOSPEC mask val
+ sbb \mask, \mask
+ and \mask, \val
+.endm
+
#ifdef CONFIG_X86_SMAP

#define ASM_CLAC \
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index c97d935a29e8..07d0e8a28b17 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -40,6 +40,7 @@ ENTRY(__get_user_1)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ MASK_NOSPEC %_ASM_DX, %_ASM_AX
ASM_STAC
1: movzbl (%_ASM_AX),%edx
xor %eax,%eax
@@ -54,6 +55,7 @@ ENTRY(__get_user_2)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ MASK_NOSPEC %_ASM_DX, %_ASM_AX
ASM_STAC
2: movzwl -1(%_ASM_AX),%edx
xor %eax,%eax
@@ -68,6 +70,7 @@ ENTRY(__get_user_4)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ MASK_NOSPEC %_ASM_DX, %_ASM_AX
ASM_STAC
3: movl -3(%_ASM_AX),%edx
xor %eax,%eax
@@ -83,6 +86,7 @@ ENTRY(__get_user_8)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user
+ MASK_NOSPEC %_ASM_DX, %_ASM_AX
ASM_STAC
4: movq -7(%_ASM_AX),%rdx
xor %eax,%eax
@@ -94,6 +98,7 @@ ENTRY(__get_user_8)
mov PER_CPU_VAR(current_task), %_ASM_DX
cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
jae bad_get_user_8
+ MASK_NOSPEC %_ASM_DX, %_ASM_AX
ASM_STAC
4: movl -7(%_ASM_AX),%edx
5: movl -3(%_ASM_AX),%ecx


2018-01-19 00:14:27

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

The syscall table base is a user controlled function pointer in kernel
space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
speculation. While retpoline prevents speculating into the user
controlled target it does not stop the pointer de-reference, the concern
is leaking memory relative to the syscall table base.

Reported-by: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/entry/entry_64.S | 2 ++
arch/x86/include/asm/smap.h | 9 ++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4f8e1d35a97c..2320017077d4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
#include <asm/asm.h>
#include <asm/smap.h>
#include <asm/pgtable_types.h>
+#include <asm/smap.h>
#include <asm/export.h>
#include <asm/frame.h>
#include <asm/nospec-branch.h>
@@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
cmpl $__NR_syscall_max, %eax
#endif
ja 1f /* return -ENOSYS (already in pt_regs->ax) */
+ MASK_NOSPEC %r11 %rax /* sanitize syscall_nr wrt speculation */
movq %r10, %rcx

/*
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 2b4ad4c6a226..3b5b2cf58dc6 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -35,7 +35,14 @@
* this directs the cpu to speculate with a NULL ptr rather than
* something targeting kernel memory.
*
- * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
+ * In the syscall entry path it is possible to speculate past the
+ * validation of the system call number. Use MASK_NOSPEC to sanitize the
+ * syscall array index to zero (sys_read) rather than an arbitrary
+ * target.
+ *
+ * assumes CF is set from a previous 'cmp' i.e.:
+ * cmp TASK_addr_limit, %ptr
+ * cmp __NR_syscall_max, %idx
*/
.macro MASK_NOSPEC mask val
sbb \mask, \mask


2018-01-19 00:14:36

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params

Wireless drivers rely on parse_txq_params to validate that
txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
to sanitize txq_params->ac with respect to speculation. I.e. ensure that
any speculation into ->conf_tx() handlers is done with a value of
txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Reported-by: Christian Lamparter <[email protected]>
Reported-by: Elena Reshetova <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
include/linux/nospec.h | 21 +++++++++++++++++++++
net/wireless/nl80211.c | 10 +++++++---
2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index f841c11f3f1f..8af35be1869e 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
__u._bit &= _mask; \
__u._ptr; \
})
+
+/**
+ * array_idx - Generate a pointer to an array index, ensuring the
+ * pointer is bounded under speculation to NULL.
+ *
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns &@idx otherwise
+ * returns NULL.
+ */
+#define array_idx(idx, sz) \
+({ \
+ union { typeof((idx)) *_ptr; unsigned long _bit; } __u; \
+ typeof(idx) *_i = &(idx); \
+ unsigned long _mask = array_ptr_mask(*_i, (sz)); \
+ \
+ __u._ptr = _i; \
+ __u._bit &= _mask; \
+ __u._ptr; \
+})
#endif /* __NOSPEC_H__ */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2b3dbcd40e46..202cb1dc03ee 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16,6 +16,7 @@
#include <linux/nl80211.h>
#include <linux/rtnetlink.h>
#include <linux/netlink.h>
+#include <linux/nospec.h>
#include <linux/etherdevice.h>
#include <net/net_namespace.h>
#include <net/genetlink.h>
@@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
static int parse_txq_params(struct nlattr *tb[],
struct ieee80211_txq_params *txq_params)
{
+ u8 ac, *idx;
+
if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
!tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
!tb[NL80211_TXQ_ATTR_AIFS])
return -EINVAL;

- txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+ ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);

- if (txq_params->ac >= NL80211_NUM_ACS)
+ idx = array_idx(ac, NL80211_NUM_ACS);
+ if (!idx)
return -EINVAL;
-
+ txq_params->ac = *idx;
return 0;
}



2018-01-19 00:15:16

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 08/10] vfs, fdtable: prevent bounds-check bypass via speculative execution

'fd' is a user controlled value that is used as a data dependency to
read from the 'fdt->fd' array. In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid 'file *' returned from
__fcheck_files.

Cc: Al Viro <[email protected]>
Co-developed-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
include/linux/fdtable.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..9731f1a255db 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/nospec.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/fs.h>
@@ -81,9 +82,11 @@ struct dentry;
static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
{
struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+ struct file __rcu **fdp;

- if (fd < fdt->max_fds)
- return rcu_dereference_raw(fdt->fd[fd]);
+ fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
+ if (fdp)
+ return rcu_dereference_raw(*fdp);
return NULL;
}



2018-01-19 00:15:33

by Dan Williams

[permalink] [raw]
Subject: [PATCH v4 05/10] x86, __get_user: use __uaccess_begin_nospec

Quoting Linus:

I do think that it would be a good idea to very expressly document
the fact that it's not that the user access itself is unsafe. I do
agree that things like "get_user()" want to be protected, but not
because of any direct bugs or problems with get_user() and friends,
but simply because get_user() is an excellent source of a pointer
that is obviously controlled from a potentially attacking user
space. So it's a prime candidate for then finding _subsequent_
accesses that can then be used to perturb the cache.

'__uaccess_begin_nospec' covers '__get_user' and 'copy_from_iter' where
the limit check is far away from the user pointer de-reference. In those
cases an 'lfence' prevents speculation with a potential pointer to
privileged memory.

Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Andi Kleen <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/uaccess.h | 6 +++---
arch/x86/include/asm/uaccess_32.h | 6 +++---
arch/x86/include/asm/uaccess_64.h | 12 ++++++------
arch/x86/lib/usercopy_32.c | 8 ++++----
4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 626caf58183a..a930585fa3b5 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do { \
({ \
int __gu_err; \
__inttype(*(ptr)) __gu_val; \
- __uaccess_begin(); \
+ __uaccess_begin_nospec(); \
__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \
__uaccess_end(); \
(x) = (__force __typeof__(*(ptr)))__gu_val; \
@@ -557,7 +557,7 @@ struct __large_struct { unsigned long buf[100]; };
* get_user_ex(...);
* } get_user_catch(err)
*/
-#define get_user_try uaccess_try
+#define get_user_try uaccess_try_nospec
#define get_user_catch(err) uaccess_catch(err)

#define get_user_ex(x, ptr) do { \
@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
__typeof__(ptr) __uval = (uval); \
__typeof__(*(ptr)) __old = (old); \
__typeof__(*(ptr)) __new = (new); \
- __uaccess_begin(); \
+ __uaccess_begin_nospec(); \
switch (size) { \
case 1: \
{ \
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
switch (n) {
case 1:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u8 *)to, from, ret,
"b", "b", "=q", 1);
__uaccess_end();
return ret;
case 2:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u16 *)to, from, ret,
"w", "w", "=r", 2);
__uaccess_end();
return ret;
case 4:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u32 *)to, from, ret,
"l", "k", "=r", 4);
__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
return copy_user_generic(dst, (__force void *)src, size);
switch (size) {
case 1:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
ret, "b", "b", "=q", 1);
__uaccess_end();
return ret;
case 2:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
ret, "w", "w", "=r", 2);
__uaccess_end();
return ret;
case 4:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
ret, "l", "k", "=r", 4);
__uaccess_end();
return ret;
case 8:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 8);
__uaccess_end();
return ret;
case 10:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 10);
if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
__uaccess_end();
return ret;
case 16:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 16);
if (likely(!ret))
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do { \

unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
if (movsl_is_ok(to, from, n))
__copy_user(to, from, n);
else
n = __copy_user_intel(to, from, n);
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
#ifdef CONFIG_X86_INTEL_USERCOPY
if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
#else
__copy_user(to, from, n);
#endif
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);


2018-01-19 08:44:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] kvm, x86: fix spectre-v1 mitigation

On 19/01/2018 01:02, Dan Williams wrote:
> Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
> added a raw 'asm("lfence");' to prevent a bounds check bypass of
> 'vmcs_field_to_offset_table'. This does not work for some AMD cpus, see
> the 'ifence' helper,

The code never runs on AMD cpus (it's for Intel virtualization
extensions), so it'd be nice if you could fix up the commit message.

Apart from this, obviously

Acked-by: Paolo Bonzini <[email protected]>

Thanks!

Paolo

> and it otherwise does not use the common
> 'array_ptr' helper designed for these types of fixes. Convert this to
> use 'array_ptr'.
>
> Cc: Andrew Honig <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..20b9b0b5e336 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -34,6 +34,7 @@
> #include <linux/tboot.h>
> #include <linux/hrtimer.h>
> #include <linux/frame.h>
> +#include <linux/nospec.h>
> #include "kvm_cache_regs.h"
> #include "x86.h"
>
> @@ -898,21 +899,15 @@ static const unsigned short vmcs_field_to_offset_table[] = {
>
> static inline short vmcs_field_to_offset(unsigned long field)
> {
> - BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
> -
> - if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
> - return -ENOENT;
> + const unsigned short *offset;
>
> - /*
> - * FIXME: Mitigation for CVE-2017-5753. To be replaced with a
> - * generic mechanism.
> - */
> - asm("lfence");
> + BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
>
> - if (vmcs_field_to_offset_table[field] == 0)
> + offset = array_ptr(vmcs_field_to_offset_table, field,
> + ARRAY_SIZE(vmcs_field_to_offset_table));
> + if (!offset || *offset == 0)
> return -ENOENT;
> -
> - return vmcs_field_to_offset_table[field];
> + return *offset;
> }
>
> static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
>


2018-01-19 10:23:17

by Jann Horn

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On Fri, Jan 19, 2018 at 1:01 AM, Dan Williams <[email protected]> wrote:
> 'array_ptr' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_ptr' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).
>
> Based on an original implementation by Linus Torvalds, tweaked to remove
> speculative flows by Alexei Starovoitov, and tweaked again by Linus to
> introduce an x86 assembly implementation for the mask generation.
[...]
> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.
> + */
> +#ifndef array_ptr_mask
> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
> +{
> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> +}
> +#endif

Nit: Maybe add a comment saying that this is equivalent to
"return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

> +/**
> + * array_ptr - Generate a pointer to an array element, ensuring
> + * the pointer is bounded under speculation to NULL.
> + *
> + * @base: the base of the array
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns the pointer to
> + * @arr[@idx], otherwise returns NULL.
> + */
> +#define array_ptr(base, idx, sz) \
> +({ \
> + union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
> + typeof(*(base)) *_arr = (base); \
> + unsigned long _i = (idx); \
> + unsigned long _mask = array_ptr_mask(_i, (sz)); \
> + \
> + __u._ptr = _arr + (_i & _mask); \
> + __u._bit &= _mask; \

AFAICS, if `idx` is out of bounds, you first zero out the index
(`_i & _mask`) and then immediately afterwards zero out
the whole pointer (`_u._bit &= _mask`).
Is there a reason for the `_i & _mask`, and if so, can you
add a comment explaining that?

2018-01-19 18:13:39

by Dan Williams

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

[ adding Alexei back to the cc ]

On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <[email protected]> wrote:
> Jann Horn <[email protected]> writes:
>
>>> +/*
>>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>>> + * failed, array_ptr will return NULL.
>>> + */
>>> +#ifndef array_ptr_mask
>>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>>> unsigned long sz)
>>> +{
>>> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>>> +}
>>> +#endif
>>
>> Nit: Maybe add a comment saying that this is equivalent to
>> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
>
> That's only true when sz < LONG_MAX, which is documented below but not
> here; it's also different from the asm version, which doesn't do the idx
> <= LONG_MAX check. So making the constraint explicit would be a good idea.
>
> From a bit of experimentation, when the top bit of sz is set, this
> expression, the C version and the assembler version all have different
> behaviour. For example, with 32-bit unsigned long:
>
> index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
>
> It may be worth noting that:
>
> return 0 - ((long) (idx < sz));
>
> causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> sequence as in Linus's asm. Are there architectures where this form
> would allow speculation?

We're operating on the assumption that compilers will not try to
introduce branches where they don't exist in the code, so if this is
producing identical assembly I think we should go with it and drop the
x86 array_ptr_mask.

2018-01-19 18:19:16

by Adam Sampson

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

Jann Horn <[email protected]> writes:

>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx,
>> unsigned long sz)
>> +{
>> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
> Nit: Maybe add a comment saying that this is equivalent to
> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?

That's only true when sz < LONG_MAX, which is documented below but not
here; it's also different from the asm version, which doesn't do the idx
<= LONG_MAX check. So making the constraint explicit would be a good idea.

From a bit of experimentation, when the top bit of sz is set, this
expression, the C version and the assembler version all have different
behaviour. For example, with 32-bit unsigned long:

index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff

It may be worth noting that:

return 0 - ((long) (idx < sz));

causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
sequence as in Linus's asm. Are there architectures where this form
would allow speculation?

Thanks,

--
Adam Sampson <[email protected]> <http://offog.org/>

2018-01-19 18:19:30

by Will Deacon

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On Fri, Jan 19, 2018 at 10:12:47AM -0800, Dan Williams wrote:
> [ adding Alexei back to the cc ]
>
> On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <[email protected]> wrote:
> > Jann Horn <[email protected]> writes:
> >
> >>> +/*
> >>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> >>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> >>> + * failed, array_ptr will return NULL.
> >>> + */
> >>> +#ifndef array_ptr_mask
> >>> +static inline unsigned long array_ptr_mask(unsigned long idx,
> >>> unsigned long sz)
> >>> +{
> >>> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> >>> +}
> >>> +#endif
> >>
> >> Nit: Maybe add a comment saying that this is equivalent to
> >> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
> >
> > That's only true when sz < LONG_MAX, which is documented below but not
> > here; it's also different from the asm version, which doesn't do the idx
> > <= LONG_MAX check. So making the constraint explicit would be a good idea.
> >
> > From a bit of experimentation, when the top bit of sz is set, this
> > expression, the C version and the assembler version all have different
> > behaviour. For example, with 32-bit unsigned long:
> >
> > index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> > index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> > index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
> >
> > It may be worth noting that:
> >
> > return 0 - ((long) (idx < sz));
> >
> > causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> > sequence as in Linus's asm. Are there architectures where this form
> > would allow speculation?
>
> We're operating on the assumption that compilers will not try to
> introduce branches where they don't exist in the code, so if this is
> producing identical assembly I think we should go with it and drop the
> x86 array_ptr_mask.

Branches, perhaps, but this could easily be compiled to a conditional
select (CSEL) instruction on arm64 and that wouldn't be safe without a
CSDB. Of course, we can do our own thing in assembly to prevent that, but
it would mean that the generic C implementation would not be robust for us.

Will

2018-01-19 18:21:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On Fri, Jan 19, 2018 at 2:20 AM, Jann Horn <[email protected]> wrote:
>> + \
>> + __u._ptr = _arr + (_i & _mask); \
>> + __u._bit &= _mask; \
>
> AFAICS, if `idx` is out of bounds, you first zero out the index
> (`_i & _mask`) and then immediately afterwards zero out
> the whole pointer (`_u._bit &= _mask`).
> Is there a reason for the `_i & _mask`, and if so, can you
> add a comment explaining that?

I think that's just leftovers from my original (untested) thing that
also did the access itself. So that __u._bit masking wasn't masking
the pointer, it was masking the value that was *read* from the
pointer, so that you could know that an invalid access returned
0/NULL, not just the first value in the array.

Linus

2018-01-19 18:28:15

by Dan Williams

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On Fri, Jan 19, 2018 at 10:18 AM, Will Deacon <[email protected]> wrote:
>
> On Fri, Jan 19, 2018 at 10:12:47AM -0800, Dan Williams wrote:
> > [ adding Alexei back to the cc ]
> >
> > On Fri, Jan 19, 2018 at 9:48 AM, Adam Sampson <[email protected]> wrote:
> > > Jann Horn <[email protected]> writes:
> > >
> > >>> +/*
> > >>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> > >>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> > >>> + * failed, array_ptr will return NULL.
> > >>> + */
> > >>> +#ifndef array_ptr_mask
> > >>> +static inline unsigned long array_ptr_mask(unsigned long idx,
> > >>> unsigned long sz)
> > >>> +{
> > >>> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> > >>> +}
> > >>> +#endif
> > >>
> > >> Nit: Maybe add a comment saying that this is equivalent to
> > >> "return ((long)idx >= 0 && idx < sz) ? ULONG_MAX : 0"?
> > >
> > > That's only true when sz < LONG_MAX, which is documented below but not
> > > here; it's also different from the asm version, which doesn't do the idx
> > > <= LONG_MAX check. So making the constraint explicit would be a good idea.
> > >
> > > From a bit of experimentation, when the top bit of sz is set, this
> > > expression, the C version and the assembler version all have different
> > > behaviour. For example, with 32-bit unsigned long:
> > >
> > > index=00000000 size=80000001: expr=ffffffff c=00000000 asm=ffffffff
> > > index=80000000 size=80000001: expr=00000000 c=00000000 asm=ffffffff
> > > index=00000000 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > > index=00000001 size=a0000000: expr=ffffffff c=00000000 asm=ffffffff
> > > index=fffffffe size=ffffffff: expr=00000000 c=00000000 asm=ffffffff
> > >
> > > It may be worth noting that:
> > >
> > > return 0 - ((long) (idx < sz));
> > >
> > > causes GCC, on ia32 and amd64, to generate exactly the same cmp/sbb
> > > sequence as in Linus's asm. Are there architectures where this form
> > > would allow speculation?
> >
> > We're operating on the assumption that compilers will not try to
> > introduce branches where they don't exist in the code, so if this is
> > producing identical assembly I think we should go with it and drop the
> > x86 array_ptr_mask.
>
> Branches, perhaps, but this could easily be compiled to a conditional
> select (CSEL) instruction on arm64 and that wouldn't be safe without a
> CSDB. Of course, we can do our own thing in assembly to prevent that, but
> it would mean that the generic C implementation would not be robust for us.
>

Ah, ok good to know. Likely if the current version doesn't produce a
conditional instruction on ARM perhaps it also won't do that on other
architectures, so it is safer.

2018-01-19 20:56:06

by Dan Williams

[permalink] [raw]
Subject: Re: [kernel-hardening] [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On Fri, Jan 19, 2018 at 10:18 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Jan 19, 2018 at 2:20 AM, Jann Horn <[email protected]> wrote:
>>> + \
>>> + __u._ptr = _arr + (_i & _mask); \
>>> + __u._bit &= _mask; \
>>
>> AFAICS, if `idx` is out of bounds, you first zero out the index
>> (`_i & _mask`) and then immediately afterwards zero out
>> the whole pointer (`_u._bit &= _mask`).
>> Is there a reason for the `_i & _mask`, and if so, can you
>> add a comment explaining that?
>
> I think that's just leftovers from my original (untested) thing that
> also did the access itself. So that __u._bit masking wasn't masking
> the pointer, it was masking the value that was *read* from the
> pointer, so that you could know that an invalid access returned
> 0/NULL, not just the first value in the array.

Yes, the index masking can be dropped since we're returning a
sanitized array element pointer now.

2018-01-20 07:00:16

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution

On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <[email protected]> wrote:
> Changes since v3 [1]
> * Drop 'ifence_array_ptr' and associated compile-time + run-time
> switching and just use the masking approach all the time.
>
> * Convert 'get_user' to use pointer sanitization via masking rather than
> lfence. '__get_user' and associated paths still rely on
> lfence. (Linus)
>
> "Basically, the rule is trivial: find all 'stac' users, and use
> address masking if those users already integrate the limit
> check, and lfence they don't."
>
> * At syscall entry sanitize the syscall number under speculation
> to remove a user controlled pointer de-reference in kernel
> space. (Linus)
>
> * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
> 'array_ptr'.
>
> * Propose 'array_idx' as a way to sanitize user input that is
> later used as an array index, but where the validation is
> happening in a different code block than the array reference.
> (Christian).
>
> * Fix grammar in speculation.txt (Kees)
>
> ---
>
> Quoting Mark's original RFC:
>
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
>
> A precondition of using this attack on the kernel is to get a user
> controlled pointer de-referenced (under speculation) in privileged code.
> The primary source of user controlled pointers in the kernel is the
> arguments passed to 'get_user' and '__get_user'. An example of other
> user controlled pointers are user-controlled array / pointer offsets.
>
> Better tooling is needed to find more arrays / pointers with user
> controlled indices / offsets that can be converted to use 'array_ptr' or
> 'array_idx'. A few are included in this set, and these are not expected
> to be complete. That said, the 'get_user' protections raise the bar on
> finding a vulnerable gadget in the kernel.
>
> These patches are also available via the 'nospec-v4' git branch here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4

I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
Paolo's ack.

git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index 8af35be1869e..b8a9222e34d1 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
long idx, unsigned long sz)
unsigned long _i = (idx); \
unsigned long _mask = array_ptr_mask(_i, (sz)); \
\
- __u._ptr = _arr + (_i & _mask); \
+ __u._ptr = _arr + _i; \
__u._bit &= _mask; \
__u._ptr; \
})

2018-01-20 17:03:29

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution

On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <[email protected]> wrote:
> > Changes since v3 [1]
> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
> > switching and just use the masking approach all the time.
> >
> > * Convert 'get_user' to use pointer sanitization via masking rather than
> > lfence. '__get_user' and associated paths still rely on
> > lfence. (Linus)
> >
> > "Basically, the rule is trivial: find all 'stac' users, and use
> > address masking if those users already integrate the limit
> > check, and lfence they don't."
> >
> > * At syscall entry sanitize the syscall number under speculation
> > to remove a user controlled pointer de-reference in kernel
> > space. (Linus)
> >
> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
> > 'array_ptr'.
> >
> > * Propose 'array_idx' as a way to sanitize user input that is
> > later used as an array index, but where the validation is
> > happening in a different code block than the array reference.
> > (Christian).
> >
> > * Fix grammar in speculation.txt (Kees)
> >
> > ---
> >
> > Quoting Mark's original RFC:
> >
> > "Recently, Google Project Zero discovered several classes of attack
> > against speculative execution. One of these, known as variant-1, allows
> > explicit bounds checks to be bypassed under speculation, providing an
> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
> > and the Documentation patch in this series."
> >
> > A precondition of using this attack on the kernel is to get a user
> > controlled pointer de-referenced (under speculation) in privileged code.
> > The primary source of user controlled pointers in the kernel is the
> > arguments passed to 'get_user' and '__get_user'. An example of other
> > user controlled pointers are user-controlled array / pointer offsets.
> >
> > Better tooling is needed to find more arrays / pointers with user
> > controlled indices / offsets that can be converted to use 'array_ptr' or
> > 'array_idx'. A few are included in this set, and these are not expected
> > to be complete. That said, the 'get_user' protections raise the bar on
> > finding a vulnerable gadget in the kernel.
> >
> > These patches are also available via the 'nospec-v4' git branch here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
>
> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
> Paolo's ack.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
>
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index 8af35be1869e..b8a9222e34d1 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
> long idx, unsigned long sz)
> unsigned long _i = (idx); \
> unsigned long _mask = array_ptr_mask(_i, (sz)); \
> \
> - __u._ptr = _arr + (_i & _mask); \
> + __u._ptr = _arr + _i; \
> __u._bit &= _mask; \
> __u._ptr; \

hmm. I'm not sure it's the right thing to do, since the macro
is forcing cpu to speculate subsequent load from null instead
of valid pointer.
As Linus said: "
So that __u._bit masking wasn't masking
the pointer, it was masking the value that was *read* from the
pointer, so that you could know that an invalid access returned
0/NULL, not just the first value in the array.
"
imo just
return _arr + (_i & _mask);
is enough. No need for union games.
The cpu will speculate the load from _arr[0] if _i is out of bounds
which is the same as if user passed _i == 0 which would have passed
bounds check anyway, so I don't see any data leak from populating
cache with _arr[0] data. In-bounds access can do that just as well
without any speculation.


2018-01-20 17:12:45

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] prevent bounds-check bypass via speculative execution

On Sat, Jan 20, 2018 at 8:56 AM, Alexei Starovoitov
<[email protected]> wrote:
> On Fri, Jan 19, 2018 at 10:58:44PM -0800, Dan Williams wrote:
>> On Thu, Jan 18, 2018 at 4:01 PM, Dan Williams <[email protected]> wrote:
>> > Changes since v3 [1]
>> > * Drop 'ifence_array_ptr' and associated compile-time + run-time
>> > switching and just use the masking approach all the time.
>> >
>> > * Convert 'get_user' to use pointer sanitization via masking rather than
>> > lfence. '__get_user' and associated paths still rely on
>> > lfence. (Linus)
>> >
>> > "Basically, the rule is trivial: find all 'stac' users, and use
>> > address masking if those users already integrate the limit
>> > check, and lfence they don't."
>> >
>> > * At syscall entry sanitize the syscall number under speculation
>> > to remove a user controlled pointer de-reference in kernel
>> > space. (Linus)
>> >
>> > * Fix a raw lfence in the kvm code (added for v4.15-rc8) to use
>> > 'array_ptr'.
>> >
>> > * Propose 'array_idx' as a way to sanitize user input that is
>> > later used as an array index, but where the validation is
>> > happening in a different code block than the array reference.
>> > (Christian).
>> >
>> > * Fix grammar in speculation.txt (Kees)
>> >
>> > ---
>> >
>> > Quoting Mark's original RFC:
>> >
>> > "Recently, Google Project Zero discovered several classes of attack
>> > against speculative execution. One of these, known as variant-1, allows
>> > explicit bounds checks to be bypassed under speculation, providing an
>> > arbitrary read gadget. Further details can be found on the GPZ blog [2]
>> > and the Documentation patch in this series."
>> >
>> > A precondition of using this attack on the kernel is to get a user
>> > controlled pointer de-referenced (under speculation) in privileged code.
>> > The primary source of user controlled pointers in the kernel is the
>> > arguments passed to 'get_user' and '__get_user'. An example of other
>> > user controlled pointers are user-controlled array / pointer offsets.
>> >
>> > Better tooling is needed to find more arrays / pointers with user
>> > controlled indices / offsets that can be converted to use 'array_ptr' or
>> > 'array_idx'. A few are included in this set, and these are not expected
>> > to be complete. That said, the 'get_user' protections raise the bar on
>> > finding a vulnerable gadget in the kernel.
>> >
>> > These patches are also available via the 'nospec-v4' git branch here:
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4
>>
>> I've pushed out a nospec-v4.1 with the below minor cleanup, a fixup of
>> the changelog for "kvm, x86: fix spectre-v1 mitigation", and added
>> Paolo's ack.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v4.1
>>
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> index 8af35be1869e..b8a9222e34d1 100644
>> --- a/include/linux/nospec.h
>> +++ b/include/linux/nospec.h
>> @@ -37,7 +37,7 @@ static inline unsigned long array_ptr_mask(unsigned
>> long idx, unsigned long sz)
>> unsigned long _i = (idx); \
>> unsigned long _mask = array_ptr_mask(_i, (sz)); \
>> \
>> - __u._ptr = _arr + (_i & _mask); \
>> + __u._ptr = _arr + _i; \
>> __u._bit &= _mask; \
>> __u._ptr; \
>
> hmm. I'm not sure it's the right thing to do, since the macro
> is forcing cpu to speculate subsequent load from null instead
> of valid pointer.
> As Linus said: "
> So that __u._bit masking wasn't masking
> the pointer, it was masking the value that was *read* from the
> pointer, so that you could know that an invalid access returned
> 0/NULL, not just the first value in the array.
> "
> imo just
> return _arr + (_i & _mask);
> is enough. No need for union games.
> The cpu will speculate the load from _arr[0] if _i is out of bounds
> which is the same as if user passed _i == 0 which would have passed
> bounds check anyway, so I don't see any data leak from populating
> cache with _arr[0] data. In-bounds access can do that just as well
> without any speculation.

scratch that. It's array_ptr, not array_access.
The code will do if (!ptr) later, so yeah this api is fine.

2018-01-21 10:38:34

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] nl80211: sanitize array index in parse_txq_params

On Thu, 2018-01-18 at 16:02 -0800, Dan Williams wrote:
> Wireless drivers rely on parse_txq_params to validate that
> txq_params->ac is less than NL80211_NUM_ACS by the time the low-level
> driver's ->conf_tx() handler is called. Use a new helper, 'array_idx',
> to sanitize txq_params->ac with respect to speculation. I.e. ensure that
> any speculation into ->conf_tx() handlers is done with a value of
> txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

Sorry, I didn't realize you were waiting for me to review.

LGTM.

Reviewed-by: Johannes Berg <[email protected]>


> Reported-by: Christian Lamparter <[email protected]>
> Reported-by: Elena Reshetova <[email protected]>
> Cc: Johannes Berg <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Dan Williams <[email protected]>
> ---
> include/linux/nospec.h | 21 +++++++++++++++++++++
> net/wireless/nl80211.c | 10 +++++++---
> 2 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> index f841c11f3f1f..8af35be1869e 100644
> --- a/include/linux/nospec.h
> +++ b/include/linux/nospec.h
> @@ -41,4 +41,25 @@ static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
> __u._bit &= _mask; \
> __u._ptr; \
> })
> +
> +/**
> + * array_idx - Generate a pointer to an array index, ensuring the
> + * pointer is bounded under speculation to NULL.
> + *
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns &@idx otherwise
> + * returns NULL.
> + */
> +#define array_idx(idx, sz) \
> +({ \
> + union { typeof((idx)) *_ptr; unsigned long _bit; } __u; \
> + typeof(idx) *_i = &(idx); \
> + unsigned long _mask = array_ptr_mask(*_i, (sz)); \
> + \
> + __u._ptr = _i; \
> + __u._bit &= _mask; \
> + __u._ptr; \
> +})
> #endif /* __NOSPEC_H__ */
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 2b3dbcd40e46..202cb1dc03ee 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -16,6 +16,7 @@
> #include <linux/nl80211.h>
> #include <linux/rtnetlink.h>
> #include <linux/netlink.h>
> +#include <linux/nospec.h>
> #include <linux/etherdevice.h>
> #include <net/net_namespace.h>
> #include <net/genetlink.h>
> @@ -2056,20 +2057,23 @@ static const struct nla_policy txq_params_policy[NL80211_TXQ_ATTR_MAX + 1] = {
> static int parse_txq_params(struct nlattr *tb[],
> struct ieee80211_txq_params *txq_params)
> {
> + u8 ac, *idx;
> +
> if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
> !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
> !tb[NL80211_TXQ_ATTR_AIFS])
> return -EINVAL;
>
> - txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
> + ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
> txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
> txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
> txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
> txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
>
> - if (txq_params->ac >= NL80211_NUM_ACS)
> + idx = array_idx(ac, NL80211_NUM_ACS);
> + if (!idx)
> return -EINVAL;
> -
> + txq_params->ac = *idx;
> return 0;
> }
>
>
>

2018-01-24 14:41:24

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On 01/19/2018, 01:02 AM, Dan Williams wrote:
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.
>
> Reported-by: Linus Torvalds <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: Andy Lutomirski <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> arch/x86/entry/entry_64.S | 2 ++
> arch/x86/include/asm/smap.h | 9 ++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4f8e1d35a97c..2320017077d4 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
> #include <asm/asm.h>
> #include <asm/smap.h>
> #include <asm/pgtable_types.h>
> +#include <asm/smap.h>

This is already included 2 lines above

thanks,
--
js
suse labs

2018-01-25 07:17:46

by Cyril Novikov

[permalink] [raw]
Subject: Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On 1/18/2018 4:01 PM, Dan Williams wrote:
> 'array_ptr' is proposed as a generic mechanism to mitigate against
> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
> via speculative execution). The 'array_ptr' implementation is expected
> to be safe for current generation cpus across multiple architectures
> (ARM, x86).

I'm an outside reviewer, not subscribed to the list, so forgive me if I
do something not according to protocol. I have the following comments on
this change:

After discarding the speculation barrier variant, is array_ptr() needed
at all? You could have a simpler sanitizing macro, say

#define array_sanitize_idx(idx, sz) ((idx) & array_ptr_mask((idx), (sz)))

(adjusted to not evaluate idx twice). And use it as follows:

if (idx < array_size) {
idx = array_sanitize_idx(idx, array_size);
do_something(array[idx]);
}

If I understand the speculation stuff correctly, unlike array_ptr(),
this "leaks" array[0] rather than nothing (*NULL) when executed
speculatively. However, it's still much better than leaking an arbitrary
location in memory. The attacker can likely get array[0] "leaked" by
passing 0 as idx anyway.

> +/*
> + * If idx is negative or if idx > size then bit 63 is set in the mask,
> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
> + * failed, array_ptr will return NULL.
> + */
> +#ifndef array_ptr_mask
> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
> +{
> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
> +}
> +#endif

Why does this have to resort to the undefined behavior of shifting a
negative number to the right? You can do without it:

return ((idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1)) - 1;

Of course, you could argue that subtracting 1 from 0 to get all ones is
also an undefined behavior, but it's still much better than the shift,
isn't it?

> +#define array_ptr(base, idx, sz) \
> +({ \
> + union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
> + typeof(*(base)) *_arr = (base); \
> + unsigned long _i = (idx); \
> + unsigned long _mask = array_ptr_mask(_i, (sz)); \
> + \
> + __u._ptr = _arr + (_i & _mask); \
> + __u._bit &= _mask; \
> + __u._ptr; \
> +})

Call me paranoid, but I think this may actually create an exploitable
bug on 32-bit systems due to casting the index to an unsigned long, if
the index as it comes from userland is a 64-bit value. You have
*replaced* the "if (idx < array_size)" check with checking if
array_ptr() returns NULL. Well, it doesn't return NULL if the low 32
bits of the index are in-bounds, but the high 32 bits are not zero.
Apart from the return value pointing to the wrong place, the subsequent
code may then assume that the 64-bit idx is actually valid and trip on
it badly.

--
Cyril

2018-01-25 22:38:37

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 02/10] asm/nospec, array_ptr: sanitize speculative array de-references

On Wed, Jan 24, 2018 at 11:09 PM, Cyril Novikov <[email protected]> wrote:
> On 1/18/2018 4:01 PM, Dan Williams wrote:
>>
>> 'array_ptr' is proposed as a generic mechanism to mitigate against
>> Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks
>> via speculative execution). The 'array_ptr' implementation is expected
>> to be safe for current generation cpus across multiple architectures
>> (ARM, x86).
>
>
> I'm an outside reviewer, not subscribed to the list, so forgive me if I do
> something not according to protocol. I have the following comments on this
> change:
>
> After discarding the speculation barrier variant, is array_ptr() needed at
> all? You could have a simpler sanitizing macro, say
>
> #define array_sanitize_idx(idx, sz) ((idx) & array_ptr_mask((idx), (sz)))
>
> (adjusted to not evaluate idx twice). And use it as follows:
>
> if (idx < array_size) {
> idx = array_sanitize_idx(idx, array_size);
> do_something(array[idx]);
> }
>
> If I understand the speculation stuff correctly, unlike array_ptr(), this
> "leaks" array[0] rather than nothing (*NULL) when executed speculatively.
> However, it's still much better than leaking an arbitrary location in
> memory. The attacker can likely get array[0] "leaked" by passing 0 as idx
> anyway.

True, we could simplify it to just sanitize the index with the
expectation that speculating with index 0 is safe. Although we'd want
to document it just case there is some odd code paths where the valid
portion of the array is offset from 0.

I like this approach better because it handles cases where the bounds
check is far away from the array de-reference. I.e. instead of having
array_ptr() and array_idx() for different cases, just sanitize the
index.

>
>> +/*
>> + * If idx is negative or if idx > size then bit 63 is set in the mask,
>> + * and the value of ~(-1L) is zero. When the mask is zero, bounds check
>> + * failed, array_ptr will return NULL.
>> + */
>> +#ifndef array_ptr_mask
>> +static inline unsigned long array_ptr_mask(unsigned long idx, unsigned
>> long sz)
>> +{
>> + return ~(long)(idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1);
>> +}
>> +#endif
>
>
> Why does this have to resort to the undefined behavior of shifting a
> negative number to the right? You can do without it:
>
> return ((idx | (sz - 1 - idx)) >> (BITS_PER_LONG - 1)) - 1;
>
> Of course, you could argue that subtracting 1 from 0 to get all ones is also
> an undefined behavior, but it's still much better than the shift, isn't it?

The goal is to prevent the compiler from emitting conditional
instructions. For example, the simplest form of this sanitize
operation from Adam:

return 0 - ((long) (idx < sz));

...produces the desired "cmp, sbb" sequence on x86, but leads to a
"csel" instruction being emitted for arm64.

Stepping back and realizing that all this churn is around the
un-optimized form of the comparison vs the inline asm that an arch can
provide, and we're effectively handling the carry bit in software, we
can have a WARN_ON_ONCE(idx < 0 || sz < 0) to catch places where the
expectations of the macro are violated.

Archs can remove the overhead of that extra branch with an instruction
sequence to handle the carry bit in hardware, otherwise we get runtime
coverage of places where array_idx() gets used incorrectly.

>
>> +#define array_ptr(base, idx, sz) \
>> +({ \
>> + union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
>> + typeof(*(base)) *_arr = (base); \
>> + unsigned long _i = (idx); \
>> + unsigned long _mask = array_ptr_mask(_i, (sz)); \
>> + \
>> + __u._ptr = _arr + (_i & _mask); \
>> + __u._bit &= _mask; \
>> + __u._ptr; \
>> +})
>
>
> Call me paranoid, but I think this may actually create an exploitable bug on
> 32-bit systems due to casting the index to an unsigned long, if the index as
> it comes from userland is a 64-bit value. You have *replaced* the "if (idx <
> array_size)" check with checking if array_ptr() returns NULL. Well, it
> doesn't return NULL if the low 32 bits of the index are in-bounds, but the
> high 32 bits are not zero. Apart from the return value pointing to the wrong
> place, the subsequent code may then assume that the 64-bit idx is actually
> valid and trip on it badly.

Yes, I'll add some BUILD_BUG_ON safety for cases where sizeof(idx) >
sizeof(long).

Thanks for taking a look.

2018-02-06 19:30:51

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
> The syscall table base is a user controlled function pointer in kernel
> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> speculation. While retpoline prevents speculating into the user
> controlled target it does not stop the pointer de-reference, the concern
> is leaking memory relative to the syscall table base.

This patch seems to cause a regression. An easy way to reproduce what
I'm seeing is to run the samples/statx/test-statx. Here's what I see
when I have this patchset applied:

# ./test-statx /tmp
statx(/tmp) = -1
/tmp: Bad file descriptor

Reverting this single patch seems to fix it.

Cheers,
--
Lu?s

>
> Reported-by: Linus Torvalds <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: Andy Lutomirski <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> arch/x86/entry/entry_64.S | 2 ++
> arch/x86/include/asm/smap.h | 9 ++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4f8e1d35a97c..2320017077d4 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -35,6 +35,7 @@
> #include <asm/asm.h>
> #include <asm/smap.h>
> #include <asm/pgtable_types.h>
> +#include <asm/smap.h>
> #include <asm/export.h>
> #include <asm/frame.h>
> #include <asm/nospec-branch.h>
> @@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
> cmpl $__NR_syscall_max, %eax
> #endif
> ja 1f /* return -ENOSYS (already in pt_regs->ax) */
> + MASK_NOSPEC %r11 %rax /* sanitize syscall_nr wrt speculation */
> movq %r10, %rcx
>
> /*
> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index 2b4ad4c6a226..3b5b2cf58dc6 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -35,7 +35,14 @@
> * this directs the cpu to speculate with a NULL ptr rather than
> * something targeting kernel memory.
> *
> - * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
> + * In the syscall entry path it is possible to speculate past the
> + * validation of the system call number. Use MASK_NOSPEC to sanitize the
> + * syscall array index to zero (sys_read) rather than an arbitrary
> + * target.
> + *
> + * assumes CF is set from a previous 'cmp' i.e.:
> + * cmp TASK_addr_limit, %ptr
> + * cmp __NR_syscall_max, %idx
> */
> .macro MASK_NOSPEC mask val
> sbb \mask, \mask
>
>

2018-02-06 19:49:59

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 11:29 AM, Luis Henriques <[email protected]> wrote:
> On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
>> The syscall table base is a user controlled function pointer in kernel
>> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
>> speculation. While retpoline prevents speculating into the user
>> controlled target it does not stop the pointer de-reference, the concern
>> is leaking memory relative to the syscall table base.
>
> This patch seems to cause a regression. An easy way to reproduce what
> I'm seeing is to run the samples/statx/test-statx. Here's what I see
> when I have this patchset applied:
>
> # ./test-statx /tmp
> statx(/tmp) = -1
> /tmp: Bad file descriptor
>
> Reverting this single patch seems to fix it.

Just to clarify, when you say "this patch" you mean:

2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
under speculation

...not this early MASK_NOSPEC version of the patch, right?

>
> Cheers,
> --
> Luís
>
>>
>> Reported-by: Linus Torvalds <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: Andy Lutomirski <[email protected]>
>> Signed-off-by: Dan Williams <[email protected]>
>> ---
>> arch/x86/entry/entry_64.S | 2 ++
>> arch/x86/include/asm/smap.h | 9 ++++++++-
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 4f8e1d35a97c..2320017077d4 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -35,6 +35,7 @@
>> #include <asm/asm.h>
>> #include <asm/smap.h>
>> #include <asm/pgtable_types.h>
>> +#include <asm/smap.h>
>> #include <asm/export.h>
>> #include <asm/frame.h>
>> #include <asm/nospec-branch.h>
>> @@ -264,6 +265,7 @@ entry_SYSCALL_64_fastpath:
>> cmpl $__NR_syscall_max, %eax
>> #endif
>> ja 1f /* return -ENOSYS (already in pt_regs->ax) */
>> + MASK_NOSPEC %r11 %rax /* sanitize syscall_nr wrt speculation */
>> movq %r10, %rcx
>>
>> /*
>> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
>> index 2b4ad4c6a226..3b5b2cf58dc6 100644
>> --- a/arch/x86/include/asm/smap.h
>> +++ b/arch/x86/include/asm/smap.h
>> @@ -35,7 +35,14 @@
>> * this directs the cpu to speculate with a NULL ptr rather than
>> * something targeting kernel memory.
>> *
>> - * assumes CF is set from a previous 'cmp TASK_addr_limit, %ptr'
>> + * In the syscall entry path it is possible to speculate past the
>> + * validation of the system call number. Use MASK_NOSPEC to sanitize the
>> + * syscall array index to zero (sys_read) rather than an arbitrary
>> + * target.
>> + *
>> + * assumes CF is set from a previous 'cmp' i.e.:
>> + * cmp TASK_addr_limit, %ptr
>> + * cmp __NR_syscall_max, %idx
>> */
>> .macro MASK_NOSPEC mask val
>> sbb \mask, \mask
>>
>>

2018-02-06 20:27:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 11:48 AM, Dan Williams <[email protected]> wrote:
>
> Just to clarify, when you say "this patch" you mean:
>
> 2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
> under speculation
>
> ...not this early MASK_NOSPEC version of the patch, right?

I suspect not. If that patch is broken, the system wouldn't even boot.

That said, looking at 2fbd7af5af86, I do note that the code generation
is horribly stupid.

It's due to two different issues:

(a) the x86 asm constraints for that inline asm is nasty, and
requires a register for 'size', even though an immediate works just
fine.

(b) the "cmp" is inside the asm, so gcc can't combine it with the
*other* cmp in the C code.

Fixing (a) is easy:

+++ b/arch/x86/include/asm/barrier.h
@@ -43 +43 @@ static inline unsigned long
array_index_mask_nospec(unsigned long index,
- :"r"(size),"r" (index)
+ :"ir"(size),"r" (index)

but fixing (b) looks fundamentally hard. Gcc generates (for do_syscall()):

cmpq $332, %rbp #, nr
ja .L295 #,
cmp $333,%rbp
sbb %rax,%rax; #, nr, mask

note how it completely pointlessly does the comparison twice, even
though it could have just done

cmp $333,%rbp
jae .L295 #,
sbb %rax,%rax; #, nr, mask

Ho humm. Sad.

Linus

2018-02-06 20:38:58

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 12:26 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Feb 6, 2018 at 11:48 AM, Dan Williams <[email protected]> wrote:
>>
>> Just to clarify, when you say "this patch" you mean:
>>
>> 2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
>> under speculation
>>
>> ...not this early MASK_NOSPEC version of the patch, right?
>
> I suspect not. If that patch is broken, the system wouldn't even boot.
>
> That said, looking at 2fbd7af5af86, I do note that the code generation
> is horribly stupid.
>
> It's due to two different issues:
>
> (a) the x86 asm constraints for that inline asm is nasty, and
> requires a register for 'size', even though an immediate works just
> fine.
>
> (b) the "cmp" is inside the asm, so gcc can't combine it with the
> *other* cmp in the C code.
>
> Fixing (a) is easy:
>
> +++ b/arch/x86/include/asm/barrier.h
> @@ -43 +43 @@ static inline unsigned long
> array_index_mask_nospec(unsigned long index,
> - :"r"(size),"r" (index)
> + :"ir"(size),"r" (index)
>
> but fixing (b) looks fundamentally hard. Gcc generates (for do_syscall()):
>
> cmpq $332, %rbp #, nr
> ja .L295 #,
> cmp $333,%rbp
> sbb %rax,%rax; #, nr, mask
>
> note how it completely pointlessly does the comparison twice, even
> though it could have just done
>
> cmp $333,%rbp
> jae .L295 #,
> sbb %rax,%rax; #, nr, mask
>
> Ho humm. Sad.

Are there any compilers that would miscompile:

mask = 0 - (index < size);

That might be a way to improve the assembly.

2018-02-06 20:43:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 12:37 PM, Dan Williams <[email protected]> wrote:
>
> Are there any compilers that would miscompile:
>
> mask = 0 - (index < size);
>
> That might be a way to improve the assembly.

Sadly, that is *very* easy to miscompile. In fact, I'd be very
surprised indeed if any compiler worth its name wouldn't combine the
comparison with the conditional branch it accompanies, and just turn
that into a constant. IOW, you'd get

mask = 0 - (index < size);
if (index <= size) {
... use mask ..

and the compiler would just turn that into

if (index <= size) {
mask = -1;

and be done with it.

Linus

2018-02-06 20:45:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 12:42 PM, Linus Torvalds
<[email protected]> wrote:
>
> Sadly, that is *very* easy to miscompile.

Side note: don't read email, go watch the falcon heavy takeoff.

Linus

2018-02-06 20:50:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 8:42 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Feb 6, 2018 at 12:37 PM, Dan Williams <[email protected]> wrote:
>>
>> Are there any compilers that would miscompile:
>>
>> mask = 0 - (index < size);
>>
>> That might be a way to improve the assembly.
>
> Sadly, that is *very* easy to miscompile. In fact, I'd be very
> surprised indeed if any compiler worth its name wouldn't combine the
> comparison with the conditional branch it accompanies, and just turn
> that into a constant. IOW, you'd get
>
> mask = 0 - (index < size);
> if (index <= size) {
> ... use mask ..
>
> and the compiler would just turn that into
>
> if (index <= size) {
> mask = -1;
>
> and be done with it.
>
> Linus

Can you use @cc to make an asm statement that outputs both the masked
array index and the "if" condition? I can never remember the syntax,
but something like:

asm ("cmp %[limit], %[index]\n\tcmovae %[zero], %[index]" : [index]
"+" (index), "@ccb" (result));

Then you shove this into a statement expression macro so you can do:

if (index_mask_nospec(&nr, NR_syscalls)) {
... sys_call_table[nr] ..;
}

(Caveat emptor: I can also *ever* remember which way the $*!& AT&T
syntax cmp instruction goes.)

A down side is that nr actually ends up containing zero outside the
if. *That* could be avoided with jump labels.

--Andy

2018-02-06 20:59:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 12:49 PM, Andy Lutomirski <[email protected]> wrote:
>
> Can you use @cc to make an asm statement that outputs both the masked
> array index and the "if" condition? I can never remember the syntax,
> but something like:

Yes. Although I'd actually suggest just using an "asm goto" if we
really want to optimize this. Give the "index_mask_nospec()" a third
argument that is the label to jump to for overflow.

Then you can just decide how to implement it best for any particular
architecture (and compiler limitation).

Linus

2018-02-06 21:38:25

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 12:58 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Feb 6, 2018 at 12:49 PM, Andy Lutomirski <[email protected]> wrote:
>>
>> Can you use @cc to make an asm statement that outputs both the masked
>> array index and the "if" condition? I can never remember the syntax,
>> but something like:
>
> Yes. Although I'd actually suggest just using an "asm goto" if we
> really want to optimize this. Give the "index_mask_nospec()" a third
> argument that is the label to jump to for overflow.
>
> Then you can just decide how to implement it best for any particular
> architecture (and compiler limitation).

At that point we're basically just back to the array_ptr() version
that returned a sanitized pointer to an array element.

call = array_ptr(sys_call_table, nr & __SYSCALL_MASK, NR_syscalls);
if (likely(call))
regs->ax = (*call)(
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);


e1e: ba 4d 01 00 00 mov $0x14d,%edx
e23: 48 39 d5 cmp %rdx,%rbp
e26: 48 19 d2 sbb %rdx,%rdx
call = array_ptr(sys_call_table, nr & __SYSCALL_MASK, NR_syscalls);
e29: 48 21 d5 and %rdx,%rbp
e2c: 48 8d 04 ed 00 00 00 lea 0x0(,%rbp,8),%rax
e33: 00
if (likely(call))
e34: 48 21 d0 and %rdx,%rax
e37: 74 1e je e57 <do_syscall_64+0x77>
regs->ax = (*call)(
e39: 48 8b 4b 38 mov 0x38(%rbx),%rcx
e3d: 48 8b 53 60 mov 0x60(%rbx),%rdx
e41: 48 8b 73 68 mov 0x68(%rbx),%rsi
e45: 48 8b 7b 70 mov 0x70(%rbx),%rdi
e49: 4c 8b 4b 40 mov 0x40(%rbx),%r9
e4d: 4c 8b 43 48 mov 0x48(%rbx),%r8
e51: ff 10 callq *(%rax)
e53: 48 89 43 50 mov %rax,0x50(%rbx)
e57: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax

2018-02-06 22:53:38

by Luis Henriques

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 06, 2018 at 11:48:45AM -0800, Dan Williams wrote:
> On Tue, Feb 6, 2018 at 11:29 AM, Luis Henriques <[email protected]> wrote:
> > On Thu, Jan 18, 2018 at 04:02:21PM -0800, Dan Williams wrote:
> >> The syscall table base is a user controlled function pointer in kernel
> >> space. Like, 'get_user, use 'MASK_NOSPEC' to prevent any out of bounds
> >> speculation. While retpoline prevents speculating into the user
> >> controlled target it does not stop the pointer de-reference, the concern
> >> is leaking memory relative to the syscall table base.
> >
> > This patch seems to cause a regression. An easy way to reproduce what
> > I'm seeing is to run the samples/statx/test-statx. Here's what I see
> > when I have this patchset applied:
> >
> > # ./test-statx /tmp
> > statx(/tmp) = -1
> > /tmp: Bad file descriptor
> >
> > Reverting this single patch seems to fix it.
>
> Just to clarify, when you say "this patch" you mean:
>
> 2fbd7af5af86 x86/syscall: Sanitize syscall table de-references
> under speculation
>
> ...not this early MASK_NOSPEC version of the patch, right?

*sigh*

Looks like I spent some good amount of time hunting a non-issue just
because I have enough old branches hanging around to confusing me :-(

Sorry for the noise.

Cheers,
--
Lu?s

2018-02-06 22:54:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 1:37 PM, Dan Williams <[email protected]> wrote:
>
> At that point we're basically just back to the array_ptr() version
> that returned a sanitized pointer to an array element.

.. that one does an extra unnecessary 'andq' instead of the duplicated
cmp. But at least it avoids comparing that 32-bit integer twice, so
it's probably slightly smaller.

(And your code generation is without the "r" -> "ir" fix for the size argument)

Probably doesn't matter. But a "asm goto" would give you at least
potentially optimal code.

Linus

2018-02-07 00:34:09

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 2:52 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Feb 6, 2018 at 1:37 PM, Dan Williams <[email protected]> wrote:
>>
>> At that point we're basically just back to the array_ptr() version
>> that returned a sanitized pointer to an array element.
>
> .. that one does an extra unnecessary 'andq' instead of the duplicated
> cmp. But at least it avoids comparing that 32-bit integer twice, so
> it's probably slightly smaller.
>
> (And your code generation is without the "r" -> "ir" fix for the size argument)
>
> Probably doesn't matter. But a "asm goto" would give you at least
> potentially optimal code.
>

Should we go with array_element_nospec() in the meantime? So we're not
depending on jump labels? With the constraint fix and killing that
superfluous AND the assembly is now:

e26: 48 81 fd 4d 01 00 00 cmp $0x14d,%rbp
e2d: 48 19 d2 sbb %rdx,%rdx
NR_syscalls);
if (likely(call))
e30: 48 21 d0 and %rdx,%rax
e33: 74 1e je e53 <do_syscall_64+0x73>
regs->ax = (*call)(regs->di, regs->si, regs->dx,
e35: 48 8b 4b 38 mov 0x38(%rbx),%rcx
e39: 48 8b 53 60 mov 0x60(%rbx),%rdx
e3d: 48 8b 73 68 mov 0x68(%rbx),%rsi
e41: 48 8b 7b 70 mov 0x70(%rbx),%rdi
e45: 4c 8b 4b 40 mov 0x40(%rbx),%r9
e49: 4c 8b 43 48 mov 0x48(%rbx),%r8
e4d: ff 10 callq *(%rax)
e4f: 48 89 43 50 mov %rax,0x50(%rbx)
e53: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax

2018-02-07 01:26:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] x86: narrow out of bounds syscalls to sys_read under speculation

On Tue, Feb 6, 2018 at 4:33 PM, Dan Williams <[email protected]> wrote:
>
> Should we go with array_element_nospec() in the meantime? So we're not
> depending on jump labels? With the constraint fix and killing that
> superfluous AND the assembly is now:
>
> e26: 48 81 fd 4d 01 00 00 cmp $0x14d,%rbp
> e2d: 48 19 d2 sbb %rdx,%rdx
> NR_syscalls);
> if (likely(call))
> e30: 48 21 d0 and %rdx,%rax
> e33: 74 1e je e53 <do_syscall_64+0x73>
> regs->ax = (*call)(regs->di, regs->si, regs->dx,
> e35: 48 8b 4b 38 mov 0x38(%rbx),%rcx
> e39: 48 8b 53 60 mov 0x60(%rbx),%rdx
> e3d: 48 8b 73 68 mov 0x68(%rbx),%rsi
> e41: 48 8b 7b 70 mov 0x70(%rbx),%rdi
> e45: 4c 8b 4b 40 mov 0x40(%rbx),%r9
> e49: 4c 8b 43 48 mov 0x48(%rbx),%r8
> e4d: ff 10 callq *(%rax)

That looks fairly optimal, except for the fact that the callq is
through a register.

Of course, that register-indirect calling convention is forced on us
by retpoline anyway (which you don't have enabled, likely because of a
lack of compiler). But without retpoline that callq could be

callq sys_call_table(,%rax,8)

if the masking is done on the index (and if the conditional jump had
been done on the cmp rather than the later 'and').

Instead, you have a

leaq sys_call_table(,%rbp,8),%rax

hiding somewhere earlier that doesn't show in your asm snippet.

Oh well. We'll have an extra instruction however we do this. I guess
that's just something we'll have to live with. No more bikeshedding..

Linus