Changes since v1 [1]:
* fixup the ifence definition to use alternative_2 per recent AMD
changes in tip/x86/pti (Tom)
* drop 'nospec_ptr' (Linus, Mark)
* rename 'nospec_array_ptr' to 'array_ptr' (Alexei)
* rename 'nospec_barrier' to 'ifence' (Peter, Ingo)
* clean up occasions of 'variable assignment in if()' (Sergei, Stephen)
* make 'array_ptr' use a mask instead of an architectural ifence by
default (Linus, Alexei)
* provide a command line and compile-time opt-in to the ifence
mechanism, if an architecture provides 'ifence_array_ptr'.
* provide an optimized mask generation helper, 'array_ptr_mask', for
x86 (Linus)
* move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
(Linus)
* drop "Thermal/int340x: prevent bounds-check..." since userspace does
not have arbitrary control over the 'trip' index (Srinivas)
* update the changelog of "net: mpls: prevent bounds-check..." and keep
it in the series to continue the debate about Spectre hygiene patches.
(Eric).
* record a reviewed-by from Laurent on "[media] uvcvideo: prevent
bounds-check..."
* update the cover letter
[1]: https://lwn.net/Articles/743376/
---
Quoting Mark's original RFC:
"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."
This series incorporates Mark Rutland's latest ARM changes and adds
the x86 specific implementation of 'ifence_array_ptr'. That ifence
based approach is provided as an opt-in fallback, but the default
mitigation, '__array_ptr', uses a 'mask' approach that removes
conditional branches instructions, and otherwise aims to redirect
speculation to use a NULL pointer rather than a user controlled value.
The mask is generated by the following from Alexei, and Linus:
mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);
...and Linus provided an optimized mask generation helper for x86:
asm ("cmpq %1,%2; sbbq %0,%0;"
:"=r" (mask)
:"r"(sz),"r" (idx)
:"cc");
The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
via the spectre_v1={mask,ifence} command line option, and the
compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
CONFIG_SPECTRE1_IFENCE.
The 'array_ptr' infrastructure is the primary focus this patch set. The
individual patches that perform 'array_ptr' conversions are a point in
time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
start at finding some of these gadgets.
Another consideration for reviewing these patches is the 'hygiene'
argument. When a patch refers to hygiene it is concerned with stopping
speculation on an unconstrained or insufficiently constrained pointer
value under userspace control. That by itself is not sufficient for
attack (per current understanding) [3], but it is a necessary
pre-condition. So 'hygiene' refers to cleaning up those suspect
pointers regardless of whether they are usable as a gadget.
These patches are also be available via the 'nospec-v2' git branch
here:
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2
Note that the BPF fix for Spectre variant1 is merged in the bpf.git
tree [4], and is not included in this branch.
[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
[3]: https://spectreattack.com/spectre.pdf
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98
---
Dan Williams (16):
x86: implement ifence()
x86: implement ifence_array_ptr() and array_ptr_mask()
asm-generic/barrier: mask speculative execution flows
x86: introduce __uaccess_begin_nospec and ASM_IFENCE
x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
ipv6: prevent bounds-check bypass via speculative execution
ipv4: prevent bounds-check bypass via speculative execution
vfs, fdtable: prevent bounds-check bypass via speculative execution
userns: prevent bounds-check bypass via speculative execution
udf: prevent bounds-check bypass via speculative execution
[media] uvcvideo: prevent bounds-check bypass via speculative execution
carl9170: prevent bounds-check bypass via speculative execution
p54: prevent bounds-check bypass via speculative execution
qla2xxx: prevent bounds-check bypass via speculative execution
cw1200: prevent bounds-check bypass via speculative execution
net: mpls: prevent bounds-check bypass via speculative execution
Mark Rutland (3):
Documentation: document array_ptr
arm64: implement ifence_array_ptr()
arm: implement ifence_array_ptr()
Documentation/speculation.txt | 142 ++++++++++++++++++++++++++++++
arch/arm/Kconfig | 1
arch/arm/include/asm/barrier.h | 24 +++++
arch/arm64/Kconfig | 1
arch/arm64/include/asm/barrier.h | 24 +++++
arch/x86/Kconfig | 3 +
arch/x86/include/asm/barrier.h | 46 ++++++++++
arch/x86/include/asm/msr.h | 3 -
arch/x86/include/asm/smap.h | 4 +
arch/x86/include/asm/uaccess.h | 16 +++
arch/x86/include/asm/uaccess_32.h | 6 +
arch/x86/include/asm/uaccess_64.h | 12 +--
arch/x86/lib/copy_user_64.S | 3 +
arch/x86/lib/usercopy_32.c | 8 +-
drivers/media/usb/uvc/uvc_v4l2.c | 9 +-
drivers/net/wireless/ath/carl9170/main.c | 7 +
drivers/net/wireless/intersil/p54/main.c | 9 +-
drivers/net/wireless/st/cw1200/sta.c | 11 +-
drivers/net/wireless/st/cw1200/wsm.h | 4 -
drivers/scsi/qla2xxx/qla_mr.c | 17 ++--
fs/udf/misc.c | 40 +++++---
include/linux/fdtable.h | 7 +
include/linux/nospec.h | 71 +++++++++++++++
kernel/Kconfig.nospec | 31 +++++++
kernel/Makefile | 1
kernel/nospec.c | 52 +++++++++++
kernel/user_namespace.c | 11 +-
lib/Kconfig | 3 +
net/ipv4/raw.c | 10 +-
net/ipv6/raw.c | 10 +-
net/mpls/af_mpls.c | 12 +--
31 files changed, 521 insertions(+), 77 deletions(-)
create mode 100644 Documentation/speculation.txt
create mode 100644 include/linux/nospec.h
create mode 100644 kernel/Kconfig.nospec
create mode 100644 kernel/nospec.c
From: Mark Rutland <[email protected]>
Document the rationale and usage of the new array_ptr() helper.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Documentation/speculation.txt | 142 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)
create mode 100644 Documentation/speculation.txt
diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..a4d465fd42cd
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,142 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+ int load_array(int *array, unsigned int idx) {
+ if (idx >= MAX_ARRAY_ELEMS)
+ return 0;
+ else
+ return array[idx];
+ }
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+ CMP <idx>, #MAX_ARRAY_ELEMS
+ B.LT less
+ MOV <returnval>, #0
+ RET
+ less:
+ LDR <returnval>, [<array>, <idx>]
+ RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building
+on the prior example:
+
+ int load_dependent_arrays(int *arr1, int *arr2, int idx)
+ {
+ int val1, val2,
+
+ val1 = load_array(arr1, idx);
+ val2 = load_array(arr2, val1);
+
+ return val2;
+ }
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The array_ptr() helper in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+A call to array_ptr(arr, idx, sz) returns a sanitized pointer to
+arr[idx] only if idx falls in the [0, sz) interval. When idx < 0 or idx > sz,
+NULL is returned. Additionally, array_ptr() an out-of-bounds poitner is
+not propagated to code which is speculatively executed.
+
+This can be used to protect the earlier load_array() example:
+
+ int load_array(int *array, unsigned int idx)
+ {
+ int *elem;
+
+ elem = array_ptr(array, idx, MAX_ARRAY_ELEMS);
+ if (elem)
+ return *elem;
+ else
+ return 0;
+ }
+
+This can also be used in situations where multiple fields on a structure are
+accessed:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ struct foo *elem;
+
+ elem = array_ptr(array, idx, SIZE);
+ if (elem) {
+ a = elem->field_a;
+ b = elem->field_b;
+ }
+ }
+
+It is imperative that the returned pointer is used. Pointers which are
+generated separately are subject to a number of potential CPU and compiler
+optimizations, and may still be used speculatively. For example, this means
+that the following sequence is unsafe:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ if (array_ptr(array, idx, SIZE) != NULL) {
+ // unsafe as wrong pointer is used
+ a = array[idx].field_a;
+ b = array[idx].field_b;
+ }
+ }
+
+Similarly, it is unsafe to compare the returned pointer with other pointers,
+as this may permit the compiler to substitute one pointer with another,
+permitting speculation. For example, the following sequence is unsafe:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ struct foo *elem = array_ptr(array, idx, size);
+
+ // unsafe due to pointer substitution
+ if (elem == &array[idx]) {
+ a = elem->field_a;
+ b = elem->field_b;
+ }
+ }
+
From: Mark Rutland <[email protected]>
This patch implements ifence_array_ptr() for arm64, using an
LDR+CSEL+CSDB sequence to inhibit speculative use of the returned value.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/arm64/include/asm/barrier.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 77651c49ef44..74ffcddb26e6 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -40,6 +40,30 @@
#define dma_rmb() dmb(oshld)
#define dma_wmb() dmb(oshst)
+#define ifence_array_ptr(arr, idx, sz) \
+({ \
+ typeof(&(arr)[0]) __nap_arr = (arr); \
+ typeof(idx) __nap_idx = (idx); \
+ typeof(sz) __nap_sz = (sz); \
+ \
+ unsigned long __nap_ptr = (unsigned long)__nap_arr + \
+ sizeof(__nap_arr[0]) * idx; \
+ \
+ asm volatile( \
+ " cmp %[i], %[s]\n" \
+ " b.cs 1f\n" \
+ " ldr %[p], %[pp]\n" \
+ "1: csel %[p], %[p], xzr, cc\n" \
+ " hint #0x14 // CSDB\n" \
+ : [p] "=&r" (__nap_ptr) \
+ : [pp] "m" (__nap_ptr), \
+ [i] "r" ((unsigned long)__nap_idx), \
+ [s] "r" ((unsigned long)__nap_sz) \
+ : "cc"); \
+ \
+ (typeof(&(__nap_arr)[0]))__nap_ptr; \
+})
+
#define __smp_mb() dmb(ish)
#define __smp_rmb() dmb(ishld)
#define __smp_wmb() dmb(ishst)
The new barrier, 'ifence', ensures that no instructions past the
boundary are speculatively executed.
Previously the kernel only needed this fence in 'rdtsc_ordered', but it
can also be used as a mitigation against Spectre variant1 attacks that
speculative access memory past an array bounds check.
'ifence', via 'ifence_array_ptr', is an opt-in fallback to the default
mitigation provided by '__array_ptr'. It is also proposed for blocking
speculation in the 'get_user' path to bypass 'access_ok' checks. For
now, just provide the common definition for later patches to build upon.
Suggested-by: Peter Zijlstra <[email protected]>
Suggested-by: Alan Cox <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/barrier.h | 4 ++++
arch/x86/include/asm/msr.h | 3 +--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 7fb336210e1b..b04f572d6d97 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,10 @@
#define wmb() asm volatile("sfence" ::: "memory")
#endif
+/* prevent speculative execution past this barrier */
+#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+ "lfence", X86_FEATURE_LFENCE_RDTSC)
+
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 07962f5f6fba..e426d2a33ff3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
* that some other imaginary CPU is updating continuously with a
* time stamp.
*/
- alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
- "lfence", X86_FEATURE_LFENCE_RDTSC);
+ ifence();
return rdtsc();
}
'ifence_array_ptr' is provided as an alternative to the default
'__array_ptr' implementation that uses a mask to sanitize user
controllable pointers. Later patches will allow it to be selected via
kernel command line. The '__array_ptr' implementation otherwise appears
safe for current generation cpus across multiple architectures.
'array_ptr_mask' is used by the default 'array_ptr' implementation to
cheaply calculate an array bounds mask.
Suggested-by: Linus Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/barrier.h | 42 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index b04f572d6d97..4450d25d8cde 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -28,6 +28,48 @@
#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
"lfence", X86_FEATURE_LFENCE_RDTSC)
+/**
+ * ifence_array_ptr - Generate a pointer to an array element,
+ * ensuring the pointer is bounded under speculation.
+ *
+ * @arr: the base of the array
+ * @idx: the index of the element
+ * @sz: the number of elements in the array
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define ifence_array_ptr(arr, idx, sz) \
+({ \
+ typeof(*(arr)) *__arr = (arr), *__ret; \
+ typeof(idx) __idx = (idx); \
+ typeof(sz) __sz = (sz); \
+ \
+ __ret = __idx < __sz ? __arr + __idx : NULL; \
+ ifence(); \
+ __ret; \
+})
+
+/**
+ * array_ptr_mask - generate a mask for array_ptr() that is ~0UL when
+ * the bounds check succeeds and 0 otherwise
+ */
+#define array_ptr_mask array_ptr_mask
+static inline unsigned long array_ptr_mask(unsigned long idx, unsigned long sz)
+{
+ unsigned long mask;
+
+ /*
+ * mask = index - size, if that result is >= 0 then the index is
+ * invalid and the mask is 0 else ~0
+ */
+ asm ("cmpq %1,%2; sbbq %0,%0;"
+ :"=r" (mask)
+ :"r"(sz),"r" (idx)
+ :"cc");
+ return mask;
+}
+
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb() rmb()
#else
'__array_ptr' is proposed as a generic mechanism to mitigate against
Spectre-variant-1 attacks, i.e. an attack that bypasses memory bounds
checks via speculative execution). The '__array_ptr' implementation
appears safe for current generation cpus across multiple architectures.
In comparison, 'ifence_array_ptr' uses a hard / architectural 'ifence'
approach to preclude the possibility speculative execution. However, it
is not the default given a concern for avoiding instruction-execution
barriers in potential fast paths.
Based on an original implementation by Linus Torvalds, tweaked to remove
speculative flows by Alexei Starovoitov, and tweaked again by Linus to
introduce an x86 assembly implementation for the mask generation.
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Alexei Starovoitov <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/x86/Kconfig | 3 ++
include/linux/nospec.h | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
kernel/Kconfig.nospec | 31 +++++++++++++++++++++
kernel/Makefile | 1 +
kernel/nospec.c | 52 +++++++++++++++++++++++++++++++++++
lib/Kconfig | 3 ++
8 files changed, 163 insertions(+)
create mode 100644 include/linux/nospec.h
create mode 100644 kernel/Kconfig.nospec
create mode 100644 kernel/nospec.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 51c8df561077..fd4789ec8cac 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
+ select ARCH_HAS_IFENCE
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..22765c4b6986 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,6 +16,7 @@ config ARM64
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
select ARCH_HAS_KCOV
+ select ARCH_HAS_IFENCE
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d4fc98c50378..68698289c83c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_KCOV if X86_64
+ select ARCH_HAS_IFENCE
select ARCH_HAS_PMEM_API if X86_64
# Causing hangs/crashes, see the commit that added this change for details.
select ARCH_HAS_REFCOUNT
@@ -442,6 +443,8 @@ config INTEL_RDT
Say N if unsure.
+source "kernel/Kconfig.nospec"
+
if X86_32
config X86_EXTENDED_PLATFORM
bool "Support for extended (non-PC) x86 platforms"
diff --git a/include/linux/nospec.h b/include/linux/nospec.h
new file mode 100644
index 000000000000..5c66fc30f919
--- /dev/null
+++ b/include/linux/nospec.h
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+
+#ifndef __NOSPEC_H__
+#define __NOSPEC_H__
+
+#include <linux/jump_label.h>
+#include <asm/barrier.h>
+
+#ifndef array_ptr_mask
+#define array_ptr_mask(idx, sz) \
+({ \
+ unsigned long mask; \
+ unsigned long _i = (idx); \
+ unsigned long _s = (sz); \
+ \
+ mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1); \
+ mask; \
+})
+#endif
+
+/**
+ * __array_ptr - Generate a pointer to an array element, ensuring
+ * the pointer is bounded under speculation to NULL.
+ *
+ * @base: the base of the array
+ * @idx: the index of the element, must be less than LONG_MAX
+ * @sz: the number of elements in the array, must be less than LONG_MAX
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to
+ * @arr[@idx], otherwise returns NULL.
+ */
+#define __array_ptr(base, idx, sz) \
+({ \
+ union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
+ typeof(*(base)) *_arr = (base); \
+ unsigned long _i = (idx); \
+ unsigned long _mask = array_ptr_mask(_i, (sz)); \
+ \
+ __u._ptr = _arr + (_i & _mask); \
+ __u._bit &= _mask; \
+ __u._ptr; \
+})
+
+#ifdef CONFIG_SPECTRE1_IFENCE
+DECLARE_STATIC_KEY_TRUE(nospec_key);
+#else
+DECLARE_STATIC_KEY_FALSE(nospec_key);
+#endif
+
+#ifdef ifence_array_ptr
+/*
+ * The expectation is that no compiler or cpu will mishandle __array_ptr
+ * leading to problematic speculative execution. Bypass the ifence
+ * based implementation by default.
+ */
+#define array_ptr(base, idx, sz) \
+({ \
+ typeof(*(base)) *__ret; \
+ \
+ if (static_branch_unlikely(&nospec_key)) \
+ __ret = ifence_array_ptr(base, idx, sz); \
+ else \
+ __ret = __array_ptr(base, idx, sz); \
+ __ret; \
+})
+#else
+#define array_ptr __array_ptr
+#endif
+
+#endif /* __NOSPEC_H__ */
diff --git a/kernel/Kconfig.nospec b/kernel/Kconfig.nospec
new file mode 100644
index 000000000000..883929184f47
--- /dev/null
+++ b/kernel/Kconfig.nospec
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: GPL-2.0
+
+choice
+ prompt "Speculative execution past bounds check"
+ default SPECTRE1_MASK
+ help
+ Select the default mechanism for guarding against kernel
+ memory leaks via speculative execution past a boundary-check
+ (Spectre variant1) . This choice determines the contents of
+ the array_ptr() helper. Note, that vulnerable code paths need
+ to be instrumented with this helper to be protected.
+
+config SPECTRE1_MASK
+ bool "mask"
+ help
+ Provide an array_ptr() implementation that arranges for only
+ safe speculative flows to be exposed to the compiler/cpu. It
+ is preferred over "ifence" since it arranges for problematic
+ speculation to be disabled without need of an instruction
+ barrier.
+
+config SPECTRE1_IFENCE
+ bool "ifence"
+ depends on ARCH_HAS_IFENCE
+ help
+ Provide a array_ptr() implementation that is specified by the
+ cpu architecture to barrier all speculative execution. Unless
+ you have specific knowledge of the "mask" approach being
+ unsuitable with a given compiler/cpu, select "mask".
+
+endchoice
diff --git a/kernel/Makefile b/kernel/Makefile
index 172d151d429c..09fe269b7d05 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -101,6 +101,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_ARCH_HAS_IFENCE) += nospec.o
obj-$(CONFIG_PERF_EVENTS) += events/
diff --git a/kernel/nospec.c b/kernel/nospec.c
new file mode 100644
index 000000000000..992de957216d
--- /dev/null
+++ b/kernel/nospec.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright(c) 2018 Intel Corporation. All rights reserved.
+#include <linux/module.h>
+#include <linux/compiler.h>
+#include <linux/jump_label.h>
+#include <linux/moduleparam.h>
+
+enum {
+ F_IFENCE,
+};
+
+#ifdef CONFIG_SPECTRE1_IFENCE
+static unsigned long nospec_flag = 1 << F_IFENCE;
+DEFINE_STATIC_KEY_TRUE(nospec_key);
+#else
+static unsigned long nospec_flag;
+DEFINE_STATIC_KEY_FALSE(nospec_key);
+#endif
+
+EXPORT_SYMBOL(nospec_key);
+
+static int param_set_nospec(const char *val, const struct kernel_param *kp)
+{
+ unsigned long *flags = kp->arg;
+
+ if (strcmp(val, "ifence") == 0 || strcmp(val, "ifence\n") == 0) {
+ if (!test_and_set_bit(F_IFENCE, flags))
+ static_key_enable(&nospec_key.key);
+ return 0;
+ } else if (strcmp(val, "mask") == 0 || strcmp(val, "mask\n") == 0) {
+ if (test_and_clear_bit(F_IFENCE, flags))
+ static_key_disable(&nospec_key.key);
+ return 0;
+ }
+ return -EINVAL;
+}
+
+static int param_get_nospec(char *buffer, const struct kernel_param *kp)
+{
+ unsigned long *flags = kp->arg;
+
+ return sprintf(buffer, "%s\n", test_bit(F_IFENCE, flags)
+ ? "ifence" : "mask");
+}
+
+static struct kernel_param_ops nospec_param_ops = {
+ .set = param_set_nospec,
+ .get = param_get_nospec,
+};
+
+core_param_cb(spectre_v1, &nospec_param_ops, &nospec_flag, 0600);
+MODULE_PARM_DESC(spectre_v1, "Spectre-v1 mitigation: 'mask' (default) vs 'ifence'");
diff --git a/lib/Kconfig b/lib/Kconfig
index c5e84fbcb30b..3cc7e7a03781 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -570,6 +570,9 @@ config STACKDEPOT
bool
select STACKTRACE
+config ARCH_HAS_IFENCE
+ bool
+
config SBITMAP
bool
For 'get_user' paths, do not allow the kernel to speculate on the value
of a user controlled pointer. In addition to the 'stac' instruction for
Supervisor Mode Access Protection, an 'ifence' causes the 'access_ok'
result to resolve in the pipeline before the cpu might take any
speculative action on the pointer value.
Since this is a major kernel interface that deals with user controlled
data, the '__uaccess_begin_nospec' mechanism will prevent speculative
execution past an 'access_ok' permission check. While speculative
execution past 'access_ok' is not enough to lead to a kernel memory
leak, it is a necessary precondition.
To be clear, '__uaccess_begin_nospec' and ASM_IFENCE are not addressing
any known issues with 'get_user' they are addressing a class of
potential problems that could be near 'get_user' usages. In other words,
these helpers are for hygiene not clinical fixes.
There are no functional changes in this patch.
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Andi Kleen <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/smap.h | 4 ++++
arch/x86/include/asm/uaccess.h | 10 ++++++++++
2 files changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index db333300bd4b..0b59707e0b46 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -40,6 +40,10 @@
#endif /* CONFIG_X86_SMAP */
+#define ASM_IFENCE \
+ ALTERNATIVE_2 "", "mfence", X86_FEATURE_MFENCE_RDTSC, \
+ "lfence", X86_FEATURE_LFENCE_RDTSC
+
#else /* __ASSEMBLY__ */
#include <asm/alternative.h>
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 574dff4d2913..a31fd4fc6483 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
#define __uaccess_begin() stac()
#define __uaccess_end() clac()
+#define __uaccess_begin_nospec() \
+({ \
+ stac(); \
+ ifence(); \
+})
/*
* This is a type: either unsigned long, if the argument fits into
@@ -487,6 +492,11 @@ struct __large_struct { unsigned long buf[100]; };
__uaccess_begin(); \
barrier();
+#define uaccess_try_nospec do { \
+ current->thread.uaccess_err = 0; \
+ __uaccess_begin_nospec(); \
+ barrier();
+
#define uaccess_catch(err) \
__uaccess_end(); \
(err) |= (current->thread.uaccess_err ? -EFAULT : 0); \
Since this is a major kernel interface that deals with user controlled
data, the '__uaccess_begin_nospec' mechanism will prevent speculative
execution past an 'access_ok' permission check. While speculative
execution past 'access_ok' is not enough to lead to a kernel memory
leak, it is a necessary precondition.
To be clear, '__uaccess_begin_nospec' and ASM_IFENCE are not addressing
any known issues with 'get_user' they are addressing a class of
potential problems that could be near 'get_user' usages. In other words,
these helpers are for hygiene not clinical fixes.
Note that '__copy_user_ll' is also called in the 'put_user' case, but
there is currently no indication that put_user in general deserves the
same hygiene as 'get_user'.
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Andi Kleen <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/uaccess.h | 6 +++---
arch/x86/include/asm/uaccess_32.h | 6 +++---
arch/x86/include/asm/uaccess_64.h | 12 ++++++------
arch/x86/lib/copy_user_64.S | 3 +++
arch/x86/lib/usercopy_32.c | 8 ++++----
5 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a31fd4fc6483..82c73f064e76 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -450,7 +450,7 @@ do { \
({ \
int __gu_err; \
__inttype(*(ptr)) __gu_val; \
- __uaccess_begin(); \
+ __uaccess_begin_nospec(); \
__get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \
__uaccess_end(); \
(x) = (__force __typeof__(*(ptr)))__gu_val; \
@@ -558,7 +558,7 @@ struct __large_struct { unsigned long buf[100]; };
* get_user_ex(...);
* } get_user_catch(err)
*/
-#define get_user_try uaccess_try
+#define get_user_try uaccess_try_nospec
#define get_user_catch(err) uaccess_catch(err)
#define get_user_ex(x, ptr) do { \
@@ -592,7 +592,7 @@ extern void __cmpxchg_wrong_size(void)
__typeof__(ptr) __uval = (uval); \
__typeof__(*(ptr)) __old = (old); \
__typeof__(*(ptr)) __new = (new); \
- __uaccess_begin(); \
+ __uaccess_begin_nospec(); \
switch (size) { \
case 1: \
{ \
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 72950401b223..ba2dc1930630 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void __user *from, unsigned long n)
switch (n) {
case 1:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u8 *)to, from, ret,
"b", "b", "=q", 1);
__uaccess_end();
return ret;
case 2:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u16 *)to, from, ret,
"w", "w", "=r", 2);
__uaccess_end();
return ret;
case 4:
ret = 0;
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u32 *)to, from, ret,
"l", "k", "=r", 4);
__uaccess_end();
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f07ef3c575db..62546b3a398e 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
return copy_user_generic(dst, (__force void *)src, size);
switch (size) {
case 1:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
ret, "b", "b", "=q", 1);
__uaccess_end();
return ret;
case 2:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
ret, "w", "w", "=r", 2);
__uaccess_end();
return ret;
case 4:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
ret, "l", "k", "=r", 4);
__uaccess_end();
return ret;
case 8:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 8);
__uaccess_end();
return ret;
case 10:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 10);
if (likely(!ret))
@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
__uaccess_end();
return ret;
case 16:
- __uaccess_begin();
+ __uaccess_begin_nospec();
__get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
ret, "q", "", "=r", 16);
if (likely(!ret))
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 020f75cc8cf6..2429ca38dee6 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -31,6 +31,7 @@
*/
ENTRY(copy_user_generic_unrolled)
ASM_STAC
+ ASM_IFENCE
cmpl $8,%edx
jb 20f /* less then 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -135,6 +136,7 @@ EXPORT_SYMBOL(copy_user_generic_unrolled)
*/
ENTRY(copy_user_generic_string)
ASM_STAC
+ ASM_IFENCE
cmpl $8,%edx
jb 2f /* less than 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -175,6 +177,7 @@ EXPORT_SYMBOL(copy_user_generic_string)
*/
ENTRY(copy_user_enhanced_fast_string)
ASM_STAC
+ ASM_IFENCE
cmpl $64,%edx
jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */
movl %edx,%ecx
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 1b377f734e64..7add8ba06887 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -331,12 +331,12 @@ do { \
unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
if (movsl_is_ok(to, from, n))
__copy_user(to, from, n);
else
n = __copy_user_intel(to, from, n);
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_user_ll);
@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
unsigned long n)
{
- stac();
+ __uaccess_begin_nospec();
#ifdef CONFIG_X86_INTEL_USERCOPY
if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
n = __copy_user_intel_nocache(to, from, n);
@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *fr
#else
__copy_user(to, from, n);
#endif
- clac();
+ __uaccess_end();
return n;
}
EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
Static analysis reports that 'offset' may be a user controlled value
that is used as a data dependency reading from a raw6_frag_vec buffer.
In order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue further
reads based on an invalid '*(rfv->c + offset)' value.
Based on an original patch by Elena Reshetova.
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
net/ipv6/raw.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 761a473a07c5..0b7ceeb6f709 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -33,6 +33,7 @@
#include <linux/skbuff.h>
#include <linux/compat.h>
#include <linux/uaccess.h>
+#include <linux/nospec.h>
#include <asm/ioctls.h>
#include <net/net_namespace.h>
@@ -725,17 +726,18 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
struct sk_buff *skb)
{
struct raw6_frag_vec *rfv = from;
+ char *rfv_buf;
- if (offset < rfv->hlen) {
+ rfv_buf = array_ptr(rfv->c, offset, rfv->hlen);
+ if (rfv_buf) {
int copy = min(rfv->hlen - offset, len);
if (skb->ip_summed == CHECKSUM_PARTIAL)
- memcpy(to, rfv->c + offset, copy);
+ memcpy(to, rfv_buf, copy);
else
skb->csum = csum_block_add(
skb->csum,
- csum_partial_copy_nocheck(rfv->c + offset,
- to, copy, 0),
+ csum_partial_copy_nocheck(rfv_buf, to, copy, 0),
odd);
odd = 0;
Static analysis reports that 'offset' may be a user controlled value
that is used as a data dependency reading from a raw_frag_vec buffer.
In order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue further
reads based on an invalid '*(rfv->c + offset)' value.
Based on an original patch by Elena Reshetova.
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
net/ipv4/raw.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 125c1eab3eaa..91091a10294f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -57,6 +57,7 @@
#include <linux/in_route.h>
#include <linux/route.h>
#include <linux/skbuff.h>
+#include <linux/nospec.h>
#include <linux/igmp.h>
#include <net/net_namespace.h>
#include <net/dst.h>
@@ -472,17 +473,18 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd,
struct sk_buff *skb)
{
struct raw_frag_vec *rfv = from;
+ char *rfv_buf;
- if (offset < rfv->hlen) {
+ rfv_buf = array_ptr(rfv->hdr.c, offset, rfv->hlen);
+ if (rfv_buf) {
int copy = min(rfv->hlen - offset, len);
if (skb->ip_summed == CHECKSUM_PARTIAL)
- memcpy(to, rfv->hdr.c + offset, copy);
+ memcpy(to, rfv_buf, copy);
else
skb->csum = csum_block_add(
skb->csum,
- csum_partial_copy_nocheck(rfv->hdr.c + offset,
- to, copy, 0),
+ csum_partial_copy_nocheck(rfv_buf, to, copy, 0),
odd);
odd = 0;
Expectedly, static analysis reports that 'fd' is a user controlled value
that is used as a data dependency to read from the 'fdt->fd' array. In
order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue reads
based on an invalid 'file *' returned from __fcheck_files.
Based on an original patch by Elena Reshetova.
Cc: Al Viro <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
include/linux/fdtable.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 1c65817673db..9731f1a255db 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/nospec.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/fs.h>
@@ -81,9 +82,11 @@ struct dentry;
static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
{
struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+ struct file __rcu **fdp;
- if (fd < fdt->max_fds)
- return rcu_dereference_raw(fdt->fd[fd]);
+ fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
+ if (fdp)
+ return rcu_dereference_raw(*fdp);
return NULL;
}
Static analysis reports that 'pos' may be a user controlled value that
is used as a data dependency determining which extent to return out of
'map'. In order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue further
reads based on an invalid speculative result from 'm_start()'.
Based on an original patch by Elena Reshetova.
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
kernel/user_namespace.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 246d4d4ce5c7..8c803eae186f 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -20,6 +20,7 @@
#include <linux/seq_file.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
+#include <linux/nospec.h>
#include <linux/ctype.h>
#include <linux/projid.h>
#include <linux/fs_struct.h>
@@ -648,15 +649,13 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
{
loff_t pos = *ppos;
unsigned extents = map->nr_extents;
- smp_rmb();
- if (pos >= extents)
- return NULL;
+ /* paired with smp_wmb in map_write */
+ smp_rmb();
if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
- return &map->extent[pos];
-
- return &map->forward[pos];
+ return array_ptr(map->extent, pos, extents);
+ return array_ptr(map->forward, pos, extents);
}
static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
Static analysis reports that 'eahd->appAttrLocation' and
'eahd->impAttrLocation' may be a user controlled values that are used as
data dependencies for calculating source and destination buffers for
memmove operations. In order to avoid potential leaks of kernel memory
values, block speculative execution of the instruction stream that could
issue further reads based on invalid 'aal' or 'ial' values.
Based on an original patch by Elena Reshetova.
Cc: Jan Kara <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
fs/udf/misc.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/fs/udf/misc.c b/fs/udf/misc.c
index 401e64cde1be..693e24699928 100644
--- a/fs/udf/misc.c
+++ b/fs/udf/misc.c
@@ -22,6 +22,7 @@
#include "udfdecl.h"
#include <linux/fs.h>
+#include <linux/nospec.h>
#include <linux/string.h>
#include <linux/crc-itu-t.h>
@@ -51,6 +52,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
int offset;
uint16_t crclen;
struct udf_inode_info *iinfo = UDF_I(inode);
+ uint8_t *ea_dst, *ea_src;
+ uint32_t aal, ial;
ea = iinfo->i_ext.i_data;
if (iinfo->i_lenEAttr) {
@@ -100,33 +103,34 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
offset = iinfo->i_lenEAttr;
if (type < 2048) {
- if (le32_to_cpu(eahd->appAttrLocation) <
- iinfo->i_lenEAttr) {
- uint32_t aal =
- le32_to_cpu(eahd->appAttrLocation);
- memmove(&ea[offset - aal + size],
- &ea[aal], offset - aal);
+ aal = le32_to_cpu(eahd->appAttrLocation);
+ ea_dst = array_ptr(ea, offset - aal + size,
+ iinfo->i_lenEAttr);
+ ea_src = array_ptr(ea, aal, iinfo->i_lenEAttr);
+ if (ea_dst && ea_src) {
+ memmove(ea_dst, ea_src, offset - aal);
offset -= aal;
eahd->appAttrLocation =
cpu_to_le32(aal + size);
}
- if (le32_to_cpu(eahd->impAttrLocation) <
- iinfo->i_lenEAttr) {
- uint32_t ial =
- le32_to_cpu(eahd->impAttrLocation);
- memmove(&ea[offset - ial + size],
- &ea[ial], offset - ial);
+
+ ial = le32_to_cpu(eahd->impAttrLocation);
+ ea_dst = array_ptr(ea, offset - ial + size,
+ iinfo->i_lenEAttr);
+ ea_src = array_ptr(ea, ial, iinfo->i_lenEAttr);
+ if (ea_dst && ea_src) {
+ memmove(ea_dst, ea_src, offset - ial);
offset -= ial;
eahd->impAttrLocation =
cpu_to_le32(ial + size);
}
} else if (type < 65536) {
- if (le32_to_cpu(eahd->appAttrLocation) <
- iinfo->i_lenEAttr) {
- uint32_t aal =
- le32_to_cpu(eahd->appAttrLocation);
- memmove(&ea[offset - aal + size],
- &ea[aal], offset - aal);
+ aal = le32_to_cpu(eahd->appAttrLocation);
+ ea_dst = array_ptr(ea, offset - aal + size,
+ iinfo->i_lenEAttr);
+ ea_src = array_ptr(ea, aal, iinfo->i_lenEAttr);
+ if (ea_dst && ea_src) {
+ memmove(ea_dst, ea_src, offset - aal);
offset -= aal;
eahd->appAttrLocation =
cpu_to_le32(aal + size);
Static analysis reports that 'index' may be a user controlled value that
is used as a data dependency to read 'pin' from the
'selector->baSourceID' array. In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid value of 'pin'.
Based on an original patch by Elena Reshetova.
Laurent notes:
"...as this is nowhere close to being a fast path, I think we can close
this potential hole as proposed in the patch"
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: [email protected]
Reviewed-by: Laurent Pinchart <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/media/usb/uvc/uvc_v4l2.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index 3e7e283a44a8..30ee200206ee 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/wait.h>
#include <linux/atomic.h>
+#include <linux/nospec.h>
#include <media/v4l2-common.h>
#include <media/v4l2-ctrls.h>
@@ -809,8 +810,12 @@ static int uvc_ioctl_enum_input(struct file *file, void *fh,
const struct uvc_entity *selector = chain->selector;
struct uvc_entity *iterm = NULL;
u32 index = input->index;
+ __u8 *elem = NULL;
int pin = 0;
+ if (selector)
+ elem = array_ptr(selector->baSourceID, index,
+ selector->bNrInPins);
if (selector == NULL ||
(chain->dev->quirks & UVC_QUIRK_IGNORE_SELECTOR_UNIT)) {
if (index != 0)
@@ -820,8 +825,8 @@ static int uvc_ioctl_enum_input(struct file *file, void *fh,
break;
}
pin = iterm->id;
- } else if (index < selector->bNrInPins) {
- pin = selector->baSourceID[index];
+ } else if (elem) {
+ pin = *elem;
list_for_each_entry(iterm, &chain->entities, chain) {
if (!UVC_ENTITY_IS_ITERM(iterm))
continue;
Static analysis reports that 'queue' may be a user controlled value that
is used as a data dependency to read from the 'ar9170_qmap' array. In
order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue reads
based on an invalid result of 'ar9170_qmap[queue]'. In this case the
value of 'ar9170_qmap[queue]' is immediately reused as an index to the
'ar->edcf' array.
Based on an original patch by Elena Reshetova.
Cc: Christian Lamparter <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/net/wireless/ath/carl9170/main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 988c8857d78c..0acfa8c22b7d 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -41,6 +41,7 @@
#include <linux/module.h>
#include <linux/etherdevice.h>
#include <linux/random.h>
+#include <linux/nospec.h>
#include <net/mac80211.h>
#include <net/cfg80211.h>
#include "hw.h"
@@ -1384,11 +1385,13 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
const struct ieee80211_tx_queue_params *param)
{
struct ar9170 *ar = hw->priv;
+ const u8 *elem;
int ret;
mutex_lock(&ar->mutex);
- if (queue < ar->hw->queues) {
- memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
+ elem = array_ptr(ar9170_qmap, queue, ar->hw->queues);
+ if (elem) {
+ memcpy(&ar->edcf[*elem], param, sizeof(*param));
ret = carl9170_set_qos(ar);
} else {
ret = -EINVAL;
Static analysis reports that 'queue' may be a user controlled value that
is used as a data dependency to read from the 'priv->qos_params' array.
In order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue reads
based on an invalid result of 'priv->qos_params[queue]'.
Based on an original patch by Elena Reshetova.
Cc: Christian Lamparter <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/net/wireless/intersil/p54/main.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index ab6d39e12069..5ce693ff547e 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -20,6 +20,7 @@
#include <linux/firmware.h>
#include <linux/etherdevice.h>
#include <linux/module.h>
+#include <linux/nospec.h>
#include <net/mac80211.h>
@@ -411,12 +412,14 @@ static int p54_conf_tx(struct ieee80211_hw *dev,
const struct ieee80211_tx_queue_params *params)
{
struct p54_common *priv = dev->priv;
+ struct p54_edcf_queue_param *p54_q;
int ret;
mutex_lock(&priv->conf_mutex);
- if (queue < dev->queues) {
- P54_SET_QUEUE(priv->qos_params[queue], params->aifs,
- params->cw_min, params->cw_max, params->txop);
+ p54_q = array_ptr(priv->qos_params, queue, dev->queues);
+ if (p54_q) {
+ P54_SET_QUEUE(p54_q[0], params->aifs, params->cw_min,
+ params->cw_max, params->txop);
ret = p54_set_edcf(priv);
} else
ret = -EINVAL;
Static analysis reports that 'handle' may be a user controlled value
that is used as a data dependency to read 'sp' from the
'req->outstanding_cmds' array. In order to avoid potential leaks of
kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid value of 'sp'. In this
case 'sp' is directly dereferenced later in the function.
If completion tags can be specified by userspace or otherwise
controlled, we should sanitize 'handle' with array_ptr.
Based on an original patch by Elena Reshetova.
Cc: [email protected]
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/scsi/qla2xxx/qla_mr.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c
index d5da3981cefe..6de3f0fddcfc 100644
--- a/drivers/scsi/qla2xxx/qla_mr.c
+++ b/drivers/scsi/qla2xxx/qla_mr.c
@@ -9,6 +9,7 @@
#include <linux/ktime.h>
#include <linux/pci.h>
#include <linux/ratelimit.h>
+#include <linux/nospec.h>
#include <linux/vmalloc.h>
#include <linux/bsg-lib.h>
#include <scsi/scsi_tcq.h>
@@ -2275,7 +2276,7 @@ qlafx00_ioctl_iosb_entry(scsi_qla_host_t *vha, struct req_que *req,
static void
qlafx00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
{
- srb_t *sp;
+ srb_t *sp, **elem;
fc_port_t *fcport;
struct scsi_cmnd *cp;
struct sts_entry_fx00 *sts;
@@ -2304,8 +2305,10 @@ qlafx00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt)
req = ha->req_q_map[que];
/* Validate handle. */
- if (handle < req->num_outstanding_cmds)
- sp = req->outstanding_cmds[handle];
+ elem = array_ptr(req->outstanding_cmds, handle,
+ req->num_outstanding_cmds);
+ if (elem)
+ sp = *elem;
else
sp = NULL;
@@ -2626,7 +2629,7 @@ static void
qlafx00_multistatus_entry(struct scsi_qla_host *vha,
struct rsp_que *rsp, void *pkt)
{
- srb_t *sp;
+ srb_t *sp, **elem;
struct multi_sts_entry_fx00 *stsmfx;
struct qla_hw_data *ha = vha->hw;
uint32_t handle, hindex, handle_count, i;
@@ -2655,8 +2658,10 @@ qlafx00_multistatus_entry(struct scsi_qla_host *vha,
req = ha->req_q_map[que];
/* Validate handle. */
- if (handle < req->num_outstanding_cmds)
- sp = req->outstanding_cmds[handle];
+ elem = array_ptr(req->outstanding_cmds, handle,
+ req->num_outstanding_cmds);
+ if (elem)
+ sp = *elem;
else
sp = NULL;
Static analysis reports that 'queue' may be a user controlled value that
is used as a data dependency to read 'txq_params' from the
'priv->tx_queue_params.params' array. In order to avoid potential leaks
of kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid value of 'txq_params'.
In this case 'txq_params' is referenced later in the function.
Based on an original patch by Elena Reshetova.
Cc: Solomon Peachy <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/net/wireless/st/cw1200/sta.c | 11 +++++++----
drivers/net/wireless/st/cw1200/wsm.h | 4 +---
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/st/cw1200/sta.c b/drivers/net/wireless/st/cw1200/sta.c
index 38678e9a0562..7521077e50a4 100644
--- a/drivers/net/wireless/st/cw1200/sta.c
+++ b/drivers/net/wireless/st/cw1200/sta.c
@@ -14,6 +14,7 @@
#include <linux/firmware.h>
#include <linux/module.h>
#include <linux/etherdevice.h>
+#include <linux/nospec.h>
#include "cw1200.h"
#include "sta.h"
@@ -612,18 +613,20 @@ int cw1200_conf_tx(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
u16 queue, const struct ieee80211_tx_queue_params *params)
{
struct cw1200_common *priv = dev->priv;
+ struct wsm_set_tx_queue_params *txq_params;
int ret = 0;
/* To prevent re-applying PM request OID again and again*/
bool old_uapsd_flags;
mutex_lock(&priv->conf_mutex);
- if (queue < dev->queues) {
+ txq_params = array_ptr(priv->tx_queue_params.params, queue,
+ dev->queues);
+ if (txq_params) {
old_uapsd_flags = le16_to_cpu(priv->uapsd_info.uapsd_flags);
- WSM_TX_QUEUE_SET(&priv->tx_queue_params, queue, 0, 0, 0);
- ret = wsm_set_tx_queue_params(priv,
- &priv->tx_queue_params.params[queue], queue);
+ WSM_TX_QUEUE_SET(txq_params, 0, 0, 0);
+ ret = wsm_set_tx_queue_params(priv, txq_params, queue);
if (ret) {
ret = -EINVAL;
goto out;
diff --git a/drivers/net/wireless/st/cw1200/wsm.h b/drivers/net/wireless/st/cw1200/wsm.h
index 48086e849515..8c8d9191e233 100644
--- a/drivers/net/wireless/st/cw1200/wsm.h
+++ b/drivers/net/wireless/st/cw1200/wsm.h
@@ -1099,10 +1099,8 @@ struct wsm_tx_queue_params {
};
-#define WSM_TX_QUEUE_SET(queue_params, queue, ack_policy, allowed_time,\
- max_life_time) \
+#define WSM_TX_QUEUE_SET(p, ack_policy, allowed_time, max_life_time) \
do { \
- struct wsm_set_tx_queue_params *p = &(queue_params)->params[queue]; \
p->ackPolicy = (ack_policy); \
p->allowedMediumTime = (allowed_time); \
p->maxTransmitLifetime = (max_life_time); \
Static analysis reports that 'index' may be a user controlled value that
is used as a data dependency reading 'rt' from the 'platform_label'
array. In order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue further
reads based on an invalid 'rt' value.
Based on an original patch by Elena Reshetova.
Eric notes:
"
When val is a pointer not an integer.
Then
array2[val] = y;
/* or */
y = array2[va];
Won't happen.
val->field;
Will happen.
Which looks similar. However the address space of pointers is too
large. Making it impossible for an attack to know where to look in
the cache to see if "val->field" happened. At least on the
assumption that val is an arbitrary value.
Further mpls_forward is small enough the entire scope of "rt" the
value read possibly past the bound check is auditable without too
much trouble. I have looked and I don't see anything that could
possibly allow the value of "rt" to be exfitrated. The problem
continuing to be that it is a pointer and the only operation on the
pointer besides derferencing it is testing if it is NULL.
Other types of timing attacks are very hard if not impossible
because any packet presenting with a value outside the bounds check
will be dropped. So it will hard if not impossible to find
something to time to see how long it took to drop the packet.
"
The motivation of resending this patch despite the NAK is to
continue a community wide discussion on the bar for judging Spectre
changes. I.e. is any user controlled speculative pointer in the
kernel a pointer too far, especially given the current array_ptr()
implementation.
Cc: "David S. Miller" <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
net/mpls/af_mpls.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8ca9915befc8..c92b1033adc2 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -8,6 +8,7 @@
#include <linux/ipv6.h>
#include <linux/mpls.h>
#include <linux/netconf.h>
+#include <linux/nospec.h>
#include <linux/vmalloc.h>
#include <linux/percpu.h>
#include <net/ip.h>
@@ -77,12 +78,13 @@ static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
{
struct mpls_route *rt = NULL;
+ struct mpls_route __rcu **platform_label =
+ rcu_dereference(net->mpls.platform_label);
+ struct mpls_route __rcu **rtp;
- if (index < net->mpls.platform_labels) {
- struct mpls_route __rcu **platform_label =
- rcu_dereference(net->mpls.platform_label);
- rt = rcu_dereference(platform_label[index]);
- }
+ rtp = array_ptr(platform_label, index, net->mpls.platform_labels);
+ if (rtp)
+ rt = rcu_dereference(*rtp);
return rt;
}
From: Mark Rutland <[email protected]>
This patch implements ifence_array_ptr() for arm, using an
LDR+MOVCS+CSDB sequence to inhibit speculative use of the returned
value.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/arm/include/asm/barrier.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 40f5c410fd8c..919235ed6e68 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,30 @@ extern void arm_heavy_mb(void);
#define dma_wmb() barrier()
#endif
+#define ifence_array_ptr(arr, idx, sz) \
+({ \
+ typeof(&(arr)[0]) __nap_arr = (arr); \
+ typeof(idx) __nap_idx = (idx); \
+ typeof(sz) __nap_sz = (sz); \
+ \
+ unsigned long __nap_ptr = (unsigned long)__nap_arr + \
+ sizeof(__nap_arr[0]) * idx; \
+ \
+ asm volatile( \
+ " cmp %[i], %[s]\n" \
+ " bcs 1f\n" \
+ " ldr %[p], %[pp]\n" \
+ "1: movcs %[p], #0\n" \
+ " .inst 0xe320f018 @ CSDB\n" \
+ : [p] "=&r" (__nap_ptr) \
+ : [pp] "m" (__nap_ptr), \
+ [i] "r" ((unsigned long)__nap_idx), \
+ [s] "r" ((unsigned long)__nap_sz) \
+ : "cc"); \
+ \
+ (typeof(&(__nap_arr)[0]))__nap_ptr; \
+})
+
#define __smp_mb() dmb(ish)
#define __smp_rmb() __smp_mb()
#define __smp_wmb() dmb(ishst)
This patch doesn't affect arch/x86/lib/getuser.S, which I find surprising.
Of all the user access functions, I actually think that get_user() is
the one most likely to have the result then used speculatively as an
index (the required second dependent read to actually leak data).
I do *not* see people doing "copy_from_user()" and then somehow using
the thing as an index to another array. I mean, it can happen (copy a
structure, use a member in that structure), but it doesn't seem to be
the most likely thing.
The most likely thing would seem to be some random ioctl() do a
"get_user()" to get an index, and then using that index. That would
seem to be one of the easier ways to perhaps get that kind of kernel
spectre attack.
Adding the ASM_IFENCE to __get_user_X() in arch/x86/lib/getuser.S
would seem to go naturally together with the copy_user_64.S changes in
this patch.
Is there some reason __get_user_X() was overlooked? Those are _the_
most common user accessor functions that do the address limit
checking.
Linus
On Thu, Jan 11, 2018 at 5:11 PM, Linus Torvalds
<[email protected]> wrote:
> This patch doesn't affect arch/x86/lib/getuser.S, which I find surprising.
>
> Of all the user access functions, I actually think that get_user() is
> the one most likely to have the result then used speculatively as an
> index (the required second dependent read to actually leak data).
>
> I do *not* see people doing "copy_from_user()" and then somehow using
> the thing as an index to another array. I mean, it can happen (copy a
> structure, use a member in that structure), but it doesn't seem to be
> the most likely thing.
>
> The most likely thing would seem to be some random ioctl() do a
> "get_user()" to get an index, and then using that index. That would
> seem to be one of the easier ways to perhaps get that kind of kernel
> spectre attack.
>
> Adding the ASM_IFENCE to __get_user_X() in arch/x86/lib/getuser.S
> would seem to go naturally together with the copy_user_64.S changes in
> this patch.
>
> Is there some reason __get_user_X() was overlooked? Those are _the_
> most common user accessor functions that do the address limit
> checking.
Oversight, I was focused on the uaccess_begin conversions. Yes, let me
go add ASM_IFENCE after the ASM_STAC in those paths.
On Thu, 2018-01-11 at 16:47 -0800, Dan Williams wrote:
> Static analysis reports that 'handle' may be a user controlled value
> that is used as a data dependency to read 'sp' from the
> 'req->outstanding_cmds' array.
Greg already told you it comes from hardware, specifically the hardware
response queue. Â If you don't believe him, I can confirm it's quite
definitely all copied from the iomem where the mailbox response is, so
it can't be a user controlled value (well, unless the user has some
influence over the firmware of the qla2xxx  controller, which probably
means you have other things to worry about than speculative information
leaks).
I think what it actually is is a handle returned in the mailbox that's
used to find other mailbox entries in different queues (the packet
handle actually contains an entry in the lower 16 bits and a queue
designation in the upper). Â Perhaps the qla2xxx people should comment
on this because the code seems to check and print an error if there's a
problem with the handle being too big, but we don't check the que value
and use it blindly to read into the req_q_map: if handle could be
wrong, couldn't que?
James
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
>
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.
Do you have any performance numbers and perhaps example code
generation? Is this noticeable? Are there any microbenchmarks showing
the difference between lfence use and the masking model?
Having both seems good for testing, but wouldn't we want to pick one in the end?
Also, I do think that there is one particular array load that would
seem to be pretty obvious: the system call function pointer array.
Yes, yes, the actual call is now behind a retpoline, but that protects
against a speculative BTB access, it's not obvious that it protects
against the mispredict of the __NR_syscall_max comparison in
arch/x86/entry/entry_64.S.
The act of fetching code is a kind of read too. And retpoline protects
against BTB stuffing etc, but what if the _actual_ system call
function address is wrong (due to mis-prediction of the system call
index check)?
Should the array access in entry_SYSCALL_64_fastpath be made to use
the masking approach?
Linus
On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
>>
>> This series incorporates Mark Rutland's latest ARM changes and adds
>> the x86 specific implementation of 'ifence_array_ptr'. That ifence
>> based approach is provided as an opt-in fallback, but the default
>> mitigation, '__array_ptr', uses a 'mask' approach that removes
>> conditional branches instructions, and otherwise aims to redirect
>> speculation to use a NULL pointer rather than a user controlled value.
>
> Do you have any performance numbers and perhaps example code
> generation? Is this noticeable? Are there any microbenchmarks showing
> the difference between lfence use and the masking model?
I don't have performance numbers, but here's a sample code generation
from __fcheck_files, where the 'and; lea; and' sequence is portion of
array_ptr() after the mask generation with 'sbb'.
fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
8e7: 8b 02 mov (%rdx),%eax
8e9: 48 39 c7 cmp %rax,%rdi
8ec: 48 19 c9 sbb %rcx,%rcx
8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
8f3: 48 89 fe mov %rdi,%rsi
8f6: 48 21 ce and %rcx,%rsi
8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
8fd: 48 21 c8 and %rcx,%rax
> Having both seems good for testing, but wouldn't we want to pick one in the end?
I was thinking we'd keep it as a 'just in case' sort of thing, at
least until the 'probably safe' assumption of the 'mask' approach has
more time to settle out.
>
> Also, I do think that there is one particular array load that would
> seem to be pretty obvious: the system call function pointer array.
>
> Yes, yes, the actual call is now behind a retpoline, but that protects
> against a speculative BTB access, it's not obvious that it protects
> against the mispredict of the __NR_syscall_max comparison in
> arch/x86/entry/entry_64.S.
>
> The act of fetching code is a kind of read too. And retpoline protects
> against BTB stuffing etc, but what if the _actual_ system call
> function address is wrong (due to mis-prediction of the system call
> index check)?
>
> Should the array access in entry_SYSCALL_64_fastpath be made to use
> the masking approach?
I'll take a look. I'm firmly in the 'patch first / worry later' stance
on these investigations.
Dan Williams <[email protected]> writes:
> The new barrier, 'ifence', ensures that no instructions past the
> boundary are speculatively executed.
This needs a much better description.
If that description was valid we could add ifence in the syscall
entry path and not have any speculative execution to worry about in the
kernel.
Perhaps:
'ifence', ensures that no speculative execution that reaches the 'ifence'
boundary continues past the 'ifence' boundary.
> Previously the kernel only needed this fence in 'rdtsc_ordered', but it
> can also be used as a mitigation against Spectre variant1 attacks that
> speculative access memory past an array bounds check.
>
> 'ifence', via 'ifence_array_ptr', is an opt-in fallback to the default
> mitigation provided by '__array_ptr'. It is also proposed for blocking
> speculation in the 'get_user' path to bypass 'access_ok' checks. For
> now, just provide the common definition for later patches to build
> upon.
This part of the description is probably unnecessary.
Eric
>
> Suggested-by: Peter Zijlstra <[email protected]>
> Suggested-by: Alan Cox <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Greg KH <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> arch/x86/include/asm/barrier.h | 4 ++++
> arch/x86/include/asm/msr.h | 3 +--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 7fb336210e1b..b04f572d6d97 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -24,6 +24,10 @@
> #define wmb() asm volatile("sfence" ::: "memory")
> #endif
>
> +/* prevent speculative execution past this barrier */
> +#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
> + "lfence", X86_FEATURE_LFENCE_RDTSC)
> +
> #ifdef CONFIG_X86_PPRO_FENCE
> #define dma_rmb() rmb()
> #else
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 07962f5f6fba..e426d2a33ff3 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
> * that some other imaginary CPU is updating continuously with a
> * time stamp.
> */
> - alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
> - "lfence", X86_FEATURE_LFENCE_RDTSC);
> + ifence();
> return rdtsc();
> }
>
Dan Williams <[email protected]> writes:
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> new file mode 100644
> index 000000000000..5c66fc30f919
> --- /dev/null
> +++ b/include/linux/nospec.h
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright(c) 2018 Intel Corporation. All rights reserved.
> +
> +#ifndef __NOSPEC_H__
> +#define __NOSPEC_H__
> +
> +#include <linux/jump_label.h>
> +#include <asm/barrier.h>
> +
> +#ifndef array_ptr_mask
> +#define array_ptr_mask(idx, sz) \
> +({ \
> + unsigned long mask; \
> + unsigned long _i = (idx); \
> + unsigned long _s = (sz); \
> + \
> + mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1); \
> + mask; \
> +})
> +#endif
This could really use a comment that explains it generates 0
for out of bound accesses and -1L aka 0xffffffffffffffff for
all other accesses.
The code is clever enough which values it generates is not obvious.
Eric
On Thu, Jan 11, 2018 at 6:27 PM, Eric W. Biederman
<[email protected]> wrote:
>
> Dan Williams <[email protected]> writes:
>
> > The new barrier, 'ifence', ensures that no instructions past the
> > boundary are speculatively executed.
>
> This needs a much better description.
>
> If that description was valid we could add ifence in the syscall
> entry path and not have any speculative execution to worry about in the
> kernel.
True, I'll fix that up.
>
> Perhaps:
> 'ifence', ensures that no speculative execution that reaches the 'ifence'
> boundary continues past the 'ifence' boundary.
>
> > Previously the kernel only needed this fence in 'rdtsc_ordered', but it
> > can also be used as a mitigation against Spectre variant1 attacks that
> > speculative access memory past an array bounds check.
> >
> > 'ifence', via 'ifence_array_ptr', is an opt-in fallback to the default
> > mitigation provided by '__array_ptr'. It is also proposed for blocking
> > speculation in the 'get_user' path to bypass 'access_ok' checks. For
> > now, just provide the common definition for later patches to build
> > upon.
>
> This part of the description is probably unnecessary.
Probably, but having some redundant information in the changelog eases
'git blame' archaeology expeditions in the future.
>
> Eric
>
> >
> > Suggested-by: Peter Zijlstra <[email protected]>
> > Suggested-by: Alan Cox <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Greg KH <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Elena Reshetova <[email protected]>
> > Signed-off-by: Dan Williams <[email protected]>
> > ---
> > arch/x86/include/asm/barrier.h | 4 ++++
> > arch/x86/include/asm/msr.h | 3 +--
> > 2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> > index 7fb336210e1b..b04f572d6d97 100644
> > --- a/arch/x86/include/asm/barrier.h
> > +++ b/arch/x86/include/asm/barrier.h
> > @@ -24,6 +24,10 @@
> > #define wmb() asm volatile("sfence" ::: "memory")
> > #endif
> >
> > +/* prevent speculative execution past this barrier */
> > +#define ifence() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
> > + "lfence", X86_FEATURE_LFENCE_RDTSC)
> > +
> > #ifdef CONFIG_X86_PPRO_FENCE
> > #define dma_rmb() rmb()
> > #else
> > diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> > index 07962f5f6fba..e426d2a33ff3 100644
> > --- a/arch/x86/include/asm/msr.h
> > +++ b/arch/x86/include/asm/msr.h
> > @@ -214,8 +214,7 @@ static __always_inline unsigned long long rdtsc_ordered(void)
> > * that some other imaginary CPU is updating continuously with a
> > * time stamp.
> > */
> > - alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
> > - "lfence", X86_FEATURE_LFENCE_RDTSC);
> > + ifence();
> > return rdtsc();
> > }
> >
On Thu, Jan 11, 2018 at 5:19 PM, James Bottomley
<[email protected]> wrote:
> On Thu, 2018-01-11 at 16:47 -0800, Dan Williams wrote:
>> Static analysis reports that 'handle' may be a user controlled value
>> that is used as a data dependency to read 'sp' from the
>> 'req->outstanding_cmds' array.
>
> Greg already told you it comes from hardware, specifically the hardware
> response queue. If you don't believe him, I can confirm it's quite
> definitely all copied from the iomem where the mailbox response is, so
> it can't be a user controlled value (well, unless the user has some
> influence over the firmware of the qla2xxx controller, which probably
> means you have other things to worry about than speculative information
> leaks).
I do believe him, and I still submitted this. I'm trying to probe at
the meta question of where do we draw the line with these especially
when it costs us relatively little to apply a few line patch? We fix
theoretical lockdep races, why not theoretical data leak paths?
On Thu, 2018-01-11 at 21:38 -0800, Dan Williams wrote:
> On Thu, Jan 11, 2018 at 5:19 PM, James Bottomley
> <[email protected]> wrote:
> >
> > On Thu, 2018-01-11 at 16:47 -0800, Dan Williams wrote:
> > >
> > > Static analysis reports that 'handle' may be a user controlled
> > > value that is used as a data dependency to read 'sp' from the
> > > 'req->outstanding_cmds' array.
> >
> > Greg already told you it comes from hardware, specifically the
> > hardware response queue.  If you don't believe him, I can confirm
> > it's quite definitely all copied from the iomem where the mailbox
> > response is, so it can't be a user controlled value (well, unless
> > the user has some influence over the firmware of the
> > qla2xxx  controller, which probably means you have other things to
> > worry about than speculative information leaks).
>
> I do believe him, and I still submitted this. I'm trying to probe at
> the meta question of where do we draw the line with these especially
> when it costs us relatively little to apply a few line patch? We fix
> theoretical lockdep races, why not theoretical data leak paths?
I think I've lost the thread of what you're after. Â I thought you were
asking for the domain experts to look and see if there is the potential
for attack; if there's no theoretical way for a user to influence the
value what's the point of killing speculation? Â Furthermore, if the
user could affect that 32 bit value, what they'd actually do is extract
information via the que variable which you didn't fix and which could
be used to compromise the kernel without resorting to side channel
attacks.
What's most puzzling to me is the inconsistency of the positions: if it
doesn't cost that much to turn off speculation, just do it on kernel
entry as Jiřà suggested; we can make it a dynamic option and the cloud
providers can do it and the rest of us don't need to bother. Â If it
does cost a lot to turn it off as Alan said, then you need us to
identify the cases above where there's no need to disrupt the
speculation pipeline and not turn it off there. Â Which is it?
James
On Thu, Jan 11, 2018 at 04:47:18PM -0800, Dan Williams wrote:
> Static analysis reports that 'offset' may be a user controlled value
> that is used as a data dependency reading from a raw_frag_vec buffer.
> In order to avoid potential leaks of kernel memory values, block
> speculative execution of the instruction stream that could issue further
> reads based on an invalid '*(rfv->c + offset)' value.
>
> Based on an original patch by Elena Reshetova.
There is the "Co-Developed-by:" tag now, if you want to use it...
> Cc: "David S. Miller" <[email protected]>
> Cc: Alexey Kuznetsov <[email protected]>
> Cc: Hideaki YOSHIFUJI <[email protected]>
> Cc: [email protected]
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> net/ipv4/raw.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
Ugh, what is this, the 4th time I've said "I don't think this is an
issue, so why are you changing this code." to this patch. To be
followed by a "oh yeah, you are right, I'll drop it", only to see it
show back up in the next time this patch series is sent out?
Same for the other patches in this series that I have reviewed 4, maybe
5, times already. The "v2" is not very true here...
thanks,
greg k-h
On Thu, Jan 11, 2018 at 04:46:56PM -0800, Dan Williams wrote:
> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
> new file mode 100644
> index 000000000000..5c66fc30f919
> --- /dev/null
> +++ b/include/linux/nospec.h
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright(c) 2018 Intel Corporation. All rights reserved.
> +
> +#ifndef __NOSPEC_H__
> +#define __NOSPEC_H__
> +
> +#include <linux/jump_label.h>
> +#include <asm/barrier.h>
> +
> +#ifndef array_ptr_mask
> +#define array_ptr_mask(idx, sz) \
> +({ \
> + unsigned long mask; \
> + unsigned long _i = (idx); \
> + unsigned long _s = (sz); \
> + \
> + mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1); \
> + mask; \
> +})
> +#endif
> +
> +/**
> + * __array_ptr - Generate a pointer to an array element, ensuring
> + * the pointer is bounded under speculation to NULL.
> + *
> + * @base: the base of the array
> + * @idx: the index of the element, must be less than LONG_MAX
> + * @sz: the number of elements in the array, must be less than LONG_MAX
> + *
> + * If @idx falls in the interval [0, @sz), returns the pointer to
> + * @arr[@idx], otherwise returns NULL.
> + */
> +#define __array_ptr(base, idx, sz) \
> +({ \
> + union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
> + typeof(*(base)) *_arr = (base); \
> + unsigned long _i = (idx); \
> + unsigned long _mask = array_ptr_mask(_i, (sz)); \
> + \
> + __u._ptr = _arr + (_i & _mask); \
> + __u._bit &= _mask; \
> + __u._ptr; \
> +})
> +
> +#ifdef CONFIG_SPECTRE1_IFENCE
> +DECLARE_STATIC_KEY_TRUE(nospec_key);
> +#else
> +DECLARE_STATIC_KEY_FALSE(nospec_key);
> +#endif
> +
> +#ifdef ifence_array_ptr
> +/*
> + * The expectation is that no compiler or cpu will mishandle __array_ptr
> + * leading to problematic speculative execution. Bypass the ifence
> + * based implementation by default.
> + */
> +#define array_ptr(base, idx, sz) \
> +({ \
> + typeof(*(base)) *__ret; \
> + \
> + if (static_branch_unlikely(&nospec_key)) \
> + __ret = ifence_array_ptr(base, idx, sz); \
> + else \
> + __ret = __array_ptr(base, idx, sz); \
> + __ret; \
> +})
So I think this wants:
#ifndef HAVE_JUMP_LABEL
#error Compiler lacks asm-goto, can generate unsafe code
#endif
Suppose the generic array_ptr_mask() is unsafe on some arch and they
only implement ifence_array_ptr() and they compile without asm-goto,
then the above reverts to a dynamic condition, which can be speculated.
If we then speculate into the 'bad' __array_ptr we're screwed.
> +#else
> +#define array_ptr __array_ptr
> +#endif
> +
> +#endif /* __NOSPEC_H__ */
In general I think I would write all this in a form like:
#define __array_ptr(base, idx, sz) \
({ \
union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
typeof(*(base)) *_arr = (base); \
unsigned long _i = (idx); \
unsigned long _mask = array_ptr_mask(_i, (sz)); \
\
__u._ptr = _arr + (_i & _mask); \
__u._bit &= _mask; \
__u._ptr; \
})
#if defined(array_ptr_mask) && defined(ifence_array_ptr)
#ifndef HAVE_JUMP_LABEL
#error Compiler lacks asm-goto, can generate unsafe code
#endif
#define array_ptr(base, idx, sz) \
({ \
typeof(*(base)) *__ret; \
\
if (static_branch_unlikely(&nospec_key)) \
__ret = ifence_array_ptr(base, idx, sz); \
else \
__ret = __array_ptr(base, idx, sz); \
__ret; \
})
#elif defined(array_ptr_mask)
#define array_ptr(base, idx, sz) __array_ptr(base, idx, sz)
#elif defined(ifence_array_ptr)
#define array_ptr(base, idx, sz) ifence_array_ptr(base, idx, sz)
#else
/* XXX we want a suitable warning here ? */
#define array_ptr(base, idx, sz) (idx < sz ? base + idx : NULL)
#endif
and stick the generic array_ptr_mask into asm-generic/nospec.h or
something.
Then the static key stuff is limited to architectures that define _both_
array_ptr_mask and ifence_array_ptr.
Do you think that the appropriate patches could be copied to the
appropriate people please?
On Thu, Jan 11, 2018 at 04:46:24PM -0800, Dan Williams wrote:
> Changes since v1 [1]:
> * fixup the ifence definition to use alternative_2 per recent AMD
> changes in tip/x86/pti (Tom)
>
> * drop 'nospec_ptr' (Linus, Mark)
>
> * rename 'nospec_array_ptr' to 'array_ptr' (Alexei)
>
> * rename 'nospec_barrier' to 'ifence' (Peter, Ingo)
>
> * clean up occasions of 'variable assignment in if()' (Sergei, Stephen)
>
> * make 'array_ptr' use a mask instead of an architectural ifence by
> default (Linus, Alexei)
>
> * provide a command line and compile-time opt-in to the ifence
> mechanism, if an architecture provides 'ifence_array_ptr'.
>
> * provide an optimized mask generation helper, 'array_ptr_mask', for
> x86 (Linus)
>
> * move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
> (Linus)
>
> * drop "Thermal/int340x: prevent bounds-check..." since userspace does
> not have arbitrary control over the 'trip' index (Srinivas)
>
> * update the changelog of "net: mpls: prevent bounds-check..." and keep
> it in the series to continue the debate about Spectre hygiene patches.
> (Eric).
>
> * record a reviewed-by from Laurent on "[media] uvcvideo: prevent
> bounds-check..."
>
> * update the cover letter
>
> [1]: https://lwn.net/Articles/743376/
>
> ---
>
> Quoting Mark's original RFC:
>
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
>
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.
>
> The mask is generated by the following from Alexei, and Linus:
>
> mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);
>
> ...and Linus provided an optimized mask generation helper for x86:
>
> asm ("cmpq %1,%2; sbbq %0,%0;"
> :"=r" (mask)
> :"r"(sz),"r" (idx)
> :"cc");
>
> The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
> via the spectre_v1={mask,ifence} command line option, and the
> compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
> CONFIG_SPECTRE1_IFENCE.
>
> The 'array_ptr' infrastructure is the primary focus this patch set. The
> individual patches that perform 'array_ptr' conversions are a point in
> time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
> start at finding some of these gadgets.
>
> Another consideration for reviewing these patches is the 'hygiene'
> argument. When a patch refers to hygiene it is concerned with stopping
> speculation on an unconstrained or insufficiently constrained pointer
> value under userspace control. That by itself is not sufficient for
> attack (per current understanding) [3], but it is a necessary
> pre-condition. So 'hygiene' refers to cleaning up those suspect
> pointers regardless of whether they are usable as a gadget.
>
> These patches are also be available via the 'nospec-v2' git branch
> here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2
>
> Note that the BPF fix for Spectre variant1 is merged in the bpf.git
> tree [4], and is not included in this branch.
>
> [2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> [3]: https://spectreattack.com/spectre.pdf
> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98
>
> ---
>
> Dan Williams (16):
> x86: implement ifence()
> x86: implement ifence_array_ptr() and array_ptr_mask()
> asm-generic/barrier: mask speculative execution flows
> x86: introduce __uaccess_begin_nospec and ASM_IFENCE
> x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
> ipv6: prevent bounds-check bypass via speculative execution
> ipv4: prevent bounds-check bypass via speculative execution
> vfs, fdtable: prevent bounds-check bypass via speculative execution
> userns: prevent bounds-check bypass via speculative execution
> udf: prevent bounds-check bypass via speculative execution
> [media] uvcvideo: prevent bounds-check bypass via speculative execution
> carl9170: prevent bounds-check bypass via speculative execution
> p54: prevent bounds-check bypass via speculative execution
> qla2xxx: prevent bounds-check bypass via speculative execution
> cw1200: prevent bounds-check bypass via speculative execution
> net: mpls: prevent bounds-check bypass via speculative execution
>
> Mark Rutland (3):
> Documentation: document array_ptr
> arm64: implement ifence_array_ptr()
> arm: implement ifence_array_ptr()
>
> Documentation/speculation.txt | 142 ++++++++++++++++++++++++++++++
> arch/arm/Kconfig | 1
> arch/arm/include/asm/barrier.h | 24 +++++
> arch/arm64/Kconfig | 1
> arch/arm64/include/asm/barrier.h | 24 +++++
> arch/x86/Kconfig | 3 +
> arch/x86/include/asm/barrier.h | 46 ++++++++++
> arch/x86/include/asm/msr.h | 3 -
> arch/x86/include/asm/smap.h | 4 +
> arch/x86/include/asm/uaccess.h | 16 +++
> arch/x86/include/asm/uaccess_32.h | 6 +
> arch/x86/include/asm/uaccess_64.h | 12 +--
> arch/x86/lib/copy_user_64.S | 3 +
> arch/x86/lib/usercopy_32.c | 8 +-
> drivers/media/usb/uvc/uvc_v4l2.c | 9 +-
> drivers/net/wireless/ath/carl9170/main.c | 7 +
> drivers/net/wireless/intersil/p54/main.c | 9 +-
> drivers/net/wireless/st/cw1200/sta.c | 11 +-
> drivers/net/wireless/st/cw1200/wsm.h | 4 -
> drivers/scsi/qla2xxx/qla_mr.c | 17 ++--
> fs/udf/misc.c | 40 +++++---
> include/linux/fdtable.h | 7 +
> include/linux/nospec.h | 71 +++++++++++++++
> kernel/Kconfig.nospec | 31 +++++++
> kernel/Makefile | 1
> kernel/nospec.c | 52 +++++++++++
> kernel/user_namespace.c | 11 +-
> lib/Kconfig | 3 +
> net/ipv4/raw.c | 10 +-
> net/ipv6/raw.c | 10 +-
> net/mpls/af_mpls.c | 12 +--
> 31 files changed, 521 insertions(+), 77 deletions(-)
> create mode 100644 Documentation/speculation.txt
> create mode 100644 include/linux/nospec.h
> create mode 100644 kernel/Kconfig.nospec
> create mode 100644 kernel/nospec.c
>
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Hi Dan,
On Fri, Jan 12, 2018 at 1:46 AM, Dan Williams <[email protected]> wrote:
> From: Mark Rutland <[email protected]>
>
> Document the rationale and usage of the new array_ptr() helper.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
Thanks for the update!
> --- /dev/null
> +++ b/Documentation/speculation.txt
> @@ -0,0 +1,142 @@
> +This document explains potential effects of speculation, and how undesirable
> +effects can be mitigated portably using common APIs.
> +
> +===========
> +Speculation
> +===========
> +
> +To improve performance and minimize average latencies, many contemporary CPUs
> +employ speculative execution techniques such as branch prediction, performing
> +work which may be discarded at a later stage.
> +
> +Typically speculative execution cannot be observed from architectural state,
> +such as the contents of registers. However, in some cases it is possible to
> +observe its impact on microarchitectural state, such as the presence or
> +absence of data in caches. Such state may form side-channels which can be
> +observed to extract secret information.
> +
> +For example, in the presence of branch prediction, it is possible for bounds
> +checks to be ignored by code which is speculatively executed. Consider the
> +following code:
> +
> + int load_array(int *array, unsigned int idx) {
One more opening curly brace to move to the next line.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
> Static analysis reports that 'queue' may be a user controlled value that
> is used as a data dependency to read from the 'ar9170_qmap' array. In
> order to avoid potential leaks of kernel memory values, block
> speculative execution of the instruction stream that could issue reads
> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
> 'ar->edcf' array.
>
> Based on an original patch by Elena Reshetova.
>
> Cc: Christian Lamparter <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
This patch (and p54, cw1200) look like the same patch?!
Can you tell me what happend to:
On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
> > And Furthermore a invalid queue (param->ac) would cause a crash in
> > this line in mac80211 before it even reaches the driver [1]:
> > | sdata->tx_conf[params->ac] = p;
> > | ^^^^^^^^
> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
> > | ^^ (this is a wrapper for the *_op_conf_tx)
> >
> > I don't think these chin-up exercises are needed.
>
> Quite the contrary, you've identified a better place in the call stack
> to sanitize the input and disable speculation. Then we can kill the
> whole class of the wireless driver reports at once it seems.
<https://www.spinics.net/lists/netdev/msg476353.html>
Anyway, I think there's an easy way to solve this: remove the
"if (queue < ar->hw->queues)" check altogether. It's no longer needed
anymore as the "queue" value is validated long before the driver code
gets called. And from my understanding, this will fix the "In this case
the value of 'ar9170_qmap[queue]' is immediately reused as an index to
the 'ar->edcf' array." gripe your tool complains about.
This is here's a quick test-case for carl9170.:
---
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 988c8857d78c..2d3afb15bb62 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -1387,13 +1387,8 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
int ret;
mutex_lock(&ar->mutex);
- if (queue < ar->hw->queues) {
- memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
- ret = carl9170_set_qos(ar);
- } else {
- ret = -EINVAL;
- }
-
+ memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
+ ret = carl9170_set_qos(ar);
mutex_unlock(&ar->mutex);
return ret;
}
---
What does your tool say about this?
(If necessary, the "queue" value could be even sanitized with a
queue %= ARRAY_SIZE(ar9170_qmap); before the mutex_lock.)
On Thu, Jan 11, 2018 at 04:47:02PM -0800, Dan Williams wrote:
> For 'get_user' paths, do not allow the kernel to speculate on the value
> of a user controlled pointer. In addition to the 'stac' instruction for
> Supervisor Mode Access Protection, an 'ifence' causes the 'access_ok'
> result to resolve in the pipeline before the cpu might take any
> speculative action on the pointer value.
So I understand the need to "patch first and ask questions later". I
also understand that usercopy is an obvious attack point for speculative
bugs. However, I'm still hopelessly confused about what exactly this
patch (and the next one) are supposed to accomplish.
I can't figure out if:
a) I'm missing something completely obvious;
b) this is poorly described; or
c) it doesn't actually fix/protect/harden anything.
The commit log doesn't help me at all. In fact, it confuses me more.
For example, this paragraph:
> Since this is a major kernel interface that deals with user controlled
> data, the '__uaccess_begin_nospec' mechanism will prevent speculative
> execution past an 'access_ok' permission check. While speculative
> execution past 'access_ok' is not enough to lead to a kernel memory
> leak, it is a necessary precondition.
That just sounds wrong. What if the speculation starts *after* the
access_ok() check? Then the barrier has no purpose.
Most access_ok/get_user/copy_from_user calls are like this:
if (copy_from_user(...uptr..)) /* or access_ok() or get_user() */
return -EFAULT;
So in other words, the usercopy function is called *before* the branch.
But to halt speculation, the lfence needs to come *after* the branch.
So putting lfences *before* the branch doesn't solve anything.
So what am I missing?
--
Josh
On Fri, Jan 12, 2018 at 9:51 AM, Josh Poimboeuf <[email protected]> wrote:
> On Thu, Jan 11, 2018 at 04:47:02PM -0800, Dan Williams wrote:
>> For 'get_user' paths, do not allow the kernel to speculate on the value
>> of a user controlled pointer. In addition to the 'stac' instruction for
>> Supervisor Mode Access Protection, an 'ifence' causes the 'access_ok'
>> result to resolve in the pipeline before the cpu might take any
>> speculative action on the pointer value.
>
> So I understand the need to "patch first and ask questions later". I
> also understand that usercopy is an obvious attack point for speculative
> bugs. However, I'm still hopelessly confused about what exactly this
> patch (and the next one) are supposed to accomplish.
>
> I can't figure out if:
>
> a) I'm missing something completely obvious;
> b) this is poorly described; or
> c) it doesn't actually fix/protect/harden anything.
>
> The commit log doesn't help me at all. In fact, it confuses me more.
> For example, this paragraph:
>
>> Since this is a major kernel interface that deals with user controlled
>> data, the '__uaccess_begin_nospec' mechanism will prevent speculative
>> execution past an 'access_ok' permission check. While speculative
>> execution past 'access_ok' is not enough to lead to a kernel memory
>> leak, it is a necessary precondition.
>
> That just sounds wrong. What if the speculation starts *after* the
> access_ok() check? Then the barrier has no purpose.
>
> Most access_ok/get_user/copy_from_user calls are like this:
>
> if (copy_from_user(...uptr..)) /* or access_ok() or get_user() */
> return -EFAULT;
>
> So in other words, the usercopy function is called *before* the branch.
>
> But to halt speculation, the lfence needs to come *after* the branch.
> So putting lfences *before* the branch doesn't solve anything.
>
> So what am I missing?
We're trying to prevent a pointer under user control from being
de-referenced inside the kernel, before we know it has been limited to
something safe. In the following sequence the branch we are worried
about speculating is the privilege check:
if (access_ok(uptr)) /* <--- Privelege Check */
if (copy_from_user_(uptr))
The cpu can speculatively skip that access_ok() check and cause a read
of kernel memory.
On Fri, Jan 12, 2018 at 6:42 AM, Christian Lamparter <[email protected]> wrote:
> On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
>> Static analysis reports that 'queue' may be a user controlled value that
>> is used as a data dependency to read from the 'ar9170_qmap' array. In
>> order to avoid potential leaks of kernel memory values, block
>> speculative execution of the instruction stream that could issue reads
>> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
>> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
>> 'ar->edcf' array.
>>
>> Based on an original patch by Elena Reshetova.
>>
>> Cc: Christian Lamparter <[email protected]>
>> Cc: Kalle Valo <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Elena Reshetova <[email protected]>
>> Signed-off-by: Dan Williams <[email protected]>
>> ---
> This patch (and p54, cw1200) look like the same patch?!
> Can you tell me what happend to:
>
> On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
>> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
>> > And Furthermore a invalid queue (param->ac) would cause a crash in
>> > this line in mac80211 before it even reaches the driver [1]:
>> > | sdata->tx_conf[params->ac] = p;
>> > | ^^^^^^^^
>> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
>> > | ^^ (this is a wrapper for the *_op_conf_tx)
>> >
>> > I don't think these chin-up exercises are needed.
>>
>> Quite the contrary, you've identified a better place in the call stack
>> to sanitize the input and disable speculation. Then we can kill the
>> whole class of the wireless driver reports at once it seems.
> <https://www.spinics.net/lists/netdev/msg476353.html>
I didn't see where ac is being validated against the driver specific
'queues' value in that earlier patch.
>
> Anyway, I think there's an easy way to solve this: remove the
> "if (queue < ar->hw->queues)" check altogether. It's no longer needed
> anymore as the "queue" value is validated long before the driver code
> gets called.
Can you point me to where that validation happens?
> And from my understanding, this will fix the "In this case
> the value of 'ar9170_qmap[queue]' is immediately reused as an index to
> the 'ar->edcf' array." gripe your tool complains about.
>
> This is here's a quick test-case for carl9170.:
> ---
> diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
> index 988c8857d78c..2d3afb15bb62 100644
> --- a/drivers/net/wireless/ath/carl9170/main.c
> +++ b/drivers/net/wireless/ath/carl9170/main.c
> @@ -1387,13 +1387,8 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
> int ret;
>
> mutex_lock(&ar->mutex);
> - if (queue < ar->hw->queues) {
> - memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> - ret = carl9170_set_qos(ar);
> - } else {
> - ret = -EINVAL;
> - }
> -
> + memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> + ret = carl9170_set_qos(ar);
> mutex_unlock(&ar->mutex);
> return ret;
> }
> ---
> What does your tool say about this?
If you take away the 'if' then I don't the tool will report on this.
> (If necessary, the "queue" value could be even sanitized with a
> queue %= ARRAY_SIZE(ar9170_qmap); before the mutex_lock.)
That is what array_ptr() is doing in a more sophisticated way.
On Thu, Jan 11, 2018 at 11:59 PM, Greg KH <[email protected]> wrote:
> On Thu, Jan 11, 2018 at 04:47:18PM -0800, Dan Williams wrote:
>> Static analysis reports that 'offset' may be a user controlled value
>> that is used as a data dependency reading from a raw_frag_vec buffer.
>> In order to avoid potential leaks of kernel memory values, block
>> speculative execution of the instruction stream that could issue further
>> reads based on an invalid '*(rfv->c + offset)' value.
>>
>> Based on an original patch by Elena Reshetova.
>
> There is the "Co-Developed-by:" tag now, if you want to use it...
Ok, thanks.
>
>> Cc: "David S. Miller" <[email protected]>
>> Cc: Alexey Kuznetsov <[email protected]>
>> Cc: Hideaki YOSHIFUJI <[email protected]>
>> Cc: [email protected]
>> Signed-off-by: Elena Reshetova <[email protected]>
>> Signed-off-by: Dan Williams <[email protected]>
>> ---
>> net/ipv4/raw.c | 10 ++++++----
>> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> Ugh, what is this, the 4th time I've said "I don't think this is an
> issue, so why are you changing this code." to this patch. To be
> followed by a "oh yeah, you are right, I'll drop it", only to see it
> show back up in the next time this patch series is sent out?
>
> Same for the other patches in this series that I have reviewed 4, maybe
> 5, times already. The "v2" is not very true here...
The theme of the review feedback on v1 was 'don't put ifence in any
net/ code', and that was addressed.
I honestly thought the new definition of array_ptr() changed the
calculus on this patch. Given the same pattern appears in the ipv6
case, and I have yet to hear that we should drop the ipv6 patch, make
the code symmetric just for readability purposes. Otherwise we need a
comment saying why this is safe for ipv4, but maybe not safe for ipv6,
I think 'array_ptr' is effectively that comment. I.e. 'array_ptr()' is
designed to be low impact for instrumenting false positives. If that
new argument does not hold water I will definitely drop this patch.
On Fri, Jan 12, 2018 at 10:21:43AM -0800, Dan Williams wrote:
> > That just sounds wrong. What if the speculation starts *after* the
> > access_ok() check? Then the barrier has no purpose.
> >
> > Most access_ok/get_user/copy_from_user calls are like this:
> >
> > if (copy_from_user(...uptr..)) /* or access_ok() or get_user() */
> > return -EFAULT;
> >
> > So in other words, the usercopy function is called *before* the branch.
> >
> > But to halt speculation, the lfence needs to come *after* the branch.
> > So putting lfences *before* the branch doesn't solve anything.
> >
> > So what am I missing?
>
> We're trying to prevent a pointer under user control from being
> de-referenced inside the kernel, before we know it has been limited to
> something safe. In the following sequence the branch we are worried
> about speculating is the privilege check:
>
> if (access_ok(uptr)) /* <--- Privelege Check */
> if (copy_from_user_(uptr))
>
> The cpu can speculatively skip that access_ok() check and cause a read
> of kernel memory.
Converting your example code to assembly:
call access_ok # no speculation which started before this point is allowed to continue past this point
test %rax, %rax
jne error_path
dereference_uptr:
(do nefarious things with the user pointer)
error_path:
mov -EINVAL, %rax
ret
So the CPU is still free to speculately execute the dereference_uptr
block because the lfence was before the 'jne error_path' branch.
--
Josh
On Fri, Jan 12, 2018 at 10:58 AM, Josh Poimboeuf <[email protected]> wrote:
> On Fri, Jan 12, 2018 at 10:21:43AM -0800, Dan Williams wrote:
>> > That just sounds wrong. What if the speculation starts *after* the
>> > access_ok() check? Then the barrier has no purpose.
>> >
>> > Most access_ok/get_user/copy_from_user calls are like this:
>> >
>> > if (copy_from_user(...uptr..)) /* or access_ok() or get_user() */
>> > return -EFAULT;
>> >
>> > So in other words, the usercopy function is called *before* the branch.
>> >
>> > But to halt speculation, the lfence needs to come *after* the branch.
>> > So putting lfences *before* the branch doesn't solve anything.
>> >
>> > So what am I missing?
>>
>> We're trying to prevent a pointer under user control from being
>> de-referenced inside the kernel, before we know it has been limited to
>> something safe. In the following sequence the branch we are worried
>> about speculating is the privilege check:
>>
>> if (access_ok(uptr)) /* <--- Privelege Check */
>> if (copy_from_user_(uptr))
>>
>> The cpu can speculatively skip that access_ok() check and cause a read
>> of kernel memory.
>
> Converting your example code to assembly:
>
> call access_ok # no speculation which started before this point is allowed to continue past this point
> test %rax, %rax
> jne error_path
> dereference_uptr:
> (do nefarious things with the user pointer)
>
> error_path:
> mov -EINVAL, %rax
> ret
>
> So the CPU is still free to speculately execute the dereference_uptr
> block because the lfence was before the 'jne error_path' branch.
By the time we get to de-reference uptr we know it is not pointing at
kernel memory, because access_ok would have failed and the cpu would
have waited for that failure result before doing anything else.
Now the vulnerability that still exists after this fence is if we do
something with the value returned from de-referencing that pointer.
I.e. if we do:
get_user(val, uptr);
if (val < array_max)
array[val];
...then we're back to having a user controlled pointer in the kernel.
This is the point where we need array_ptr() to sanitize things.
On Fri, Jan 12, 2018 at 11:26 AM, Dan Williams <[email protected]> wrote:
>
> By the time we get to de-reference uptr we know it is not pointing at
> kernel memory, because access_ok would have failed and the cpu would
> have waited for that failure result before doing anything else.
I'm not actually convinced that's right in the original patches,
exactly because of the issue that Josh pointed out: even if there is a
comparison inside access_ok() that will be properly serialized, then
that comparison can (and sometimes does) just cause a truth value to
be generated, and then there might be *another* comparison of that
return value after the lfence. And while the return value is table,
the conditional branch on that comparison isn't.
The new model of just doing it together with the STAC should be fine, though.
I do think that it would be a good idea to very expressly document the
fact that it's not that the user access itself is unsafe. I do agree
that things like "get_user()" want to be protected, but not because of
any direct bugs or problems with get_user() and friends, but simply
because get_user() is an excellent source of a pointer that is
obviously controlled from a potentially attacking user space. So it's
a prime candidate for then finding _subsequent_ accesses that can then
be used to perturb the cache.
Linus
On Friday, January 12, 2018 7:39:50 PM CET Dan Williams wrote:
> On Fri, Jan 12, 2018 at 6:42 AM, Christian Lamparter <[email protected]> wrote:
> > On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
> >> Static analysis reports that 'queue' may be a user controlled value that
> >> is used as a data dependency to read from the 'ar9170_qmap' array. In
> >> order to avoid potential leaks of kernel memory values, block
> >> speculative execution of the instruction stream that could issue reads
> >> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
> >> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
> >> 'ar->edcf' array.
> >>
> >> Based on an original patch by Elena Reshetova.
> >>
> >> Cc: Christian Lamparter <[email protected]>
> >> Cc: Kalle Valo <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Signed-off-by: Elena Reshetova <[email protected]>
> >> Signed-off-by: Dan Williams <[email protected]>
> >> ---
> > This patch (and p54, cw1200) look like the same patch?!
> > Can you tell me what happend to:
> >
> > On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
> >> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
> >> > And Furthermore a invalid queue (param->ac) would cause a crash in
> >> > this line in mac80211 before it even reaches the driver [1]:
> >> > | sdata->tx_conf[params->ac] = p;
> >> > | ^^^^^^^^
> >> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
> >> > | ^^ (this is a wrapper for the *_op_conf_tx)
> >> >
> >> > I don't think these chin-up exercises are needed.
> >>
> >> Quite the contrary, you've identified a better place in the call stack
> >> to sanitize the input and disable speculation. Then we can kill the
> >> whole class of the wireless driver reports at once it seems.
> > <https://www.spinics.net/lists/netdev/msg476353.html>
>
> I didn't see where ac is being validated against the driver specific
> 'queues' value in that earlier patch.
The link to the check is right there in the earlier post. It's in
parse_txq_params():
<https://github.com/torvalds/linux/blob/master/net/wireless/nl80211.c#L2070>
| if (txq_params->ac >= NL80211_NUM_ACS)
| return -EINVAL;
NL80211_NUM_ACS is 4
<http://elixir.free-electrons.com/linux/v4.15-rc7/source/include/uapi/linux/nl80211.h#L3748>
This check was added ever since mac80211's ieee80211_set_txq_params():
| sdata->tx_conf[params->ac] = p;
For cw1200: the driver just sets the dev->queue to 4.
In carl9170 dev->queues is set to __AR9170_NUM_TXQ and
p54 uses P54_QUEUE_AC_NUM.
Both __AR9170_NUM_TXQ and P54_QUEUE_AC_NUM are 4.
And this is not going to change since all drivers
have to follow mac80211's queue API:
<https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues>
Some background:
In the old days (linux 2.6 and early 3.x), the parse_txq_params() function did
not verify the "queue" value. That's why these drivers had to do it.
Here's the relevant code from 2.6.39:
<http://elixir.free-electrons.com/linux/v2.6.39/source/net/wireless/nl80211.c#L879>
You'll notice that the check is missing there.
Here's mac80211's ieee80211_set_txq_params for reference:
<http://elixir.free-electrons.com/linux/v2.6.39/source/net/mac80211/cfg.c#L1197>
However over time, the check in the driver has become redundant.
> > Anyway, I think there's an easy way to solve this: remove the
> > "if (queue < ar->hw->queues)" check altogether. It's no longer needed
> > anymore as the "queue" value is validated long before the driver code
> > gets called.
> >
> > And from my understanding, this will fix the "In this case
> > the value of 'ar9170_qmap[queue]' is immediately reused as an index to
> > the 'ar->edcf' array." gripe your tool complains about.
> >
> > This is here's a quick test-case for carl9170.:
> > ---
> > diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
> > index 988c8857d78c..2d3afb15bb62 100644
> > --- a/drivers/net/wireless/ath/carl9170/main.c
> > +++ b/drivers/net/wireless/ath/carl9170/main.c
> > @@ -1387,13 +1387,8 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
> > int ret;
> >
> > mutex_lock(&ar->mutex);
> > - if (queue < ar->hw->queues) {
> > - memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> > - ret = carl9170_set_qos(ar);
> > - } else {
> > - ret = -EINVAL;
> > - }
> > -
> > + memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> > + ret = carl9170_set_qos(ar);
> > mutex_unlock(&ar->mutex);
> > return ret;
> > }
> > ---
> > What does your tool say about this?
>
> If you take away the 'if' then I don't the tool will report on this.
>
> > (If necessary, the "queue" value could be even sanitized with a
> > queue %= ARRAY_SIZE(ar9170_qmap); before the mutex_lock.)
>
> That is what array_ptr() is doing in a more sophisticated way.
I think it's a very roundabout way :D. In any case the queue %= ...
could also be replaced by:
BUILD_BUG_ON(ARRAY_SIZE(ar9170_qmap) != NL80211_NUM_ACS));
(And the equivalent for p54, cw1200)
On Fri, Jan 12, 2018 at 12:01:04PM -0800, Linus Torvalds wrote:
> On Fri, Jan 12, 2018 at 11:26 AM, Dan Williams <[email protected]> wrote:
> >
> > By the time we get to de-reference uptr we know it is not pointing at
> > kernel memory, because access_ok would have failed and the cpu would
> > have waited for that failure result before doing anything else.
>
> I'm not actually convinced that's right in the original patches,
> exactly because of the issue that Josh pointed out: even if there is a
> comparison inside access_ok() that will be properly serialized, then
> that comparison can (and sometimes does) just cause a truth value to
> be generated, and then there might be *another* comparison of that
> return value after the lfence. And while the return value is table,
> the conditional branch on that comparison isn't.
>
> The new model of just doing it together with the STAC should be fine, though.
Aha, that clears it up for me, thanks. I was still thinking about the
previous version of the patch which had the barrier in access_ok(). I
didn't realize the new version moved the barrier to after the
access_ok() checks.
--
Josh
On Fri, Jan 12, 2018 at 12:01 PM, Christian Lamparter
<[email protected]> wrote:
> On Friday, January 12, 2018 7:39:50 PM CET Dan Williams wrote:
>> On Fri, Jan 12, 2018 at 6:42 AM, Christian Lamparter <[email protected]> wrote:
>> > On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
>> >> Static analysis reports that 'queue' may be a user controlled value that
>> >> is used as a data dependency to read from the 'ar9170_qmap' array. In
>> >> order to avoid potential leaks of kernel memory values, block
>> >> speculative execution of the instruction stream that could issue reads
>> >> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
>> >> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
>> >> 'ar->edcf' array.
>> >>
>> >> Based on an original patch by Elena Reshetova.
>> >>
>> >> Cc: Christian Lamparter <[email protected]>
>> >> Cc: Kalle Valo <[email protected]>
>> >> Cc: [email protected]
>> >> Cc: [email protected]
>> >> Signed-off-by: Elena Reshetova <[email protected]>
>> >> Signed-off-by: Dan Williams <[email protected]>
>> >> ---
>> > This patch (and p54, cw1200) look like the same patch?!
>> > Can you tell me what happend to:
>> >
>> > On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
>> >> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
>> >> > And Furthermore a invalid queue (param->ac) would cause a crash in
>> >> > this line in mac80211 before it even reaches the driver [1]:
>> >> > | sdata->tx_conf[params->ac] = p;
>> >> > | ^^^^^^^^
>> >> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
>> >> > | ^^ (this is a wrapper for the *_op_conf_tx)
>> >> >
>> >> > I don't think these chin-up exercises are needed.
>> >>
>> >> Quite the contrary, you've identified a better place in the call stack
>> >> to sanitize the input and disable speculation. Then we can kill the
>> >> whole class of the wireless driver reports at once it seems.
>> > <https://www.spinics.net/lists/netdev/msg476353.html>
>>
>> I didn't see where ac is being validated against the driver specific
>> 'queues' value in that earlier patch.
> The link to the check is right there in the earlier post. It's in
> parse_txq_params():
> <https://github.com/torvalds/linux/blob/master/net/wireless/nl80211.c#L2070>
> | if (txq_params->ac >= NL80211_NUM_ACS)
> | return -EINVAL;
>
> NL80211_NUM_ACS is 4
> <http://elixir.free-electrons.com/linux/v4.15-rc7/source/include/uapi/linux/nl80211.h#L3748>
>
> This check was added ever since mac80211's ieee80211_set_txq_params():
> | sdata->tx_conf[params->ac] = p;
>
> For cw1200: the driver just sets the dev->queue to 4.
> In carl9170 dev->queues is set to __AR9170_NUM_TXQ and
> p54 uses P54_QUEUE_AC_NUM.
>
> Both __AR9170_NUM_TXQ and P54_QUEUE_AC_NUM are 4.
> And this is not going to change since all drivers
> have to follow mac80211's queue API:
> <https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues>
>
> Some background:
> In the old days (linux 2.6 and early 3.x), the parse_txq_params() function did
> not verify the "queue" value. That's why these drivers had to do it.
>
> Here's the relevant code from 2.6.39:
> <http://elixir.free-electrons.com/linux/v2.6.39/source/net/wireless/nl80211.c#L879>
> You'll notice that the check is missing there.
> Here's mac80211's ieee80211_set_txq_params for reference:
> <http://elixir.free-electrons.com/linux/v2.6.39/source/net/mac80211/cfg.c#L1197>
>
> However over time, the check in the driver has become redundant.
>
Excellent, thank you for pointing that out and the background so clearly.
What this tells me though is that we want to inject an ifence() at
this input validation point, i.e.:
if (txq_params->ac >= NL80211_NUM_ACS) {
ifence();
return -EINVAL;
}
...but the kernel, in these patches, only has ifence() defined for
x86. The only way we can sanitize the 'txq_params->ac' value without
ifence() is to do it at array access time, but then we're stuck
touching all drivers when standard kernel development practice says
'refactor common code out of drivers'.
Ugh, the bigger concern is that this driver is being flagged and not
that initial bounds check. Imagine if cw1200 and p54 had already been
converted to assume that they can just trust 'queue'. It would never
have been flagged.
I think we should focus on the get_user path and __fcheck_files for v3.
On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
<[email protected]> wrote:
> Should the array access in entry_SYSCALL_64_fastpath be made to use
> the masking approach?
That one has a bounds check for an inline constant.
cmpq $__NR_syscall_max, %rax
so should be safe.
The classic Spectre variant #1 code sequence is:
int array_size;
if (x < array_size) {
something with array[x]
}
which runs into problems because the array_size variable may not
be in cache, and while the CPU core is waiting for the value it
speculates inside the "if" body.
The syscall entry is more like:
#define ARRAY_SIZE 10
if (x < ARRAY_SIZE) {
something with array[x]
}
Here there isn't any reason for speculation. The core has the
value of 'x' in a register and the upper bound encoded into the
"cmp" instruction. Both are right there, no waiting, no speculation.
-Tony
On Fri, Jan 12, 2018 at 1:12 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Jan 11, 2018 at 04:46:56PM -0800, Dan Williams wrote:
>> diff --git a/include/linux/nospec.h b/include/linux/nospec.h
>> new file mode 100644
>> index 000000000000..5c66fc30f919
>> --- /dev/null
>> +++ b/include/linux/nospec.h
>> @@ -0,0 +1,71 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright(c) 2018 Intel Corporation. All rights reserved.
>> +
>> +#ifndef __NOSPEC_H__
>> +#define __NOSPEC_H__
>> +
>> +#include <linux/jump_label.h>
>> +#include <asm/barrier.h>
>> +
>> +#ifndef array_ptr_mask
>> +#define array_ptr_mask(idx, sz) \
>> +({ \
>> + unsigned long mask; \
>> + unsigned long _i = (idx); \
>> + unsigned long _s = (sz); \
>> + \
>> + mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1); \
>> + mask; \
>> +})
>> +#endif
>> +
>> +/**
>> + * __array_ptr - Generate a pointer to an array element, ensuring
>> + * the pointer is bounded under speculation to NULL.
>> + *
>> + * @base: the base of the array
>> + * @idx: the index of the element, must be less than LONG_MAX
>> + * @sz: the number of elements in the array, must be less than LONG_MAX
>> + *
>> + * If @idx falls in the interval [0, @sz), returns the pointer to
>> + * @arr[@idx], otherwise returns NULL.
>> + */
>> +#define __array_ptr(base, idx, sz) \
>> +({ \
>> + union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
>> + typeof(*(base)) *_arr = (base); \
>> + unsigned long _i = (idx); \
>> + unsigned long _mask = array_ptr_mask(_i, (sz)); \
>> + \
>> + __u._ptr = _arr + (_i & _mask); \
>> + __u._bit &= _mask; \
>> + __u._ptr; \
>> +})
>> +
>> +#ifdef CONFIG_SPECTRE1_IFENCE
>> +DECLARE_STATIC_KEY_TRUE(nospec_key);
>> +#else
>> +DECLARE_STATIC_KEY_FALSE(nospec_key);
>> +#endif
>> +
>> +#ifdef ifence_array_ptr
>> +/*
>> + * The expectation is that no compiler or cpu will mishandle __array_ptr
>> + * leading to problematic speculative execution. Bypass the ifence
>> + * based implementation by default.
>> + */
>> +#define array_ptr(base, idx, sz) \
>> +({ \
>> + typeof(*(base)) *__ret; \
>> + \
>> + if (static_branch_unlikely(&nospec_key)) \
>> + __ret = ifence_array_ptr(base, idx, sz); \
>> + else \
>> + __ret = __array_ptr(base, idx, sz); \
>> + __ret; \
>> +})
>
>
> So I think this wants:
>
> #ifndef HAVE_JUMP_LABEL
> #error Compiler lacks asm-goto, can generate unsafe code
> #endif
>
> Suppose the generic array_ptr_mask() is unsafe on some arch and they
> only implement ifence_array_ptr() and they compile without asm-goto,
> then the above reverts to a dynamic condition, which can be speculated.
> If we then speculate into the 'bad' __array_ptr we're screwed.
True.
>
>> +#else
>> +#define array_ptr __array_ptr
>> +#endif
>> +
>> +#endif /* __NOSPEC_H__ */
>
>
> In general I think I would write all this in a form like:
>
> #define __array_ptr(base, idx, sz) \
> ({ \
> union { typeof(*(base)) *_ptr; unsigned long _bit; } __u; \
> typeof(*(base)) *_arr = (base); \
> unsigned long _i = (idx); \
> unsigned long _mask = array_ptr_mask(_i, (sz)); \
> \
> __u._ptr = _arr + (_i & _mask); \
> __u._bit &= _mask; \
> __u._ptr; \
> })
>
> #if defined(array_ptr_mask) && defined(ifence_array_ptr)
>
> #ifndef HAVE_JUMP_LABEL
> #error Compiler lacks asm-goto, can generate unsafe code
> #endif
>
> #define array_ptr(base, idx, sz) \
> ({ \
> typeof(*(base)) *__ret; \
> \
> if (static_branch_unlikely(&nospec_key)) \
> __ret = ifence_array_ptr(base, idx, sz); \
> else \
> __ret = __array_ptr(base, idx, sz); \
> __ret; \
> })
>
> #elif defined(array_ptr_mask)
>
> #define array_ptr(base, idx, sz) __array_ptr(base, idx, sz)
>
> #elif defined(ifence_array_ptr)
>
> #define array_ptr(base, idx, sz) ifence_array_ptr(base, idx, sz)
>
> #else
>
> /* XXX we want a suitable warning here ? */
>
> #define array_ptr(base, idx, sz) (idx < sz ? base + idx : NULL)
>
> #endif
>
> and stick the generic array_ptr_mask into asm-generic/nospec.h or
> something.
>
> Then the static key stuff is limited to architectures that define _both_
> array_ptr_mask and ifence_array_ptr.
This certainly needs a Kconfig "depends on JUMP_LABEL" to turn on the
dynamic switching at all, and a HAVE_JUMP_LABEL compile time failure
if the compiler lacks support. I don't think we need the checks on
'defined(array_ptr_mask) or that 'XXX' warning case, because default
mask is assumed safe, and otherwise better than nothing.
On Fri, Jan 12, 2018 at 10:47:44AM -0800, Dan Williams wrote:
> On Thu, Jan 11, 2018 at 11:59 PM, Greg KH <[email protected]> wrote:
> >> Cc: "David S. Miller" <[email protected]>
> >> Cc: Alexey Kuznetsov <[email protected]>
> >> Cc: Hideaki YOSHIFUJI <[email protected]>
> >> Cc: [email protected]
> >> Signed-off-by: Elena Reshetova <[email protected]>
> >> Signed-off-by: Dan Williams <[email protected]>
> >> ---
> >> net/ipv4/raw.c | 10 ++++++----
> >> 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > Ugh, what is this, the 4th time I've said "I don't think this is an
> > issue, so why are you changing this code." to this patch. To be
> > followed by a "oh yeah, you are right, I'll drop it", only to see it
> > show back up in the next time this patch series is sent out?
> >
> > Same for the other patches in this series that I have reviewed 4, maybe
> > 5, times already. The "v2" is not very true here...
>
> The theme of the review feedback on v1 was 'don't put ifence in any
> net/ code', and that was addressed.
>
> I honestly thought the new definition of array_ptr() changed the
> calculus on this patch. Given the same pattern appears in the ipv6
> case, and I have yet to hear that we should drop the ipv6 patch, make
> the code symmetric just for readability purposes. Otherwise we need a
> comment saying why this is safe for ipv4, but maybe not safe for ipv6,
> I think 'array_ptr' is effectively that comment. I.e. 'array_ptr()' is
> designed to be low impact for instrumenting false positives. If that
> new argument does not hold water I will definitely drop this patch.
I also argued, again in an older review of this same patch series, that
the ipv6 patch should be dropped as well for this same exact reason.
I didn't think you wanted to hear me rant about the same thing on both
patches :)
greg k-h
On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck <[email protected]> wrote:
>
> Here there isn't any reason for speculation. The core has the
> value of 'x' in a register and the upper bound encoded into the
> "cmp" instruction. Both are right there, no waiting, no speculation.
So this is an argument I haven't seen before (although it was brought
up in private long ago), but that is very relevant: the actual scope
and depth of speculation.
Your argument basically depends on just what gets speculated, and on
the _actual_ order of execution.
So your argument depends on "the uarch will actually run the code in
order if there are no events that block the pipeline".
Or at least it depends on a certain latency of the killing of any OoO
execution being low enough that the cache access doesn't even begin.
I realize that that is very much a particular microarchitectural
detail, but it's actually a *big* deal. Do we have a set of rules for
what is not a worry, simply because the speculated accesses get killed
early enough?
Apparently "test a register value against a constant" is good enough,
assuming that register is also needed for the address of the access.
Linus
On Fri, Jan 12, 2018 at 04:41:37PM -0800, Dan Williams wrote:
> This certainly needs a Kconfig "depends on JUMP_LABEL" to turn on the
> dynamic switching at all, and a HAVE_JUMP_LABEL compile time failure
> if the compiler lacks support.
We're ready to drop all compilers that don't support asm-goto on the
floor for x86. So don't over engineer it just to avoid that.
On Thu 11-01-18 16:47:35, Dan Williams wrote:
> Static analysis reports that 'eahd->appAttrLocation' and
> 'eahd->impAttrLocation' may be a user controlled values that are used as
> data dependencies for calculating source and destination buffers for
> memmove operations. In order to avoid potential leaks of kernel memory
> values, block speculative execution of the instruction stream that could
> issue further reads based on invalid 'aal' or 'ial' values.
>
> Based on an original patch by Elena Reshetova.
>
> Cc: Jan Kara <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
Dan, I've already emailed to you [1] why I don't think this patch is needed
at all. Do you disagree or did my email just get lost?
[1] https://marc.info/?l=linux-arch&m=151540683024125&w=2
Honza
> ---
> fs/udf/misc.c | 40 ++++++++++++++++++++++------------------
> 1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/fs/udf/misc.c b/fs/udf/misc.c
> index 401e64cde1be..693e24699928 100644
> --- a/fs/udf/misc.c
> +++ b/fs/udf/misc.c
> @@ -22,6 +22,7 @@
> #include "udfdecl.h"
>
> #include <linux/fs.h>
> +#include <linux/nospec.h>
> #include <linux/string.h>
> #include <linux/crc-itu-t.h>
>
> @@ -51,6 +52,8 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
> int offset;
> uint16_t crclen;
> struct udf_inode_info *iinfo = UDF_I(inode);
> + uint8_t *ea_dst, *ea_src;
> + uint32_t aal, ial;
>
> ea = iinfo->i_ext.i_data;
> if (iinfo->i_lenEAttr) {
> @@ -100,33 +103,34 @@ struct genericFormat *udf_add_extendedattr(struct inode *inode, uint32_t size,
>
> offset = iinfo->i_lenEAttr;
> if (type < 2048) {
> - if (le32_to_cpu(eahd->appAttrLocation) <
> - iinfo->i_lenEAttr) {
> - uint32_t aal =
> - le32_to_cpu(eahd->appAttrLocation);
> - memmove(&ea[offset - aal + size],
> - &ea[aal], offset - aal);
> + aal = le32_to_cpu(eahd->appAttrLocation);
> + ea_dst = array_ptr(ea, offset - aal + size,
> + iinfo->i_lenEAttr);
> + ea_src = array_ptr(ea, aal, iinfo->i_lenEAttr);
> + if (ea_dst && ea_src) {
> + memmove(ea_dst, ea_src, offset - aal);
> offset -= aal;
> eahd->appAttrLocation =
> cpu_to_le32(aal + size);
> }
> - if (le32_to_cpu(eahd->impAttrLocation) <
> - iinfo->i_lenEAttr) {
> - uint32_t ial =
> - le32_to_cpu(eahd->impAttrLocation);
> - memmove(&ea[offset - ial + size],
> - &ea[ial], offset - ial);
> +
> + ial = le32_to_cpu(eahd->impAttrLocation);
> + ea_dst = array_ptr(ea, offset - ial + size,
> + iinfo->i_lenEAttr);
> + ea_src = array_ptr(ea, ial, iinfo->i_lenEAttr);
> + if (ea_dst && ea_src) {
> + memmove(ea_dst, ea_src, offset - ial);
> offset -= ial;
> eahd->impAttrLocation =
> cpu_to_le32(ial + size);
> }
> } else if (type < 65536) {
> - if (le32_to_cpu(eahd->appAttrLocation) <
> - iinfo->i_lenEAttr) {
> - uint32_t aal =
> - le32_to_cpu(eahd->appAttrLocation);
> - memmove(&ea[offset - aal + size],
> - &ea[aal], offset - aal);
> + aal = le32_to_cpu(eahd->appAttrLocation);
> + ea_dst = array_ptr(ea, offset - aal + size,
> + iinfo->i_lenEAttr);
> + ea_src = array_ptr(ea, aal, iinfo->i_lenEAttr);
> + if (ea_dst && ea_src) {
> + memmove(ea_dst, ea_src, offset - aal);
> offset -= aal;
> eahd->appAttrLocation =
> cpu_to_le32(aal + size);
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Jan 15, 2018 at 2:32 AM, Jan Kara <[email protected]> wrote:
> On Thu 11-01-18 16:47:35, Dan Williams wrote:
>> Static analysis reports that 'eahd->appAttrLocation' and
>> 'eahd->impAttrLocation' may be a user controlled values that are used as
>> data dependencies for calculating source and destination buffers for
>> memmove operations. In order to avoid potential leaks of kernel memory
>> values, block speculative execution of the instruction stream that could
>> issue further reads based on invalid 'aal' or 'ial' values.
>>
>> Based on an original patch by Elena Reshetova.
>>
>> Cc: Jan Kara <[email protected]>
>> Signed-off-by: Elena Reshetova <[email protected]>
>> Signed-off-by: Dan Williams <[email protected]>
>
> Dan, I've already emailed to you [1] why I don't think this patch is needed
> at all. Do you disagree or did my email just get lost?
>
> [1] https://marc.info/?l=linux-arch&m=151540683024125&w=2
Sorry, I missed that one before the v2 posting went out. I've dropped
this from the v3 [1] posting.
[1]: https://marc.info/?l=linux-kernel&m=151586794400997&w=2
On Sat, Jan 13, 2018 at 10:51 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck <[email protected]> wrote:
> So your argument depends on "the uarch will actually run the code in
> order if there are no events that block the pipeline".
And might be bogus ... I'm a software person not a u-arch expert. That
sounded good in my head, but the level of parallelism may be greater
than I can imagine.
> Or at least it depends on a certain latency of the killing of any OoO
> execution being low enough that the cache access doesn't even begin.
>
> I realize that that is very much a particular microarchitectural
> detail, but it's actually a *big* deal. Do we have a set of rules for
> what is not a worry, simply because the speculated accesses get killed
> early enough?
>
> Apparently "test a register value against a constant" is good enough,
> assuming that register is also needed for the address of the access.
People who do understand this are working on what can be guaranteed.
For now don't make big plans based on my ramblings.
-Tony
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
> From: Mark Rutland <[email protected]>
>
> Document the rationale and usage of the new array_ptr() helper.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
> Documentation/speculation.txt | 142 +++++++++++++++++++++++++++++++++++++++++
> [...]
> +NULL is returned. Additionally, array_ptr() an out-of-bounds poitner is
> +not propagated to code which is speculatively executed.
I think this meant to say:
Additionally, array_ptr() of an out-of-bounds pointer is
not propagated to code which is speculatively executed.
Other than that:
Reviewed-by: Kees Cook <[email protected]>
-Kees
--
Kees Cook
Pixel Security
Hi Dan, Linus,
On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
> <[email protected]> wrote:
> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
> >>
> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> based approach is provided as an opt-in fallback, but the default
> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> conditional branches instructions, and otherwise aims to redirect
> >> speculation to use a NULL pointer rather than a user controlled value.
> >
> > Do you have any performance numbers and perhaps example code
> > generation? Is this noticeable? Are there any microbenchmarks showing
> > the difference between lfence use and the masking model?
>
> I don't have performance numbers, but here's a sample code generation
> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> array_ptr() after the mask generation with 'sbb'.
>
> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> 8e7: 8b 02 mov (%rdx),%eax
> 8e9: 48 39 c7 cmp %rax,%rdi
> 8ec: 48 19 c9 sbb %rcx,%rcx
> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
> 8f3: 48 89 fe mov %rdi,%rsi
> 8f6: 48 21 ce and %rcx,%rsi
> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
> 8fd: 48 21 c8 and %rcx,%rax
>
>
> > Having both seems good for testing, but wouldn't we want to pick one in the end?
>
> I was thinking we'd keep it as a 'just in case' sort of thing, at
> least until the 'probably safe' assumption of the 'mask' approach has
> more time to settle out.
From the arm64 side, the only concern I have (and this actually applies to
our CSDB sequence as well) is the calculation of the array size by the
caller. As Linus mentioned at the end of [1], if the determination of the
size argument is based on a conditional branch, then masking doesn't help
because you bound within the wrong range under speculation.
We ran into this when trying to use masking to protect our uaccess routines
where the conditional bound is either KERNEL_DS or USER_DS. It's possible
that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
we'd need to throw some heavy barriers in set_fs to make it robust.
The question then is whether or not we're likely to run into this elsewhere,
and I don't have a good feel for that.
Will
[1] http://lkml.kernel.org/r/CA+55aFz0tsreoa=5Ud2noFCpng-dizLBhT9WU9asyhpLfjdcYA@mail.gmail.com
On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon <[email protected]> wrote:
> Hi Dan, Linus,
>
> On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
>> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
>> <[email protected]> wrote:
>> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
>> >>
>> >> This series incorporates Mark Rutland's latest ARM changes and adds
>> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
>> >> based approach is provided as an opt-in fallback, but the default
>> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
>> >> conditional branches instructions, and otherwise aims to redirect
>> >> speculation to use a NULL pointer rather than a user controlled value.
>> >
>> > Do you have any performance numbers and perhaps example code
>> > generation? Is this noticeable? Are there any microbenchmarks showing
>> > the difference between lfence use and the masking model?
>>
>> I don't have performance numbers, but here's a sample code generation
>> from __fcheck_files, where the 'and; lea; and' sequence is portion of
>> array_ptr() after the mask generation with 'sbb'.
>>
>> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
>> 8e7: 8b 02 mov (%rdx),%eax
>> 8e9: 48 39 c7 cmp %rax,%rdi
>> 8ec: 48 19 c9 sbb %rcx,%rcx
>> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
>> 8f3: 48 89 fe mov %rdi,%rsi
>> 8f6: 48 21 ce and %rcx,%rsi
>> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
>> 8fd: 48 21 c8 and %rcx,%rax
>>
>>
>> > Having both seems good for testing, but wouldn't we want to pick one in the end?
>>
>> I was thinking we'd keep it as a 'just in case' sort of thing, at
>> least until the 'probably safe' assumption of the 'mask' approach has
>> more time to settle out.
>
> From the arm64 side, the only concern I have (and this actually applies to
> our CSDB sequence as well) is the calculation of the array size by the
> caller. As Linus mentioned at the end of [1], if the determination of the
> size argument is based on a conditional branch, then masking doesn't help
> because you bound within the wrong range under speculation.
>
> We ran into this when trying to use masking to protect our uaccess routines
> where the conditional bound is either KERNEL_DS or USER_DS. It's possible
> that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
> we'd need to throw some heavy barriers in set_fs to make it robust.
At least in the conditional mask case near set_fs() usage the approach
we are taking is to use a barrier. I.e. the following guidance from
Linus:
"Basically, the rule is trivial: find all 'stac' users, and use address
masking if those users already integrate the limit check, and lfence
they don't."
...which translates to narrow the pointer for get_user() and use a
barrier for __get_user().
On Thu, Jan 18, 2018 at 08:58:08AM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon <[email protected]> wrote:
> > On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> >> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
> >> <[email protected]> wrote:
> >> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
> >> >>
> >> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> >> based approach is provided as an opt-in fallback, but the default
> >> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> >> conditional branches instructions, and otherwise aims to redirect
> >> >> speculation to use a NULL pointer rather than a user controlled value.
> >> >
> >> > Do you have any performance numbers and perhaps example code
> >> > generation? Is this noticeable? Are there any microbenchmarks showing
> >> > the difference between lfence use and the masking model?
> >>
> >> I don't have performance numbers, but here's a sample code generation
> >> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> >> array_ptr() after the mask generation with 'sbb'.
> >>
> >> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> >> 8e7: 8b 02 mov (%rdx),%eax
> >> 8e9: 48 39 c7 cmp %rax,%rdi
> >> 8ec: 48 19 c9 sbb %rcx,%rcx
> >> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
> >> 8f3: 48 89 fe mov %rdi,%rsi
> >> 8f6: 48 21 ce and %rcx,%rsi
> >> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
> >> 8fd: 48 21 c8 and %rcx,%rax
> >>
> >>
> >> > Having both seems good for testing, but wouldn't we want to pick one in the end?
> >>
> >> I was thinking we'd keep it as a 'just in case' sort of thing, at
> >> least until the 'probably safe' assumption of the 'mask' approach has
> >> more time to settle out.
> >
> > From the arm64 side, the only concern I have (and this actually applies to
> > our CSDB sequence as well) is the calculation of the array size by the
> > caller. As Linus mentioned at the end of [1], if the determination of the
> > size argument is based on a conditional branch, then masking doesn't help
> > because you bound within the wrong range under speculation.
> >
> > We ran into this when trying to use masking to protect our uaccess routines
> > where the conditional bound is either KERNEL_DS or USER_DS. It's possible
> > that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
> > we'd need to throw some heavy barriers in set_fs to make it robust.
>
> At least in the conditional mask case near set_fs() usage the approach
> we are taking is to use a barrier. I.e. the following guidance from
> Linus:
>
> "Basically, the rule is trivial: find all 'stac' users, and use address
> masking if those users already integrate the limit check, and lfence
> they don't."
>
> ...which translates to narrow the pointer for get_user() and use a
> barrier for __get_user().
Great, that matches my thinking re set_fs but I'm still worried about
finding all the places where the bound is conditional for other users
of the macro. Then again, finding the places that need this macro in the
first place is tough enough so perhaps analysing the bound calculation
doesn't make it much worse.
Will
Hi Will,
On Thursday, 18 January 2018 19:05:47 EET Will Deacon wrote:
> On Thu, Jan 18, 2018 at 08:58:08AM -0800, Dan Williams wrote:
> > On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon wrote:
> >> On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> >>> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds wrote:
> >>>> On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams wrote:
> >>>>> This series incorporates Mark Rutland's latest ARM changes and adds
> >>>>> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >>>>> based approach is provided as an opt-in fallback, but the default
> >>>>> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >>>>> conditional branches instructions, and otherwise aims to redirect
> >>>>> speculation to use a NULL pointer rather than a user controlled
> >>>>> value.
> >>>>
> >>>> Do you have any performance numbers and perhaps example code
> >>>> generation? Is this noticeable? Are there any microbenchmarks showing
> >>>> the difference between lfence use and the masking model?
> >>>
> >>> I don't have performance numbers, but here's a sample code generation
> >>> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> >>> array_ptr() after the mask generation with 'sbb'.
> >>>
> >>> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> >>>
> >>> 8e7: 8b 02 mov (%rdx),%eax
> >>> 8e9: 48 39 c7 cmp %rax,%rdi
> >>> 8ec: 48 19 c9 sbb %rcx,%rcx
> >>> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
> >>> 8f3: 48 89 fe mov %rdi,%rsi
> >>> 8f6: 48 21 ce and %rcx,%rsi
> >>> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
> >>> 8fd: 48 21 c8 and %rcx,%rax
> >>>>
> >>>> Having both seems good for testing, but wouldn't we want to pick one
> >>>> in the end?
> >>>
> >>> I was thinking we'd keep it as a 'just in case' sort of thing, at
> >>> least until the 'probably safe' assumption of the 'mask' approach has
> >>> more time to settle out.
> >>
> >> From the arm64 side, the only concern I have (and this actually applies
> >> to our CSDB sequence as well) is the calculation of the array size by
> >> the caller. As Linus mentioned at the end of [1], if the determination
> >> of the size argument is based on a conditional branch, then masking
> >> doesn't help because you bound within the wrong range under speculation.
> >>
> >> We ran into this when trying to use masking to protect our uaccess
> >> routines where the conditional bound is either KERNEL_DS or USER_DS.
> >> It's possible that a prior conditional set_fs(KERNEL_DS) could defeat
> >> the masking and so we'd need to throw some heavy barriers in set_fs to
> >> make it robust.
> >
> > At least in the conditional mask case near set_fs() usage the approach
> > we are taking is to use a barrier. I.e. the following guidance from
> > Linus:
> >
> > "Basically, the rule is trivial: find all 'stac' users, and use address
> > masking if those users already integrate the limit check, and lfence
> > they don't."
> >
> > ...which translates to narrow the pointer for get_user() and use a
> > barrier for __get_user().
>
> Great, that matches my thinking re set_fs but I'm still worried about
> finding all the places where the bound is conditional for other users
> of the macro. Then again, finding the places that need this macro in the
> first place is tough enough so perhaps analysing the bound calculation
> doesn't make it much worse.
It might not now, but if the bound calculation changes later, I'm pretty sure
we'll forget to update the speculation barrier macro at least in some cases.
Without the help of static (or possibly dynamic) code analysis I think we're
bound to reintroduce problems over time, but that's true for finding places
where the barrier is needed, not just for barrier selection based on bound
calculation.
--
Regards,
Laurent Pinchart
Hi Dan,
Thank you for the patch.
On Friday, 12 January 2018 02:47:40 EEST Dan Williams wrote:
> Static analysis reports that 'index' may be a user controlled value that
> is used as a data dependency to read 'pin' from the
> 'selector->baSourceID' array. In order to avoid potential leaks of
> kernel memory values, block speculative execution of the instruction
> stream that could issue reads based on an invalid value of 'pin'.
>
> Based on an original patch by Elena Reshetova.
>
> Laurent notes:
>
> "...as this is nowhere close to being a fast path, I think we can close
> this potential hole as proposed in the patch"
>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: [email protected]
> Reviewed-by: Laurent Pinchart <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
What's the status of this series (and of this patch in particular) ?
> ---
> drivers/media/usb/uvc/uvc_v4l2.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_v4l2.c
> b/drivers/media/usb/uvc/uvc_v4l2.c index 3e7e283a44a8..30ee200206ee 100644
> --- a/drivers/media/usb/uvc/uvc_v4l2.c
> +++ b/drivers/media/usb/uvc/uvc_v4l2.c
> @@ -22,6 +22,7 @@
> #include <linux/mm.h>
> #include <linux/wait.h>
> #include <linux/atomic.h>
> +#include <linux/nospec.h>
>
> #include <media/v4l2-common.h>
> #include <media/v4l2-ctrls.h>
> @@ -809,8 +810,12 @@ static int uvc_ioctl_enum_input(struct file *file, void
> *fh, const struct uvc_entity *selector = chain->selector;
> struct uvc_entity *iterm = NULL;
> u32 index = input->index;
> + __u8 *elem = NULL;
> int pin = 0;
>
> + if (selector)
> + elem = array_ptr(selector->baSourceID, index,
> + selector->bNrInPins);
> if (selector == NULL ||
> (chain->dev->quirks & UVC_QUIRK_IGNORE_SELECTOR_UNIT)) {
> if (index != 0)
> @@ -820,8 +825,8 @@ static int uvc_ioctl_enum_input(struct file *file, void
> *fh, break;
> }
> pin = iterm->id;
> - } else if (index < selector->bNrInPins) {
> - pin = selector->baSourceID[index];
> + } else if (elem) {
> + pin = *elem;
> list_for_each_entry(iterm, &chain->entities, chain) {
> if (!UVC_ENTITY_IS_ITERM(iterm))
> continue;
--
Regards,
Laurent Pinchart