Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [1]
and the Documentation patch in this series.
There are a number of potential gadgets in the Linux codebase, and
mitigations for these are architecture-specific.
This RFC attempts to provide a cross-architecture API for inhibiting
these primitives. Hopefully, architecture-specific mitigations can be
unified behind this. An arm64 implementation is provided following the
architecturally recommended sequence laid out in the Arm whitepaper [2].
The API is based on a proposed compiler intrinsic [3].
I've provided a patch to BPF as an example use of the API. I know that
this is incomplete and less than optimal. I'd appreciate feedback from
other affected architectures as to whether this API is suitable for
their required mitigation.
I've pushed the series to my kernel.org repo [4].
Since v1 [5]:
* Remove the nospec_*load helpers
* Added nospec_array_ptr()
* Rework asm-generic implementation to fit other architectures
* Improve documentation
[1] https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
[2] https://developer.arm.com/support/security-update
[3] https://developer.arm.com/support/security-update/compiler-support-for-mitigations
[4] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git core/nospec
[5] https://lkml.kernel.org/r/[email protected]
Thanks,
Mark.
Mark Rutland (4):
asm-generic/barrier: add generic nospec helpers
Documentation: document nospec helpers
arm64: implement nospec_{load,ptr}()
bpf: inhibit speculated out-of-bounds pointers
Documentation/speculation.txt | 166 +++++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/barrier.h | 55 +++++++++++++
include/asm-generic/barrier.h | 68 ++++++++++++++++
kernel/bpf/arraymap.c | 20 +++--
kernel/bpf/cpumap.c | 5 +-
kernel/bpf/devmap.c | 3 +-
kernel/bpf/sockmap.c | 3 +-
7 files changed, 308 insertions(+), 12 deletions(-)
create mode 100644 Documentation/speculation.txt
--
2.11.0
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
This patch adds helpers which can be used to inhibit the use of
out-of-bounds pointers under speculation.
A generic implementation is provided for compatibility, but does not
guarantee safety under speculation. Architectures are expected to
override these helpers as necessary.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Daniel Willams <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/asm-generic/barrier.h | 68 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
Dan, I've reworked this so that nospec_ptr() can take an arch-specific barrier
sequence. I believe that for x86 you just need to implement __nospec_barrier()
as osb().
Mark.
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index fe297b599b0a..91c3071f49e5 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -54,6 +54,74 @@
#define read_barrier_depends() do { } while (0)
#endif
+/*
+ * Inhibit subsequent speculative memory accesses.
+ *
+ * Architectures with a suitable memory barrier should provide an
+ * implementation. This is non-portable, and generic code should use
+ * nospec_ptr().
+ */
+#ifndef __nospec_barrier
+#define __nospec_barrier() do { } while (0)
+#endif
+
+/**
+ * nospec_ptr() - Ensure a pointer is bounded, even under speculation.
+ *
+ * @ptr: the pointer to test
+ * @lo: the lower valid bound for @ptr, inclusive
+ * @hi: the upper valid bound for @ptr, exclusive
+ *
+ * If @ptr falls in the interval [@lo, @i), returns @ptr, otherwise returns
+ * NULL.
+ *
+ * Architectures which do not provide __nospec_barrier() should override this
+ * to ensure that ptr falls in the [lo, hi) interval both under architectural
+ * execution and under speculation, preventing propagation of an out-of-bounds
+ * pointer to code which is speculatively executed.
+ */
+#ifndef nospec_ptr
+#define nospec_ptr(ptr, lo, hi) \
+({ \
+ typeof (ptr) __ret; \
+ typeof (ptr) __ptr = (ptr); \
+ typeof (ptr) __lo = (lo); \
+ typeof (ptr) __hi = (hi); \
+ \
+ __ret = (__lo <= __ptr && __ptr < __hi) ? __ptr : NULL; \
+ \
+ __nospec_barrier(); \
+ \
+ __ret; \
+})
+#endif
+
+/**
+ * nospec_array_ptr - Generate a pointer to an array element, ensuring the
+ * pointer is bounded under speculation.
+ *
+ * @arr: the base of the array
+ * @idx: the index of the element
+ * @sz: the number of elements in the array
+ *
+ * If @idx falls in the interval [0, @sz), returns the pointer to @arr[@idx],
+ * otherwise returns NULL.
+ *
+ * This is a wrapper around nospec_ptr(), provided for convenience.
+ * Architectures should implement nospec_ptr() to ensure this is the case
+ * under speculation.
+ */
+#define nospec_array_ptr(arr, idx, sz) \
+({ \
+ typeof(*(arr)) *__arr = (arr); \
+ typeof(idx) __idx = (idx); \
+ typeof(sz) __sz = (sz); \
+ \
+ nospec_ptr(__arr + __idx, __arr, __arr + __sz); \
+})
+
+#undef __nospec_barrier
+
#ifndef __smp_mb
#define __smp_mb() mb()
#endif
--
2.11.0
Document the rationale and usage of the new nospec*() helpers.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
Documentation/speculation.txt | 166 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 166 insertions(+)
create mode 100644 Documentation/speculation.txt
diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
new file mode 100644
index 000000000000..748fcd4dcda4
--- /dev/null
+++ b/Documentation/speculation.txt
@@ -0,0 +1,166 @@
+This document explains potential effects of speculation, and how undesirable
+effects can be mitigated portably using common APIs.
+
+===========
+Speculation
+===========
+
+To improve performance and minimize average latencies, many contemporary CPUs
+employ speculative execution techniques such as branch prediction, performing
+work which may be discarded at a later stage.
+
+Typically speculative execution cannot be observed from architectural state,
+such as the contents of registers. However, in some cases it is possible to
+observe its impact on microarchitectural state, such as the presence or
+absence of data in caches. Such state may form side-channels which can be
+observed to extract secret information.
+
+For example, in the presence of branch prediction, it is possible for bounds
+checks to be ignored by code which is speculatively executed. Consider the
+following code:
+
+ int load_array(int *array, unsigned int idx) {
+ if (idx >= MAX_ARRAY_ELEMS)
+ return 0;
+ else
+ return array[idx];
+ }
+
+Which, on arm64, may be compiled to an assembly sequence such as:
+
+ CMP <idx>, #MAX_ARRAY_ELEMS
+ B.LT less
+ MOV <returnval>, #0
+ RET
+ less:
+ LDR <returnval>, [<array>, <idx>]
+ RET
+
+It is possible that a CPU mis-predicts the conditional branch, and
+speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
+will subsequently be discarded, but the speculated load may affect
+microarchitectural state which can be subsequently measured.
+
+More complex sequences involving multiple dependent memory accesses may result
+in sensitive information being leaked. Consider the following code, building on
+the prior example:
+
+ int load_dependent_arrays(int *arr1, int *arr2, int idx) {
+ int val1, val2,
+
+ val1 = load_array(arr1, idx);
+ val2 = load_array(arr2, val1);
+
+ return val2;
+ }
+
+Under speculation, the first call to load_array() may return the value of an
+out-of-bounds address, while the second call will influence microarchitectural
+state dependent on this value. This may provide an arbitrary read primitive.
+
+====================================
+Mitigating speculation side-channels
+====================================
+
+The kernel provides a generic API to ensure that bounds checks are respected
+even under speculation. Architectures which are affected by speculation-based
+side-channels are expected to implement these primitives.
+
+The following helpers found in <asm/barrier.h> can be used to prevent
+information from being leaked via side-channels.
+
+* nospec_ptr(ptr, lo, hi)
+
+ Returns a sanitized pointer that is bounded by the [lo, hi) interval. When
+ ptr < lo, or ptr >= hi, NULL is returned. Prevents an out-of-bounds pointer
+ being propagated to code which is speculatively executed.
+
+ This is expected to be used by code which computes pointers to data
+ structures, where part of the address (such as an array index) may be
+ user-controlled.
+
+ This can be used to protect the earlier load_array() example:
+
+ int load_array(int *array, unsigned int idx)
+ {
+ int *elem;
+
+ if ((elem = nospec_ptr(array + idx, array, array + MAX_ARRAY_ELEMS)))
+ return *elem;
+ else
+ return 0;
+ }
+
+ This can also be used in situations where multiple fields on a structure are
+ accessed:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ struct foo *elem;
+
+ if ((elem = nospec_ptr(array + idx, array, array + SIZE)) {
+ a = elem->field_a;
+ b = elem->field_b;
+ }
+ }
+
+ It is imperative that the returned pointer is used. Pointers which are
+ generated separately are subject to a number of potential CPU and compiler
+ optimizations, and may still be used speculatively. For example, this means
+ that the following sequence is unsafe:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ if (nospec_ptr(array + idx, array, array + SIZE) != NULL) {
+ // unsafe as wrong pointer is used
+ a = array[idx].field_a;
+ b = array[idx].field_b;
+ }
+ }
+
+ Similarly, it is unsafe to compare the returned pointer with other pointers,
+ as this may permit the compiler to substitute one pointer with another,
+ permitting speculation. For example, the following sequence is unsafe:
+
+ struct foo array[SIZE];
+ int a, b;
+
+ void do_thing(int idx)
+ {
+ struct foo *elem = nospec_ptr(array + idx, array, array + size);
+
+ // unsafe due to pointer substitution
+ if (elem == &array[idx]) {
+ a = elem->field_a;
+ b = elem->field_b;
+ }
+ }
+
+* nospec_array_ptr(arr, idx, sz)
+
+ Returns a sanitized pointer to arr[idx] only if idx falls in the [0, sz)
+ interval. When idx < 0 or idx > sz, NULL is returned. Prevents an
+ out-of-bounds pointer being propagated to code which is speculatively
+ executed.
+
+ This is a convenience function which wraps nospec_ptr(), and has the same
+ caveats w.r.t. the use of the returned pointer.
+
+ For example, this may be used as follows:
+
+ int load_array(int *array, unsigned int idx)
+ {
+ int *elem;
+
+ if ((elem = nospec_array_ptr(array, idx, MAX_ARRAY_ELEMS)))
+ return *elem;
+ else
+ return 0;
+ }
+
--
2.11.0
This patch implements nospec_ptr() for arm64, following the recommended
architectural sequence.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/arm64/include/asm/barrier.h | 55 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 77651c49ef44..b4819f6a0e5c 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -40,6 +40,61 @@
#define dma_rmb() dmb(oshld)
#define dma_wmb() dmb(oshst)
+#define __load_no_speculate_n(ptr, lo, hi, failval, cmpptr, w, sz) \
+({ \
+ typeof(*ptr) __nln_val; \
+ typeof(*ptr) __failval = \
+ (typeof(*ptr))(unsigned long)(failval); \
+ \
+ asm volatile ( \
+ " cmp %[c], %[l]\n" \
+ " ccmp %[c], %[h], 2, cs\n" \
+ " b.cs 1f\n" \
+ " ldr" #sz " %" #w "[v], %[p]\n" \
+ "1: csel %" #w "[v], %" #w "[v], %" #w "[f], cc\n" \
+ " hint #0x14 // CSDB\n" \
+ : [v] "=&r" (__nln_val) \
+ : [p] "m" (*(ptr)), [l] "r" (lo), [h] "r" (hi), \
+ [f] "rZ" (__failval), [c] "r" (cmpptr) \
+ : "cc"); \
+ \
+ __nln_val; \
+})
+
+#define __load_no_speculate(ptr, lo, hi, failval, cmpptr) \
+({ \
+ typeof(*(ptr)) __nl_val; \
+ \
+ switch (sizeof(__nl_val)) { \
+ case 1: \
+ __nl_val = __load_no_speculate_n(ptr, lo, hi, failval, \
+ cmpptr, w, b); \
+ break; \
+ case 2: \
+ __nl_val = __load_no_speculate_n(ptr, lo, hi, failval, \
+ cmpptr, w, h); \
+ break; \
+ case 4: \
+ __nl_val = __load_no_speculate_n(ptr, lo, hi, failval, \
+ cmpptr, w, ); \
+ break; \
+ case 8: \
+ __nl_val = __load_no_speculate_n(ptr, lo, hi, failval, \
+ cmpptr, x, ); \
+ break; \
+ default: \
+ BUILD_BUG(); \
+ } \
+ \
+ __nl_val; \
+})
+
+#define nospec_ptr(ptr, lo, hi) \
+({ \
+ typeof(ptr) __np_ptr = (ptr); \
+ __load_no_speculate(&__np_ptr, lo, hi, 0, __np_ptr); \
+})
+
#define __smp_mb() dmb(ish)
#define __smp_rmb() dmb(ishld)
#define __smp_wmb() dmb(ishst)
--
2.11.0
Note: this patch is an *example* use of the nospec API. It is understood
that this is incomplete, etc.
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
The EBPF map code has a number of such bounds-checks accesses in
map_lookup_elem implementations. This patch modifies these to use the
nospec helpers to inhibit such side channels.
The JITted lookup_elem implementations remain potentially vulnerable,
and are disabled (with JITted code falling back to the C
implementations).
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/bpf/arraymap.c | 20 +++++++++++++-------
kernel/bpf/cpumap.c | 5 ++---
kernel/bpf/devmap.c | 3 ++-
kernel/bpf/sockmap.c | 3 ++-
4 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7c25426d3cf5..deaad334a100 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -117,15 +117,20 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 index = *(u32 *)key;
+ void *ptr, *high;
if (unlikely(index >= array->map.max_entries))
return NULL;
- return array->value + array->elem_size * index;
+ ptr = array->value + array->elem_size * index;
+ high = array->value + array->elem_size * array->map.max_entries;
+
+ return nospec_ptr(ptr, array->value, high);
}
/* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
-static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
+static u32 __maybe_unused array_map_gen_lookup(struct bpf_map *map,
+ struct bpf_insn *insn_buf)
{
struct bpf_insn *insn = insn_buf;
u32 elem_size = round_up(map->value_size, 8);
@@ -153,11 +158,14 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 index = *(u32 *)key;
+ void __percpu *pptr;
if (unlikely(index >= array->map.max_entries))
return NULL;
- return this_cpu_ptr(array->pptrs[index]);
+ pptr = nospec_array_ptr(array->pptrs, index, array->map.max_entries);
+
+ return this_cpu_ptr(pptr);
}
int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
@@ -302,7 +310,6 @@ const struct bpf_map_ops array_map_ops = {
.map_lookup_elem = array_map_lookup_elem,
.map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem,
- .map_gen_lookup = array_map_gen_lookup,
};
const struct bpf_map_ops percpu_array_map_ops = {
@@ -610,8 +617,8 @@ static void *array_of_map_lookup_elem(struct bpf_map *map, void *key)
return READ_ONCE(*inner_map);
}
-static u32 array_of_map_gen_lookup(struct bpf_map *map,
- struct bpf_insn *insn_buf)
+static u32 __maybe_unused array_of_map_gen_lookup(struct bpf_map *map,
+ struct bpf_insn *insn_buf)
{
u32 elem_size = round_up(map->value_size, 8);
struct bpf_insn *insn = insn_buf;
@@ -644,5 +651,4 @@ const struct bpf_map_ops array_of_maps_map_ops = {
.map_fd_get_ptr = bpf_map_fd_get_ptr,
.map_fd_put_ptr = bpf_map_fd_put_ptr,
.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
- .map_gen_lookup = array_of_map_gen_lookup,
};
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index ce5b669003b2..6769a0e30c8c 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -551,13 +551,12 @@ void cpu_map_free(struct bpf_map *map)
struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
{
struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
- struct bpf_cpu_map_entry *rcpu;
if (key >= map->max_entries)
return NULL;
- rcpu = READ_ONCE(cmap->cpu_map[key]);
- return rcpu;
+ return READ_ONCE(*nospec_array_ptr(cmap->cpu_map, key,
+ map->max_entries));
}
static void *cpu_map_lookup_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index ebdef54bf7df..5a1050d270a0 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -254,7 +254,8 @@ struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
if (key >= map->max_entries)
return NULL;
- dev = READ_ONCE(dtab->netdev_map[key]);
+ dev = READ_ONCE(*nospec_array_ptr(dtab->netdev_map, key,
+ map->max_entries));
return dev ? dev->dev : NULL;
}
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 5ee2e41893d9..e912de3cd4ce 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -630,7 +630,8 @@ struct sock *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
if (key >= map->max_entries)
return NULL;
- return READ_ONCE(stab->sock_map[key]);
+ return READ_ONCE(*nospec_array_ptr(stab->sock_map, key,
+ map->max_entries));
}
static int sock_map_delete_elem(struct bpf_map *map, void *key)
--
2.11.0
On Fri, Jan 5, 2018 at 6:57 AM, Mark Rutland <[email protected]> wrote:
> Note: this patch is an *example* use of the nospec API. It is understood
> that this is incomplete, etc.
>
> Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> memory accesses under a bounds check may be speculated even if the
> bounds check fails, providing a primitive for building a side channel.
>
> The EBPF map code has a number of such bounds-checks accesses in
> map_lookup_elem implementations. This patch modifies these to use the
> nospec helpers to inhibit such side channels.
>
> The JITted lookup_elem implementations remain potentially vulnerable,
> and are disabled (with JITted code falling back to the C
> implementations).
Do we still need this given this patch from the bpf folks:
https://patchwork.ozlabs.org/patch/855911/
?
On Fri, Jan 05, 2018 at 08:38:43AM -0800, Dan Williams wrote:
> On Fri, Jan 5, 2018 at 6:57 AM, Mark Rutland <[email protected]> wrote:
> > Note: this patch is an *example* use of the nospec API. It is understood
> > that this is incomplete, etc.
> >
> > Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> > memory accesses under a bounds check may be speculated even if the
> > bounds check fails, providing a primitive for building a side channel.
> >
> > The EBPF map code has a number of such bounds-checks accesses in
> > map_lookup_elem implementations. This patch modifies these to use the
> > nospec helpers to inhibit such side channels.
> >
> > The JITted lookup_elem implementations remain potentially vulnerable,
> > and are disabled (with JITted code falling back to the C
> > implementations).
>
> Do we still need this given this patch from the bpf folks:
>
> https://patchwork.ozlabs.org/patch/855911/
Probably not; it was jsut easier to update this example than to write
new ones.
I've started on the set of cases Elena reported. Most cases fall out
quite nicely, though in places where there's a lot of pointer
arithmetic it's somewhat more painful. I'll try to use those in future,
unless someone beats me to implementing them. ;)
Thanks,
Mark.
On Fri, Jan 05, 2018 at 02:57:50PM +0000, Mark Rutland wrote:
> Note: this patch is an *example* use of the nospec API. It is understood
> that this is incomplete, etc.
>
> Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> memory accesses under a bounds check may be speculated even if the
> bounds check fails, providing a primitive for building a side channel.
>
> The EBPF map code has a number of such bounds-checks accesses in
> map_lookup_elem implementations. This patch modifies these to use the
> nospec helpers to inhibit such side channels.
>
> The JITted lookup_elem implementations remain potentially vulnerable,
> and are disabled (with JITted code falling back to the C
> implementations).
>
> Signed-off-by: Mark Rutland <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> kernel/bpf/arraymap.c | 20 +++++++++++++-------
> kernel/bpf/cpumap.c | 5 ++---
> kernel/bpf/devmap.c | 3 ++-
> kernel/bpf/sockmap.c | 3 ++-
> 4 files changed, 19 insertions(+), 12 deletions(-)
Mark, did you see my email with this patch yesterday ?
https://patchwork.ozlabs.org/patch/855911/
btw your patch does not fix the variant 1 exploit.
Also all of the pre-embargo patches from Elena that add lfence
in the bpf interpreter and x64 JIT also do not fix it.
The exploit works via bpf_tail_call and not via map_lookup.
I'm trying to make both safer with minimal run-time cost
with above patch.
Also as I tried to explain earlier the variant 1 is relying
on 64-bit speculative address math in bpf_tail_call
that was fixed into 32-bit math in October, so the risk is
close to zero already.
If both x64 and arm folks can test the above patch at least
we will be able to merge it and close one known hole in the tree.
In parallel we can work on adding nospec/osb primitives
and sprinkling them all over the kernel, but please do not use
bpf side as an 'example'. It's unnecessary.
Thanks
On 01/05/18 06:57, Mark Rutland wrote:
> Document the rationale and usage of the new nospec*() helpers.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> Documentation/speculation.txt | 166 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 166 insertions(+)
> create mode 100644 Documentation/speculation.txt
>
> diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
> new file mode 100644
> index 000000000000..748fcd4dcda4
> --- /dev/null
> +++ b/Documentation/speculation.txt
> @@ -0,0 +1,166 @@
> +
> +Typically speculative execution cannot be observed from architectural state,
> +such as the contents of registers. However, in some cases it is possible to
> +observe its impact on microarchitectural state, such as the presence or
> +absence of data in caches. Such state may form side-channels which can be
> +observed to extract secret information.
I'm curious about what it takes to observe this...
or is that covered in the exploit papers?
thanks,
--
~Randy
Hi Mark,
On Fri, Jan 5, 2018 at 3:57 PM, Mark Rutland <[email protected]> wrote:
> Document the rationale and usage of the new nospec*() helpers.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
I love your patch! Yet something to improve:
(borrowed from another Intel division)
> --- /dev/null
> +++ b/Documentation/speculation.txt
> @@ -0,0 +1,166 @@
> +This document explains potential effects of speculation, and how undesirable
> +effects can be mitigated portably using common APIs.
> +
> +===========
> +Speculation
> +===========
> +
> +To improve performance and minimize average latencies, many contemporary CPUs
> +employ speculative execution techniques such as branch prediction, performing
> +work which may be discarded at a later stage.
> +
> +Typically speculative execution cannot be observed from architectural state,
> +such as the contents of registers. However, in some cases it is possible to
> +observe its impact on microarchitectural state, such as the presence or
> +absence of data in caches. Such state may form side-channels which can be
> +observed to extract secret information.
> +
> +For example, in the presence of branch prediction, it is possible for bounds
> +checks to be ignored by code which is speculatively executed. Consider the
> +following code:
> +
> + int load_array(int *array, unsigned int idx) {
"{" on next line?
> + if (idx >= MAX_ARRAY_ELEMS)
> + return 0;
> + else
> + return array[idx];
> + }
> +
> +Which, on arm64, may be compiled to an assembly sequence such as:
> +
> + CMP <idx>, #MAX_ARRAY_ELEMS
> + B.LT less
> + MOV <returnval>, #0
> + RET
> + less:
> + LDR <returnval>, [<array>, <idx>]
> + RET
> +
> +It is possible that a CPU mis-predicts the conditional branch, and
> +speculatively loads array[idx], even if idx >= MAX_ARRAY_ELEMS. This value
> +will subsequently be discarded, but the speculated load may affect
> +microarchitectural state which can be subsequently measured.
> +
> +More complex sequences involving multiple dependent memory accesses may result
> +in sensitive information being leaked. Consider the following code, building on
> +the prior example:
> +
> + int load_dependent_arrays(int *arr1, int *arr2, int idx) {
"{" on next line
> + int val1, val2,
> +
> + val1 = load_array(arr1, idx);
> + val2 = load_array(arr2, val1);
> +
> + return val2;
> + }
> +
> +Under speculation, the first call to load_array() may return the value of an
> +out-of-bounds address, while the second call will influence microarchitectural
> +state dependent on this value. This may provide an arbitrary read primitive.
> +
> +====================================
> +Mitigating speculation side-channels
> +====================================
> +
> +The kernel provides a generic API to ensure that bounds checks are respected
> +even under speculation. Architectures which are affected by speculation-based
> +side-channels are expected to implement these primitives.
> +
> +The following helpers found in <asm/barrier.h> can be used to prevent
> +information from being leaked via side-channels.
> +
> +* nospec_ptr(ptr, lo, hi)
> +
> + Returns a sanitized pointer that is bounded by the [lo, hi) interval. When
> + ptr < lo, or ptr >= hi, NULL is returned. Prevents an out-of-bounds pointer
> + being propagated to code which is speculatively executed.
> +
> + This is expected to be used by code which computes pointers to data
> + structures, where part of the address (such as an array index) may be
> + user-controlled.
> +
> + This can be used to protect the earlier load_array() example:
> +
> + int load_array(int *array, unsigned int idx)
> + {
> + int *elem;
> +
> + if ((elem = nospec_ptr(array + idx, array, array + MAX_ARRAY_ELEMS)))
elem = nospec_ptr(array + idx, array, array + MAX_ARRAY_ELEMS);
if (elem)
> + return *elem;
> + else
> + return 0;
> + }
> +
> + This can also be used in situations where multiple fields on a structure are
> + accessed:
> +
> + struct foo array[SIZE];
> + int a, b;
> +
> + void do_thing(int idx)
> + {
> + struct foo *elem;
> +
> + if ((elem = nospec_ptr(array + idx, array, array + SIZE)) {
elem = nospec_ptr(array + idx, array, array + SIZE;
if (elem) {
> + a = elem->field_a;
> + b = elem->field_b;
> + }
> + }
> +
> + It is imperative that the returned pointer is used. Pointers which are
> + generated separately are subject to a number of potential CPU and compiler
> + optimizations, and may still be used speculatively. For example, this means
> + that the following sequence is unsafe:
> +
> + struct foo array[SIZE];
> + int a, b;
> +
> + void do_thing(int idx)
> + {
> + if (nospec_ptr(array + idx, array, array + SIZE) != NULL) {
> + // unsafe as wrong pointer is used
> + a = array[idx].field_a;
> + b = array[idx].field_b;
> + }
> + }
> +
> + Similarly, it is unsafe to compare the returned pointer with other pointers,
> + as this may permit the compiler to substitute one pointer with another,
> + permitting speculation. For example, the following sequence is unsafe:
> +
> + struct foo array[SIZE];
> + int a, b;
> +
> + void do_thing(int idx)
> + {
> + struct foo *elem = nospec_ptr(array + idx, array, array + size);
> +
> + // unsafe due to pointer substitution
> + if (elem == &array[idx]) {
> + a = elem->field_a;
> + b = elem->field_b;
> + }
> + }
> +
> +* nospec_array_ptr(arr, idx, sz)
> +
> + Returns a sanitized pointer to arr[idx] only if idx falls in the [0, sz)
> + interval. When idx < 0 or idx > sz, NULL is returned. Prevents an
> + out-of-bounds pointer being propagated to code which is speculatively
> + executed.
> +
> + This is a convenience function which wraps nospec_ptr(), and has the same
> + caveats w.r.t. the use of the returned pointer.
> +
> + For example, this may be used as follows:
> +
> + int load_array(int *array, unsigned int idx)
> + {
> + int *elem;
> +
> + if ((elem = nospec_array_ptr(array, idx, MAX_ARRAY_ELEMS)))
elem = nospec_array_ptr(array, idx, MAX_ARRAY_ELEMS);
if (elem)
> + return *elem;
> + else
> + return 0;
> + }
> +
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Sat, Jan 06, 2018 at 09:20:59PM -0800, Randy Dunlap wrote:
> On 01/05/18 06:57, Mark Rutland wrote:
> > Document the rationale and usage of the new nospec*() helpers.
> >
> > Signed-off-by: Mark Rutland <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Jonathan Corbet <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > ---
> > Documentation/speculation.txt | 166 ++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 166 insertions(+)
> > create mode 100644 Documentation/speculation.txt
> >
> > diff --git a/Documentation/speculation.txt b/Documentation/speculation.txt
> > new file mode 100644
> > index 000000000000..748fcd4dcda4
> > --- /dev/null
> > +++ b/Documentation/speculation.txt
> > @@ -0,0 +1,166 @@
> > +
> > +Typically speculative execution cannot be observed from architectural state,
> > +such as the contents of registers. However, in some cases it is possible to
> > +observe its impact on microarchitectural state, such as the presence or
> > +absence of data in caches. Such state may form side-channels which can be
> > +observed to extract secret information.
>
> I'm curious about what it takes to observe this...
>
> or is that covered in the exploit papers?
That's covered elsewhere, e.g.
https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
I'll add some references.
Thanks,
Mark.
> On Fri, Jan 05, 2018 at 02:57:50PM +0000, Mark Rutland wrote:
> > Note: this patch is an *example* use of the nospec API. It is understood
> > that this is incomplete, etc.
> >
> > Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> > memory accesses under a bounds check may be speculated even if the
> > bounds check fails, providing a primitive for building a side channel.
> >
> > The EBPF map code has a number of such bounds-checks accesses in
> > map_lookup_elem implementations. This patch modifies these to use the
> > nospec helpers to inhibit such side channels.
> >
> > The JITted lookup_elem implementations remain potentially vulnerable,
> > and are disabled (with JITted code falling back to the C
> > implementations).
> >
> > Signed-off-by: Mark Rutland <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > ---
> > kernel/bpf/arraymap.c | 20 +++++++++++++-------
> > kernel/bpf/cpumap.c | 5 ++---
> > kernel/bpf/devmap.c | 3 ++-
> > kernel/bpf/sockmap.c | 3 ++-
> > 4 files changed, 19 insertions(+), 12 deletions(-)
>
> Mark, did you see my email with this patch yesterday ?
> https://patchwork.ozlabs.org/patch/855911/
>
> btw your patch does not fix the variant 1 exploit.
>
> Also all of the pre-embargo patches from Elena that add lfence
> in the bpf interpreter and x64 JIT also do not fix it.
>
> The exploit works via bpf_tail_call and not via map_lookup.
Could you please clarify this part? The actual jump
to the out-of-bounds index is indeed made by bpf_tail_call,
but the "speculation" bypassing step happens when it does map_lookup_elem
on the out-of-bound index.
Best Regards,
Elena.
On Fri, 2018-01-05 at 14:57 +0000, Mark Rutland wrote:
> Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> memory accesses under a bounds check may be speculated even if the
> bounds check fails, providing a primitive for building a side channel.
>
> This patch adds helpers which can be used to inhibit the use of
> out-of-bounds pointers under speculation.
>
> A generic implementation is provided for compatibility, but does not
> guarantee safety under speculation. Architectures are expected to
> override these helpers as necessary.
>
> Signed-off-by: Mark Rutland <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> Cc: Daniel Willams <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> ---
> include/asm-generic/barrier.h | 68 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
> Dan, I've reworked this so that nospec_ptr() can take an arch-specific barrier
> sequence. I believe that for x86 you just need to implement __nospec_barrier()
> as osb().
>
> Mark.
>
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index fe297b599b0a..91c3071f49e5 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -54,6 +54,74 @@
> #define read_barrier_depends() do { } while (0)
> #endif
>
> +/*
> + * Inhibit subsequent speculative memory accesses.
> + *
> + * Architectures with a suitable memory barrier should provide an
> + * implementation. This is non-portable, and generic code should use
> + * nospec_ptr().
> + */
> +#ifndef __nospec_barrier
> +#define __nospec_barrier() do { } while (0)
> +#endif
> +
> +/**
> + * nospec_ptr() - Ensure a pointer is bounded, even under speculation.
> + *
> + * @ptr: the pointer to test
> + * @lo: the lower valid bound for @ptr, inclusive
> + * @hi: the upper valid bound for @ptr, exclusive
> + *
> + * If @ptr falls in the interval [@lo, @i), returns @ptr, otherwise returns
> + * NULL.
> + *
> + * Architectures which do not provide __nospec_barrier() should override this
> + * to ensure that ptr falls in the [lo, hi) interval both under architectural
> + * execution and under speculation, preventing propagation of an out-of-bounds
> + * pointer to code which is speculatively executed.
> + */
> +#ifndef nospec_ptr
> +#define nospec_ptr(ptr, lo, hi) \
> +({ \
> + typeof (ptr) __ret; \
> + typeof (ptr) __ptr = (ptr); \
> + typeof (ptr) __lo = (lo); \
> + typeof (ptr) __hi = (hi); \
> + \
> + __ret = (__lo <= __ptr && __ptr < __hi) ? __ptr : NULL; \
> + \
> + __nospec_barrier(); \
> + \
> + __ret; \
> +})
> +#endif
> +
> +/**
> + * nospec_array_ptr - Generate a pointer to an array element, ensuring the
> + * pointer is bounded under speculation.
> + *
> + * @arr: the base of the array
> + * @idx: the index of the element
> + * @sz: the number of elements in the array
> + *
> + * If @idx falls in the interval [0, @sz), returns the pointer to @arr[@idx],
> + * otherwise returns NULL.
> + *
> + * This is a wrapper around nospec_ptr(), provided for convenience.
> + * Architectures should implement nospec_ptr() to ensure this is the case
> + * under speculation.
> + */
> +#define nospec_array_ptr(arr, idx, sz) \
> +({ \
> + typeof(*(arr)) *__arr = (arr); \
> + typeof(idx) __idx = (idx); \
> + typeof(sz) __sz = (sz); \
> + \
> + nospec_ptr(__arr + __idx, __arr, __arr + __sz); \
> +})
> +
> +#undef __nospec_barrier
I see:
kernel/bpf/arraymap.c: In function 'array_map_lookup_elem':
kernel/bpf/arraymap.c:124:2: error: implicit declaration of function '__nospec_barrier' [-Werror=implicit-function-declaration]
return nospec_ptr(ptr, array->value, high);
^
cc1: all warnings being treated as errors
when building using a 4.8ish gcc. Removing the above #undef avoids that
error. I don't get the build error on fedora with gcc7
> +
> #ifndef __smp_mb
> #define __smp_mb() mb()
> #endif