This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.
Signed-off-by: Qiaowei Ren <[email protected]>
---
Documentation/x86/intel_mpx.txt | 76 +++++++++++++++++++++++++++++++++++++++
1 files changed, 76 insertions(+), 0 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt
diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 0000000..778d06e
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,76 @@
+Intel(R) MPX Overview:
+=====================
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
+capability introduced into Intel Architecture. Intel MPX can
+increase the robustness of software when it is used in conjunction
+with compiler changes to check that memory references intended
+at compile time do not become unsafe at runtime.
+
+Two of the most important goals of Intel MPX are to provide
+this capability at very low performance overhead for newly
+compiled code, and to provide compatibility mechanisms with
+legacy software components. A direct benefit Intel MPX provides
+is hardening software against malicious attacks designed to
+cause or exploit buffer overruns.
+
+For details about the Intel MPX instructions, see "Intel(R)
+Architecture Instruction Set Extensions Programming Reference".
+
+Intel(R) MPX Programming Model
+------------------------------
+
+Intel MPX introduces new registers and new instructions that
+operate on these registers. Some of the registers added are
+bounds registers which store a pointer's lower bound and upper
+bound limits. Whenever the pointer is used, the requested
+reference is checked against the pointer's associated bounds,
+thereby preventing out-of-bound memory access (such as buffer
+overflows and overruns). Out-of-bounds memory references
+initiate a #BR exception which can then be handled in an
+appropriate manner.
+
+Loading and Storing Bounds using Translation
+--------------------------------------------
+
+Intel MPX defines two instructions for load/store of the linear
+address of a pointer to a buffer, along with the bounds of the
+buffer into a paging structure of extended bounds. Specifically
+when storing extended bounds, the processor will perform address
+translation of the address where the pointer is stored to an
+address in the Bound Table (BT) to determine the store location
+of extended bounds. Loading of an extended bounds performs the
+reverse sequence.
+
+The structure in memory to load/store an extended bound is a
+4-tuple consisting of lower bound, upper bound, pointer value
+and a reserved field. Bound loads and stores access 32-bit or
+64-bit operand size according to the operation mode. Thus,
+a bound table entry is 4*32 bits in 32-bit mode and 4*64 bits
+in 64-bit mode.
+
+The linear address of a bound table is stored in a Bound
+Directory (BD) entry. And the linear address of the bound
+directory is derived from either BNDCFGU or BNDCFGS registers.
+Bounds in memory are stored in Bound Tables (BT) as an extended
+bound, which are accessed via Bound Directory (BD) and address
+translation performed by BNDLDX/BNDSTX instructions.
+
+Bounds Directory (BD) and Bounds Tables (BT) are stored in
+application memory and are allocated by the application (in case
+of kernel use, the structures will be in kernel memory). The
+bound directory and each instance of bound table are in contiguous
+linear memory.
+
+XSAVE/XRESTOR Support of Intel MPX State
+----------------------------------------
+
+Enabling Intel MPX requires an OS to manage two bits in XCR0:
+ - BNDREGS for saving and restoring registers BND0-BND3,
+ - BNDCSR for saving and restoring the user-mode configuration
+(BNDCFGU) and the status register (BNDSTATUS).
+
+The reason for having two separate bits is that BND0-BND3 is
+likely to be volatile state, while BNDCFGU and BNDSTATUS are not.
+Therefore, an OS has flexibility in handling these two states
+differently in saving or restoring them.
--
1.7.1
An access to an invalid bound directory entry will cause a #BR
exception. This patch hook #BR exception handler to allocate
one bound table and bind it with that buond directory entry.
This will avoid the need of forwarding the #BR exception
to the user space when bound directory has invalid entry.
Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 35 +++++++++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 44 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 125 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/kernel/mpx.c
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
new file mode 100644
index 0000000..d074153
--- /dev/null
+++ b/arch/x86/include/asm/mpx.h
@@ -0,0 +1,35 @@
+#ifndef _ASM_X86_MPX_H
+#define _ASM_X86_MPX_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_64
+
+#define MPX_L1_BITS 28
+#define MPX_L1_SHIFT 3
+#define MPX_L2_BITS 17
+#define MPX_L2_SHIFT 5
+#define MPX_IGN_BITS 3
+#define MPX_L2_NODE_ADDR_MASK 0xfffffffffffffff8UL
+
+#define MPX_BNDSTA_ADDR_MASK 0xfffffffffffffffcUL
+#define MPX_BNDCFG_ADDR_MASK 0xfffffffffffff000UL
+
+#else
+
+#define MPX_L1_BITS 20
+#define MPX_L1_SHIFT 2
+#define MPX_L2_BITS 10
+#define MPX_L2_SHIFT 4
+#define MPX_IGN_BITS 2
+#define MPX_L2_NODE_ADDR_MASK 0xfffffffcUL
+
+#define MPX_BNDSTA_ADDR_MASK 0xfffffffcUL
+#define MPX_BNDCFG_ADDR_MASK 0xfffff000UL
+
+#endif
+
+void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+
+#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a5408b9..bba7a71 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -38,6 +38,7 @@ obj-y += resource.o
obj-y += process.o
obj-y += i387.o xsave.o
+obj-y += mpx.o
obj-y += ptrace.o
obj-$(CONFIG_X86_32) += tls.o
obj-$(CONFIG_IA32_EMULATION) += tls.o
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
new file mode 100644
index 0000000..767b3bf
--- /dev/null
+++ b/arch/x86/kernel/mpx.c
@@ -0,0 +1,44 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/processor.h>
+#include <asm/mpx.h>
+#include <asm/mman.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
+#include <asm/alternative.h>
+
+static bool allocate_bt(unsigned long bd_entry)
+{
+ unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
+ unsigned long bt_addr, old_val;
+
+ bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
+ if (bt_addr == -1) {
+ pr_err("L2 Node Allocation Failed at L1 addr %lx\n",
+ bd_entry);
+ return false;
+ }
+ bt_addr = (bt_addr & MPX_L2_NODE_ADDR_MASK) | 0x01;
+
+ user_atomic_cmpxchg_inatomic(&old_val,
+ (long __user *)bd_entry, 0, bt_addr);
+ if (old_val)
+ vm_munmap(bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
+
+ return true;
+}
+
+void do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+ unsigned long status;
+ unsigned long bd_entry, bd_base;
+ unsigned long bd_size = 1UL << (MPX_L1_BITS+MPX_L1_SHIFT);
+
+ bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+ status = xsave_buf->bndcsr.status_reg;
+
+ bd_entry = status & MPX_BNDSTA_ADDR_MASK;
+ if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
+ allocate_bt(bd_entry);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8c8093b..eb04039 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
#include <asm/alternative.h>
+#include <asm/mpx.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -214,7 +215,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV,
regs->ip)
DO_ERROR(X86_TRAP_OF, SIGSEGV, "overflow", overflow)
-DO_ERROR(X86_TRAP_BR, SIGSEGV, "bounds", bounds)
DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN,
regs->ip)
DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun",
@@ -267,6 +267,50 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
}
#endif
+dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
+{
+ enum ctx_state prev_state;
+ unsigned long status;
+ struct xsave_struct *xsave_buf;
+ struct task_struct *tsk = current;
+
+ prev_state = exception_enter();
+ if (notify_die(DIE_TRAP, "bounds", regs, error_code,
+ X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
+ goto exit;
+ conditional_sti(regs);
+
+ if (!boot_cpu_has(X86_FEATURE_MPX)) {
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ goto exit;
+ }
+
+ fpu_xsave(&tsk->thread.fpu);
+ xsave_buf = &(tsk->thread.fpu.state->xsave);
+ status = xsave_buf->bndcsr.status_reg;
+
+ switch (status & 0x3) {
+ case 2:
+ /*
+ * Bound directory has invalid entry.
+ * No signal will be sent to the user space.
+ */
+ do_mpx_bt_fault(xsave_buf);
+ break;
+
+ case 1: /* Bound violation. */
+ case 0: /* No MPX exception. */
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ break;
+
+ default:
+ break;
+ }
+
+exit:
+ exception_exit(prev_state);
+}
+
dotraplinkage void __kprobes
do_general_protection(struct pt_regs *regs, long error_code)
{
--
1.7.1
This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.
These fields will be set in #BR exception handler by decoding
the user instruction and constructing the faulting pointer.
A userspace application can get violation address, lower bound
and upper bound for bound violation from this new siginfo structure.
Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 39 +++++
arch/x86/kernel/mpx.c | 289 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 6 +
include/uapi/asm-generic/siginfo.h | 9 +-
kernel/signal.c | 4 +
5 files changed, 346 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 9652e9e..8c1c914 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -30,6 +30,43 @@
#endif
+struct mpx_insn_field {
+ union {
+ signed int value;
+ unsigned char bytes[4];
+ };
+ unsigned char nbytes;
+};
+
+struct mpx_insn {
+ struct mpx_insn_field rex_prefix; /* REX prefix */
+ struct mpx_insn_field modrm;
+ struct mpx_insn_field sib;
+ struct mpx_insn_field displacement;
+
+ unsigned char addr_bytes; /* effective address size */
+ unsigned char limit;
+ unsigned char x86_64;
+
+ const unsigned char *kaddr; /* kernel address of insn to analyze */
+ const unsigned char *next_byte;
+};
+
+#define MAX_MPX_INSN_SIZE 15
+
+#define X86_MODRM_MOD(modrm) (((modrm) & 0xc0) >> 6)
+#define X86_MODRM_REG(modrm) (((modrm) & 0x38) >> 3)
+#define X86_MODRM_RM(modrm) ((modrm) & 0x07)
+
+#define X86_SIB_SCALE(sib) (((sib) & 0xc0) >> 6)
+#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
+#define X86_SIB_BASE(sib) ((sib) & 0x07)
+
+#define X86_REX_W(rex) ((rex) & 8)
+#define X86_REX_R(rex) ((rex) & 4)
+#define X86_REX_X(rex) ((rex) & 2)
+#define X86_REX_B(rex) ((rex) & 1)
+
typedef union {
struct {
unsigned long ignored:MPX_IGN_BITS;
@@ -40,5 +77,7 @@ typedef union {
} mpx_addr;
void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf);
#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index ffe5aee..3770991 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -91,6 +91,269 @@ int mpx_release(struct task_struct *tsk)
return 0;
}
+typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t;
+static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
+ reg_type_t type)
+{
+ int regno = 0;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ static const int regoff[] = {
+ offsetof(struct pt_regs, ax),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, sp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+#endif
+ };
+
+ switch (type) {
+ case REG_TYPE_RM:
+ regno = X86_MODRM_RM(modrm);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_INDEX:
+ regno = X86_SIB_INDEX(sib);
+ if (X86_REX_X(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_BASE:
+ regno = X86_SIB_BASE(sib);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ default:
+ break;
+ }
+
+ return regs_get_register(regs, regoff[regno]);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned long addr;
+ unsigned long base;
+ unsigned long indx;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ if (X86_MODRM_MOD(modrm) == 3) {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ } else {
+ if (insn->sib.nbytes) {
+ base = get_reg(insn, regs, REG_TYPE_BASE);
+ indx = get_reg(insn, regs, REG_TYPE_INDEX);
+ addr = base + indx * (1 << X86_SIB_SCALE(sib));
+ } else {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ }
+ addr += insn->displacement.value;
+ }
+
+ return addr;
+}
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n) \
+ ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit)
+
+#define __get_next(t, insn) \
+({ \
+ t r = *(t*)insn->next_byte; \
+ insn->next_byte += sizeof(t); \
+ r; \
+})
+
+#define __peek_next(t, insn) \
+({ \
+ t r = *(t*)insn->next_byte; \
+ r; \
+})
+
+#define get_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __get_next(t, insn); \
+})
+
+#define peek_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __peek_next(t, insn); \
+})
+
+static void mpx_insn_get_prefixes(struct mpx_insn *insn)
+{
+ unsigned char b;
+
+ /* Decode legacy prefix and REX prefix */
+ b = peek_next(unsigned char, insn);
+ while (b != 0x0f) {
+ /*
+ * look for a rex prefix
+ * a REX prefix cannot be followed by a legacy prefix.
+ */
+ if (insn->x86_64 && (b&0xf0)==0x40) {
+ insn->rex_prefix.value = b;
+ insn->rex_prefix.nbytes = 1;
+ insn->next_byte++;
+ break;
+ }
+
+ /* check the other legacy prefixes */
+ switch (b) {
+ case 0xf2:
+ case 0xf3:
+ case 0xf0:
+ case 0x64:
+ case 0x65:
+ case 0x2e:
+ case 0x3e:
+ case 0x26:
+ case 0x36:
+ case 0x66:
+ case 0x67:
+ insn->next_byte++;
+ break;
+ default: /* everything else is garbage */
+ goto err_out;
+ }
+ b = peek_next(unsigned char, insn);
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_modrm(struct mpx_insn *insn)
+{
+ insn->modrm.value = get_next(unsigned char, insn);
+ insn->modrm.nbytes = 1;
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_sib(struct mpx_insn *insn)
+{
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+
+ if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+ insn->sib.value = get_next(unsigned char, insn);
+ insn->sib.nbytes = 1;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_displacement(struct mpx_insn *insn)
+{
+ unsigned char mod, rm, base;
+
+ /*
+ * Interpreting the modrm byte:
+ * mod = 00 - no displacement fields (exceptions below)
+ * mod = 01 - 1-byte displacement field
+ * mod = 10 - displacement field is 4 bytes
+ * mod = 11 - no memory operand
+ *
+ * mod != 11, r/m = 100 - SIB byte exists
+ * mod = 00, SIB base = 101 - displacement field is 4 bytes
+ * mod = 00, r/m = 101 - rip-relative addressing, displacement
+ * field is 4 bytes
+ */
+ mod = X86_MODRM_MOD(insn->modrm.value);
+ rm = X86_MODRM_RM(insn->modrm.value);
+ base = X86_SIB_BASE(insn->sib.value);
+ if (mod == 3)
+ return;
+ if (mod == 1) {
+ insn->displacement.value = get_next(unsigned char, insn);
+ insn->displacement.nbytes = 1;
+ } else if ((mod == 0 && rm == 5) || mod == 2 ||
+ (mod == 0 && base == 5)) {
+ insn->displacement.value = get_next(int, insn);
+ insn->displacement.nbytes = 4;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned char buf[MAX_MPX_INSN_SIZE];
+ int bytes;
+
+ memset(insn, 0, sizeof(*insn));
+
+ bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE);
+ insn->limit = MAX_MPX_INSN_SIZE - bytes;
+ insn->kaddr = buf;
+ insn->next_byte = buf;
+
+ /*
+ * In 64-bit Mode, all Intel MPX instructions use 64-bit
+ * operands for bounds and 64 bit addressing, i.e. REX.W &
+ * 67H have no effect on data or address size.
+ *
+ * In compatibility and legacy modes (including 16-bit code
+ * segments, real and virtual 8086 modes) all Intel MPX
+ * instructions use 32-bit operands for bounds and 32 bit
+ * addressing.
+ */
+#ifdef CONFIG_X86_64
+ insn->x86_64 = 1;
+ insn->addr_bytes = 8;
+#else
+ insn->x86_64 = 0;
+ insn->addr_bytes = 4;
+#endif
+}
+
+static unsigned long mpx_insn_decode(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ mpx_insn_init(insn, regs);
+
+ /*
+ * In this case, we only need decode bndcl/bndcn/bndcu,
+ * so we can use private diassembly interfaces to get
+ * prefixes, modrm, sib, displacement, etc..
+ */
+ mpx_insn_get_prefixes(insn);
+ insn->next_byte += 2; /* ignore opcode */
+ mpx_insn_get_modrm(insn);
+ mpx_insn_get_sib(insn);
+ mpx_insn_get_displacement(insn);
+
+ return get_addr_ref(insn, regs);
+}
+
static bool allocate_bt(unsigned long bd_entry)
{
unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
@@ -126,3 +389,29 @@ void do_mpx_bt_fault(struct xsave_struct *xsave_buf)
if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
allocate_bt(bd_entry);
}
+
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf)
+{
+ struct mpx_insn insn;
+ uint8_t bndregno;
+ unsigned long addr_vio;
+
+ addr_vio = mpx_insn_decode(&insn, regs);
+
+ bndregno = X86_MODRM_REG(insn.modrm.value);
+ if (bndregno > 3)
+ return;
+
+ info->si_lower =
+ (void __user *)(xsave_buf->bndregs.bndregs[2*bndregno]);
+ info->si_upper =
+ (void __user *)(~xsave_buf->bndregs.bndregs[2*bndregno+1]);
+ info->si_addr_lsb = 0;
+ info->si_signo = SIGSEGV;
+ info->si_errno = 0;
+ info->si_code = SEGV_BNDERR;
+ info->si_addr = (void __user *)addr_vio;
+ pr_debug("bound violation addr 0x%lx, lower 0x%lx, upper 0x%lx\n",
+ addr_vio, info->si_lower, info->si_upper);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index eb04039..2069bf4 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -273,6 +273,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
unsigned long status;
struct xsave_struct *xsave_buf;
struct task_struct *tsk = current;
+ siginfo_t info;
prev_state = exception_enter();
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
@@ -299,6 +300,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
break;
case 1: /* Bound violation. */
+ do_mpx_bounds(regs, &info, xsave_buf);
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs,
+ error_code, &info);
+ break;
+
case 0: /* No MPX exception. */
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
break;
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index ba5be7f..1e35520 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,10 @@ typedef struct siginfo {
int _trapno; /* TRAP # which caused the signal */
#endif
short _addr_lsb; /* LSB of the reported address */
+ struct {
+ void __user *_lower;
+ void __user *_upper;
+ } _addr_bnd;
} _sigfault;
/* SIGPOLL */
@@ -131,6 +135,8 @@ typedef struct siginfo {
#define si_trapno _sifields._sigfault._trapno
#endif
#define si_addr_lsb _sifields._sigfault._addr_lsb
+#define si_lower _sifields._sigfault._addr_bnd._lower
+#define si_upper _sifields._sigfault._addr_bnd._upper
#define si_band _sifields._sigpoll._band
#define si_fd _sifields._sigpoll._fd
#ifdef __ARCH_SIGSYS
@@ -199,7 +205,8 @@ typedef struct siginfo {
*/
#define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */
#define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped object */
-#define NSIGSEGV 2
+#define SEGV_BNDERR (__SI_FAULT|3) /* failed address bound checks */
+#define NSIGSEGV 3
/*
* SIGBUS si_codes
diff --git a/kernel/signal.c b/kernel/signal.c
index ded28b9..3cd6eaf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2771,6 +2771,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
#endif
+#ifdef SEGV_BNDERR
+ err |= __put_user(from->si_lower, &to->si_lower);
+ err |= __put_user(from->si_upper, &to->si_upper);
+#endif
break;
case __SI_CHLD:
err |= __put_user(from->si_pid, &to->si_pid);
--
1.7.1
This patch adds all the MPX instructions to x86 opcode map, and then
the x86 instruction decoder can decode MPX instructions used in kernel.
Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 533a85e..1a2be7c 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -346,8 +346,8 @@ AVXcode: 1
17: vmovhps Mq,Vq (v1) | vmovhpd Mq,Vq (66),(v1)
18: Grp16 (1A)
19:
-1a:
-1b:
+1a: BNDCL Ev,Gv | BNDCU Ev,Gv | BNDMOV Gv,Ev | BNDLDX Gv,Ev,Gv
+1b: BNDCN Ev,Gv | BNDMOV Ev,Gv | BNDMK Gv,Ev | BNDSTX Ev,GV,Gv
1c:
1d:
1e:
--
1.7.1
This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl()
commands on the x86 platform. These commands can be used to
init and release MPX related resource.
A MMU notifier will be registered during PR_MPX_INIT
command execution. So the bound tables can be automatically
deallocated when one memory area is unmapped.
Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/Kconfig | 4 ++
arch/x86/include/asm/mpx.h | 9 ++++
arch/x86/include/asm/processor.h | 16 +++++++
arch/x86/kernel/mpx.c | 84 ++++++++++++++++++++++++++++++++++++++
include/uapi/linux/prctl.h | 6 +++
kernel/sys.c | 12 +++++
6 files changed, 131 insertions(+), 0 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ee2fb9d..695101a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -233,6 +233,10 @@ config HAVE_INTEL_TXT
def_bool y
depends on INTEL_IOMMU && ACPI
+config HAVE_INTEL_MPX
+ def_bool y
+ select MMU_NOTIFIER
+
config X86_32_SMP
def_bool y
depends on X86_32 && SMP
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index d074153..9652e9e 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -30,6 +30,15 @@
#endif
+typedef union {
+ struct {
+ unsigned long ignored:MPX_IGN_BITS;
+ unsigned long l2index:MPX_L2_BITS;
+ unsigned long l1index:MPX_L1_BITS;
+ };
+ unsigned long addr;
+} mpx_addr;
+
void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 43be6f6..ea4e72d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -963,6 +963,22 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
extern int get_tsc_mode(unsigned long adr);
extern int set_tsc_mode(unsigned int val);
+#ifdef CONFIG_HAVE_INTEL_MPX
+
+/* Init/release a process' MPX related resource */
+#define MPX_INIT(tsk) mpx_init((tsk))
+#define MPX_RELEASE(tsk) mpx_release((tsk))
+
+extern int mpx_init(struct task_struct *tsk);
+extern int mpx_release(struct task_struct *tsk);
+
+#else /* CONFIG_HAVE_INTEL_MPX */
+
+#define MPX_INIT(tsk) (-EINVAL)
+#define MPX_RELEASE(tsk) (-EINVAL)
+
+#endif /* CONFIG_HAVE_INTEL_MPX */
+
extern u16 amd_get_nb_id(int cpu);
static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index 767b3bf..ffe5aee 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -1,5 +1,7 @@
#include <linux/kernel.h>
#include <linux/syscalls.h>
+#include <linux/prctl.h>
+#include <linux/mmu_notifier.h>
#include <asm/processor.h>
#include <asm/mpx.h>
#include <asm/mman.h>
@@ -7,6 +9,88 @@
#include <asm/fpu-internal.h>
#include <asm/alternative.h>
+static struct mmu_notifier mpx_mn;
+
+static void mpx_invl_range_end(struct mmu_notifier *mn,
+ struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+ struct xsave_struct *xsave_buf;
+ unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
+ unsigned long bt_addr;
+ unsigned long bd_base;
+ unsigned long bd_entry, bde_start, bde_end;
+ mpx_addr lap;
+
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ /* ignore swap notifications */
+ pgd = pgd_offset(mm, start);
+ pud = pud_offset(pgd, start);
+ pmd = pmd_offset(pud, start);
+ pte = pte_offset_kernel(pmd, start);
+ if (!pte_present(*pte) && !pte_none(*pte) && !pte_file(*pte))
+ return;
+
+ /* get bound directory base address */
+ fpu_xsave(¤t->thread.fpu);
+ xsave_buf = &(current->thread.fpu.state->xsave);
+ bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+
+ /* get related bde range */
+ lap.addr = start;
+ bde_start = bd_base + (lap.l1index << MPX_L1_SHIFT);
+
+ lap.addr = end;
+ if (lap.ignored || lap.l2index)
+ bde_end = bd_base + (lap.l1index<<MPX_L1_SHIFT) + sizeof(long);
+ else
+ bde_end = bd_base + (lap.l1index<<MPX_L1_SHIFT);
+
+ for (bd_entry = bde_start; bd_entry < bde_end;
+ bd_entry += sizeof(unsigned long)) {
+
+ if (get_user(bt_addr, (long __user *)bd_entry))
+ return;
+
+ if (bt_addr & 0x1) {
+ user_atomic_cmpxchg_inatomic(&bt_addr,
+ (long __user *)bd_entry, bt_addr, 0);
+ do_munmap(mm, bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
+ }
+ }
+}
+
+static struct mmu_notifier_ops mpx_mmuops = {
+ .invalidate_range_end = mpx_invl_range_end,
+};
+
+int mpx_init(struct task_struct *tsk)
+{
+ if (!boot_cpu_has(X86_FEATURE_MPX))
+ return -EINVAL;
+
+ /* register mmu_notifier to delallocate the bound tables */
+ mpx_mn.ops = &mpx_mmuops;
+ mmu_notifier_register(&mpx_mn, current->mm);
+
+ return 0;
+}
+
+int mpx_release(struct task_struct *tsk)
+{
+ if (!boot_cpu_has(X86_FEATURE_MPX))
+ return -EINVAL;
+
+ /* unregister mmu_notifier */
+ mmu_notifier_unregister(&mpx_mn, current->mm);
+
+ return 0;
+}
+
static bool allocate_bt(unsigned long bd_entry)
{
unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 289760f..19ab881 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -149,4 +149,10 @@
#define PR_GET_TID_ADDRESS 40
+/*
+ * Init/release MPX related resource.
+ */
+#define PR_MPX_INIT 41
+#define PR_MPX_RELEASE 42
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index c18ecca..bbaf573 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -92,6 +92,12 @@
#ifndef SET_TSC_CTL
# define SET_TSC_CTL(a) (-EINVAL)
#endif
+#ifndef MPX_INIT
+# define MPX_INIT(a) (-EINVAL)
+#endif
+#ifndef MPX_RELEASE
+# define MPX_RELEASE(a) (-EINVAL)
+#endif
/*
* this is where the system-wide overflow UID and GID are defined, for
@@ -1999,6 +2005,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
if (arg2 || arg3 || arg4 || arg5)
return -EINVAL;
return current->no_new_privs ? 1 : 0;
+ case PR_MPX_INIT:
+ error = MPX_INIT(me);
+ break;
+ case PR_MPX_RELEASE:
+ error = MPX_RELEASE(me);
+ break;
default:
error = -EINVAL;
break;
--
1.7.1
On Sun, Jan 12, 2014 at 05:20:00PM +0800, Qiaowei Ren wrote:
> An access to an invalid bound directory entry will cause a #BR
> exception. This patch hook #BR exception handler to allocate
> one bound table and bind it with that buond directory entry.
>
> This will avoid the need of forwarding the #BR exception
> to the user space when bound directory has invalid entry.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/include/asm/mpx.h | 35 +++++++++++++++++++++++++++++++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/mpx.c | 44 ++++++++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/traps.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> 4 files changed, 125 insertions(+), 1 deletions(-)
> create mode 100644 arch/x86/include/asm/mpx.h
> create mode 100644 arch/x86/kernel/mpx.c
...
> diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
> new file mode 100644
> index 0000000..767b3bf
> --- /dev/null
> +++ b/arch/x86/kernel/mpx.c
> @@ -0,0 +1,44 @@
> +#include <linux/kernel.h>
> +#include <linux/syscalls.h>
> +#include <asm/processor.h>
> +#include <asm/mpx.h>
> +#include <asm/mman.h>
> +#include <asm/i387.h>
> +#include <asm/fpu-internal.h>
> +#include <asm/alternative.h>
> +
> +static bool allocate_bt(unsigned long bd_entry)
> +{
> + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
> + unsigned long bt_addr, old_val;
> +
> + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
Are we sure about this? We can do a possible memory allocation in
mmap_region() in this exception handler context. And yes, we do a
conditional_sti(), which makes it all the more susceptible.
Have you run this with lockdep enabled?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sun, Jan 12, 2014 at 05:20:03PM +0800, Qiaowei Ren wrote:
> This patch adds new fields about bound violation into siginfo
> structure. si_lower and si_upper are respectively lower bound
> and upper bound when bound violation is caused.
>
> These fields will be set in #BR exception handler by decoding
> the user instruction and constructing the faulting pointer.
> A userspace application can get violation address, lower bound
> and upper bound for bound violation from this new siginfo structure.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/include/asm/mpx.h | 39 +++++
> arch/x86/kernel/mpx.c | 289 ++++++++++++++++++++++++++++++++++++
This thing looks like a partial duplication of functionality which we
already have - inat.*/insn.*, etc.
It would be cleaner to integrate the mpx pieces into the existing x86
insn analysis code and use it instead of growing your own, IMHO.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
>
> This thing looks like a partial duplication of functionality which we
> already have - inat.*/insn.*, etc.
>
> It would be cleaner to integrate the mpx pieces into the existing x86
> insn analysis code and use it instead of growing your own, IMHO.
>
I saw a previous version of the code that did that, and it really didn't
work out well -- it ended up being more complex and slower.
-hpa
On Sun, Jan 12, 2014 at 08:49:21AM -0800, H. Peter Anvin wrote:
> I saw a previous version of the code that did that, and it really
> didn't work out well -- it ended up being more complex and slower.
I suspected as much.
But, we still probably should use the generic struct insn, insn_field,
etc and act on them in mpx.c instead of defining our own mpx_insn,
mpx_insn_field, X86_MODRM_MOD, etc in the header which are more or less
copied from insn.h, right?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 01/12/2014 09:03 AM, Borislav Petkov wrote:
> On Sun, Jan 12, 2014 at 08:49:21AM -0800, H. Peter Anvin wrote:
>> I saw a previous version of the code that did that, and it really
>> didn't work out well -- it ended up being more complex and slower.
>
> I suspected as much.
>
> But, we still probably should use the generic struct insn, insn_field,
> etc and act on them in mpx.c instead of defining our own mpx_insn,
> mpx_insn_field, X86_MODRM_MOD, etc in the header which are more or less
> copied from insn.h, right?
>
That may be.
-hpa
On 01/13/2014 01:03 AM, Borislav Petkov wrote:
> On Sun, Jan 12, 2014 at 08:49:21AM -0800, H. Peter Anvin wrote:
>> I saw a previous version of the code that did that, and it really
>> didn't work out well -- it ended up being more complex and slower.
>
> I suspected as much.
>
> But, we still probably should use the generic struct insn, insn_field,
> etc and act on them in mpx.c instead of defining our own mpx_insn,
> mpx_insn_field, X86_MODRM_MOD, etc in the header which are more or less
> copied from insn.h, right?
>
I tried to use generic structure and macro, but many members of generic
struct insn are not used for MPX, and except this I have to add one
member into this structure. So I define mpx specific struct insn.
And so I guess only struct insn_field and several macros like X86_XXX
may use generic version. Anyway, I will try to use their generic version
in next version for this patchset.
Thanks,
Qiaowei
On 01/12/2014 05:20 PM, Borislav Petkov wrote:
> On Sun, Jan 12, 2014 at 05:20:00PM +0800, Qiaowei Ren wrote:
>> +static bool allocate_bt(unsigned long bd_entry)
>> +{
>> + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
>> + unsigned long bt_addr, old_val;
>> +
>> + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
>> + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
>
> Are we sure about this? We can do a possible memory allocation in
> mmap_region() in this exception handler context. And yes, we do a
> conditional_sti(), which makes it all the more susceptible.
>
> Have you run this with lockdep enabled?
>
Yes, I run this with lockdep enabled.
Thanks,
Qiaowei
On 01/13/2014 11:09 AM, Ren Qiaowei wrote:
> On 01/13/2014 01:03 AM, Borislav Petkov wrote:
>> On Sun, Jan 12, 2014 at 08:49:21AM -0800, H. Peter Anvin wrote:
>>> I saw a previous version of the code that did that, and it really
>>> didn't work out well -- it ended up being more complex and slower.
>>
>> I suspected as much.
>>
>> But, we still probably should use the generic struct insn, insn_field,
>> etc and act on them in mpx.c instead of defining our own mpx_insn,
>> mpx_insn_field, X86_MODRM_MOD, etc in the header which are more or less
>> copied from insn.h, right?
>>
>
> I tried to use generic structure and macro, but many members of generic
> struct insn are not used for MPX, and except this I have to add one
> member into this structure. So I define mpx specific struct insn.
>
> And so I guess only struct insn_field and several macros like X86_XXX
> may use generic version. Anyway, I will try to use their generic version
> in next version for this patchset.
>
Because only struct insn_field and several macros may be replaced with
generic version, I guess it maybe be confused easily to include generic
insn header. What do you think about it?
Thanks,
Qiaowei
On Mon, Jan 13, 2014 at 11:17:15AM +0800, Ren Qiaowei wrote:
> Yes, I run this with lockdep enabled.
Ok, but you still are doing memory allocation in the exception handler,
AFAICT:
do_bounds
|->do_mpx_bt_fault
|->allocate_bt
|->sys_mmap_pgoff
|->vm_mmap_pgoff
|->do_mmap_pgoff
|->mmap_region
|-> kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Mon, Jan 13, 2014 at 04:22:27PM +0800, Ren Qiaowei wrote:
> >I tried to use generic structure and macro, but many members of generic
> >struct insn are not used for MPX,
I think that's ok - there are a lot of examples in the kernel where only
a subset of the struct members are used by a particular functionality.
> Because only struct insn_field and several macros may be replaced
> with generic version, I guess it maybe be confused easily to include
> generic insn header. What do you think about it?
Yes, I think the idea is to use and, if needed, extend the generic
functionality instead of growing our own here and there for obvious
benefits.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
> -----Original Message-----
> From: Borislav Petkov [mailto:[email protected]]
> Sent: Monday, January 13, 2014 6:38 PM
> To: Ren, Qiaowei
> Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; [email protected];
> [email protected]
> Subject: Re: [PATCH 2/5] x86, mpx: hook #BR exception handler to allocate
> bound tables
>
> On Mon, Jan 13, 2014 at 11:17:15AM +0800, Ren Qiaowei wrote:
> > Yes, I run this with lockdep enabled.
>
> Ok, but you still are doing memory allocation in the exception handler,
> AFAICT:
>
> do_bounds
> |->do_mpx_bt_fault
> |->allocate_bt
> |->sys_mmap_pgoff
> |->vm_mmap_pgoff
> |->do_mmap_pgoff
> |->mmap_region
> |-> kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
>
Sorry for my late reply.
Petkov, could you please detail the problem? Memory allocation can't be done in the eception handler? I guess it is like do_page_fault(), right?
Thanks,
Qiaowei
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
> -----Original Message-----
> From: Borislav Petkov [mailto:[email protected]]
> Sent: Monday, January 13, 2014 6:43 PM
> To: Ren, Qiaowei
> Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; [email protected];
> [email protected]
> Subject: Re: [PATCH 5/5] x86, mpx: extend siginfo structure to include bound
> violation information
>
> On Mon, Jan 13, 2014 at 04:22:27PM +0800, Ren Qiaowei wrote:
> > >I tried to use generic structure and macro, but many members of
> > >generic struct insn are not used for MPX,
>
> I think that's ok - there are a lot of examples in the kernel where only a subset
> of the struct members are used by a particular functionality.
>
> > Because only struct insn_field and several macros may be replaced with
> > generic version, I guess it maybe be confused easily to include
> > generic insn header. What do you think about it?
>
> Yes, I think the idea is to use and, if needed, extend the generic functionality
> instead of growing our own here and there for obvious benefits.
>
Yes, I once tried to use and extend the generic functionality to implement the decoding, but I noticed it didn't work out well for this specific MPX case. Anyway, I will try to use generic version more.
Thanks,
Qiaowei
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On 01/13/2014 02:38 AM, Borislav Petkov wrote:
> On Mon, Jan 13, 2014 at 11:17:15AM +0800, Ren Qiaowei wrote:
>> Yes, I run this with lockdep enabled.
>
> Ok, but you still are doing memory allocation in the exception handler,
> AFAICT:
>
Sure... what is the problem with that? We do that in several other
exception handlers, e.g. #NM, and of course page faults. As long as the
exception originates from user space there shouldn't be any issue.
-hpa
On Fri, Jan 17, 2014 at 02:47:15PM +0000, Ren, Qiaowei wrote:
> > do_bounds
> > |->do_mpx_bt_fault
> > |->allocate_bt
> > |->sys_mmap_pgoff
> > |->vm_mmap_pgoff
> > |->do_mmap_pgoff
> > |->mmap_region
> > |-> kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> >
> Sorry for my late reply.
>
> Petkov, could you please detail the problem? Memory allocation can't
> be done in the eception handler? I guess it is like do_page_fault(),
> right?
Right, so Steve and I played a couple of scenarios in IRC with this. So
#BR is comparable with #PF, AFAICT, and as expected we don't take any
locks when handling page faults in kernel space as we might deadlock.
Now, what happens if a thread is sleeping on some lock down that
GFP_KERNEL allocation path and another thread gets a #BR and goes that
same mmap_pgoff path and tries to grab that same lock?
Also, what happens if you take a #BR in NMI context, say the NMI
handler?
All I'm trying to say is, it might not be such a good idea to sleep in a
fault handler...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Fri, Jan 17, 2014 at 08:31:16AM -0800, H. Peter Anvin wrote:
> As long as the exception originates from user space there shouldn't
> be any issue.
And if it originates from kernel space?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 01/17/2014 08:48 AM, Borislav Petkov wrote:
> On Fri, Jan 17, 2014 at 08:31:16AM -0800, H. Peter Anvin wrote:
>> As long as the exception originates from user space there shouldn't
>> be any issue.
>
> And if it originates from kernel space?
>
We shouldn't have any BOUND or MPX instructions in kernel space. Panic.
-hpa
On 01/17/2014 08:47 AM, Borislav Petkov wrote:
>
> Right, so Steve and I played a couple of scenarios in IRC with this. So
> #BR is comparable with #PF, AFAICT, and as expected we don't take any
> locks when handling page faults in kernel space as we might deadlock.
>
> Now, what happens if a thread is sleeping on some lock down that
> GFP_KERNEL allocation path and another thread gets a #BR and goes that
> same mmap_pgoff path and tries to grab that same lock?
It goes to sleep. Same as if we take a page fault and have to page
something in.
> Also, what happens if you take a #BR in NMI context, say the NMI
> handler?
You should never, ever do that. We should never take a #BR in the
kernel, full stop -- if we do it is panic time.
> All I'm trying to say is, it might not be such a good idea to sleep in a
> fault handler...
A fault handler from user space is really nothing other than a different
kind of system call. It is nothing magic about it.
-hpa
On Fri, Jan 17, 2014 at 08:51:31AM -0800, H. Peter Anvin wrote:
> We shouldn't have any BOUND or MPX instructions in kernel space. Panic.
How do we enforce that?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 01/17/2014 08:56 AM, Borislav Petkov wrote:
> On Fri, Jan 17, 2014 at 08:51:31AM -0800, H. Peter Anvin wrote:
>> We shouldn't have any BOUND or MPX instructions in kernel space. Panic.
>
> How do we enforce that?
#BR exception from kernel space -> panic.
We have tons of rules on kernel code... e.g. no FPU usage. This one is
minor in comparison.
-hpa
On Fri, Jan 17, 2014 at 08:58:49AM -0800, H. Peter Anvin wrote:
> #BR exception from kernel space -> panic.
>
> We have tons of rules on kernel code... e.g. no FPU usage. This one is
> minor in comparison.
So basically review should catch those wrong BOUND usages?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Fri, 17 Jan 2014 17:47:36 +0100
Borislav Petkov <[email protected]> wrote:
> On Fri, Jan 17, 2014 at 02:47:15PM +0000, Ren, Qiaowei wrote:
> > > do_bounds
> > > |->do_mpx_bt_fault
> > > |->allocate_bt
> > > |->sys_mmap_pgoff
> > > |->vm_mmap_pgoff
> > > |->do_mmap_pgoff
> > > |->mmap_region
> > > |-> kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> > >
> > Sorry for my late reply.
> >
> > Petkov, could you please detail the problem? Memory allocation can't
> > be done in the eception handler? I guess it is like do_page_fault(),
> > right?
>
> Right, so Steve and I played a couple of scenarios in IRC with this. So
> #BR is comparable with #PF, AFAICT, and as expected we don't take any
> locks when handling page faults in kernel space as we might deadlock.
>
> Now, what happens if a thread is sleeping on some lock down that
> GFP_KERNEL allocation path and another thread gets a #BR and goes that
> same mmap_pgoff path and tries to grab that same lock?
>
> Also, what happens if you take a #BR in NMI context, say the NMI
> handler?
>
> All I'm trying to say is, it might not be such a good idea to sleep in a
> fault handler...
>
Or do what #PF does. Check if the fault happened in the kernel and go
one path (probably follow what do_fault() does), otherwise if it is
userspace, it's ok to sleep or grab locks or whatever you want.
-- Steve
On Fri, 17 Jan 2014 08:51:03 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On 01/17/2014 08:47 AM, Borislav Petkov wrote:
> >
> > Right, so Steve and I played a couple of scenarios in IRC with this. So
> > #BR is comparable with #PF, AFAICT, and as expected we don't take any
> > locks when handling page faults in kernel space as we might deadlock.
> >
> > Now, what happens if a thread is sleeping on some lock down that
> > GFP_KERNEL allocation path and another thread gets a #BR and goes that
> > same mmap_pgoff path and tries to grab that same lock?
>
> It goes to sleep. Same as if we take a page fault and have to page
> something in.
Yep, which is what I was explaining to Boris on IRC.
>
> > Also, what happens if you take a #BR in NMI context, say the NMI
> > handler?
>
> You should never, ever do that. We should never take a #BR in the
> kernel, full stop -- if we do it is panic time.
Right. It should actually do what a page fault does too. If we page
fault in NMI, it reports it and crashes.
>
> > All I'm trying to say is, it might not be such a good idea to sleep in a
> > fault handler...
>
> A fault handler from user space is really nothing other than a different
> kind of system call. It is nothing magic about it.
Exactly. I was saying that #BR should be just like #PF, as it can
detect bugs in the kernel too. The first thing the handler should do is
check to see if the fault occurred in userspace or kernel space. If it
is userspace, then there's no restrictions. If it is kernel space then
we should do the bare minimum to report the bug and then kill whatever
task happened to do it.
-- Steve
On 01/17/2014 09:04 AM, Borislav Petkov wrote:
> On Fri, Jan 17, 2014 at 08:58:49AM -0800, H. Peter Anvin wrote:
>> #BR exception from kernel space -> panic.
>>
>> We have tons of rules on kernel code... e.g. no FPU usage. This one is
>> minor in comparison.
>
> So basically review should catch those wrong BOUND usages?
>
Yes. It would take some serious effort to get BOUND or MPX instructions
into the kernel -- you'd have to put them in inline assembly, and BOUND
doesn't even exist in 64-bit mode.
-hpa
On 01/17/2014 09:14 AM, Steven Rostedt wrote:
>>
>>> All I'm trying to say is, it might not be such a good idea to sleep in a
>>> fault handler...
>>
>> A fault handler from user space is really nothing other than a different
>> kind of system call. It is nothing magic about it.
>
> Exactly. I was saying that #BR should be just like #PF, as it can
> detect bugs in the kernel too. The first thing the handler should do is
> check to see if the fault occurred in userspace or kernel space. If it
> is userspace, then there's no restrictions. If it is kernel space then
> we should do the bare minimum to report the bug and then kill whatever
> task happened to do it.
>
Yes. We call this "oopsing" ;)
-hpa
On Fri, Jan 17, 2014 at 09:30:45AM -0800, H. Peter Anvin wrote:
> Yes. It would take some serious effort to get BOUND or MPX
> instructions into the kernel -- you'd have to put them in inline
> assembly, and BOUND doesn't even exist in 64-bit mode.
Which reminds me - we will have to explicitly turn off any compiler
switches that utilize MPX insns so that they don't get issued when
building the kernel. Provided there will be compiler support, that is.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 01/17/2014 10:23 AM, Borislav Petkov wrote:
> On Fri, Jan 17, 2014 at 09:30:45AM -0800, H. Peter Anvin wrote:
>> Yes. It would take some serious effort to get BOUND or MPX
>> instructions into the kernel -- you'd have to put them in inline
>> assembly, and BOUND doesn't even exist in 64-bit mode.
>
> Which reminds me - we will have to explicitly turn off any compiler
> switches that utilize MPX insns so that they don't get issued when
> building the kernel. Provided there will be compiler support, that is.
>
There will be, whether or not it will be on by default is another matter.
Either way, the kernel would have to explicitly enable the use MPX
instructions in kernel space or they are simply treated as NOPs.
-hpa
On 01/12/2014 01:20 AM, Qiaowei Ren wrote:
> This patch adds all the MPX instructions to x86 opcode map, and then
> the x86 instruction decoder can decode MPX instructions used in kernel.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/lib/x86-opcode-map.txt | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
This patch seems rather trivial and uncontroversial, so I'm going to
pull it in.
-hpa
Commit-ID: fb09b78151361f5001ad462e4b242b10845830e2
Gitweb: http://git.kernel.org/tip/fb09b78151361f5001ad462e4b242b10845830e2
Author: Qiaowei Ren <[email protected]>
AuthorDate: Sun, 12 Jan 2014 17:20:02 +0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Fri, 17 Jan 2014 11:04:09 -0800
x86, mpx: Add MPX related opcodes to the x86 opcode map
This patch adds all the MPX instructions to x86 opcode map, so the x86
instruction decoder can decode MPX instructions.
Signed-off-by: Qiaowei Ren <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 533a85e..1a2be7c 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -346,8 +346,8 @@ AVXcode: 1
17: vmovhps Mq,Vq (v1) | vmovhpd Mq,Vq (66),(v1)
18: Grp16 (1A)
19:
-1a:
-1b:
+1a: BNDCL Ev,Gv | BNDCU Ev,Gv | BNDMOV Gv,Ev | BNDLDX Gv,Ev,Gv
+1b: BNDCN Ev,Gv | BNDMOV Ev,Gv | BNDMK Gv,Ev | BNDSTX Ev,GV,Gv
1c:
1d:
1e:
> -----Original Message-----
> From: Steven Rostedt [mailto:[email protected]]
> Sent: Saturday, January 18, 2014 1:15 AM
> To: H. Peter Anvin
> Cc: Borislav Petkov; Ren, Qiaowei; Thomas Gleixner; Ingo Molnar;
> [email protected]; [email protected]
> Subject: Re: [PATCH 2/5] x86, mpx: hook #BR exception handler to allocate
> bound tables
>
> On Fri, 17 Jan 2014 08:51:03 -0800
> "H. Peter Anvin" <[email protected]> wrote:
>
> > On 01/17/2014 08:47 AM, Borislav Petkov wrote:
> > >
> > > Right, so Steve and I played a couple of scenarios in IRC with this.
> > > So #BR is comparable with #PF, AFAICT, and as expected we don't take
> > > any locks when handling page faults in kernel space as we might deadlock.
> > >
> > > Now, what happens if a thread is sleeping on some lock down that
> > > GFP_KERNEL allocation path and another thread gets a #BR and goes
> > > that same mmap_pgoff path and tries to grab that same lock?
> >
> > It goes to sleep. Same as if we take a page fault and have to page
> > something in.
>
> Yep, which is what I was explaining to Boris on IRC.
>
> >
> > > Also, what happens if you take a #BR in NMI context, say the NMI
> > > handler?
> >
> > You should never, ever do that. We should never take a #BR in the
> > kernel, full stop -- if we do it is panic time.
>
> Right. It should actually do what a page fault does too. If we page fault in NMI,
> it reports it and crashes.
>
> >
> > > All I'm trying to say is, it might not be such a good idea to sleep
> > > in a fault handler...
> >
> > A fault handler from user space is really nothing other than a
> > different kind of system call. It is nothing magic about it.
>
> Exactly. I was saying that #BR should be just like #PF, as it can detect bugs in
> the kernel too. The first thing the handler should do is check to see if the fault
> occurred in userspace or kernel space. If it is userspace, then there's no
> restrictions. If it is kernel space then we should do the bare minimum to report
> the bug and then kill whatever task happened to do it.
>
Yes. I guess I know what you mean and this detection is necessary.
The following check should be added into at the beginning of this handler.
if (!user_mode(regs)) {
if (!fixup_exception(regs)) {
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
die(str, regs, error_code);
}
}
Thanks,
Qiaowei