2014-01-26 01:56:09

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v3 0/4] Intel MPX support

This patchset adds support for the Memory Protection Extensions
(MPX) feature found in future Intel processors.

MPX can be used in conjunction with compiler changes to check memory
references, for those references whose compile-time normal intentions
are usurped at runtime due to buffer overflow or underflow.

MPX provides this capability at very low performance overhead for
newly compiled code, and provides compatibility mechanisms with legacy
software components. MPX architecture is designed allow a machine to
run both MPX enabled software and legacy software that is MPX unaware.
In such a case, the legacy software does not benefit from MPX, but it
also does not experience any change in functionality or reduction in
performance.

More information about Intel MPX can be found in "Intel(R) Architecture
Instruction Set Extensions Programming Reference".

To get the advantage of MPX, changes are required in the OS kernel,
binutils, compiler, system libraries support.

New GCC option -fmpx is introduced to utilize MPX instructions.
Currently GCC compiler sources with MPX support is available in a
separate branch in common GCC SVN repository. See GCC SVN page
(http://gcc.gnu.org/svn.html) for details.

To have the full protection, we had to add MPX instrumentation to all
the necessary Glibc routines (e.g. memcpy) written on assembler, and
compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
Glibc source can be found in Glibc git repository.

Enabling an application to use MPX will generally not require source
code updates but there is some runtime code, which is responsible for
configuring and enabling MPX, needed in order to make use of MPX.
For most applications this runtime support will be available by linking
to a library supplied by the compiler or possibly it will come directly
from the OS once OS versions that support MPX are available.

MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
provide handlers for bounds faults (#BR), and manage bounds memory.

Currently no hardware with MPX ISA is available but it is always
possible to use SDE (Intel(R) software Development Emulator) instead,
which can be downloaded from
http://software.intel.com/en-us/articles/intel-software-development-emulator


Changes since v1:
* check to see if #BR occurred in userspace or kernel space.
* use generic structure and macro as much as possible when
decode mpx instructions.

Changes since v2:
* fix some compile warnings.
* update documentation.

Qiaowei Ren (4):
x86, mpx: add documentation on Intel MPX
x86, mpx: hook #BR exception handler to allocate bound tables
x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
x86, mpx: extend siginfo structure to include bound violation
information

Documentation/x86/intel_mpx.txt | 226 ++++++++++++++++++++
arch/x86/Kconfig | 4 +
arch/x86/include/asm/mpx.h | 63 ++++++
arch/x86/include/asm/processor.h | 16 ++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 415 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 61 +++++-
include/uapi/asm-generic/siginfo.h | 9 +-
include/uapi/linux/prctl.h | 6 +
kernel/signal.c | 4 +
kernel/sys.c | 12 +
11 files changed, 815 insertions(+), 2 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/kernel/mpx.c


2014-01-26 01:56:48

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

An access to an invalid bound directory entry will cause a #BR
exception. This patch hook #BR exception handler to allocate
one bound table and bind it with that buond directory entry.

This will avoid the need of forwarding the #BR exception
to the user space when bound directory has invalid entry.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 35 ++++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 44 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 55 +++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 134 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/kernel/mpx.c

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
new file mode 100644
index 0000000..d074153
--- /dev/null
+++ b/arch/x86/include/asm/mpx.h
@@ -0,0 +1,35 @@
+#ifndef _ASM_X86_MPX_H
+#define _ASM_X86_MPX_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_64
+
+#define MPX_L1_BITS 28
+#define MPX_L1_SHIFT 3
+#define MPX_L2_BITS 17
+#define MPX_L2_SHIFT 5
+#define MPX_IGN_BITS 3
+#define MPX_L2_NODE_ADDR_MASK 0xfffffffffffffff8UL
+
+#define MPX_BNDSTA_ADDR_MASK 0xfffffffffffffffcUL
+#define MPX_BNDCFG_ADDR_MASK 0xfffffffffffff000UL
+
+#else
+
+#define MPX_L1_BITS 20
+#define MPX_L1_SHIFT 2
+#define MPX_L2_BITS 10
+#define MPX_L2_SHIFT 4
+#define MPX_IGN_BITS 2
+#define MPX_L2_NODE_ADDR_MASK 0xfffffffcUL
+
+#define MPX_BNDSTA_ADDR_MASK 0xfffffffcUL
+#define MPX_BNDCFG_ADDR_MASK 0xfffff000UL
+
+#endif
+
+void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+
+#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cb648c8..becb970 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o

obj-y += process.o
obj-y += i387.o xsave.o
+obj-y += mpx.o
obj-y += ptrace.o
obj-$(CONFIG_X86_32) += tls.o
obj-$(CONFIG_IA32_EMULATION) += tls.o
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
new file mode 100644
index 0000000..e055e0e
--- /dev/null
+++ b/arch/x86/kernel/mpx.c
@@ -0,0 +1,44 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/processor.h>
+#include <asm/mpx.h>
+#include <asm/mman.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
+#include <asm/alternative.h>
+
+static bool allocate_bt(unsigned long bd_entry)
+{
+ unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
+ unsigned long bt_addr, old_val = 0;
+
+ bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
+ if (bt_addr == -1) {
+ pr_err("L2 Node Allocation Failed at L1 addr %lx\n",
+ bd_entry);
+ return false;
+ }
+ bt_addr = (bt_addr & MPX_L2_NODE_ADDR_MASK) | 0x01;
+
+ user_atomic_cmpxchg_inatomic(&old_val,
+ (long __user *)bd_entry, 0, bt_addr);
+ if (old_val)
+ vm_munmap(bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
+
+ return true;
+}
+
+void do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+ unsigned long status;
+ unsigned long bd_entry, bd_base;
+ unsigned long bd_size = 1UL << (MPX_L1_BITS+MPX_L1_SHIFT);
+
+ bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+ status = xsave_buf->bndcsr.status_reg;
+
+ bd_entry = status & MPX_BNDSTA_ADDR_MASK;
+ if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
+ allocate_bt(bd_entry);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 57409f6..6b284a4 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
#include <asm/alternative.h>
+#include <asm/mpx.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -213,7 +214,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \

DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip )
DO_ERROR (X86_TRAP_OF, SIGSEGV, "overflow", overflow )
-DO_ERROR (X86_TRAP_BR, SIGSEGV, "bounds", bounds )
DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip )
DO_ERROR (X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun )
DO_ERROR (X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS )
@@ -263,6 +263,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
}
#endif

+dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
+{
+ enum ctx_state prev_state;
+ unsigned long status;
+ struct xsave_struct *xsave_buf;
+ struct task_struct *tsk = current;
+
+ prev_state = exception_enter();
+ if (notify_die(DIE_TRAP, "bounds", regs, error_code,
+ X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
+ goto exit;
+ conditional_sti(regs);
+
+ if (!user_mode(regs)) {
+ if (!fixup_exception(regs)) {
+ tsk->thread.error_code = error_code;
+ tsk->thread.trap_nr = X86_TRAP_BR;
+ die("bounds", regs, error_code);
+ }
+ goto exit;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_MPX)) {
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ goto exit;
+ }
+
+ fpu_xsave(&tsk->thread.fpu);
+ xsave_buf = &(tsk->thread.fpu.state->xsave);
+ status = xsave_buf->bndcsr.status_reg;
+
+ switch (status & 0x3) {
+ case 2:
+ /*
+ * Bound directory has invalid entry.
+ * No signal will be sent to the user space.
+ */
+ do_mpx_bt_fault(xsave_buf);
+ break;
+
+ case 1: /* Bound violation. */
+ case 0: /* No MPX exception. */
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ break;
+
+ default:
+ break;
+ }
+
+exit:
+ exception_exit(prev_state);
+}
+
dotraplinkage void __kprobes
do_general_protection(struct pt_regs *regs, long error_code)
{
--
1.7.1

2014-01-26 01:56:50

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <[email protected]>
---
Documentation/x86/intel_mpx.txt | 226 +++++++++++++++++++++++++++++++++++++++
1 files changed, 226 insertions(+), 0 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt

diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 0000000..052001c
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,226 @@
+1. Intel(R) MPX Overview
+========================
+
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
+capability introduced into Intel Architecture. Intel MPX provides
+hardware features that can be used in conjunction with compiler
+changes to check memory references, for those references whose
+compile-time normal intentions are usurped at runtime due to
+buffer overflow or underflow.
+
+Two of the most important goals of Intel MPX are to provide
+this capability at very low performance overhead for newly
+compiled code, and to provide compatibility mechanisms with
+legacy software components. MPX architecture is designed
+allow a machine (i.e., the processor(s) and the OS software)
+to run both MPX enabled software and legacy software that
+is MPX unaware. In such a case, the legacy software does not
+benefit from MPX, but it also does not experience any change
+in functionality or reduction in performance.
+
+Intel(R) MPX Programming Model
+------------------------------
+
+Intel MPX introduces new registers and new instructions that
+operate on these registers. Some of the registers added are
+bounds registers which store a pointer's lower bound and upper
+bound limits. Whenever the pointer is used, the requested
+reference is checked against the pointer's associated bounds,
+thereby preventing out-of-bound memory access (such as buffer
+overflows and overruns). Out-of-bounds memory references
+initiate a #BR exception which can then be handled in an
+appropriate manner.
+
+Loading and Storing Bounds using Translation
+--------------------------------------------
+
+Intel MPX defines two instructions for load/store of the linear
+address of a pointer to a buffer, along with the bounds of the
+buffer into a paging structure of extended bounds. Specifically
+when storing extended bounds, the processor will perform address
+translation of the address where the pointer is stored to an
+address in the Bound Table (BT) to determine the store location
+of extended bounds. Loading of an extended bounds performs the
+reverse sequence.
+
+The structure in memory to load/store an extended bound is a
+4-tuple consisting of lower bound, upper bound, pointer value
+and a reserved field. Bound loads and stores access 32-bit or
+64-bit operand size according to the operation mode. Thus,
+a bound table entry is 4*32 bits in 32-bit mode and 4*64 bits
+in 64-bit mode.
+
+The linear address of a bound table is stored in a Bound
+Directory (BD) entry. And the linear address of the bound
+directory is derived from either BNDCFGU or BNDCFGS registers.
+Bounds in memory are stored in Bound Tables (BT) as an extended
+bound, which are accessed via Bound Directory (BD) and address
+translation performed by BNDLDX/BNDSTX instructions.
+
+Bounds Directory (BD) and Bounds Tables (BT) are stored in
+application memory and are allocated by the application (in case
+of kernel use, the structures will be in kernel memory). The
+bound directory and each instance of bound table are in contiguous
+linear memory.
+
+XSAVE/XRESTOR Support of Intel MPX State
+----------------------------------------
+
+Enabling Intel MPX requires an OS to manage two bits in XCR0:
+ - BNDREGS for saving and restoring registers BND0-BND3,
+ - BNDCSR for saving and restoring the user-mode configuration
+(BNDCFGU) and the status register (BNDSTATUS).
+
+The reason for having two separate bits is that BND0-BND3 is
+likely to be volatile state, while BNDCFGU and BNDSTATUS are not.
+Therefore, an OS has flexibility in handling these two states
+differently in saving or restoring them.
+
+For details about the Intel MPX instructions, see "Intel(R)
+Architecture Instruction Set Extensions Programming Reference".
+
+
+2. How to get the advantage of MPX
+==================================
+
+
+To get the advantage of MPX, changes are required in
+the OS kernel, binutils, compiler, system libraries support.
+
+MPX support in the GNU toolchain
+--------------------------------
+
+This section describes changes in GNU Binutils, GCC and Glibc
+to support MPX.
+
+The first step of MPX support is to implement support for new
+hardware features in binutils and the GCC.
+
+The second step is implementation of MPX instrumentation pass
+in the GCC compiler which is responsible for instrumenting all
+memory accesses with pointer checks. Compiler changes for runtime
+bound checks include:
+
+ * Bounds creation for statically allocated objects, objects
+ allocated on the stack and statically initialized pointers.
+
+ * MPX support in ABI: ABI extension allows passing bounds for
+ the pointers passed as function arguments and provide returned
+ bounds with the pointers.
+
+ * Bounds table content management: each pointer is stored into
+ the memory should have its bounds stored in the corresponding
+ row of the bounds table; compiler generates appropriate code
+ to have the bounds table in the consistent state.
+
+ * Memory accesses instrumentation: compiler analyzes data flow
+ to compute bounds corresponding to each memory access and
+ inserts code to check used address against computed bounds.
+
+Dynamically created objects in heap using memory allocators need
+to set bounds for objects (buffers) at allocation time. So the
+next step is to add MPX support into standard memory allocators
+in Glibc.
+
+To have the full protection, an application has to use libraries
+compiled with MPX instrumentation. It means we had to compile
+Glibc with the MPX enabled GCC compiler because it is used in
+most applications. Also we had to add MPX instrumentation to all
+the necessary Glibc routines (e.g. memcpy) written on assembler.
+
+New GCC option -fmpx is introduced to utilize MPX instructions.
+Also binutils with MPX enabled should be used to get binaries
+with memory protection.
+
+Consider following simple test for MPX compiled program:
+
+ int main(int argc, char* argv)
+ {
+ int buf[100];
+ return buf[argc];
+ }
+
+Snippet of the original assembler output (compiled with -O2):
+
+ movslq %edi, %rdi
+ movl -120(%rsp,%rdi,4), %eax // memory access buf[argc]
+
+Compile test as follows: mpx-gcc/gcc test.c -fmpx -O2.
+
+Resulted assembler snippet:
+
+ movl $399, %edx // load array length to edx
+ movslq %edi, %rdi // rdi contains value of argc
+ leaq -104(%rsp), %rax // load start address of buf to rax
+ bndmk (%rax,%rdx), %bnd0 // create bounds for buf
+ bndcl (%rax,%rdi,4), %bnd0 // check that memory access doesn't
+ // violate buf's low bound
+ bndcu 3(%rax,%rdi,4), %bnd0 // check that memory access doesn't
+ // violate buf's upper bound
+ movl -104(%rsp,%rdi,4), %eax // original memory access
+
+Code looks pretty clear. Note only that we added displacement 3 for
+upper bound checking since we have 4 byte (integer) access here.
+
+Several MPX specific compiler options besides -fmpx were introduced
+in the compiler. Most of them, like -fmpx-check-read and
+-fmpx-check-write, control number of inserted runtime bound checks.
+Also developers always can use intrinsics to insert MPX instructions
+manually.
+
+Currently GCC compiler sources with MPX support is available in a
+separate branch in common GCC SVN repository. See GCC SVN page
+(http://gcc.gnu.org/svn.html) for details.
+
+Currently no hardware with MPX ISA is available but it is always
+possible to use SDE (Intel(R) software Development Emulator) instead,
+which can be downloaded from
+http://software.intel.com/en-us/articles/intel-software-development-emulator
+
+MPX runtime support
+-------------------
+
+Enabling an application to use MPX will generally not require source
+code updates but there is some runtime code needed in order to make
+use of MPX. For most applications this runtime support will be available
+by linking to a library supplied by the compiler or possibly it will
+come directly from the OS once OS versions that support MPX are available.
+
+The runtime is responsible for configuring and enabling MPX. The
+configuration and enabling of MPX consists of the runtime writing
+the base address of the Bound Directory(BD) to the BNDCFGU register
+and setting the enable bit.
+
+MPX kernel support
+------------------
+
+MPX kernel code has mainly the following responsibilities.
+
+1) Providing handlers for bounds faults (#BR).
+
+When MPX is enabled, there are 2 new situations that can generate
+#BR faults. If a bounds overflow occurs then a #BR is generated.
+The fault handler will decode MPX instructions to get violation
+address and set this address into extended struct siginfo.
+
+The other case that generates a #BR is when a BNDSTX instruction
+attempts to save bounds to a BD entry marked as invalid. This is
+an indication that no BT exists for this entry. In this case the
+fault handler will allocate a new BT.
+
+2) Managing bounds memory.
+
+MPX defines 4 sets of bound registers. When an application needs
+more than 4 sets of bounds it uses the BNDSTX instruction to save
+the additional bounds out to memory. The kernel dynamically allocates
+the memory used to store these bounds. The bounds memory is organized
+into a 2 level structure consisting of a BD which contains pointers
+to a set of Bound Tables (BT) which contain the actual bound information
+In order to minimize the Intel MPX memory usage the BTs are allocated
+on demand by the Intel MPX runtime.
+
+Also the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands are added
+to init and release MPX related resource. A MMU notifier will be
+registered during PR_MPX_INIT command execution. So the bound tables
+can be automatically deallocated when one memory area is unmapped.
--
1.7.1

2014-01-26 01:56:47

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.

These fields will be set in #BR exception handler by decoding
the user instruction and constructing the faulting pointer.
A userspace application can get violation address, lower bound
and upper bound for bound violation from this new siginfo structure.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 19 +++
arch/x86/kernel/mpx.c | 287 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 6 +
include/uapi/asm-generic/siginfo.h | 9 +-
kernel/signal.c | 4 +
5 files changed, 324 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 9652e9e..e099573 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -3,6 +3,7 @@

#include <linux/types.h>
#include <asm/ptrace.h>
+#include <asm/insn.h>

#ifdef CONFIG_X86_64

@@ -30,6 +31,22 @@

#endif

+struct mpx_insn {
+ struct insn_field rex_prefix; /* REX prefix */
+ struct insn_field modrm;
+ struct insn_field sib;
+ struct insn_field displacement;
+
+ unsigned char addr_bytes; /* effective address size */
+ unsigned char limit;
+ unsigned char x86_64;
+
+ const unsigned char *kaddr; /* kernel address of insn to analyze */
+ const unsigned char *next_byte;
+};
+
+#define MAX_MPX_INSN_SIZE 15
+
typedef union {
struct {
unsigned long ignored:MPX_IGN_BITS;
@@ -40,5 +57,7 @@ typedef union {
} mpx_addr;

void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf);

#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index 9e91178..983abf7 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -91,6 +91,269 @@ int mpx_release(struct task_struct *tsk)
return 0;
}

+typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t;
+static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
+ reg_type_t type)
+{
+ int regno = 0;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ static const int regoff[] = {
+ offsetof(struct pt_regs, ax),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, sp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+#endif
+ };
+
+ switch (type) {
+ case REG_TYPE_RM:
+ regno = X86_MODRM_RM(modrm);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_INDEX:
+ regno = X86_SIB_INDEX(sib);
+ if (X86_REX_X(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_BASE:
+ regno = X86_SIB_BASE(sib);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ default:
+ break;
+ }
+
+ return regs_get_register(regs, regoff[regno]);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned long addr;
+ unsigned long base;
+ unsigned long indx;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ if (X86_MODRM_MOD(modrm) == 3) {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ } else {
+ if (insn->sib.nbytes) {
+ base = get_reg(insn, regs, REG_TYPE_BASE);
+ indx = get_reg(insn, regs, REG_TYPE_INDEX);
+ addr = base + indx * (1 << X86_SIB_SCALE(sib));
+ } else {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ }
+ addr += insn->displacement.value;
+ }
+
+ return addr;
+}
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n) \
+ ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit)
+
+#define __get_next(t, insn) \
+({ \
+ t r = *(t *)insn->next_byte; \
+ insn->next_byte += sizeof(t); \
+ r; \
+})
+
+#define __peek_next(t, insn) \
+({ \
+ t r = *(t *)insn->next_byte; \
+ r; \
+})
+
+#define get_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __get_next(t, insn); \
+})
+
+#define peek_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __peek_next(t, insn); \
+})
+
+static void mpx_insn_get_prefixes(struct mpx_insn *insn)
+{
+ unsigned char b;
+
+ /* Decode legacy prefix and REX prefix */
+ b = peek_next(unsigned char, insn);
+ while (b != 0x0f) {
+ /*
+ * look for a rex prefix
+ * a REX prefix cannot be followed by a legacy prefix.
+ */
+ if (insn->x86_64 && ((b&0xf0)==0x40)) {
+ insn->rex_prefix.value = b;
+ insn->rex_prefix.nbytes = 1;
+ insn->next_byte++;
+ break;
+ }
+
+ /* check the other legacy prefixes */
+ switch (b) {
+ case 0xf2:
+ case 0xf3:
+ case 0xf0:
+ case 0x64:
+ case 0x65:
+ case 0x2e:
+ case 0x3e:
+ case 0x26:
+ case 0x36:
+ case 0x66:
+ case 0x67:
+ insn->next_byte++;
+ break;
+ default: /* everything else is garbage */
+ goto err_out;
+ }
+ b = peek_next(unsigned char, insn);
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_modrm(struct mpx_insn *insn)
+{
+ insn->modrm.value = get_next(unsigned char, insn);
+ insn->modrm.nbytes = 1;
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_sib(struct mpx_insn *insn)
+{
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+
+ if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+ insn->sib.value = get_next(unsigned char, insn);
+ insn->sib.nbytes = 1;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_displacement(struct mpx_insn *insn)
+{
+ unsigned char mod, rm, base;
+
+ /*
+ * Interpreting the modrm byte:
+ * mod = 00 - no displacement fields (exceptions below)
+ * mod = 01 - 1-byte displacement field
+ * mod = 10 - displacement field is 4 bytes
+ * mod = 11 - no memory operand
+ *
+ * mod != 11, r/m = 100 - SIB byte exists
+ * mod = 00, SIB base = 101 - displacement field is 4 bytes
+ * mod = 00, r/m = 101 - rip-relative addressing, displacement
+ * field is 4 bytes
+ */
+ mod = X86_MODRM_MOD(insn->modrm.value);
+ rm = X86_MODRM_RM(insn->modrm.value);
+ base = X86_SIB_BASE(insn->sib.value);
+ if (mod == 3)
+ return;
+ if (mod == 1) {
+ insn->displacement.value = get_next(unsigned char, insn);
+ insn->displacement.nbytes = 1;
+ } else if ((mod == 0 && rm == 5) || mod == 2 ||
+ (mod == 0 && base == 5)) {
+ insn->displacement.value = get_next(int, insn);
+ insn->displacement.nbytes = 4;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned char buf[MAX_MPX_INSN_SIZE];
+ int bytes;
+
+ memset(insn, 0, sizeof(*insn));
+
+ bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE);
+ insn->limit = MAX_MPX_INSN_SIZE - bytes;
+ insn->kaddr = buf;
+ insn->next_byte = buf;
+
+ /*
+ * In 64-bit Mode, all Intel MPX instructions use 64-bit
+ * operands for bounds and 64 bit addressing, i.e. REX.W &
+ * 67H have no effect on data or address size.
+ *
+ * In compatibility and legacy modes (including 16-bit code
+ * segments, real and virtual 8086 modes) all Intel MPX
+ * instructions use 32-bit operands for bounds and 32 bit
+ * addressing.
+ */
+#ifdef CONFIG_X86_64
+ insn->x86_64 = 1;
+ insn->addr_bytes = 8;
+#else
+ insn->x86_64 = 0;
+ insn->addr_bytes = 4;
+#endif
+}
+
+static unsigned long mpx_insn_decode(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ mpx_insn_init(insn, regs);
+
+ /*
+ * In this case, we only need decode bndcl/bndcn/bndcu,
+ * so we can use private diassembly interfaces to get
+ * prefixes, modrm, sib, displacement, etc..
+ */
+ mpx_insn_get_prefixes(insn);
+ insn->next_byte += 2; /* ignore opcode */
+ mpx_insn_get_modrm(insn);
+ mpx_insn_get_sib(insn);
+ mpx_insn_get_displacement(insn);
+
+ return get_addr_ref(insn, regs);
+}
+
static bool allocate_bt(unsigned long bd_entry)
{
unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
@@ -126,3 +389,27 @@ void do_mpx_bt_fault(struct xsave_struct *xsave_buf)
if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
allocate_bt(bd_entry);
}
+
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf)
+{
+ struct mpx_insn insn;
+ uint8_t bndregno;
+ unsigned long addr_vio;
+
+ addr_vio = mpx_insn_decode(&insn, regs);
+
+ bndregno = X86_MODRM_REG(insn.modrm.value);
+ if (bndregno > 3)
+ return;
+
+ info->si_lower =
+ (void __user *)(xsave_buf->bndregs.bndregs[2*bndregno]);
+ info->si_upper =
+ (void __user *)(~xsave_buf->bndregs.bndregs[2*bndregno+1]);
+ info->si_addr_lsb = 0;
+ info->si_signo = SIGSEGV;
+ info->si_errno = 0;
+ info->si_code = SEGV_BNDERR;
+ info->si_addr = (void __user *)addr_vio;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6b284a4..4fda5c6 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -269,6 +269,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
unsigned long status;
struct xsave_struct *xsave_buf;
struct task_struct *tsk = current;
+ siginfo_t info;

prev_state = exception_enter();
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
@@ -304,6 +305,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
break;

case 1: /* Bound violation. */
+ do_mpx_bounds(regs, &info, xsave_buf);
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs,
+ error_code, &info);
+ break;
+
case 0: /* No MPX exception. */
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
break;
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index ba5be7f..1e35520 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,10 @@ typedef struct siginfo {
int _trapno; /* TRAP # which caused the signal */
#endif
short _addr_lsb; /* LSB of the reported address */
+ struct {
+ void __user *_lower;
+ void __user *_upper;
+ } _addr_bnd;
} _sigfault;

/* SIGPOLL */
@@ -131,6 +135,8 @@ typedef struct siginfo {
#define si_trapno _sifields._sigfault._trapno
#endif
#define si_addr_lsb _sifields._sigfault._addr_lsb
+#define si_lower _sifields._sigfault._addr_bnd._lower
+#define si_upper _sifields._sigfault._addr_bnd._upper
#define si_band _sifields._sigpoll._band
#define si_fd _sifields._sigpoll._fd
#ifdef __ARCH_SIGSYS
@@ -199,7 +205,8 @@ typedef struct siginfo {
*/
#define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */
#define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped object */
-#define NSIGSEGV 2
+#define SEGV_BNDERR (__SI_FAULT|3) /* failed address bound checks */
+#define NSIGSEGV 3

/*
* SIGBUS si_codes
diff --git a/kernel/signal.c b/kernel/signal.c
index 940b30e..7b774d9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2771,6 +2771,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from)
if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
#endif
+#ifdef SEGV_BNDERR
+ err |= __put_user(from->si_lower, &to->si_lower);
+ err |= __put_user(from->si_upper, &to->si_upper);
+#endif
break;
case __SI_CHLD:
err |= __put_user(from->si_pid, &to->si_pid);
--
1.7.1

2014-01-26 01:56:45

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl()
commands on the x86 platform. These commands can be used to
init and release MPX related resource.

A MMU notifier will be registered during PR_MPX_INIT
command execution. So the bound tables can be automatically
deallocated when one memory area is unmapped.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/Kconfig | 4 ++
arch/x86/include/asm/mpx.h | 9 ++++
arch/x86/include/asm/processor.h | 16 +++++++
arch/x86/kernel/mpx.c | 84 ++++++++++++++++++++++++++++++++++++++
include/uapi/linux/prctl.h | 6 +++
kernel/sys.c | 12 +++++
6 files changed, 131 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd18b83..28916e1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -236,6 +236,10 @@ config HAVE_INTEL_TXT
def_bool y
depends on INTEL_IOMMU && ACPI

+config HAVE_INTEL_MPX
+ def_bool y
+ select MMU_NOTIFIER
+
config X86_32_SMP
def_bool y
depends on X86_32 && SMP
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index d074153..9652e9e 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -30,6 +30,15 @@

#endif

+typedef union {
+ struct {
+ unsigned long ignored:MPX_IGN_BITS;
+ unsigned long l2index:MPX_L2_BITS;
+ unsigned long l1index:MPX_L1_BITS;
+ };
+ unsigned long addr;
+} mpx_addr;
+
void do_mpx_bt_fault(struct xsave_struct *xsave_buf);

#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index fdedd38..5962413 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -943,6 +943,22 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
extern int get_tsc_mode(unsigned long adr);
extern int set_tsc_mode(unsigned int val);

+#ifdef CONFIG_HAVE_INTEL_MPX
+
+/* Init/release a process' MPX related resource */
+#define MPX_INIT(tsk) mpx_init((tsk))
+#define MPX_RELEASE(tsk) mpx_release((tsk))
+
+extern int mpx_init(struct task_struct *tsk);
+extern int mpx_release(struct task_struct *tsk);
+
+#else /* CONFIG_HAVE_INTEL_MPX */
+
+#define MPX_INIT(tsk) (-EINVAL)
+#define MPX_RELEASE(tsk) (-EINVAL)
+
+#endif /* CONFIG_HAVE_INTEL_MPX */
+
extern u16 amd_get_nb_id(int cpu);

static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index e055e0e..9e91178 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -1,5 +1,7 @@
#include <linux/kernel.h>
#include <linux/syscalls.h>
+#include <linux/prctl.h>
+#include <linux/mmu_notifier.h>
#include <asm/processor.h>
#include <asm/mpx.h>
#include <asm/mman.h>
@@ -7,6 +9,88 @@
#include <asm/fpu-internal.h>
#include <asm/alternative.h>

+static struct mmu_notifier mpx_mn;
+
+static void mpx_invl_range_end(struct mmu_notifier *mn,
+ struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+ struct xsave_struct *xsave_buf;
+ unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
+ unsigned long bt_addr;
+ unsigned long bd_base;
+ unsigned long bd_entry, bde_start, bde_end;
+ mpx_addr lap;
+
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ /* ignore swap notifications */
+ pgd = pgd_offset(mm, start);
+ pud = pud_offset(pgd, start);
+ pmd = pmd_offset(pud, start);
+ pte = pte_offset_kernel(pmd, start);
+ if (!pte_present(*pte) && !pte_none(*pte) && !pte_file(*pte))
+ return;
+
+ /* get bound directory base address */
+ fpu_xsave(&current->thread.fpu);
+ xsave_buf = &(current->thread.fpu.state->xsave);
+ bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+
+ /* get related bde range */
+ lap.addr = start;
+ bde_start = bd_base + (lap.l1index << MPX_L1_SHIFT);
+
+ lap.addr = end;
+ if (lap.ignored || lap.l2index)
+ bde_end = bd_base + (lap.l1index<<MPX_L1_SHIFT) + sizeof(long);
+ else
+ bde_end = bd_base + (lap.l1index<<MPX_L1_SHIFT);
+
+ for (bd_entry = bde_start; bd_entry < bde_end;
+ bd_entry += sizeof(unsigned long)) {
+
+ if (get_user(bt_addr, (long __user *)bd_entry))
+ return;
+
+ if (bt_addr & 0x1) {
+ user_atomic_cmpxchg_inatomic(&bt_addr,
+ (long __user *)bd_entry, bt_addr, 0);
+ do_munmap(mm, bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
+ }
+ }
+}
+
+static struct mmu_notifier_ops mpx_mmuops = {
+ .invalidate_range_end = mpx_invl_range_end,
+};
+
+int mpx_init(struct task_struct *tsk)
+{
+ if (!boot_cpu_has(X86_FEATURE_MPX))
+ return -EINVAL;
+
+ /* register mmu_notifier to delallocate the bound tables */
+ mpx_mn.ops = &mpx_mmuops;
+ mmu_notifier_register(&mpx_mn, current->mm);
+
+ return 0;
+}
+
+int mpx_release(struct task_struct *tsk)
+{
+ if (!boot_cpu_has(X86_FEATURE_MPX))
+ return -EINVAL;
+
+ /* unregister mmu_notifier */
+ mmu_notifier_unregister(&mpx_mn, current->mm);
+
+ return 0;
+}
+
static bool allocate_bt(unsigned long bd_entry)
{
unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 289760f..19ab881 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -149,4 +149,10 @@

#define PR_GET_TID_ADDRESS 40

+/*
+ * Init/release MPX related resource.
+ */
+#define PR_MPX_INIT 41
+#define PR_MPX_RELEASE 42
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index c723113..0334d03 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -91,6 +91,12 @@
#ifndef SET_TSC_CTL
# define SET_TSC_CTL(a) (-EINVAL)
#endif
+#ifndef MPX_INIT
+# define MPX_INIT(a) (-EINVAL)
+#endif
+#ifndef MPX_RELEASE
+# define MPX_RELEASE(a) (-EINVAL)
+#endif

/*
* this is where the system-wide overflow UID and GID are defined, for
@@ -1998,6 +2004,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
if (arg2 || arg3 || arg4 || arg5)
return -EINVAL;
return current->no_new_privs ? 1 : 0;
+ case PR_MPX_INIT:
+ error = MPX_INIT(me);
+ break;
+ case PR_MPX_RELEASE:
+ error = MPX_RELEASE(me);
+ break;
default:
error = -EINVAL;
break;
--
1.7.1

2014-01-26 03:07:20

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX

On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
> This patch adds the Documentation/x86/intel_mpx.txt file with some
> information about Intel MPX.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> Documentation/x86/intel_mpx.txt | 226 +++++++++++++++++++++++++++++++++++++++
> 1 files changed, 226 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/x86/intel_mpx.txt
>
> diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
> new file mode 100644
> index 0000000..052001c
> --- /dev/null
> +++ b/Documentation/x86/intel_mpx.txt
> @@ -0,0 +1,226 @@
> +1. Intel(R) MPX Overview
> +========================
> +
> +
> +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
> +capability introduced into Intel Architecture. Intel MPX provides
> +hardware features that can be used in conjunction with compiler
> +changes to check memory references, for those references whose
> +compile-time normal intentions are usurped at runtime due to
> +buffer overflow or underflow.
> +
> +Two of the most important goals of Intel MPX are to provide
> +this capability at very low performance overhead for newly
> +compiled code, and to provide compatibility mechanisms with
> +legacy software components. MPX architecture is designed

designed to

> +allow a machine (i.e., the processor(s) and the OS software)
> +to run both MPX enabled software and legacy software that
> +is MPX unaware. In such a case, the legacy software does not
> +benefit from MPX, but it also does not experience any change
> +in functionality or reduction in performance.
> +
> +Intel(R) MPX Programming Model
> +------------------------------
> +
> +Intel MPX introduces new registers and new instructions that
> +operate on these registers. Some of the registers added are
> +bounds registers which store a pointer's lower bound and upper
> +bound limits. Whenever the pointer is used, the requested
> +reference is checked against the pointer's associated bounds,
> +thereby preventing out-of-bound memory access (such as buffer
> +overflows and overruns). Out-of-bounds memory references
> +initiate a #BR exception which can then be handled in an
> +appropriate manner.
> +
> +Loading and Storing Bounds using Translation
> +--------------------------------------------
> +
> +Intel MPX defines two instructions for load/store of the linear
> +address of a pointer to a buffer, along with the bounds of the
> +buffer into a paging structure of extended bounds. Specifically
> +when storing extended bounds, the processor will perform address
> +translation of the address where the pointer is stored to an
> +address in the Bound Table (BT) to determine the store location
> +of extended bounds. Loading of an extended bounds performs the
> +reverse sequence.
> +
> +The structure in memory to load/store an extended bound is a
> +4-tuple consisting of lower bound, upper bound, pointer value
> +and a reserved field. Bound loads and stores access 32-bit or
> +64-bit operand size according to the operation mode. Thus,
> +a bound table entry is 4*32 bits in 32-bit mode and 4*64 bits
> +in 64-bit mode.
> +
> +The linear address of a bound table is stored in a Bound
> +Directory (BD) entry. And the linear address of the bound

The linear address

> +directory is derived from either BNDCFGU or BNDCFGS registers.
> +Bounds in memory are stored in Bound Tables (BT) as an extended
> +bound, which are accessed via Bound Directory (BD) and address
> +translation performed by BNDLDX/BNDSTX instructions.
> +
> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
> +application memory and are allocated by the application (in case
> +of kernel use, the structures will be in kernel memory). The
> +bound directory and each instance of bound table are in contiguous
> +linear memory.
> +
> +XSAVE/XRESTOR Support of Intel MPX State
> +----------------------------------------
> +
> +Enabling Intel MPX requires an OS to manage two bits in XCR0:
> + - BNDREGS for saving and restoring registers BND0-BND3,
> + - BNDCSR for saving and restoring the user-mode configuration
> +(BNDCFGU) and the status register (BNDSTATUS).
> +
> +The reason for having two separate bits is that BND0-BND3 is

are

> +likely to be volatile state, while BNDCFGU and BNDSTATUS are not.
> +Therefore, an OS has flexibility in handling these two states
> +differently in saving or restoring them.
> +
> +For details about the Intel MPX instructions, see "Intel(R)
> +Architecture Instruction Set Extensions Programming Reference".
> +
> +
> +2. How to get the advantage of MPX

drop trailing space above.

> +==================================
> +
> +
> +To get the advantage of MPX, changes are required in
> +the OS kernel, binutils, compiler, system libraries support.

and system libraries support.

> +
> +MPX support in the GNU toolchain
> +--------------------------------
> +
> +This section describes changes in GNU Binutils, GCC and Glibc
> +to support MPX.
> +
> +The first step of MPX support is to implement support for new
> +hardware features in binutils and the GCC.
> +
> +The second step is implementation of MPX instrumentation pass
> +in the GCC compiler which is responsible for instrumenting all
> +memory accesses with pointer checks. Compiler changes for runtime
> +bound checks include:
> +
> + * Bounds creation for statically allocated objects, objects
> + allocated on the stack and statically initialized pointers.
> +
> + * MPX support in ABI: ABI extension allows passing bounds for
> + the pointers passed as function arguments and provide returned

provides

> + bounds with the pointers.
> +
> + * Bounds table content management: each pointer is stored into

that is stored into

> + the memory should have its bounds stored in the corresponding

drop "the":
memory should

> + row of the bounds table; compiler generates appropriate code
> + to have the bounds table in the consistent state.
> +
> + * Memory accesses instrumentation: compiler analyzes data flow
> + to compute bounds corresponding to each memory access and
> + inserts code to check used address against computed bounds.
> +
> +Dynamically created objects in heap using memory allocators need
> +to set bounds for objects (buffers) at allocation time. So the
> +next step is to add MPX support into standard memory allocators
> +in Glibc.
> +
> +To have the full protection, an application has to use libraries

drop "the" ^^^

> +compiled with MPX instrumentation. It means we had to compile
> +Glibc with the MPX enabled GCC compiler because it is used in

MPX-enabled

> +most applications. Also we had to add MPX instrumentation to all
> +the necessary Glibc routines (e.g. memcpy) written on assembler.

in

> +
> +New GCC option -fmpx is introduced to utilize MPX instructions.

A new GCC option

> +Also binutils with MPX enabled should be used to get binaries
> +with memory protection.
> +
> +Consider following simple test for MPX compiled program:

Consider the following

> +
> + int main(int argc, char* argv)
> + {
> + int buf[100];
> + return buf[argc];
> + }
> +
> +Snippet of the original assembler output (compiled with -O2):
> +
> + movslq %edi, %rdi
> + movl -120(%rsp,%rdi,4), %eax // memory access buf[argc]
> +
> +Compile test as follows: mpx-gcc/gcc test.c -fmpx -O2.

drop ending "."

> +
> +Resulted assembler snippet:
> +
> + movl $399, %edx // load array length to edx

Array length (in bytes) is 400, not 399, so this is more like
the upper bound of the buffer (array).

> + movslq %edi, %rdi // rdi contains value of argc
> + leaq -104(%rsp), %rax // load start address of buf to rax
> + bndmk (%rax,%rdx), %bnd0 // create bounds for buf
> + bndcl (%rax,%rdi,4), %bnd0 // check that memory access doesn't
> + // violate buf's low bound
> + bndcu 3(%rax,%rdi,4), %bnd0 // check that memory access doesn't
> + // violate buf's upper bound
> + movl -104(%rsp,%rdi,4), %eax // original memory access
> +
> +Code looks pretty clear. Note only that we added displacement 3 for
> +upper bound checking since we have 4 byte (integer) access here.
> +
> +Several MPX specific compiler options besides -fmpx were introduced

MPX-specific

> +in the compiler. Most of them, like -fmpx-check-read and
> +-fmpx-check-write, control number of inserted runtime bound checks.
> +Also developers always can use intrinsics to insert MPX instructions
> +manually.
> +
> +Currently GCC compiler sources with MPX support is available in a
> +separate branch in common GCC SVN repository. See GCC SVN page
> +(http://gcc.gnu.org/svn.html) for details.
> +
> +Currently no hardware with MPX ISA is available but it is always
> +possible to use SDE (Intel(R) software Development Emulator) instead,

Software (?)

> +which can be downloaded from
> +http://software.intel.com/en-us/articles/intel-software-development-emulator
> +
> +MPX runtime support
> +-------------------
> +
> +Enabling an application to use MPX will generally not require source
> +code updates but there is some runtime code needed in order to make
> +use of MPX. For most applications this runtime support will be available
> +by linking to a library supplied by the compiler or possibly it will
> +come directly from the OS once OS versions that support MPX are available.
> +
> +The runtime is responsible for configuring and enabling MPX. The
> +configuration and enabling of MPX consists of the runtime writing
> +the base address of the Bound Directory(BD) to the BNDCFGU register
> +and setting the enable bit.
> +
> +MPX kernel support
> +------------------
> +
> +MPX kernel code has mainly the following responsibilities.
> +
> +1) Providing handlers for bounds faults (#BR).
> +
> +When MPX is enabled, there are 2 new situations that can generate
> +#BR faults. If a bounds overflow occurs then a #BR is generated.
> +The fault handler will decode MPX instructions to get violation
> +address and set this address into extended struct siginfo.
> +
> +The other case that generates a #BR is when a BNDSTX instruction
> +attempts to save bounds to a BD entry marked as invalid. This is
> +an indication that no BT exists for this entry. In this case the
> +fault handler will allocate a new BT.
> +
> +2) Managing bounds memory.
> +
> +MPX defines 4 sets of bound registers. When an application needs
> +more than 4 sets of bounds it uses the BNDSTX instruction to save
> +the additional bounds out to memory. The kernel dynamically allocates
> +the memory used to store these bounds. The bounds memory is organized
> +into a 2 level structure consisting of a BD which contains pointers

2-level

> +to a set of Bound Tables (BT) which contain the actual bound information

information.

> +In order to minimize the Intel MPX memory usage the BTs are allocated
> +on demand by the Intel MPX runtime.
> +
> +Also the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands are added
> +to init and release MPX related resource. A MMU notifier will be

initialize resources. An MMU


> +registered during PR_MPX_INIT command execution. So the bound tables
> +can be automatically deallocated when one memory area is unmapped.
>


--
~Randy

2014-01-26 03:20:43

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX

On 01/26/2014 11:06 AM, Randy Dunlap wrote:
> On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
>> This patch adds the Documentation/x86/intel_mpx.txt file with some
>> information about Intel MPX.
>>
>> Signed-off-by: Qiaowei Ren <[email protected]>
>> ---
>> Documentation/x86/intel_mpx.txt | 226 +++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 226 insertions(+), 0 deletions(-)
>> create mode 100644 Documentation/x86/intel_mpx.txt
>>
>> diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
>> new file mode 100644
>> index 0000000..052001c
>> --- /dev/null
>> +++ b/Documentation/x86/intel_mpx.txt
>> @@ -0,0 +1,226 @@
>> +1. Intel(R) MPX Overview
>> +========================
>> +
>> +
>> +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
>> +capability introduced into Intel Architecture. Intel MPX provides
>> +hardware features that can be used in conjunction with compiler
>> +changes to check memory references, for those references whose
>> +compile-time normal intentions are usurped at runtime due to
>> +buffer overflow or underflow.
>> +
>> +Two of the most important goals of Intel MPX are to provide
>> +this capability at very low performance overhead for newly
>> +compiled code, and to provide compatibility mechanisms with
>> +legacy software components. MPX architecture is designed
>
> designed to
>
>> +allow a machine (i.e., the processor(s) and the OS software)
>> +to run both MPX enabled software and legacy software that
>> +is MPX unaware. In such a case, the legacy software does not
>> +benefit from MPX, but it also does not experience any change
>> +in functionality or reduction in performance.
>> +
>> +Intel(R) MPX Programming Model
>> +------------------------------
>> +
>> +Intel MPX introduces new registers and new instructions that
>> +operate on these registers. Some of the registers added are
>> +bounds registers which store a pointer's lower bound and upper
>> +bound limits. Whenever the pointer is used, the requested
>> +reference is checked against the pointer's associated bounds,
>> +thereby preventing out-of-bound memory access (such as buffer
>> +overflows and overruns). Out-of-bounds memory references
>> +initiate a #BR exception which can then be handled in an
>> +appropriate manner.
>> +
>> +Loading and Storing Bounds using Translation
>> +--------------------------------------------
>> +
>> +Intel MPX defines two instructions for load/store of the linear
>> +address of a pointer to a buffer, along with the bounds of the
>> +buffer into a paging structure of extended bounds. Specifically
>> +when storing extended bounds, the processor will perform address
>> +translation of the address where the pointer is stored to an
>> +address in the Bound Table (BT) to determine the store location
>> +of extended bounds. Loading of an extended bounds performs the
>> +reverse sequence.
>> +
>> +The structure in memory to load/store an extended bound is a
>> +4-tuple consisting of lower bound, upper bound, pointer value
>> +and a reserved field. Bound loads and stores access 32-bit or
>> +64-bit operand size according to the operation mode. Thus,
>> +a bound table entry is 4*32 bits in 32-bit mode and 4*64 bits
>> +in 64-bit mode.
>> +
>> +The linear address of a bound table is stored in a Bound
>> +Directory (BD) entry. And the linear address of the bound
>
> The linear address
>
>> +directory is derived from either BNDCFGU or BNDCFGS registers.
>> +Bounds in memory are stored in Bound Tables (BT) as an extended
>> +bound, which are accessed via Bound Directory (BD) and address
>> +translation performed by BNDLDX/BNDSTX instructions.
>> +
>> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
>> +application memory and are allocated by the application (in case
>> +of kernel use, the structures will be in kernel memory). The
>> +bound directory and each instance of bound table are in contiguous
>> +linear memory.
>> +
>> +XSAVE/XRESTOR Support of Intel MPX State
>> +----------------------------------------
>> +
>> +Enabling Intel MPX requires an OS to manage two bits in XCR0:
>> + - BNDREGS for saving and restoring registers BND0-BND3,
>> + - BNDCSR for saving and restoring the user-mode configuration
>> +(BNDCFGU) and the status register (BNDSTATUS).
>> +
>> +The reason for having two separate bits is that BND0-BND3 is
>
> are
>
>> +likely to be volatile state, while BNDCFGU and BNDSTATUS are not.
>> +Therefore, an OS has flexibility in handling these two states
>> +differently in saving or restoring them.
>> +
>> +For details about the Intel MPX instructions, see "Intel(R)
>> +Architecture Instruction Set Extensions Programming Reference".
>> +
>> +
>> +2. How to get the advantage of MPX
>
> drop trailing space above.
>
>> +==================================
>> +
>> +
>> +To get the advantage of MPX, changes are required in
>> +the OS kernel, binutils, compiler, system libraries support.
>
> and system libraries support.
>
>> +
>> +MPX support in the GNU toolchain
>> +--------------------------------
>> +
>> +This section describes changes in GNU Binutils, GCC and Glibc
>> +to support MPX.
>> +
>> +The first step of MPX support is to implement support for new
>> +hardware features in binutils and the GCC.
>> +
>> +The second step is implementation of MPX instrumentation pass
>> +in the GCC compiler which is responsible for instrumenting all
>> +memory accesses with pointer checks. Compiler changes for runtime
>> +bound checks include:
>> +
>> + * Bounds creation for statically allocated objects, objects
>> + allocated on the stack and statically initialized pointers.
>> +
>> + * MPX support in ABI: ABI extension allows passing bounds for
>> + the pointers passed as function arguments and provide returned
>
> provides
>
>> + bounds with the pointers.
>> +
>> + * Bounds table content management: each pointer is stored into
>
> that is stored into
>
>> + the memory should have its bounds stored in the corresponding
>
> drop "the":
> memory should
>
>
Ok. Thanks for your review.

Thanks,
Qiaowei

2014-01-26 04:23:07

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On Sun, 26 Jan 2014, Qiaowei Ren wrote:

> This patch adds new fields about bound violation into siginfo
> structure. si_lower and si_upper are respectively lower bound
> and upper bound when bound violation is caused.
>
> These fields will be set in #BR exception handler by decoding
> the user instruction and constructing the faulting pointer.
> A userspace application can get violation address, lower bound
> and upper bound for bound violation from this new siginfo structure.
>
> Signed-off-by: Qiaowei Ren <[email protected]>

Same 32-bit warnings I reported for v2:

arch/x86/kernel/mpx.c: In function ‘do_mpx_bounds’:
arch/x86/kernel/mpx.c:407:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
arch/x86/kernel/mpx.c:409:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

and the documentation says you explicitly want to support this config.

These types of warnings are usually indicative of real problems when
you're storing upper and lower bits in 32-bit fields after casting them
from 64-bit values.

I'm also not sure if the added fields to the generic struct siginfo can be
justified for this.

2014-01-26 04:45:33

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On 01/26/2014 12:22 PM, David Rientjes wrote:
> On Sun, 26 Jan 2014, Qiaowei Ren wrote:
>
>> This patch adds new fields about bound violation into siginfo
>> structure. si_lower and si_upper are respectively lower bound
>> and upper bound when bound violation is caused.
>>
>> These fields will be set in #BR exception handler by decoding
>> the user instruction and constructing the faulting pointer.
>> A userspace application can get violation address, lower bound
>> and upper bound for bound violation from this new siginfo structure.
>>
>> Signed-off-by: Qiaowei Ren <[email protected]>
>
> Same 32-bit warnings I reported for v2:
>
> arch/x86/kernel/mpx.c: In function ‘do_mpx_bounds’:
> arch/x86/kernel/mpx.c:407:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
> arch/x86/kernel/mpx.c:409:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>
> and the documentation says you explicitly want to support this config.
>
> These types of warnings are usually indicative of real problems when
> you're storing upper and lower bits in 32-bit fields after casting them
> from 64-bit values.
>
> I'm also not sure if the added fields to the generic struct siginfo can be
> justified for this.
>
According to MPX spec, for 32-bit case, the upper 32-bits of 64-bits
bound register are ignored, and so casting to pointer from 64-bit values
should be not produce any problems.

Thanks,
Qiaowei

2014-01-26 08:19:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Intel MPX support


* Qiaowei Ren <[email protected]> wrote:

> This patchset adds support for the Memory Protection Extensions
> (MPX) feature found in future Intel processors.
>
> MPX can be used in conjunction with compiler changes to check memory
> references, for those references whose compile-time normal intentions
> are usurped at runtime due to buffer overflow or underflow.
>
> MPX provides this capability at very low performance overhead for
> newly compiled code, and provides compatibility mechanisms with legacy
> software components. MPX architecture is designed allow a machine to
> run both MPX enabled software and legacy software that is MPX unaware.
> In such a case, the legacy software does not benefit from MPX, but it
> also does not experience any change in functionality or reduction in
> performance.
>
> More information about Intel MPX can be found in "Intel(R) Architecture
> Instruction Set Extensions Programming Reference".
>
> To get the advantage of MPX, changes are required in the OS kernel,
> binutils, compiler, system libraries support.
>
> New GCC option -fmpx is introduced to utilize MPX instructions.
> Currently GCC compiler sources with MPX support is available in a
> separate branch in common GCC SVN repository. See GCC SVN page
> (http://gcc.gnu.org/svn.html) for details.
>
> To have the full protection, we had to add MPX instrumentation to all
> the necessary Glibc routines (e.g. memcpy) written on assembler, and
> compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
> Glibc source can be found in Glibc git repository.
>
> Enabling an application to use MPX will generally not require source
> code updates but there is some runtime code, which is responsible for
> configuring and enabling MPX, needed in order to make use of MPX.
> For most applications this runtime support will be available by linking
> to a library supplied by the compiler or possibly it will come directly
> from the OS once OS versions that support MPX are available.
>
> MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
> provide handlers for bounds faults (#BR), and manage bounds memory.

AFAICS the kernel side implementation causes no runtime overhead for
non-MPX workloads, and also causes no runtime overhead for non-MPX
hardware, right?

> Currently no hardware with MPX ISA is available but it is always
> possible to use SDE (Intel(R) software Development Emulator) instead,
> which can be downloaded from
> http://software.intel.com/en-us/articles/intel-software-development-emulator
>
>
> Changes since v1:
> * check to see if #BR occurred in userspace or kernel space.
> * use generic structure and macro as much as possible when
> decode mpx instructions.
>
> Changes since v2:
> * fix some compile warnings.
> * update documentation.
>
> Qiaowei Ren (4):
> x86, mpx: add documentation on Intel MPX
> x86, mpx: hook #BR exception handler to allocate bound tables
> x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
> x86, mpx: extend siginfo structure to include bound violation
> information
>
> Documentation/x86/intel_mpx.txt | 226 ++++++++++++++++++++
> arch/x86/Kconfig | 4 +
> arch/x86/include/asm/mpx.h | 63 ++++++
> arch/x86/include/asm/processor.h | 16 ++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/mpx.c | 415 ++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/traps.c | 61 +++++-
> include/uapi/asm-generic/siginfo.h | 9 +-
> include/uapi/linux/prctl.h | 6 +
> kernel/signal.c | 4 +
> kernel/sys.c | 12 +
> 11 files changed, 815 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/x86/intel_mpx.txt
> create mode 100644 arch/x86/include/asm/mpx.h
> create mode 100644 arch/x86/kernel/mpx.c

Ok, this summary was pretty good - it addresses my prior objections
wrt. submission quality. Once the details are fleshed out this sure
looks like a useful feature.

Thanks,

Ingo

2014-01-26 08:22:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE


* Qiaowei Ren <[email protected]> wrote:

> This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl()
> commands on the x86 platform. These commands can be used to
> init and release MPX related resource.
>
> A MMU notifier will be registered during PR_MPX_INIT
> command execution. So the bound tables can be automatically
> deallocated when one memory area is unmapped.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/Kconfig | 4 ++
> arch/x86/include/asm/mpx.h | 9 ++++
> arch/x86/include/asm/processor.h | 16 +++++++
> arch/x86/kernel/mpx.c | 84 ++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/prctl.h | 6 +++
> kernel/sys.c | 12 +++++
> 6 files changed, 131 insertions(+), 0 deletions(-)

Hm. What is the expected typical frequency of these syscalls for a
larger application like a web browser? Only once per startup
typically, or will they be called more frequently?

Thanks,

Ingo

2014-01-26 08:26:12

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Intel MPX support

On 01/26/2014 04:19 PM, Ingo Molnar wrote:
>
> * Qiaowei Ren <[email protected]> wrote:
>
>> This patchset adds support for the Memory Protection Extensions
>> (MPX) feature found in future Intel processors.
>>
>> MPX can be used in conjunction with compiler changes to check memory
>> references, for those references whose compile-time normal intentions
>> are usurped at runtime due to buffer overflow or underflow.
>>
>> MPX provides this capability at very low performance overhead for
>> newly compiled code, and provides compatibility mechanisms with legacy
>> software components. MPX architecture is designed allow a machine to
>> run both MPX enabled software and legacy software that is MPX unaware.
>> In such a case, the legacy software does not benefit from MPX, but it
>> also does not experience any change in functionality or reduction in
>> performance.
>>
>> More information about Intel MPX can be found in "Intel(R) Architecture
>> Instruction Set Extensions Programming Reference".
>>
>> To get the advantage of MPX, changes are required in the OS kernel,
>> binutils, compiler, system libraries support.
>>
>> New GCC option -fmpx is introduced to utilize MPX instructions.
>> Currently GCC compiler sources with MPX support is available in a
>> separate branch in common GCC SVN repository. See GCC SVN page
>> (http://gcc.gnu.org/svn.html) for details.
>>
>> To have the full protection, we had to add MPX instrumentation to all
>> the necessary Glibc routines (e.g. memcpy) written on assembler, and
>> compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
>> Glibc source can be found in Glibc git repository.
>>
>> Enabling an application to use MPX will generally not require source
>> code updates but there is some runtime code, which is responsible for
>> configuring and enabling MPX, needed in order to make use of MPX.
>> For most applications this runtime support will be available by linking
>> to a library supplied by the compiler or possibly it will come directly
>> from the OS once OS versions that support MPX are available.
>>
>> MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
>> provide handlers for bounds faults (#BR), and manage bounds memory.
>
> AFAICS the kernel side implementation causes no runtime overhead for
> non-MPX workloads, and also causes no runtime overhead for non-MPX
> hardware, right?
>
Yes.

>> Currently no hardware with MPX ISA is available but it is always
>> possible to use SDE (Intel(R) software Development Emulator) instead,
>> which can be downloaded from
>> http://software.intel.com/en-us/articles/intel-software-development-emulator
>>
>>
>> Changes since v1:
>> * check to see if #BR occurred in userspace or kernel space.
>> * use generic structure and macro as much as possible when
>> decode mpx instructions.
>>
>> Changes since v2:
>> * fix some compile warnings.
>> * update documentation.
>>
>> Qiaowei Ren (4):
>> x86, mpx: add documentation on Intel MPX
>> x86, mpx: hook #BR exception handler to allocate bound tables
>> x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
>> x86, mpx: extend siginfo structure to include bound violation
>> information
>>
>> Documentation/x86/intel_mpx.txt | 226 ++++++++++++++++++++
>> arch/x86/Kconfig | 4 +
>> arch/x86/include/asm/mpx.h | 63 ++++++
>> arch/x86/include/asm/processor.h | 16 ++
>> arch/x86/kernel/Makefile | 1 +
>> arch/x86/kernel/mpx.c | 415 ++++++++++++++++++++++++++++++++++++
>> arch/x86/kernel/traps.c | 61 +++++-
>> include/uapi/asm-generic/siginfo.h | 9 +-
>> include/uapi/linux/prctl.h | 6 +
>> kernel/signal.c | 4 +
>> kernel/sys.c | 12 +
>> 11 files changed, 815 insertions(+), 2 deletions(-)
>> create mode 100644 Documentation/x86/intel_mpx.txt
>> create mode 100644 arch/x86/include/asm/mpx.h
>> create mode 100644 arch/x86/kernel/mpx.c
>
> Ok, this summary was pretty good - it addresses my prior objections
> wrt. submission quality. Once the details are fleshed out this sure
> looks like a useful feature.
>
Thanks. I apologize for previous submission.

Thanks,
Qiaowei

2014-01-26 08:29:06

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/26/2014 04:22 PM, Ingo Molnar wrote:
>
> * Qiaowei Ren <[email protected]> wrote:
>
>> This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl()
>> commands on the x86 platform. These commands can be used to
>> init and release MPX related resource.
>>
>> A MMU notifier will be registered during PR_MPX_INIT
>> command execution. So the bound tables can be automatically
>> deallocated when one memory area is unmapped.
>>
>> Signed-off-by: Qiaowei Ren <[email protected]>
>> ---
>> arch/x86/Kconfig | 4 ++
>> arch/x86/include/asm/mpx.h | 9 ++++
>> arch/x86/include/asm/processor.h | 16 +++++++
>> arch/x86/kernel/mpx.c | 84 ++++++++++++++++++++++++++++++++++++++
>> include/uapi/linux/prctl.h | 6 +++
>> kernel/sys.c | 12 +++++
>> 6 files changed, 131 insertions(+), 0 deletions(-)
>
> Hm. What is the expected typical frequency of these syscalls for a
> larger application like a web browser? Only once per startup
> typically, or will they be called more frequently?
>

It will be only once per startup.

Thanks,
Qiaowei

2014-01-26 08:39:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE


* Ren Qiaowei <[email protected]> wrote:

> On 01/26/2014 04:22 PM, Ingo Molnar wrote:
> >
> >* Qiaowei Ren <[email protected]> wrote:
> >
> >>This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl()
> >>commands on the x86 platform. These commands can be used to
> >>init and release MPX related resource.
> >>
> >>A MMU notifier will be registered during PR_MPX_INIT
> >>command execution. So the bound tables can be automatically
> >>deallocated when one memory area is unmapped.
> >>
> >>Signed-off-by: Qiaowei Ren <[email protected]>
> >>---
> >> arch/x86/Kconfig | 4 ++
> >> arch/x86/include/asm/mpx.h | 9 ++++
> >> arch/x86/include/asm/processor.h | 16 +++++++
> >> arch/x86/kernel/mpx.c | 84 ++++++++++++++++++++++++++++++++++++++
> >> include/uapi/linux/prctl.h | 6 +++
> >> kernel/sys.c | 12 +++++
> >> 6 files changed, 131 insertions(+), 0 deletions(-)
> >
> > Hm. What is the expected typical frequency of these syscalls for a
> > larger application like a web browser? Only once per startup
> > typically, or will they be called more frequently?
>
> It will be only once per startup.

In that case it would be more efficient to make this part of the
binary execution environment so that exec() sets it up automatically,
not a separate prctl() syscall.

Thanks,

Ingo

2014-01-26 09:08:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE


* Qiaowei Ren <[email protected]> wrote:

> @@ -7,6 +9,88 @@
> #include <asm/fpu-internal.h>
> #include <asm/alternative.h>
>
> +static struct mmu_notifier mpx_mn;

> +static struct mmu_notifier_ops mpx_mmuops = {
> + .invalidate_range_end = mpx_invl_range_end,
> +};
> +
> +int mpx_init(struct task_struct *tsk)
> +{
> + if (!boot_cpu_has(X86_FEATURE_MPX))
> + return -EINVAL;
> +
> + /* register mmu_notifier to delallocate the bound tables */
> + mpx_mn.ops = &mpx_mmuops;
> + mmu_notifier_register(&mpx_mn, current->mm);

0)

I do think MPX should be supported by Linux, but this is the best
thing I can say about the code at the moment:

1)

The above MMU notifier bit is absolutely disgusting: it adds an
O(nr_mpx_tasks) overhead to every MMU operation!

MPX needs to be called from architecture methods. (And the MMU
notifier should probably be removed, it literally invites such abuse.)

2)

The whole MPX_INIT() macro wrappery is ugly beyond relief.

3)

The kernel/sys.c bits are x86-only, it probably won't even build on
other architectures.

4)

I also argue that MPX setup should be automatic through the ELF
loader:

- MPX setup could also be initiated through the ELF notes section or
so - similar to the executable bit stack attribute in ELF binaries.
(See PT_GNU_STACK handling in fs/binfmt_elf.c.)

- What is the typical life time of the bounds table? Does user-space
get access to it? I don't see where it's discoverable to
user-space. (for example for debuggers)

5)

Teardown can be done through regular munmap() if the notifier is
eliminated, so that step falls away as well.

6)

How many entries are in the bounds table? Because mpx_invl_range_end()
looks utterly unscalable if it has any size beyond 1-2 entries.

Thanks,

Ingo

2014-01-26 11:37:19

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE


> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Ingo Molnar
> Sent: Sunday, January 26, 2014 4:40 PM
> To: Ren, Qiaowei
> Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; [email protected];
> [email protected]; Peter Zijlstra
> Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT,
> PR_MPX_RELEASE
>
>
> * Ren Qiaowei <[email protected]> wrote:
>
> > On 01/26/2014 04:22 PM, Ingo Molnar wrote:
> > >
> > >* Qiaowei Ren <[email protected]> wrote:
> > >
> > >>This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl() commands
> > >>on the x86 platform. These commands can be used to init and release
> > >>MPX related resource.
> > >>
> > >>A MMU notifier will be registered during PR_MPX_INIT command
> > >>execution. So the bound tables can be automatically deallocated when
> > >>one memory area is unmapped.
> > >>
> > >>Signed-off-by: Qiaowei Ren <[email protected]>
> > >>---
> > >> arch/x86/Kconfig | 4 ++
> > >> arch/x86/include/asm/mpx.h | 9 ++++
> > >> arch/x86/include/asm/processor.h | 16 +++++++
> > >> arch/x86/kernel/mpx.c | 84
> ++++++++++++++++++++++++++++++++++++++
> > >> include/uapi/linux/prctl.h | 6 +++
> > >> kernel/sys.c | 12 +++++
> > >> 6 files changed, 131 insertions(+), 0 deletions(-)
> > >
> > > Hm. What is the expected typical frequency of these syscalls for a
> > > larger application like a web browser? Only once per startup
> > > typically, or will they be called more frequently?
> >
> > It will be only once per startup.
>
> In that case it would be more efficient to make this part of the binary execution
> environment so that exec() sets it up automatically, not a separate prctl()
> syscall.
>
Sorry, I guess what I said is not accurate. Normally it will be only once per startup. But user application maybe only partly want to use MPX, e.g. only one thread of process. For those cases, user application need to enable MPX for specific part of the code. So it is not enough to set it up automatically.

Thanks,
Qiaowei

2014-01-26 12:49:51

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE


> -----Original Message-----
> From: Ingo Molnar [mailto:[email protected]] On Behalf Of Ingo
> Molnar
> Sent: Sunday, January 26, 2014 5:08 PM
> To: Ren, Qiaowei
> Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; [email protected];
> [email protected]; Peter Zijlstra; Linus Torvalds; Andrew Morton
> Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT,
> PR_MPX_RELEASE
>
>
> * Qiaowei Ren <[email protected]> wrote:
>
> > @@ -7,6 +9,88 @@
> > #include <asm/fpu-internal.h>
> > #include <asm/alternative.h>
> >
> > +static struct mmu_notifier mpx_mn;
>
> > +static struct mmu_notifier_ops mpx_mmuops = {
> > + .invalidate_range_end = mpx_invl_range_end, };
> > +
> > +int mpx_init(struct task_struct *tsk) {
> > + if (!boot_cpu_has(X86_FEATURE_MPX))
> > + return -EINVAL;
> > +
> > + /* register mmu_notifier to delallocate the bound tables */
> > + mpx_mn.ops = &mpx_mmuops;
> > + mmu_notifier_register(&mpx_mn, current->mm);
>
> 0)
>
> I do think MPX should be supported by Linux, but this is the best thing I can say
> about the code at the moment:
>
> 1)
>
> The above MMU notifier bit is absolutely disgusting: it adds an
> O(nr_mpx_tasks) overhead to every MMU operation!
>
> MPX needs to be called from architecture methods. (And the MMU notifier
> should probably be removed, it literally invites such abuse.)
>
> 2)
>
> The whole MPX_INIT() macro wrappery is ugly beyond relief.
>
> 3)
>
> The kernel/sys.c bits are x86-only, it probably won't even build on other
> architectures.
>
> 4)
>
> I also argue that MPX setup should be automatic through the ELF
> loader:
>
> - MPX setup could also be initiated through the ELF notes section or
> so - similar to the executable bit stack attribute in ELF binaries.
> (See PT_GNU_STACK handling in fs/binfmt_elf.c.)
>
> - What is the typical life time of the bounds table? Does user-space
> get access to it? I don't see where it's discoverable to
> user-space. (for example for debuggers)
>
> 5)
>
> Teardown can be done through regular munmap() if the notifier is eliminated,
> so that step falls away as well.
>
> 6)
>
> How many entries are in the bounds table? Because mpx_invl_range_end()
> looks utterly unscalable if it has any size beyond 1-2 entries.
>
The size of one bound table is 4M bytes for 64bit, and 16K bytes for 32bit. It can not be accessed by user-space, and it will be accessed automatically by hardware.

When #BR faults come, and the entry of the bound directory is invalid, one bound table have to be allocated to save value of the bounds. It is what the second patch does. As for the lifetime of the bound tables, we can see the following case:
User application use malloc or mmap to dynamically allocate memory. when address in the memory region is first accessed, related entry in the bound directory will be checked and it should be invalid. Then one new bound table will be allocated.
After a time, this memory area is released, and so the bound tables related with this memory area become meaningless and may be released. If we don't release these unnecessary bound tables, it will save the workload of allocation of new bound tables when the memory area related with the entry of the bound directory is allocated and accessed again. But in this way the loss of virtual space will have to be worried.

Anyway, what this patch does is only that when a piece of address space is unmapped, we destroy/unmap the bounds tables that correspond to that same address space. MMU notifier will add some overhead for those tasks which use MPX. But I have no better way to know when one address space is unmapped, and then destroy the related bounds tables. Do you have any suggestion?

Thanks,
Qiaowei

2014-01-26 15:14:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE


* Ren, Qiaowei <[email protected]> wrote:

> The size of one bound table is 4M bytes for 64bit, and 16K bytes for
> 32bit. It can not be accessed by user-space, and it will be accessed
> automatically by hardware.

So, here's the bound-table allocation AFAICS:

+static bool allocate_bt(unsigned long bd_entry)
+{
+ unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
+ unsigned long bt_addr, old_val = 0;
+
+ bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);

What ensures that user-space cannot access (and in particular, modify)
the pages at bt_addr? It's a read-write anonymous mapping AFAICS.

Thanks,

Ingo

2014-01-26 21:38:42

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On Sun, 26 Jan 2014, Ren Qiaowei wrote:

> > arch/x86/kernel/mpx.c: In function ‘do_mpx_bounds’:
> > arch/x86/kernel/mpx.c:407:3: warning: cast to pointer from integer of
> > different size [-Wint-to-pointer-cast]
> > arch/x86/kernel/mpx.c:409:3: warning: cast to pointer from integer of
> > different size [-Wint-to-pointer-cast]
> >
> > and the documentation says you explicitly want to support this config.
> >
> > These types of warnings are usually indicative of real problems when
> > you're storing upper and lower bits in 32-bit fields after casting them
> > from 64-bit values.
> >
> > I'm also not sure if the added fields to the generic struct siginfo can be
> > justified for this.
> >
> According to MPX spec, for 32-bit case, the upper 32-bits of 64-bits bound
> register are ignored, and so casting to pointer from 64-bit values should be
> not produce any problems.
>

Ok, so this is intended per the spec which nobody reading the code is
going to know and people who report the compile warnings are going to
continue to question it.

How are you planning on suppressing the warnings? It will probably
require either

- separate 64-bit and 32-bit helper functions to do_mpx_bounds() to
do appropriate casts before casting to a pointer, or

- a macro defined as a no-op for 64-bit and as a cast to 32-bit value
for 32-bit configs that will be used in do_mpx_bounds() and casted
to the pointer.

2014-01-27 01:40:23

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On 01/27/2014 05:38 AM, David Rientjes wrote:
> On Sun, 26 Jan 2014, Ren Qiaowei wrote:
>
>>> arch/x86/kernel/mpx.c: In function ‘do_mpx_bounds’:
>>> arch/x86/kernel/mpx.c:407:3: warning: cast to pointer from integer of
>>> different size [-Wint-to-pointer-cast]
>>> arch/x86/kernel/mpx.c:409:3: warning: cast to pointer from integer of
>>> different size [-Wint-to-pointer-cast]
>>>
>>> and the documentation says you explicitly want to support this config.
>>>
>>> These types of warnings are usually indicative of real problems when
>>> you're storing upper and lower bits in 32-bit fields after casting them
>>> from 64-bit values.
>>>
>>> I'm also not sure if the added fields to the generic struct siginfo can be
>>> justified for this.
>>>
>> According to MPX spec, for 32-bit case, the upper 32-bits of 64-bits bound
>> register are ignored, and so casting to pointer from 64-bit values should be
>> not produce any problems.
>>
>
> Ok, so this is intended per the spec which nobody reading the code is
> going to know and people who report the compile warnings are going to
> continue to question it.
>
> How are you planning on suppressing the warnings? It will probably
> require either
>
> - separate 64-bit and 32-bit helper functions to do_mpx_bounds() to
> do appropriate casts before casting to a pointer, or
>
> - a macro defined as a no-op for 64-bit and as a cast to 32-bit value
> for 32-bit configs that will be used in do_mpx_bounds() and casted
> to the pointer.
>
I agree with you and we should suppress all the warnings as possible. If
I use (unsgined long) to cast like the following code, what do you think
about it? sizeof(long) will be 4 for 32-bit.

info->si_lower = (void __user *)(unsigned long)
(xsave_buf->bndregs.bndregs[2*bndregno]);
info->si_upper = (void __user *)(unsigned long)
(~xsave_buf->bndregs.bndregs[2*bndregno+1]);

Thanks,
Qiaowei

2014-01-27 01:50:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/26/2014 12:39 AM, Ingo Molnar wrote:
>>
>> It will be only once per startup.
>
> In that case it would be more efficient to make this part of the
> binary execution environment so that exec() sets it up automatically,
> not a separate prctl() syscall.
>

This is not necessarily possible, and in particular it might need to be
deferred until the MPX runtime has initialized.

What isn't clear to me is if these syscalls are needed at all, or if it
would be better to just let the MPX runtile set BNDSTATUS and BNDCFGU
directly in userspace. The kernel cannot rely on them staying
consistent across userspace anyway.

Now, it might be beneficial for the kernel to have them anyway. It's a
bit of a tough call.

-hpa

2014-01-27 01:53:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On 01/26/2014 05:34 PM, Ren Qiaowei wrote:
>>
> I agree with you and we should suppress all the warnings as possible. If
> I use (unsgined long) to cast like the following code, what do you think
> about it? sizeof(long) will be 4 for 32-bit.
>
> info->si_lower = (void __user *)(unsigned long)
> (xsave_buf->bndregs.bndregs[2*bndregno]);
> info->si_upper = (void __user *)(unsigned long)
> (~xsave_buf->bndregs.bndregs[2*bndregno+1]);
>

That is the way it is usually done, yes. Add a comment saying something
like:

/* Note: the upper 32 bits are ignored in 32-bit mode. */

It is worth watching out a bit here, though, because you could be
running a 32-bit process on top of a 64-bit kernel.

-hpa

2014-01-27 02:01:27

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/27/2014 09:50 AM, H. Peter Anvin wrote:
> On 01/26/2014 12:39 AM, Ingo Molnar wrote:
>>>
>>> It will be only once per startup.
>>
>> In that case it would be more efficient to make this part of the
>> binary execution environment so that exec() sets it up automatically,
>> not a separate prctl() syscall.
>>
>
> This is not necessarily possible, and in particular it might need to be
> deferred until the MPX runtime has initialized.
>
> What isn't clear to me is if these syscalls are needed at all, or if it
> would be better to just let the MPX runtile set BNDSTATUS and BNDCFGU
> directly in userspace. The kernel cannot rely on them staying
> consistent across userspace anyway.
>
> Now, it might be beneficial for the kernel to have them anyway. It's a
> bit of a tough call.
>
> -hpa
>
Peter, you mean we should remove these two call and do what they do in
user-space, right?

Thanks,
Qiaowei

2014-01-27 02:02:17

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On 01/27/2014 09:53 AM, H. Peter Anvin wrote:
> On 01/26/2014 05:34 PM, Ren Qiaowei wrote:
>>>
>> I agree with you and we should suppress all the warnings as possible. If
>> I use (unsgined long) to cast like the following code, what do you think
>> about it? sizeof(long) will be 4 for 32-bit.
>>
>> info->si_lower = (void __user *)(unsigned long)
>> (xsave_buf->bndregs.bndregs[2*bndregno]);
>> info->si_upper = (void __user *)(unsigned long)
>> (~xsave_buf->bndregs.bndregs[2*bndregno+1]);
>>
>
> That is the way it is usually done, yes. Add a comment saying something
> like:
>
> /* Note: the upper 32 bits are ignored in 32-bit mode. */
>
> It is worth watching out a bit here, though, because you could be
> running a 32-bit process on top of a 64-bit kernel.
>
Ok. I will update it in next version.

Thanks,
Qiaowei

2014-01-27 02:07:01

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/26/2014 11:14 PM, Ingo Molnar wrote:
>
> * Ren, Qiaowei <[email protected]> wrote:
>
>> The size of one bound table is 4M bytes for 64bit, and 16K bytes for
>> 32bit. It can not be accessed by user-space, and it will be accessed
>> automatically by hardware.
>
> So, here's the bound-table allocation AFAICS:
>
> +static bool allocate_bt(unsigned long bd_entry)
> +{
> + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
> + unsigned long bt_addr, old_val = 0;
> +
> + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
>
> What ensures that user-space cannot access (and in particular, modify)
> the pages at bt_addr? It's a read-write anonymous mapping AFAICS.
>
Looks like that we can not be able to ensure this. I just mean that
user-space doesn't know the bound tables, and it should not access them
also.

Thanks,
Qiaowei

2014-01-27 02:10:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/26/2014 05:55 PM, Ren Qiaowei wrote:
>
> Peter, you mean we should remove these two call and do what they do in
> user-space, right?
>

Unless we think there is a benefit to the kernel to have a on/off switch
for the #BR exception (if disabled, all #BR exceptions are signals,
regardless of source.)

There might be, would like other people's opinion.

-hpa

2014-01-27 02:22:28

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/27/2014 10:10 AM, H. Peter Anvin wrote:
> On 01/26/2014 05:55 PM, Ren Qiaowei wrote:
>>
>> Peter, you mean we should remove these two call and do what they do in
>> user-space, right?
>>
>
> Unless we think there is a benefit to the kernel to have a on/off switch
> for the #BR exception (if disabled, all #BR exceptions are signals,
> regardless of source.)
>
> There might be, would like other people's opinion.
>

These two syscalls are only used to release automaticlly the bound
tables, and they will not set BNDSTATUS and BNDCFGU.

I want to remove them also. :) But if so, we have to release the bound
tables in user-space.

Thanks,
Qiaowei

2014-01-27 20:27:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX

On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
> This patch adds the Documentation/x86/intel_mpx.txt file with some
> information about Intel MPX.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> Documentation/x86/intel_mpx.txt | 226 +++++++++++++++++++++++++++++++++++++++
> 1 files changed, 226 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/x86/intel_mpx.txt
>
> diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
> new file mode 100644
> index 0000000..052001c
> --- /dev/null
> +++ b/Documentation/x86/intel_mpx.txt
> @@ -0,0 +1,226 @@
> +1. Intel(R) MPX Overview
> +========================
> +
> +
> +Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
> +capability introduced into Intel Architecture. Intel MPX provides
> +hardware features that can be used in conjunction with compiler
> +changes to check memory references, for those references whose
> +compile-time normal intentions are usurped at runtime due to
> +buffer overflow or underflow.
> +
> +Two of the most important goals of Intel MPX are to provide
> +this capability at very low performance overhead for newly
> +compiled code, and to provide compatibility mechanisms with
> +legacy software components. MPX architecture is designed
> +allow a machine (i.e., the processor(s) and the OS software)
> +to run both MPX enabled software and legacy software that
> +is MPX unaware. In such a case, the legacy software does not
> +benefit from MPX, but it also does not experience any change
> +in functionality or reduction in performance.
> +
> +Intel(R) MPX Programming Model
> +------------------------------
> +
> +Intel MPX introduces new registers and new instructions that
> +operate on these registers. Some of the registers added are
> +bounds registers which store a pointer's lower bound and upper
> +bound limits. Whenever the pointer is used, the requested
> +reference is checked against the pointer's associated bounds,
> +thereby preventing out-of-bound memory access (such as buffer
> +overflows and overruns). Out-of-bounds memory references
> +initiate a #BR exception which can then be handled in an
> +appropriate manner.
> +
> +Loading and Storing Bounds using Translation
> +--------------------------------------------
> +
> +Intel MPX defines two instructions for load/store of the linear
> +address of a pointer to a buffer, along with the bounds of the
> +buffer into a paging structure of extended bounds. Specifically
> +when storing extended bounds, the processor will perform address
> +translation of the address where the pointer is stored to an
> +address in the Bound Table (BT) to determine the store location
> +of extended bounds. Loading of an extended bounds performs the
> +reverse sequence.
> +
> +The structure in memory to load/store an extended bound is a
> +4-tuple consisting of lower bound, upper bound, pointer value
> +and a reserved field. Bound loads and stores access 32-bit or
> +64-bit operand size according to the operation mode. Thus,
> +a bound table entry is 4*32 bits in 32-bit mode and 4*64 bits
> +in 64-bit mode.
> +
> +The linear address of a bound table is stored in a Bound
> +Directory (BD) entry. And the linear address of the bound
> +directory is derived from either BNDCFGU or BNDCFGS registers.
> +Bounds in memory are stored in Bound Tables (BT) as an extended
> +bound, which are accessed via Bound Directory (BD) and address
> +translation performed by BNDLDX/BNDSTX instructions.
> +
> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
> +application memory and are allocated by the application (in case
> +of kernel use, the structures will be in kernel memory). The
> +bound directory and each instance of bound table are in contiguous
> +linear memory.
> +
> +XSAVE/XRESTOR Support of Intel MPX State
> +----------------------------------------
> +
> +Enabling Intel MPX requires an OS to manage two bits in XCR0:
> + - BNDREGS for saving and restoring registers BND0-BND3,
> + - BNDCSR for saving and restoring the user-mode configuration
> +(BNDCFGU) and the status register (BNDSTATUS).
> +
> +The reason for having two separate bits is that BND0-BND3 is
> +likely to be volatile state, while BNDCFGU and BNDSTATUS are not.
> +Therefore, an OS has flexibility in handling these two states
> +differently in saving or restoring them.
> +
> +For details about the Intel MPX instructions, see "Intel(R)
> +Architecture Instruction Set Extensions Programming Reference".
> +
> +
> +2. How to get the advantage of MPX
> +==================================
> +
> +
> +To get the advantage of MPX, changes are required in
> +the OS kernel, binutils, compiler, system libraries support.
> +
> +MPX support in the GNU toolchain
> +--------------------------------
> +
> +This section describes changes in GNU Binutils, GCC and Glibc
> +to support MPX.
> +
> +The first step of MPX support is to implement support for new
> +hardware features in binutils and the GCC.
> +
> +The second step is implementation of MPX instrumentation pass
> +in the GCC compiler which is responsible for instrumenting all
> +memory accesses with pointer checks. Compiler changes for runtime
> +bound checks include:
> +
> + * Bounds creation for statically allocated objects, objects
> + allocated on the stack and statically initialized pointers.
> +
> + * MPX support in ABI: ABI extension allows passing bounds for
> + the pointers passed as function arguments and provide returned
> + bounds with the pointers.
> +
> + * Bounds table content management: each pointer is stored into
> + the memory should have its bounds stored in the corresponding
> + row of the bounds table; compiler generates appropriate code
> + to have the bounds table in the consistent state.
> +
> + * Memory accesses instrumentation: compiler analyzes data flow
> + to compute bounds corresponding to each memory access and
> + inserts code to check used address against computed bounds.
> +
> +Dynamically created objects in heap using memory allocators need
> +to set bounds for objects (buffers) at allocation time. So the
> +next step is to add MPX support into standard memory allocators
> +in Glibc.
> +
> +To have the full protection, an application has to use libraries
> +compiled with MPX instrumentation. It means we had to compile
> +Glibc with the MPX enabled GCC compiler because it is used in
> +most applications. Also we had to add MPX instrumentation to all
> +the necessary Glibc routines (e.g. memcpy) written on assembler.
> +
> +New GCC option -fmpx is introduced to utilize MPX instructions.
> +Also binutils with MPX enabled should be used to get binaries
> +with memory protection.
> +
> +Consider following simple test for MPX compiled program:
> +
> + int main(int argc, char* argv)
> + {
> + int buf[100];
> + return buf[argc];
> + }
> +
> +Snippet of the original assembler output (compiled with -O2):
> +
> + movslq %edi, %rdi
> + movl -120(%rsp,%rdi,4), %eax // memory access buf[argc]
> +
> +Compile test as follows: mpx-gcc/gcc test.c -fmpx -O2.
> +
> +Resulted assembler snippet:
> +
> + movl $399, %edx // load array length to edx
> + movslq %edi, %rdi // rdi contains value of argc
> + leaq -104(%rsp), %rax // load start address of buf to rax
> + bndmk (%rax,%rdx), %bnd0 // create bounds for buf
> + bndcl (%rax,%rdi,4), %bnd0 // check that memory access doesn't
> + // violate buf's low bound
> + bndcu 3(%rax,%rdi,4), %bnd0 // check that memory access doesn't
> + // violate buf's upper bound
> + movl -104(%rsp,%rdi,4), %eax // original memory access
> +
> +Code looks pretty clear. Note only that we added displacement 3 for
> +upper bound checking since we have 4 byte (integer) access here.

I know I'm off topic here, but is using MPX for a trivial case like this
actually faster than just comparing argc to 100 and branching to a fault
handler function on failure?

> +
> +Several MPX specific compiler options besides -fmpx were introduced
> +in the compiler. Most of them, like -fmpx-check-read and
> +-fmpx-check-write, control number of inserted runtime bound checks.
> +Also developers always can use intrinsics to insert MPX instructions
> +manually.
> +
> +Currently GCC compiler sources with MPX support is available in a
> +separate branch in common GCC SVN repository. See GCC SVN page
> +(http://gcc.gnu.org/svn.html) for details.
> +
> +Currently no hardware with MPX ISA is available but it is always
> +possible to use SDE (Intel(R) software Development Emulator) instead,
> +which can be downloaded from
> +http://software.intel.com/en-us/articles/intel-software-development-emulator
> +
> +MPX runtime support
> +-------------------
> +
> +Enabling an application to use MPX will generally not require source
> +code updates but there is some runtime code needed in order to make
> +use of MPX. For most applications this runtime support will be available
> +by linking to a library supplied by the compiler or possibly it will
> +come directly from the OS once OS versions that support MPX are available.
> +
> +The runtime is responsible for configuring and enabling MPX. The
> +configuration and enabling of MPX consists of the runtime writing
> +the base address of the Bound Directory(BD) to the BNDCFGU register
> +and setting the enable bit.
> +
> +MPX kernel support
> +------------------
> +
> +MPX kernel code has mainly the following responsibilities.
> +
> +1) Providing handlers for bounds faults (#BR).
> +
> +When MPX is enabled, there are 2 new situations that can generate
> +#BR faults. If a bounds overflow occurs then a #BR is generated.
> +The fault handler will decode MPX instructions to get violation
> +address and set this address into extended struct siginfo.

Can you document exactly where the insn address and pointer value end
up? (If it's in the (IMO hideous) cr2 field in ucontext, this needs to
be documented. If it's somewhere useful in siginfo, that should also be
documented to save people the time it takes to figure it out.)

2014-01-27 20:36:18

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
> An access to an invalid bound directory entry will cause a #BR
> exception. This patch hook #BR exception handler to allocate
> one bound table and bind it with that buond directory entry.
>
> This will avoid the need of forwarding the #BR exception
> to the user space when bound directory has invalid entry.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/include/asm/mpx.h | 35 ++++++++++++++++++++++++++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/mpx.c | 44 +++++++++++++++++++++++++++++++++++
> arch/x86/kernel/traps.c | 55 +++++++++++++++++++++++++++++++++++++++++++-
> 4 files changed, 134 insertions(+), 1 deletions(-)
> create mode 100644 arch/x86/include/asm/mpx.h
> create mode 100644 arch/x86/kernel/mpx.c
>
> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
> new file mode 100644
> index 0000000..d074153
> --- /dev/null
> +++ b/arch/x86/include/asm/mpx.h
> @@ -0,0 +1,35 @@
> +#ifndef _ASM_X86_MPX_H
> +#define _ASM_X86_MPX_H
> +
> +#include <linux/types.h>
> +#include <asm/ptrace.h>
> +
> +#ifdef CONFIG_X86_64
> +
> +#define MPX_L1_BITS 28
> +#define MPX_L1_SHIFT 3
> +#define MPX_L2_BITS 17
> +#define MPX_L2_SHIFT 5
> +#define MPX_IGN_BITS 3
> +#define MPX_L2_NODE_ADDR_MASK 0xfffffffffffffff8UL
> +
> +#define MPX_BNDSTA_ADDR_MASK 0xfffffffffffffffcUL
> +#define MPX_BNDCFG_ADDR_MASK 0xfffffffffffff000UL
> +
> +#else
> +
> +#define MPX_L1_BITS 20
> +#define MPX_L1_SHIFT 2
> +#define MPX_L2_BITS 10
> +#define MPX_L2_SHIFT 4
> +#define MPX_IGN_BITS 2
> +#define MPX_L2_NODE_ADDR_MASK 0xfffffffcUL
> +
> +#define MPX_BNDSTA_ADDR_MASK 0xfffffffcUL
> +#define MPX_BNDCFG_ADDR_MASK 0xfffff000UL
> +
> +#endif
> +
> +void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
> +
> +#endif /* _ASM_X86_MPX_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index cb648c8..becb970 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -41,6 +41,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o
>
> obj-y += process.o
> obj-y += i387.o xsave.o
> +obj-y += mpx.o
> obj-y += ptrace.o
> obj-$(CONFIG_X86_32) += tls.o
> obj-$(CONFIG_IA32_EMULATION) += tls.o
> diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
> new file mode 100644
> index 0000000..e055e0e
> --- /dev/null
> +++ b/arch/x86/kernel/mpx.c
> @@ -0,0 +1,44 @@
> +#include <linux/kernel.h>
> +#include <linux/syscalls.h>
> +#include <asm/processor.h>
> +#include <asm/mpx.h>
> +#include <asm/mman.h>
> +#include <asm/i387.h>
> +#include <asm/fpu-internal.h>
> +#include <asm/alternative.h>
> +
> +static bool allocate_bt(unsigned long bd_entry)
> +{
> + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
> + unsigned long bt_addr, old_val = 0;
> +
> + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
> + if (bt_addr == -1) {
> + pr_err("L2 Node Allocation Failed at L1 addr %lx\n",
> + bd_entry);
> + return false;
> + }
> + bt_addr = (bt_addr & MPX_L2_NODE_ADDR_MASK) | 0x01;
> +
> + user_atomic_cmpxchg_inatomic(&old_val,
> + (long __user *)bd_entry, 0, bt_addr);
> + if (old_val)
> + vm_munmap(bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
> +
> + return true;
> +}

If anyone ever wants to use hugepages for the L2 tables, will this break
ABI? If so, can we get the ABI right to start with? (For example, this
could allocate 2MB blocks, clear MAP_POPULATE, and keep track of which
pages within the blocks are within use.)

> +
> +void do_mpx_bt_fault(struct xsave_struct *xsave_buf)
> +{
> + unsigned long status;
> + unsigned long bd_entry, bd_base;
> + unsigned long bd_size = 1UL << (MPX_L1_BITS+MPX_L1_SHIFT);
> +
> + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> + status = xsave_buf->bndcsr.status_reg;
> +
> + bd_entry = status & MPX_BNDSTA_ADDR_MASK;
> + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
> + allocate_bt(bd_entry);

What happens if this fails? Retrying forever isn't very nice.

> +}
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 57409f6..6b284a4 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -59,6 +59,7 @@
> #include <asm/fixmap.h>
> #include <asm/mach_traps.h>
> #include <asm/alternative.h>
> +#include <asm/mpx.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -213,7 +214,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
>
> DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip )
> DO_ERROR (X86_TRAP_OF, SIGSEGV, "overflow", overflow )
> -DO_ERROR (X86_TRAP_BR, SIGSEGV, "bounds", bounds )
> DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip )
> DO_ERROR (X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun )
> DO_ERROR (X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS )
> @@ -263,6 +263,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
> }
> #endif
>
> +dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
> +{
> + enum ctx_state prev_state;
> + unsigned long status;
> + struct xsave_struct *xsave_buf;
> + struct task_struct *tsk = current;
> +
> + prev_state = exception_enter();
> + if (notify_die(DIE_TRAP, "bounds", regs, error_code,
> + X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
> + goto exit;
> + conditional_sti(regs);
> +
> + if (!user_mode(regs)) {
> + if (!fixup_exception(regs)) {
> + tsk->thread.error_code = error_code;
> + tsk->thread.trap_nr = X86_TRAP_BR;
> + die("bounds", regs, error_code);
> + }

Why the fixup? Unless I'm missing something, the kernel has no business
getting #BR on access to a user address.

Or are you adding code to allow the kernel to use MPX itself? If so,
shouldn't this use an MPX-specific fixup to allow normal C code to use
this stuff?

> + goto exit;
> + }
> +
> + if (!boot_cpu_has(X86_FEATURE_MPX)) {
> + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
> + goto exit;

This, as well as the status == 0 case, should probably document that the
exception is from BOUND, not MPX.

> + }
> +
> + fpu_xsave(&tsk->thread.fpu);
> + xsave_buf = &(tsk->thread.fpu.state->xsave);
> + status = xsave_buf->bndcsr.status_reg;
> +
> + switch (status & 0x3) {
> + case 2:
> + /*
> + * Bound directory has invalid entry.
> + * No signal will be sent to the user space.
> + */
> + do_mpx_bt_fault(xsave_buf);

If mmap fails, this should presumably generate SIGBUS. (See above).

> + break;
> +
> + case 1: /* Bound violation. */
> + case 0: /* No MPX exception. */
> + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
> + break;

What does "No Intel MPX exception" mean? Surely that has business
sending #BR.

> +
> + default:
> + break;

What does status 3 mean? The docs say "reserved". Presumably this
should log and kill the process.

In general, should this function decode BNDSTATUS and write the output
into siginfo somewhere useful?

--Andy

2014-01-27 20:59:33

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
> This patch adds the PR_MPX_INIT and PR_MPX_RELEASE prctl()
> commands on the x86 platform. These commands can be used to
> init and release MPX related resource.
>
> A MMU notifier will be registered during PR_MPX_INIT
> command execution. So the bound tables can be automatically
> deallocated when one memory area is unmapped.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/Kconfig | 4 ++
> arch/x86/include/asm/mpx.h | 9 ++++
> arch/x86/include/asm/processor.h | 16 +++++++
> arch/x86/kernel/mpx.c | 84 ++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/prctl.h | 6 +++
> kernel/sys.c | 12 +++++
> 6 files changed, 131 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cd18b83..28916e1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -236,6 +236,10 @@ config HAVE_INTEL_TXT
> def_bool y
> depends on INTEL_IOMMU && ACPI
>
> +config HAVE_INTEL_MPX
> + def_bool y
> + select MMU_NOTIFIER
> +
> config X86_32_SMP
> def_bool y
> depends on X86_32 && SMP
> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
> index d074153..9652e9e 100644
> --- a/arch/x86/include/asm/mpx.h
> +++ b/arch/x86/include/asm/mpx.h
> @@ -30,6 +30,15 @@
>
> #endif
>
> +typedef union {
> + struct {
> + unsigned long ignored:MPX_IGN_BITS;
> + unsigned long l2index:MPX_L2_BITS;
> + unsigned long l1index:MPX_L1_BITS;
> + };
> + unsigned long addr;
> +} mpx_addr;
> +
> void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
>
> #endif /* _ASM_X86_MPX_H */
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index fdedd38..5962413 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -943,6 +943,22 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
> extern int get_tsc_mode(unsigned long adr);
> extern int set_tsc_mode(unsigned int val);
>
> +#ifdef CONFIG_HAVE_INTEL_MPX
> +
> +/* Init/release a process' MPX related resource */
> +#define MPX_INIT(tsk) mpx_init((tsk))
> +#define MPX_RELEASE(tsk) mpx_release((tsk))
> +
> +extern int mpx_init(struct task_struct *tsk);
> +extern int mpx_release(struct task_struct *tsk);
> +
> +#else /* CONFIG_HAVE_INTEL_MPX */
> +
> +#define MPX_INIT(tsk) (-EINVAL)
> +#define MPX_RELEASE(tsk) (-EINVAL)
> +
> +#endif /* CONFIG_HAVE_INTEL_MPX */
> +
> extern u16 amd_get_nb_id(int cpu);
>
> static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
> diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
> index e055e0e..9e91178 100644
> --- a/arch/x86/kernel/mpx.c
> +++ b/arch/x86/kernel/mpx.c
> @@ -1,5 +1,7 @@
> #include <linux/kernel.h>
> #include <linux/syscalls.h>
> +#include <linux/prctl.h>
> +#include <linux/mmu_notifier.h>
> #include <asm/processor.h>
> #include <asm/mpx.h>
> #include <asm/mman.h>
> @@ -7,6 +9,88 @@
> #include <asm/fpu-internal.h>
> #include <asm/alternative.h>
>
> +static struct mmu_notifier mpx_mn;
> +
> +static void mpx_invl_range_end(struct mmu_notifier *mn,
> + struct mm_struct *mm,
> + unsigned long start, unsigned long end)
> +{
> + struct xsave_struct *xsave_buf;
> + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
> + unsigned long bt_addr;
> + unsigned long bd_base;
> + unsigned long bd_entry, bde_start, bde_end;
> + mpx_addr lap;
> +
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + /* ignore swap notifications */
> + pgd = pgd_offset(mm, start);
> + pud = pud_offset(pgd, start);
> + pmd = pmd_offset(pud, start);
> + pte = pte_offset_kernel(pmd, start);
> + if (!pte_present(*pte) && !pte_none(*pte) && !pte_file(*pte))
> + return;
> +
> + /* get bound directory base address */
> + fpu_xsave(&current->thread.fpu);

I may be wrong, but I think that this will corrupt things if you call it
from within kernel_fpu_begin.

One of these days I think we should just start unconditionally doing
xsaveopt on kernel entry.

> + xsave_buf = &(current->thread.fpu.state->xsave);
> + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> +
> + /* get related bde range */
> + lap.addr = start;
> + bde_start = bd_base + (lap.l1index << MPX_L1_SHIFT);
> +
> + lap.addr = end;
> + if (lap.ignored || lap.l2index)
> + bde_end = bd_base + (lap.l1index<<MPX_L1_SHIFT) + sizeof(long);
> + else
> + bde_end = bd_base + (lap.l1index<<MPX_L1_SHIFT);
> +
> + for (bd_entry = bde_start; bd_entry < bde_end;
> + bd_entry += sizeof(unsigned long)) {
> +
> + if (get_user(bt_addr, (long __user *)bd_entry))
> + return;
> +
> + if (bt_addr & 0x1) {
> + user_atomic_cmpxchg_inatomic(&bt_addr,
> + (long __user *)bd_entry, bt_addr, 0);

Shouldn't this check whether the cmpxchg worked?

> + do_munmap(mm, bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);

Is it safe for munmap to recurse like this?

> + }
> + }
> +}
> +
> +static struct mmu_notifier_ops mpx_mmuops = {
> + .invalidate_range_end = mpx_invl_range_end,
> +};
> +
> +int mpx_init(struct task_struct *tsk)
> +{
> + if (!boot_cpu_has(X86_FEATURE_MPX))
> + return -EINVAL;
> +
> + /* register mmu_notifier to delallocate the bound tables */
> + mpx_mn.ops = &mpx_mmuops;
> + mmu_notifier_register(&mpx_mn, current->mm);

What happens if evil userspace runs this in a loop?

FWIW, I'm at a loss to understand why the mmu_notifier code works if
there are multiple struct mmu_notifiers in play.

> +
> + return 0;
> +}
> +
> +int mpx_release(struct task_struct *tsk)
> +{
> + if (!boot_cpu_has(X86_FEATURE_MPX))
> + return -EINVAL;
> +
> + /* unregister mmu_notifier */
> + mmu_notifier_unregister(&mpx_mn, current->mm);
> +
> + return 0;
> +}
> +
> static bool allocate_bt(unsigned long bd_entry)
> {
> unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 289760f..19ab881 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -149,4 +149,10 @@
>
> #define PR_GET_TID_ADDRESS 40
>
> +/*
> + * Init/release MPX related resource.
> + */
> +#define PR_MPX_INIT 41
> +#define PR_MPX_RELEASE 42
> +
> #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index c723113..0334d03 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -91,6 +91,12 @@
> #ifndef SET_TSC_CTL
> # define SET_TSC_CTL(a) (-EINVAL)
> #endif
> +#ifndef MPX_INIT
> +# define MPX_INIT(a) (-EINVAL)
> +#endif
> +#ifndef MPX_RELEASE
> +# define MPX_RELEASE(a) (-EINVAL)
> +#endif
>
> /*
> * this is where the system-wide overflow UID and GID are defined, for
> @@ -1998,6 +2004,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> if (arg2 || arg3 || arg4 || arg5)
> return -EINVAL;
> return current->no_new_privs ? 1 : 0;
> + case PR_MPX_INIT:
> + error = MPX_INIT(me);
> + break;
> + case PR_MPX_RELEASE:
> + error = MPX_RELEASE(me);
> + break;

IMO this is ugly. I'd personally rather see the CONFIG_ checks in here.

Also, usual nit: please return -EINVAL if any of the unused arguments
are zero.

Who allocates the original L1 table? Userspace or the kernel? If
userspace, what happens if there are multiple userspace things trying to
manage this? Is configuredness reference-counted?

--Andy

2014-01-27 21:54:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/26/2014 06:10 PM, H. Peter Anvin wrote:
> On 01/26/2014 05:55 PM, Ren Qiaowei wrote:
>>
>> Peter, you mean we should remove these two call and do what they do in
>> user-space, right?
>>
>
> Unless we think there is a benefit to the kernel to have a on/off switch
> for the #BR exception (if disabled, all #BR exceptions are signals,
> regardless of source.)

Yes.

For example, wouldn't UML want to have all of this stuff disabled?
Presumably it would much prefer to receive the exception directly.

The same goes for seccomp users -- as it currently stands, this code
allows mmap without a system call.

This probably means that the prctl should (optionally) take a parameter
that fixes the address of the L1 table -- seccomp users would probably
want that. (Actually, everyone might -- this is going to have weird
results if the L1 table moves.)

--Andy

>
> There might be, would like other people's opinion.
>
> -hpa
>
>

2014-01-27 21:58:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
> This patch adds new fields about bound violation into siginfo
> structure. si_lower and si_upper are respectively lower bound
> and upper bound when bound violation is caused.
>
> These fields will be set in #BR exception handler by decoding
> the user instruction and constructing the faulting pointer.
> A userspace application can get violation address, lower bound
> and upper bound for bound violation from this new siginfo structure.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/include/asm/mpx.h | 19 +++
> arch/x86/kernel/mpx.c | 287 ++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/traps.c | 6 +
> include/uapi/asm-generic/siginfo.h | 9 +-
> kernel/signal.c | 4 +
> 5 files changed, 324 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
> index 9652e9e..e099573 100644
> --- a/arch/x86/include/asm/mpx.h
> +++ b/arch/x86/include/asm/mpx.h
> @@ -3,6 +3,7 @@
>
> #include <linux/types.h>
> #include <asm/ptrace.h>
> +#include <asm/insn.h>
>
> #ifdef CONFIG_X86_64
>
> @@ -30,6 +31,22 @@
>
> #endif
>
> +struct mpx_insn {
> + struct insn_field rex_prefix; /* REX prefix */
> + struct insn_field modrm;
> + struct insn_field sib;
> + struct insn_field displacement;
> +
> + unsigned char addr_bytes; /* effective address size */
> + unsigned char limit;
> + unsigned char x86_64;
> +
> + const unsigned char *kaddr; /* kernel address of insn to analyze */
> + const unsigned char *next_byte;
> +};
> +
> +#define MAX_MPX_INSN_SIZE 15
> +
> typedef union {
> struct {
> unsigned long ignored:MPX_IGN_BITS;
> @@ -40,5 +57,7 @@ typedef union {
> } mpx_addr;
>
> void do_mpx_bt_fault(struct xsave_struct *xsave_buf);
> +void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
> + struct xsave_struct *xsave_buf);
>
> #endif /* _ASM_X86_MPX_H */
> diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
> index 9e91178..983abf7 100644
> --- a/arch/x86/kernel/mpx.c
> +++ b/arch/x86/kernel/mpx.c
> @@ -91,6 +91,269 @@ int mpx_release(struct task_struct *tsk)
> return 0;
> }
>
> +typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t;
> +static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
> + reg_type_t type)
> +{
> + int regno = 0;
> + unsigned char modrm = (unsigned char)insn->modrm.value;
> + unsigned char sib = (unsigned char)insn->sib.value;
> +
> + static const int regoff[] = {
> + offsetof(struct pt_regs, ax),
> + offsetof(struct pt_regs, cx),
> + offsetof(struct pt_regs, dx),
> + offsetof(struct pt_regs, bx),
> + offsetof(struct pt_regs, sp),
> + offsetof(struct pt_regs, bp),
> + offsetof(struct pt_regs, si),
> + offsetof(struct pt_regs, di),
> +#ifdef CONFIG_X86_64
> + offsetof(struct pt_regs, r8),
> + offsetof(struct pt_regs, r9),
> + offsetof(struct pt_regs, r10),
> + offsetof(struct pt_regs, r11),
> + offsetof(struct pt_regs, r12),
> + offsetof(struct pt_regs, r13),
> + offsetof(struct pt_regs, r14),
> + offsetof(struct pt_regs, r15),
> +#endif
> + };
> +
> + switch (type) {
> + case REG_TYPE_RM:
> + regno = X86_MODRM_RM(modrm);
> + if (X86_REX_B(insn->rex_prefix.value) == 1)
> + regno += 8;
> + break;
> +
> + case REG_TYPE_INDEX:
> + regno = X86_SIB_INDEX(sib);
> + if (X86_REX_X(insn->rex_prefix.value) == 1)
> + regno += 8;
> + break;
> +
> + case REG_TYPE_BASE:
> + regno = X86_SIB_BASE(sib);
> + if (X86_REX_B(insn->rex_prefix.value) == 1)
> + regno += 8;
> + break;
> +
> + default:
> + break;
> + }
> +
> + return regs_get_register(regs, regoff[regno]);
> +}

This (and the rest of the decoder) is IMO hideous. Is there any reason
that this belongs in the kernel and not in, say, a libmpx?

(Why on earth does Intel not expose this stuff in cr2 or an MSR or
something?)

--Andy

2014-01-27 22:01:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE

On 01/27/2014 01:54 PM, Andy Lutomirski wrote:
> On 01/26/2014 06:10 PM, H. Peter Anvin wrote:
>> On 01/26/2014 05:55 PM, Ren Qiaowei wrote:
>>>
>>> Peter, you mean we should remove these two call and do what they do in
>>> user-space, right?
>>>
>>
>> Unless we think there is a benefit to the kernel to have a on/off switch
>> for the #BR exception (if disabled, all #BR exceptions are signals,
>> regardless of source.)
>
> Yes.
>
> For example, wouldn't UML want to have all of this stuff disabled?
> Presumably it would much prefer to receive the exception directly.
>
> The same goes for seccomp users -- as it currently stands, this code
> allows mmap without a system call.
>
> This probably means that the prctl should (optionally) take a parameter
> that fixes the address of the L1 table -- seccomp users would probably
> want that. (Actually, everyone might -- this is going to have weird
> results if the L1 table moves.)
>

That seems like a good argument. If the table has been moved by
userspace we can deliver the signal as a violation.

-hpa

2014-01-28 02:48:49

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86, mpx: extend siginfo structure to include bound violation information

On 01/28/2014 05:58 AM, Andy Lutomirski wrote:
> On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
> (Why on earth does Intel not expose this stuff in cr2 or an MSR or
> something?)
>

I guess it is due to some design reason.

Thanks,
Qiaowei

2014-01-28 03:40:57

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

On 01/28/2014 04:36 AM, Andy Lutomirski wrote:
>> + bd_entry = status & MPX_BNDSTA_ADDR_MASK;
>> + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
>> + allocate_bt(bd_entry);
>
> What happens if this fails? Retrying forever isn't very nice.
>
If allocation of the bound table fail, the related entry in the bound
directory is still invalid. The following access to this entry still
produce #BR fault.

>> + if (!user_mode(regs)) {
>> + if (!fixup_exception(regs)) {
>> + tsk->thread.error_code = error_code;
>> + tsk->thread.trap_nr = X86_TRAP_BR;
>> + die("bounds", regs, error_code);
>> + }
>
> Why the fixup? Unless I'm missing something, the kernel has no business
> getting #BR on access to a user address.
>
> Or are you adding code to allow the kernel to use MPX itself? If so,
> shouldn't this use an MPX-specific fixup to allow normal C code to use
> this stuff?
>
It checks whether #BR come from user-space. You can see do_trap_no_signal().

>> + goto exit;
>> + }
>> +
>> + if (!boot_cpu_has(X86_FEATURE_MPX)) {
>> + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
>> + goto exit;
>
> This, as well as the status == 0 case, should probably document that the
> exception is from BOUND, not MPX.
>
Ok. I will add one comment for this.


>> + break;
>> +
>> + case 1: /* Bound violation. */
>> + case 0: /* No MPX exception. */
>> + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
>> + break;
>
> What does "No Intel MPX exception" mean? Surely that has business
> sending #BR.
>
Oh. It comes from spec, and just mean it is not from MPX. :) I will
change it to be accurate.

>> +
>> + default:
>> + break;
>
> What does status 3 mean? The docs say "reserved". Presumably this
> should log and kill the process.
I guess it should be a good suggestion.

Thanks,
Qiaowei

2014-01-28 03:46:28

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] x86, mpx: add documentation on Intel MPX

On 01/28/2014 04:27 AM, Andy Lutomirski wrote:
> On 01/26/2014 01:08 AM, Qiaowei Ren wrote:
>> +1) Providing handlers for bounds faults (#BR).
>> +
>> +When MPX is enabled, there are 2 new situations that can generate
>> +#BR faults. If a bounds overflow occurs then a #BR is generated.
>> +The fault handler will decode MPX instructions to get violation
>> +address and set this address into extended struct siginfo.
>
> Can you document exactly where the insn address and pointer value end
> up? (If it's in the (IMO hideous) cr2 field in ucontext, this needs to
> be documented. If it's somewhere useful in siginfo, that should also be
> documented to save people the time it takes to figure it out.)
>
Ok. I will describe extended siginfo at this documentation.

Thanks,
Qiaowei

2014-01-28 05:22:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

On Mon, Jan 27, 2014 at 7:35 PM, Ren Qiaowei <[email protected]> wrote:
> On 01/28/2014 04:36 AM, Andy Lutomirski wrote:
>>>
>>> + bd_entry = status & MPX_BNDSTA_ADDR_MASK;
>>> + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
>>> + allocate_bt(bd_entry);
>>
>>
>> What happens if this fails? Retrying forever isn't very nice.
>>
> If allocation of the bound table fail, the related entry in the bound
> directory is still invalid. The following access to this entry still produce
> #BR fault.
>

By the "following access" I think you mean the same instruction that
just trapped -- it will trap again because the exception hasn't been
fixed up. Then mmap will fail again, and you'll retry again, leading
to an infinite loop.

I think that failure to fix up the exception should either let the
normal bounds error through or should raise SIGBUS.

>
>>> + if (!user_mode(regs)) {
>>> + if (!fixup_exception(regs)) {
>>> + tsk->thread.error_code = error_code;
>>> + tsk->thread.trap_nr = X86_TRAP_BR;
>>> + die("bounds", regs, error_code);
>>> + }
>>
>>
>> Why the fixup? Unless I'm missing something, the kernel has no business
>> getting #BR on access to a user address.
>>
>> Or are you adding code to allow the kernel to use MPX itself? If so,
>> shouldn't this use an MPX-specific fixup to allow normal C code to use
>> this stuff?
>>
> It checks whether #BR come from user-space. You can see do_trap_no_signal().

Wasn't #BR using do_trap before? do_trap doesn't call
fixup_exception. I don't see why it should do it now. (I also don't
think it should come from kernel space until someone adds kernel-mode
MPX support.)

>
>
>>> + goto exit;
>>> + }
>>> +
>>> + if (!boot_cpu_has(X86_FEATURE_MPX)) {
>>> + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code,
>>> NULL);
>>> + goto exit;
>>
>>
>> This, as well as the status == 0 case, should probably document that the
>> exception is from BOUND, not MPX.
>>
> Ok. I will add one comment for this.
>
>
>
>>> + break;
>>> +
>>> + case 1: /* Bound violation. */
>>> + case 0: /* No MPX exception. */
>>> + do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code,
>>> NULL);
>>> + break;
>>
>>
>> What does "No Intel MPX exception" mean? Surely that has business
>> sending #BR.
>>
> Oh. It comes from spec, and just mean it is not from MPX. :) I will change
> it to be accurate.
>
>
>>> +
>>> + default:
>>> + break;
>>
>>
>> What does status 3 mean? The docs say "reserved". Presumably this
>> should log and kill the process.
>
> I guess it should be a good suggestion.
>
> Thanks,
> Qiaowei
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-01-28 05:45:26

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

On 01/28/2014 01:21 PM, Andy Lutomirski wrote:
> On Mon, Jan 27, 2014 at 7:35 PM, Ren Qiaowei <[email protected]> wrote:
>> On 01/28/2014 04:36 AM, Andy Lutomirski wrote:
>>>>
>>>> + bd_entry = status & MPX_BNDSTA_ADDR_MASK;
>>>> + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
>>>> + allocate_bt(bd_entry);
>>>
>>>
>>> What happens if this fails? Retrying forever isn't very nice.
>>>
>> If allocation of the bound table fail, the related entry in the bound
>> directory is still invalid. The following access to this entry still produce
>> #BR fault.
>>
>
> By the "following access" I think you mean the same instruction that
> just trapped -- it will trap again because the exception hasn't been
> fixed up. Then mmap will fail again, and you'll retry again, leading
> to an infinite loop.
>
I don't mean the same instruction that just trapped.

> I think that failure to fix up the exception should either let the
> normal bounds error through or should raise SIGBUS.
>
Maybe we need HPA help answer this question. Peter, what do you think
about it? If allocation of the bound table fail, what should we do?

>>
>>>> + if (!user_mode(regs)) {
>>>> + if (!fixup_exception(regs)) {
>>>> + tsk->thread.error_code = error_code;
>>>> + tsk->thread.trap_nr = X86_TRAP_BR;
>>>> + die("bounds", regs, error_code);
>>>> + }
>>>
>>>
>>> Why the fixup? Unless I'm missing something, the kernel has no business
>>> getting #BR on access to a user address.
>>>
>>> Or are you adding code to allow the kernel to use MPX itself? If so,
>>> shouldn't this use an MPX-specific fixup to allow normal C code to use
>>> this stuff?
>>>
>> It checks whether #BR come from user-space. You can see do_trap_no_signal().
>
> Wasn't #BR using do_trap before? do_trap doesn't call
> fixup_exception. I don't see why it should do it now. (I also don't
> think it should come from kernel space until someone adds kernel-mode
> MPX support.)
>
do_trap() -> do_trap_no_signal() call similar code to check if the fault
occurred in userspace or kernel space. You can see previous discussion
for the first version of this patchset.

Thanks,
Qiaowei

2014-01-28 06:42:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Intel MPX support


* Ren Qiaowei <[email protected]> wrote:

> >> MPX kernel code, namely this patchset, has mainly the 2
> >> responsibilities: provide handlers for bounds faults (#BR), and
> >> manage bounds memory.
> >
> > AFAICS the kernel side implementation causes no runtime overhead
> > for non-MPX workloads, and also causes no runtime overhead for
> > non-MPX hardware, right?
>
> Yes.

Actually, I think that's not entirely true.

For example if within the same mm there's a lot of non-MPX threads and
an MPX thread, then the MMU notifier will be called for MMU operations
of every non-MPX thread as well!

So MPX state of a thread will slow down all the other non-MPX threads
as well.

The statement is only true for non-MPX tasks that have their separate
mm's that does not have a single MPX thread.

Thanks,

Ingo

2014-01-28 06:43:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

On Mon, Jan 27, 2014 at 9:39 PM, Ren Qiaowei <[email protected]> wrote:
> On 01/28/2014 01:21 PM, Andy Lutomirski wrote:
>>
>> On Mon, Jan 27, 2014 at 7:35 PM, Ren Qiaowei <[email protected]>
>> wrote:
>>>
>>> On 01/28/2014 04:36 AM, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> + bd_entry = status & MPX_BNDSTA_ADDR_MASK;
>>>>> + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
>>>>> + allocate_bt(bd_entry);
>>>>
>>>>
>>>>
>>>> What happens if this fails? Retrying forever isn't very nice.
>>>>
>>> If allocation of the bound table fail, the related entry in the bound
>>> directory is still invalid. The following access to this entry still
>>> produce
>>> #BR fault.
>>>
>>
>> By the "following access" I think you mean the same instruction that
>> just trapped -- it will trap again because the exception hasn't been
>> fixed up. Then mmap will fail again, and you'll retry again, leading
>> to an infinite loop.
>>
> I don't mean the same instruction that just trapped.

I haven't dug to the right page of the docs, I guess. What is RIP set
to when an MPX instruction causes #BR?

It's *certainly* not okay to fail the fixup and skip the offending instruction.

>
>
>> I think that failure to fix up the exception should either let the
>> normal bounds error through or should raise SIGBUS.
>>
> Maybe we need HPA help answer this question. Peter, what do you think about
> it? If allocation of the bound table fail, what should we do?
>
>
>>>
>>>>> + if (!user_mode(regs)) {
>>>>> + if (!fixup_exception(regs)) {
>>>>> + tsk->thread.error_code = error_code;
>>>>> + tsk->thread.trap_nr = X86_TRAP_BR;
>>>>> + die("bounds", regs, error_code);
>>>>> + }
>>>>
>>>>
>>>>
>>>> Why the fixup? Unless I'm missing something, the kernel has no business
>>>> getting #BR on access to a user address.
>>>>
>>>> Or are you adding code to allow the kernel to use MPX itself? If so,
>>>> shouldn't this use an MPX-specific fixup to allow normal C code to use
>>>> this stuff?
>>>>
>>> It checks whether #BR come from user-space. You can see
>>> do_trap_no_signal().
>>
>>
>> Wasn't #BR using do_trap before? do_trap doesn't call
>> fixup_exception. I don't see why it should do it now. (I also don't
>> think it should come from kernel space until someone adds kernel-mode
>> MPX support.)
>>
> do_trap() -> do_trap_no_signal() call similar code to check if the fault
> occurred in userspace or kernel space. You can see previous discussion for
> the first version of this patchset.

I just read it. do_trap_no_signal presumably calls fixup_exception
because #UD uses it and #UD needs that handling. (I'm guessing that
there is actually a legitimate use for a kernel fixup on #UD somewhere
-- there's probably something that isn't covered by cpuid.)

There should not be a #BR from the kernel using the fixup mechanism.
IMO if the exception comes from the kernel, it should unconditionally
call die.

At some point there might be legitimate #BR faults from the kernel due
to actual in-kernel use of the MPX translation table. This is a whole
different story.

(Presumably the right thing to do is to have gcc support for wide
pointers that contain their own bounds.)

--Andy

2014-01-28 06:52:43

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] x86, mpx: hook #BR exception handler to allocate bound tables

On 01/28/2014 02:42 PM, Andy Lutomirski wrote:
> I just read it. do_trap_no_signal presumably calls fixup_exception
> because #UD uses it and #UD needs that handling. (I'm guessing that
> there is actually a legitimate use for a kernel fixup on #UD somewhere
> -- there's probably something that isn't covered by cpuid.)
>
> There should not be a #BR from the kernel using the fixup mechanism.
> IMO if the exception comes from the kernel, it should unconditionally
> call die.
>
Oh. I agree with you, and if a #BR from the kernel it should
unconditionally call die.

if (!user_mode(regs))
die("bounds", regs, error_code);

Thanks,
Qiaowei

2014-01-28 07:06:57

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Intel MPX support

On 01/28/2014 02:42 PM, Ingo Molnar wrote:
>
> * Ren Qiaowei <[email protected]> wrote:
>
>>>> MPX kernel code, namely this patchset, has mainly the 2
>>>> responsibilities: provide handlers for bounds faults (#BR), and
>>>> manage bounds memory.
>>>
>>> AFAICS the kernel side implementation causes no runtime overhead
>>> for non-MPX workloads, and also causes no runtime overhead for
>>> non-MPX hardware, right?
>>
>> Yes.
>
> Actually, I think that's not entirely true.
>
> For example if within the same mm there's a lot of non-MPX threads and
> an MPX thread, then the MMU notifier will be called for MMU operations
> of every non-MPX thread as well!
>
> So MPX state of a thread will slow down all the other non-MPX threads
> as well.
>
> The statement is only true for non-MPX tasks that have their separate
> mm's that does not have a single MPX thread.
>

Yes. Though all non-MPX threads are slowed down, the whole process
benefit from MPX.

Anyway, HPA suggest these syscalls, which use MMU notifier, should be
not needed, we can do what they do in userspace runtime. What do you
think about it? I guess that I should remove the third patch which adds
new prctl() syscalls in next version of this patchset.

Thanks,
Qiaowei

2014-01-28 18:27:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Intel MPX support

On 01/27/2014 11:01 PM, Ren Qiaowei wrote:
>
> Yes. Though all non-MPX threads are slowed down, the whole process
> benefit from MPX.
>
> Anyway, HPA suggest these syscalls, which use MMU notifier, should be
> not needed, we can do what they do in userspace runtime. What do you
> think about it? I guess that I should remove the third patch which adds
> new prctl() syscalls in next version of this patchset.
>

The syscalls is one thing, managing the bounds map in kernel space is
another.

We could manage the bounds map entirely in user space in a signal
handler, but that has both ABI issues (#BR currently turns into SIGSEGV
which is commonly hooked by applications; we could switch to a different
signal but there aren't many unclaimed ones) and performance issues.

I would think it would be extremely unusual for an application to have
some MPX and some non-MPX threads, since they would share the same
address space and the non-MPX threads would mess up the bounds.

-hpa