2014-02-23 06:17:22

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v5 0/3] Intel MPX support

This patchset adds support for the Memory Protection Extensions
(MPX) feature found in future Intel processors.

MPX can be used in conjunction with compiler changes to check memory
references, for those references whose compile-time normal intentions
are usurped at runtime due to buffer overflow or underflow.

MPX provides this capability at very low performance overhead for
newly compiled code, and provides compatibility mechanisms with legacy
software components. MPX architecture is designed allow a machine to
run both MPX enabled software and legacy software that is MPX unaware.
In such a case, the legacy software does not benefit from MPX, but it
also does not experience any change in functionality or reduction in
performance.

More information about Intel MPX can be found in "Intel(R) Architecture
Instruction Set Extensions Programming Reference".

To get the advantage of MPX, changes are required in the OS kernel,
binutils, compiler, system libraries support.

New GCC option -fmpx is introduced to utilize MPX instructions.
Currently GCC compiler sources with MPX support is available in a
separate branch in common GCC SVN repository. See GCC SVN page
(http://gcc.gnu.org/svn.html) for details.

To have the full protection, we had to add MPX instrumentation to all
the necessary Glibc routines (e.g. memcpy) written on assembler, and
compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
Glibc source can be found in Glibc git repository.

Enabling an application to use MPX will generally not require source
code updates but there is some runtime code, which is responsible for
configuring and enabling MPX, needed in order to make use of MPX.
For most applications this runtime support will be available by linking
to a library supplied by the compiler or possibly it will come directly
from the OS once OS versions that support MPX are available.

MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
provide handlers for bounds faults (#BR), and manage bounds memory.

Currently no hardware with MPX ISA is available but it is always
possible to use SDE (Intel(R) software Development Emulator) instead,
which can be downloaded from
http://software.intel.com/en-us/articles/intel-software-development-emulator


Changes since v1:
* check to see if #BR occurred in userspace or kernel space.
* use generic structure and macro as much as possible when
decode mpx instructions.

Changes since v2:
* fix some compile warnings.
* update documentation.

Changes since v3:
* correct some syntax errors at documentation, and document
extended struct siginfo.
* for kill the process when the error code of BNDSTATUS is 3.
* add some comments.
* remove new prctl() commands.
* fix some compile warnings for 32-bit.

Changes since v4:
* raise SIGBUS if the allocations of the bound tables fail.

Qiaowei Ren (3):
x86, mpx: add documentation on Intel MPX
x86, mpx: hook #BR exception handler to allocate bound tables
x86, mpx: extend siginfo structure to include bound violation
information

Documentation/x86/intel_mpx.txt | 239 +++++++++++++++++++++++++
arch/x86/include/asm/mpx.h | 54 ++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 335 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 62 +++++++-
include/uapi/asm-generic/siginfo.h | 9 +-
kernel/signal.c | 4 +
7 files changed, 702 insertions(+), 2 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/kernel/mpx.c


2014-02-23 06:18:09

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v5 3/3] x86, mpx: extend siginfo structure to include bound violation information

This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.

These fields will be set in #BR exception handler by decoding
the user instruction and constructing the faulting pointer.
A userspace application can get violation address, lower bound
and upper bound for bound violation from this new siginfo structure.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 19 +++
arch/x86/kernel/mpx.c | 289 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 6 +
include/uapi/asm-generic/siginfo.h | 9 +-
kernel/signal.c | 4 +
5 files changed, 326 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index d9a61a8..3296052 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -3,6 +3,7 @@

#include <linux/types.h>
#include <asm/ptrace.h>
+#include <asm/insn.h>

#ifdef CONFIG_X86_64

@@ -30,6 +31,24 @@

#endif

+struct mpx_insn {
+ struct insn_field rex_prefix; /* REX prefix */
+ struct insn_field modrm;
+ struct insn_field sib;
+ struct insn_field displacement;
+
+ unsigned char addr_bytes; /* effective address size */
+ unsigned char limit;
+ unsigned char x86_64;
+
+ const unsigned char *kaddr; /* kernel address of insn to analyze */
+ const unsigned char *next_byte;
+};
+
+#define MAX_MPX_INSN_SIZE 15
+
bool do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf);

#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index f1a16c0..7c0e36c 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -7,6 +7,270 @@
#include <asm/fpu-internal.h>
#include <asm/alternative.h>

+typedef enum {REG_TYPE_RM, REG_TYPE_INDEX, REG_TYPE_BASE} reg_type_t;
+static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
+ reg_type_t type)
+{
+ int regno = 0;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ static const int regoff[] = {
+ offsetof(struct pt_regs, ax),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, sp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+#endif
+ };
+
+ switch (type) {
+ case REG_TYPE_RM:
+ regno = X86_MODRM_RM(modrm);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_INDEX:
+ regno = X86_SIB_INDEX(sib);
+ if (X86_REX_X(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_BASE:
+ regno = X86_SIB_BASE(sib);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ default:
+ break;
+ }
+
+ return regs_get_register(regs, regoff[regno]);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned long addr;
+ unsigned long base;
+ unsigned long indx;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ if (X86_MODRM_MOD(modrm) == 3) {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ } else {
+ if (insn->sib.nbytes) {
+ base = get_reg(insn, regs, REG_TYPE_BASE);
+ indx = get_reg(insn, regs, REG_TYPE_INDEX);
+ addr = base + indx * (1 << X86_SIB_SCALE(sib));
+ } else {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ }
+ addr += insn->displacement.value;
+ }
+
+ return addr;
+}
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n) \
+ ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit)
+
+#define __get_next(t, insn) \
+({ \
+ t r = *(t *)insn->next_byte; \
+ insn->next_byte += sizeof(t); \
+ r; \
+})
+
+#define __peek_next(t, insn) \
+({ \
+ t r = *(t *)insn->next_byte; \
+ r; \
+})
+
+#define get_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __get_next(t, insn); \
+})
+
+#define peek_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __peek_next(t, insn); \
+})
+
+static void mpx_insn_get_prefixes(struct mpx_insn *insn)
+{
+ unsigned char b;
+
+ /* Decode legacy prefix and REX prefix */
+ b = peek_next(unsigned char, insn);
+ while (b != 0x0f) {
+ /*
+ * look for a rex prefix
+ * a REX prefix cannot be followed by a legacy prefix.
+ */
+ if (insn->x86_64 && ((b&0xf0) == 0x40)) {
+ insn->rex_prefix.value = b;
+ insn->rex_prefix.nbytes = 1;
+ insn->next_byte++;
+ break;
+ }
+
+ /* check the other legacy prefixes */
+ switch (b) {
+ case 0xf2:
+ case 0xf3:
+ case 0xf0:
+ case 0x64:
+ case 0x65:
+ case 0x2e:
+ case 0x3e:
+ case 0x26:
+ case 0x36:
+ case 0x66:
+ case 0x67:
+ insn->next_byte++;
+ break;
+ default: /* everything else is garbage */
+ goto err_out;
+ }
+ b = peek_next(unsigned char, insn);
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_modrm(struct mpx_insn *insn)
+{
+ insn->modrm.value = get_next(unsigned char, insn);
+ insn->modrm.nbytes = 1;
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_sib(struct mpx_insn *insn)
+{
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+
+ if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+ insn->sib.value = get_next(unsigned char, insn);
+ insn->sib.nbytes = 1;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_displacement(struct mpx_insn *insn)
+{
+ unsigned char mod, rm, base;
+
+ /*
+ * Interpreting the modrm byte:
+ * mod = 00 - no displacement fields (exceptions below)
+ * mod = 01 - 1-byte displacement field
+ * mod = 10 - displacement field is 4 bytes
+ * mod = 11 - no memory operand
+ *
+ * mod != 11, r/m = 100 - SIB byte exists
+ * mod = 00, SIB base = 101 - displacement field is 4 bytes
+ * mod = 00, r/m = 101 - rip-relative addressing, displacement
+ * field is 4 bytes
+ */
+ mod = X86_MODRM_MOD(insn->modrm.value);
+ rm = X86_MODRM_RM(insn->modrm.value);
+ base = X86_SIB_BASE(insn->sib.value);
+ if (mod == 3)
+ return;
+ if (mod == 1) {
+ insn->displacement.value = get_next(unsigned char, insn);
+ insn->displacement.nbytes = 1;
+ } else if ((mod == 0 && rm == 5) || mod == 2 ||
+ (mod == 0 && base == 5)) {
+ insn->displacement.value = get_next(int, insn);
+ insn->displacement.nbytes = 4;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned char buf[MAX_MPX_INSN_SIZE];
+ int bytes;
+
+ memset(insn, 0, sizeof(*insn));
+
+ bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE);
+ insn->limit = MAX_MPX_INSN_SIZE - bytes;
+ insn->kaddr = buf;
+ insn->next_byte = buf;
+
+ /*
+ * In 64-bit Mode, all Intel MPX instructions use 64-bit
+ * operands for bounds and 64 bit addressing, i.e. REX.W &
+ * 67H have no effect on data or address size.
+ *
+ * In compatibility and legacy modes (including 16-bit code
+ * segments, real and virtual 8086 modes) all Intel MPX
+ * instructions use 32-bit operands for bounds and 32 bit
+ * addressing.
+ */
+#ifdef CONFIG_X86_64
+ insn->x86_64 = 1;
+ insn->addr_bytes = 8;
+#else
+ insn->x86_64 = 0;
+ insn->addr_bytes = 4;
+#endif
+}
+
+static unsigned long mpx_insn_decode(struct mpx_insn *insn,
+ struct pt_regs *regs)
+{
+ mpx_insn_init(insn, regs);
+
+ /*
+ * In this case, we only need decode bndcl/bndcn/bndcu,
+ * so we can use private diassembly interfaces to get
+ * prefixes, modrm, sib, displacement, etc..
+ */
+ mpx_insn_get_prefixes(insn);
+ insn->next_byte += 2; /* ignore opcode */
+ mpx_insn_get_modrm(insn);
+ mpx_insn_get_sib(insn);
+ mpx_insn_get_displacement(insn);
+
+ return get_addr_ref(insn, regs);
+}
+
static bool allocate_bt(unsigned long bd_entry)
{
unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
@@ -44,3 +308,28 @@ bool do_mpx_bt_fault(struct xsave_struct *xsave_buf)

return false;
}
+
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf)
+{
+ struct mpx_insn insn;
+ uint8_t bndregno;
+ unsigned long addr_vio;
+
+ addr_vio = mpx_insn_decode(&insn, regs);
+
+ bndregno = X86_MODRM_REG(insn.modrm.value);
+ if (bndregno > 3)
+ return;
+
+ /* Note: the upper 32 bits are ignored in 32-bit mode. */
+ info->si_lower = (void __user *)(unsigned long)
+ (xsave_buf->bndregs.bndregs[2*bndregno]);
+ info->si_upper = (void __user *)(unsigned long)
+ (~xsave_buf->bndregs.bndregs[2*bndregno+1]);
+ info->si_addr_lsb = 0;
+ info->si_signo = SIGSEGV;
+ info->si_errno = 0;
+ info->si_code = SEGV_BNDERR;
+ info->si_addr = (void __user *)addr_vio;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b894f09..ecfc706 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -269,6 +269,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
unsigned long status;
struct xsave_struct *xsave_buf;
struct task_struct *tsk = current;
+ siginfo_t info;

prev_state = exception_enter();
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
@@ -305,6 +306,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
break;

case 1: /* Bound violation. */
+ do_mpx_bounds(regs, &info, xsave_buf);
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs,
+ error_code, &info);
+ break;
+
case 0: /* No exception caused by Intel MPX operations. */
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
break;
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index ba5be7f..1e35520 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,10 @@ typedef struct siginfo {
int _trapno; /* TRAP # which caused the signal */
#endif
short _addr_lsb; /* LSB of the reported address */
+ struct {
+ void __user *_lower;
+ void __user *_upper;
+ } _addr_bnd;
} _sigfault;

/* SIGPOLL */
@@ -131,6 +135,8 @@ typedef struct siginfo {
#define si_trapno _sifields._sigfault._trapno
#endif
#define si_addr_lsb _sifields._sigfault._addr_lsb
+#define si_lower _sifields._sigfault._addr_bnd._lower
+#define si_upper _sifields._sigfault._addr_bnd._upper
#define si_band _sifields._sigpoll._band
#define si_fd _sifields._sigpoll._fd
#ifdef __ARCH_SIGSYS
@@ -199,7 +205,8 @@ typedef struct siginfo {
*/
#define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */
#define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped object */
-#define NSIGSEGV 2
+#define SEGV_BNDERR (__SI_FAULT|3) /* failed address bound checks */
+#define NSIGSEGV 3

/*
* SIGBUS si_codes
diff --git a/kernel/signal.c b/kernel/signal.c
index 52f881d..b9ea074 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2771,6 +2771,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from)
if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
#endif
+#ifdef SEGV_BNDERR
+ err |= __put_user(from->si_lower, &to->si_lower);
+ err |= __put_user(from->si_upper, &to->si_upper);
+#endif
break;
case __SI_CHLD:
err |= __put_user(from->si_pid, &to->si_pid);
--
1.7.1

2014-02-23 06:17:38

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <[email protected]>
---
Documentation/x86/intel_mpx.txt | 239 +++++++++++++++++++++++++++++++++++++++
1 files changed, 239 insertions(+), 0 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt

diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 0000000..9af8636
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,239 @@
+1. Intel(R) MPX Overview
+========================
+
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
+capability introduced into Intel Architecture. Intel MPX provides
+hardware features that can be used in conjunction with compiler
+changes to check memory references, for those references whose
+compile-time normal intentions are usurped at runtime due to
+buffer overflow or underflow.
+
+Two of the most important goals of Intel MPX are to provide
+this capability at very low performance overhead for newly
+compiled code, and to provide compatibility mechanisms with
+legacy software components. MPX architecture is designed to
+allow a machine (i.e., the processor(s) and the OS software)
+to run both MPX enabled software and legacy software that
+is MPX unaware. In such a case, the legacy software does not
+benefit from MPX, but it also does not experience any change
+in functionality or reduction in performance.
+
+Intel(R) MPX Programming Model
+------------------------------
+
+Intel MPX introduces new registers and new instructions that
+operate on these registers. Some of the registers added are
+bounds registers which store a pointer's lower bound and upper
+bound limits. Whenever the pointer is used, the requested
+reference is checked against the pointer's associated bounds,
+thereby preventing out-of-bound memory access (such as buffer
+overflows and overruns). Out-of-bounds memory references
+initiate a #BR exception which can then be handled in an
+appropriate manner.
+
+Loading and Storing Bounds using Translation
+--------------------------------------------
+
+Intel MPX defines two instructions for load/store of the linear
+address of a pointer to a buffer, along with the bounds of the
+buffer into a paging structure of extended bounds. Specifically
+when storing extended bounds, the processor will perform address
+translation of the address where the pointer is stored to an
+address in the Bound Table (BT) to determine the store location
+of extended bounds. Loading of an extended bounds performs the
+reverse sequence.
+
+The structure in memory to load/store an extended bound is a
+4-tuple consisting of lower bound, upper bound, pointer value
+and a reserved field. Bound loads and stores access 32-bit or
+64-bit operand size according to the operation mode. Thus,
+a bound table entry is 4*32 bits in 32-bit mode and 4*64 bits
+in 64-bit mode.
+
+The linear address of a bound table is stored in a Bound
+Directory (BD) entry. The linear address of the bound
+directory is derived from either BNDCFGU or BNDCFGS registers.
+Bounds in memory are stored in Bound Tables (BT) as an extended
+bound, which are accessed via Bound Directory (BD) and address
+translation performed by BNDLDX/BNDSTX instructions.
+
+Bounds Directory (BD) and Bounds Tables (BT) are stored in
+application memory and are allocated by the application (in case
+of kernel use, the structures will be in kernel memory). The
+bound directory and each instance of bound table are in contiguous
+linear memory.
+
+XSAVE/XRESTOR Support of Intel MPX State
+----------------------------------------
+
+Enabling Intel MPX requires an OS to manage two bits in XCR0:
+ - BNDREGS for saving and restoring registers BND0-BND3,
+ - BNDCSR for saving and restoring the user-mode configuration
+(BNDCFGU) and the status register (BNDSTATUS).
+
+The reason for having two separate bits is that BND0-BND3 are
+likely to be volatile state, while BNDCFGU and BNDSTATUS are not.
+Therefore, an OS has flexibility in handling these two states
+differently in saving or restoring them.
+
+For details about the Intel MPX instructions, see "Intel(R)
+Architecture Instruction Set Extensions Programming Reference".
+
+
+2. How to get the advantage of MPX
+==================================
+
+
+To get the advantage of MPX, changes are required in
+the OS kernel, binutils, compiler, and system libraries support.
+
+MPX support in the GNU toolchain
+--------------------------------
+
+This section describes changes in GNU Binutils, GCC and Glibc
+to support MPX.
+
+The first step of MPX support is to implement support for new
+hardware features in binutils and the GCC.
+
+The second step is implementation of MPX instrumentation pass
+in the GCC compiler which is responsible for instrumenting all
+memory accesses with pointer checks. Compiler changes for runtime
+bound checks include:
+
+ * Bounds creation for statically allocated objects, objects
+ allocated on the stack and statically initialized pointers.
+
+ * MPX support in ABI: ABI extension allows passing bounds for
+ the pointers passed as function arguments and provides returned
+ bounds with the pointers.
+
+ * Bounds table content management: each pointer that is stored
+ into memory should have its bounds stored in the corresponding
+ row of the bounds table; compiler generates appropriate code
+ to have the bounds table in the consistent state.
+
+ * Memory accesses instrumentation: compiler analyzes data flow
+ to compute bounds corresponding to each memory access and
+ inserts code to check used address against computed bounds.
+
+Dynamically created objects in heap using memory allocators need
+to set bounds for objects (buffers) at allocation time. So the
+next step is to add MPX support into standard memory allocators
+in Glibc.
+
+To have full protection, an application has to use libraries
+compiled with MPX instrumentation. It means we had to compile
+Glibc with the MPX-enabled GCC compiler because it is used in
+most applications. Also we had to add MPX instrumentation to all
+the necessary Glibc routines (e.g. memcpy) written in assembler.
+
+A new GCC option -fmpx is introduced to utilize MPX instructions.
+Also binutils with MPX enabled should be used to get binaries
+with memory protection.
+
+Consider the following simple test for MPX compiled program:
+
+ int main(int argc, char* argv)
+ {
+ int buf[100];
+ return buf[argc];
+ }
+
+Snippet of the original assembler output (compiled with -O2):
+
+ movslq %edi, %rdi
+ movl -120(%rsp,%rdi,4), %eax // memory access buf[argc]
+
+Compile test as follows: mpx-gcc/gcc test.c -fmpx -O2
+
+Resulted assembler snippet:
+
+ movl $399, %edx
+ movslq %edi, %rdi // rdi contains value of argc
+ leaq -104(%rsp), %rax // load start address of buf to rax
+ bndmk (%rax,%rdx), %bnd0 // create bounds for buf
+ bndcl (%rax,%rdi,4), %bnd0 // check that memory access doesn't
+ // violate buf's low bound
+ bndcu 3(%rax,%rdi,4), %bnd0 // check that memory access doesn't
+ // violate buf's upper bound
+ movl -104(%rsp,%rdi,4), %eax // original memory access
+
+Code looks pretty clear. Note only that we added displacement 3 for
+upper bound checking since we have 4 byte (integer) access here.
+
+Several MPX-specific compiler options besides -fmpx were introduced
+in the compiler. Most of them, like -fmpx-check-read and
+-fmpx-check-write, control number of inserted runtime bound checks.
+Also developers always can use intrinsics to insert MPX instructions
+manually.
+
+Currently GCC compiler sources with MPX support is available in a
+separate branch in common GCC SVN repository. See GCC SVN page
+(http://gcc.gnu.org/svn.html) for details.
+
+Currently no hardware with MPX ISA is available but it is always
+possible to use SDE (Intel(R) Software Development Emulator) instead,
+which can be downloaded from
+http://software.intel.com/en-us/articles/intel-software-development-emulator
+
+MPX runtime support
+-------------------
+
+Enabling an application to use MPX will generally not require source
+code updates but there is some runtime code needed in order to make
+use of MPX. For most applications this runtime support will be available
+by linking to a library supplied by the compiler or possibly it will
+come directly from the OS once OS versions that support MPX are available.
+
+The runtime is responsible for configuring and enabling MPX. The
+configuration and enabling of MPX consists of the runtime writing
+the base address of the Bound Directory(BD) to the BNDCFGU register
+and setting the enable bit.
+
+MPX kernel support
+------------------
+
+MPX kernel code has mainly the following responsibilities.
+
+1) Providing handlers for bounds faults (#BR).
+
+When MPX is enabled, there are 2 new situations that can generate
+#BR faults. If a bounds overflow occurs then a #BR is generated.
+The fault handler will decode MPX instructions to get violation
+address and set this address into extended struct siginfo.
+
+The _sigfault feild of struct siginfo is extended as follow:
+
+87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+88 struct {
+89 void __user *_addr; /* faulting insn/memory ref. */
+90 #ifdef __ARCH_SI_TRAPNO
+91 int _trapno; /* TRAP # which caused the signal */
+92 #endif
+93 short _addr_lsb; /* LSB of the reported address */
+94 struct {
+95 void __user *_lower;
+96 void __user *_upper;
+97 } _addr_bnd;
+98 } _sigfault;
+
+The '_addr' field refers to violation address, and new '_addr_and'
+field refers to the upper/lower bounds when a #BR is caused.
+
+The other case that generates a #BR is when a BNDSTX instruction
+attempts to save bounds to a BD entry marked as invalid. This is
+an indication that no BT exists for this entry. In this case the
+fault handler will allocate a new BT.
+
+2) Managing bounds memory.
+
+MPX defines 4 sets of bound registers. When an application needs
+more than 4 sets of bounds it uses the BNDSTX instruction to save
+the additional bounds out to memory. The kernel dynamically allocates
+the memory used to store these bounds. The bounds memory is organized
+into a 2-level structure consisting of a BD which contains pointers
+to a set of Bound Tables (BT) which contain the actual bound information.
+In order to minimize the Intel MPX memory usage the BTs are allocated
+on demand by the Intel MPX runtime.
--
1.7.1

2014-02-23 06:18:07

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables

An access to an invalid bound directory entry will cause a #BR
exception. This patch hook #BR exception handler to allocate
one bound table and bind it with that buond directory entry.

This will avoid the need of forwarding the #BR exception
to the user space when bound directory has invalid entry.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 35 +++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 46 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 56 +++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 137 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/kernel/mpx.c

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
new file mode 100644
index 0000000..d9a61a8
--- /dev/null
+++ b/arch/x86/include/asm/mpx.h
@@ -0,0 +1,35 @@
+#ifndef _ASM_X86_MPX_H
+#define _ASM_X86_MPX_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_64
+
+#define MPX_L1_BITS 28
+#define MPX_L1_SHIFT 3
+#define MPX_L2_BITS 17
+#define MPX_L2_SHIFT 5
+#define MPX_IGN_BITS 3
+#define MPX_L2_NODE_ADDR_MASK 0xfffffffffffffff8UL
+
+#define MPX_BNDSTA_ADDR_MASK 0xfffffffffffffffcUL
+#define MPX_BNDCFG_ADDR_MASK 0xfffffffffffff000UL
+
+#else
+
+#define MPX_L1_BITS 20
+#define MPX_L1_SHIFT 2
+#define MPX_L2_BITS 10
+#define MPX_L2_SHIFT 4
+#define MPX_IGN_BITS 2
+#define MPX_L2_NODE_ADDR_MASK 0xfffffffcUL
+
+#define MPX_BNDSTA_ADDR_MASK 0xfffffffcUL
+#define MPX_BNDCFG_ADDR_MASK 0xfffff000UL
+
+#endif
+
+bool do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+
+#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cb648c8..becb970 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o

obj-y += process.o
obj-y += i387.o xsave.o
+obj-y += mpx.o
obj-y += ptrace.o
obj-$(CONFIG_X86_32) += tls.o
obj-$(CONFIG_IA32_EMULATION) += tls.o
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
new file mode 100644
index 0000000..f1a16c0
--- /dev/null
+++ b/arch/x86/kernel/mpx.c
@@ -0,0 +1,46 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/processor.h>
+#include <asm/mpx.h>
+#include <asm/mman.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
+#include <asm/alternative.h>
+
+static bool allocate_bt(unsigned long bd_entry)
+{
+ unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
+ unsigned long bt_addr, old_val = 0;
+
+ bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
+ if (bt_addr == -1) {
+ pr_err("L2 Node Allocation Failed at L1 addr %lx\n",
+ bd_entry);
+ return false;
+ }
+ bt_addr = (bt_addr & MPX_L2_NODE_ADDR_MASK) | 0x01;
+
+ user_atomic_cmpxchg_inatomic(&old_val,
+ (long __user *)bd_entry, 0, bt_addr);
+ if (old_val)
+ vm_munmap(bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
+
+ return true;
+}
+
+bool do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+ unsigned long status;
+ unsigned long bd_entry, bd_base;
+ unsigned long bd_size = 1UL << (MPX_L1_BITS+MPX_L1_SHIFT);
+
+ bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+ status = xsave_buf->bndcsr.status_reg;
+
+ bd_entry = status & MPX_BNDSTA_ADDR_MASK;
+ if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
+ return allocate_bt(bd_entry);
+
+ return false;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 57409f6..b894f09 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
#include <asm/alternative.h>
+#include <asm/mpx.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -213,7 +214,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \

DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip )
DO_ERROR (X86_TRAP_OF, SIGSEGV, "overflow", overflow )
-DO_ERROR (X86_TRAP_BR, SIGSEGV, "bounds", bounds )
DO_ERROR_INFO(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip )
DO_ERROR (X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun )
DO_ERROR (X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS )
@@ -263,6 +263,60 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
}
#endif

+dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
+{
+ enum ctx_state prev_state;
+ unsigned long status;
+ struct xsave_struct *xsave_buf;
+ struct task_struct *tsk = current;
+
+ prev_state = exception_enter();
+ if (notify_die(DIE_TRAP, "bounds", regs, error_code,
+ X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
+ goto exit;
+ conditional_sti(regs);
+
+ if (!user_mode(regs))
+ die("bounds", regs, error_code);
+
+ if (!boot_cpu_has(X86_FEATURE_MPX)) {
+ /* The exception is not from Intel MPX */
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ goto exit;
+ }
+
+ fpu_xsave(&tsk->thread.fpu);
+ xsave_buf = &(tsk->thread.fpu.state->xsave);
+ status = xsave_buf->bndcsr.status_reg;
+
+ /*
+ * The error code field of the BNDSTATUS register communicates status
+ * information of a bound range exception #BR or operation involving
+ * bound directory.
+ */
+ switch (status & 0x3) {
+ case 2:
+ /*
+ * Bound directory has invalid entry.
+ * No signal will be sent to the user space.
+ */
+ if (!do_mpx_bt_fault(xsave_buf))
+ force_sig(SIGBUS, tsk);
+ break;
+
+ case 1: /* Bound violation. */
+ case 0: /* No exception caused by Intel MPX operations. */
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ break;
+
+ default:
+ die("bounds", regs, error_code);
+ }
+
+exit:
+ exception_exit(prev_state);
+}
+
dotraplinkage void __kprobes
do_general_protection(struct pt_regs *regs, long error_code)
{
--
1.7.1

2014-02-24 17:28:27

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables

On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
> +static bool allocate_bt(unsigned long bd_entry)
> +{
> + unsigned long bt_size = 1UL << (MPX_L2_BITS+MPX_L2_SHIFT);
> + unsigned long bt_addr, old_val = 0;
> +
> + bt_addr = sys_mmap_pgoff(0, bt_size, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
> + if (bt_addr == -1) {
> + pr_err("L2 Node Allocation Failed at L1 addr %lx\n",
> + bd_entry);
> + return false;
> + }
> + bt_addr = (bt_addr & MPX_L2_NODE_ADDR_MASK) | 0x01;
> +
> + user_atomic_cmpxchg_inatomic(&old_val,
> + (long __user *)bd_entry, 0, bt_addr);
> + if (old_val)
> + vm_munmap(bt_addr & MPX_L2_NODE_ADDR_MASK, bt_size);
> +
> + return true;
> +}
> +
> +bool do_mpx_bt_fault(struct xsave_struct *xsave_buf)
> +{
> + unsigned long status;
> + unsigned long bd_entry, bd_base;
> + unsigned long bd_size = 1UL << (MPX_L1_BITS+MPX_L1_SHIFT);
> +
> + bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> + status = xsave_buf->bndcsr.status_reg;
> +
> + bd_entry = status & MPX_BNDSTA_ADDR_MASK;
> + if ((bd_entry >= bd_base) && (bd_entry < bd_base + bd_size))
> + return allocate_bt(bd_entry);
> +
> + return false;
> +}

Can you talk a little bit about what the design is here? Why does the
kernel have to do the allocation of the virtual address space? Does it
really need to MAP_POPULATE? bt_size looks like 4MB, and that's an
awful lot of memory to eat up at once. Shouldn't we just let the kernel
demand-fault this like everything else?

2014-02-24 17:51:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables

On 02/24/2014 09:27 AM, Dave Hansen wrote:
>
> Can you talk a little bit about what the design is here? Why does the
> kernel have to do the allocation of the virtual address space? Does it
> really need to MAP_POPULATE? bt_size looks like 4MB, and that's an
> awful lot of memory to eat up at once. Shouldn't we just let the kernel
> demand-fault this like everything else?
>

MAP_POPULATE definitely seems like the wrong thing.

-hpa

2014-02-25 13:34:56

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate bound tables


> -----Original Message-----
> From: H. Peter Anvin [mailto:[email protected]]
> Sent: Tuesday, February 25, 2014 1:52 AM
> To: Hansen, Dave; Ren, Qiaowei; Thomas Gleixner; Ingo Molnar
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH v5 2/3] x86, mpx: hook #BR exception handler to allocate
> bound tables
>
> On 02/24/2014 09:27 AM, Dave Hansen wrote:
> >
> > Can you talk a little bit about what the design is here? Why does the
> > kernel have to do the allocation of the virtual address space? Does
> > it really need to MAP_POPULATE? bt_size looks like 4MB, and that's an
> > awful lot of memory to eat up at once. Shouldn't we just let the
> > kernel demand-fault this like everything else?
> >
>
> MAP_POPULATE definitely seems like the wrong thing.
>
Oh. This option should be removed.

Thanks,
Qiaowei

2014-02-26 19:17:31

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
> +application memory and are allocated by the application (in case
> +of kernel use, the structures will be in kernel memory). The
> +bound directory and each instance of bound table are in contiguous
> +linear memory.

Hi Qiaowei,

Does this mean that if userspace decided to map something in the way of
the bounds tables that it would break MPX?

Also, in the description, could we s/linear/virtual/? Linear seems to
be the term that Intel likes to use in its documents, but we almost
universally call them virtual addresses in the kernel.

2014-02-26 20:58:49

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

On 02/26/2014 11:17 AM, Dave Hansen wrote:
> On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
>> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
>> +application memory and are allocated by the application (in case
>> +of kernel use, the structures will be in kernel memory). The
>> +bound directory and each instance of bound table are in contiguous
>> +linear memory.
>
> Hi Qiaowei,
>
> Does this mean that if userspace decided to map something in the way of
> the bounds tables that it would break MPX?

Oh, or does this mean: "The Bounds Directory is contiguous in virtual
memory." and then "Each Bound Table is contiguous in virtual memory."
But, the directory and tables do not have to be contiguous to each other?

The wording there sounds a bit confusing.

2014-02-26 21:19:59

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
> +The other case that generates a #BR is when a BNDSTX instruction
> +attempts to save bounds to a BD entry marked as invalid. This is
> +an indication that no BT exists for this entry. In this case the
> +fault handler will allocate a new BT.

Hi Qiaowei,

Can you make an effort to move these comments over to the code? The
bounds-table allocation function is a much more appropriate place for
this text than the Documentation/ file.

2014-02-26 21:45:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

On 02/26/2014 11:17 AM, Dave Hansen wrote:
> On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
>> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
>> +application memory and are allocated by the application (in case
>> +of kernel use, the structures will be in kernel memory). The
>> +bound directory and each instance of bound table are in contiguous
>> +linear memory.
>
> Hi Qiaowei,
>
> Does this mean that if userspace decided to map something in the way of
> the bounds tables that it would break MPX?
>
> Also, in the description, could we s/linear/virtual/? Linear seems to
> be the term that Intel likes to use in its documents, but we almost
> universally call them virtual addresses in the kernel.
>

It might be useful to clarify, though, just so noone gets confused with
effective addresses (offsets).

-hpa

2014-02-27 02:05:14

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

On 02/27/2014 05:19 AM, Dave Hansen wrote:
> On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
>> +The other case that generates a #BR is when a BNDSTX instruction
>> +attempts to save bounds to a BD entry marked as invalid. This is
>> +an indication that no BT exists for this entry. In this case the
>> +fault handler will allocate a new BT.
>
> Hi Qiaowei,
>
> Can you make an effort to move these comments over to the code? The
> bounds-table allocation function is a much more appropriate place for
> this text than the Documentation/ file.
>
Ok. They will be moved.

Thanks,
Qiaowei

2014-02-27 02:09:48

by Ren Qiaowei

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] x86, mpx: add documentation on Intel MPX

On 02/27/2014 05:44 AM, H. Peter Anvin wrote:
> On 02/26/2014 11:17 AM, Dave Hansen wrote:
>> On 02/23/2014 05:27 AM, Qiaowei Ren wrote:
>>> +Bounds Directory (BD) and Bounds Tables (BT) are stored in
>>> +application memory and are allocated by the application (in case
>>> +of kernel use, the structures will be in kernel memory). The
>>> +bound directory and each instance of bound table are in contiguous
>>> +linear memory.
>>
>> Hi Qiaowei,
>>
>> Does this mean that if userspace decided to map something in the way of
>> the bounds tables that it would break MPX?
>>
>> Also, in the description, could we s/linear/virtual/? Linear seems to
>> be the term that Intel likes to use in its documents, but we almost
>> universally call them virtual addresses in the kernel.
>>
>
> It might be useful to clarify, though, just so noone gets confused with
> effective addresses (offsets).
>
Ok. Some words here come from some MPX public documents. I need to go
through more carefully.

Thanks,
Qiaowei