2014-07-21 05:42:26

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 00/10] Intel MPX support

This patchset adds support for the Memory Protection Extensions
(MPX) feature found in future Intel processors.

MPX can be used in conjunction with compiler changes to check memory
references, for those references whose compile-time normal intentions
are usurped at runtime due to buffer overflow or underflow.

MPX provides this capability at very low performance overhead for
newly compiled code, and provides compatibility mechanisms with legacy
software components. MPX architecture is designed allow a machine to
run both MPX enabled software and legacy software that is MPX unaware.
In such a case, the legacy software does not benefit from MPX, but it
also does not experience any change in functionality or reduction in
performance.

More information about Intel MPX can be found in "Intel(R) Architecture
Instruction Set Extensions Programming Reference".

To get the advantage of MPX, changes are required in the OS kernel,
binutils, compiler, system libraries support.

New GCC option -fmpx is introduced to utilize MPX instructions.
Currently GCC compiler sources with MPX support is available in a
separate branch in common GCC SVN repository. See GCC SVN page
(http://gcc.gnu.org/svn.html) for details.

To have the full protection, we had to add MPX instrumentation to all
the necessary Glibc routines (e.g. memcpy) written on assembler, and
compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
Glibc source can be found in Glibc git repository.

Enabling an application to use MPX will generally not require source
code updates but there is some runtime code, which is responsible for
configuring and enabling MPX, needed in order to make use of MPX.
For most applications this runtime support will be available by linking
to a library supplied by the compiler or possibly it will come directly
from the OS once OS versions that support MPX are available.

MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
provide handlers for bounds faults (#BR), and manage bounds memory.

The high-level areas modified in the patchset are as follow:
1) struct siginfo is extended to include bound violation information.
2) two prctl() commands are added to do performance optimization.

Currently no hardware with MPX ISA is available but it is always
possible to use SDE (Intel(R) software Development Emulator) instead,
which can be downloaded from
http://software.intel.com/en-us/articles/intel-software-development-emulator

In addition, this patchset has been tested on Intel internal hardware
platform for MPX testing.

Future TODO items:
1) support 32-bit binaries on 64-bit kernels.

Changes since v1:
* check to see if #BR occurred in userspace or kernel space.
* use generic structure and macro as much as possible when
decode mpx instructions.

Changes since v2:
* fix some compile warnings.
* update documentation.

Changes since v3:
* correct some syntax errors at documentation, and document
extended struct siginfo.
* for kill the process when the error code of BNDSTATUS is 3.
* add some comments.
* remove new prctl() commands.
* fix some compile warnings for 32-bit.

Changes since v4:
* raise SIGBUS if the allocations of the bound tables fail.

Changes since v5:
* hook unmap() path to cleanup unused bounds tables, and use
new prctl() command to register bounds directory address to
struct mm_struct to check whether one process is MPX enabled
during unmap().
* in order track precisely MPX memory usage, add MPX specific
mmap interface and one VM_MPX flag to check whether a VMA
is MPX bounds table.
* add macro cpu_has_mpx to do performance optimization.
* sync struct figinfo for mips with general version to avoid
build issue.

Changes since v6:
* because arch_vma_name is removed, this patchset have toset MPX
specific ->vm_ops to do the same thing.
* fix warnings for 32 bit arch.
* add more description into these patches.

Qiaowei Ren (10):
x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
x86, mpx: add MPX specific mmap interface
x86, mpx: add macro cpu_has_mpx
x86, mpx: hook #BR exception handler to allocate bound tables
x86, mpx: extend siginfo structure to include bound violation
information
mips: sync struct siginfo with general version
x86, mpx: decode MPX instruction to get bound violation information
x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
x86, mpx: cleanup unused bound tables
x86, mpx: add documentation on Intel MPX

Documentation/x86/intel_mpx.txt | 127 +++++++++++
arch/mips/include/uapi/asm/siginfo.h | 4 +
arch/x86/Kconfig | 4 +
arch/x86/include/asm/cpufeature.h | 6 +
arch/x86/include/asm/mmu_context.h | 16 ++
arch/x86/include/asm/mpx.h | 91 ++++++++
arch/x86/include/asm/processor.h | 18 ++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 415 ++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 61 +++++-
arch/x86/mm/Makefile | 2 +
arch/x86/mm/mpx.c | 260 +++++++++++++++++++++
fs/proc/task_mmu.c | 1 +
include/asm-generic/mmu_context.h | 6 +
include/linux/mm.h | 2 +
include/linux/mm_types.h | 3 +
include/uapi/asm-generic/siginfo.h | 9 +-
include/uapi/linux/prctl.h | 6 +
kernel/signal.c | 4 +
kernel/sys.c | 12 +
mm/mmap.c | 2 +
21 files changed, 1048 insertions(+), 2 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/kernel/mpx.c
create mode 100644 arch/x86/mm/mpx.c


2014-07-21 05:42:27

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific

MPX-enabled application will possibly create a lot of bounds tables
in process address space to save bounds information. These tables
can take up huge swaths of memory (as much as 80% of the memory on
the system) even if we clean them up aggressively. Being this huge,
we need a way to track their memory use. If we want to track them,
we essentially have two options:

1. walk the multi-GB (in virtual space) bounds directory to locate
all the VMAs and walk them
2. Find a way to distinguish MPX bounds-table VMAs from normal
anonymous VMAs and use some existing mechanism to walk them

We expect (1) will be prohibitively expensive. For (2), we only
need a single bit, and we've chosen to use a VM_ flag. We understand
that they are scarce and are open to other options.

There is one potential hybrid approach: check the bounds directory
entry for any anonymous VMA that could possibly contain a bounds table.
This is less expensive than (1), but still requires reading a pointer
out of userspace for every VMA that we iterate over.

Signed-off-by: Qiaowei Ren <[email protected]>
---
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 2 ++
2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..b2bc755 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
[ilog2(VM_GROWSDOWN)] = "gd",
[ilog2(VM_PFNMAP)] = "pf",
[ilog2(VM_DENYWRITE)] = "dw",
+ [ilog2(VM_MPX)] = "mp",
[ilog2(VM_LOCKED)] = "lo",
[ilog2(VM_IO)] = "io",
[ilog2(VM_SEQ_READ)] = "sr",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e03dd29..44c75d7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -127,6 +127,8 @@ extern unsigned int kobjsize(const void *objp);
#define VM_HUGETLB 0x00400000 /* Huge TLB Page VM */
#define VM_NONLINEAR 0x00800000 /* Is non-linear (remap_file_pages) */
#define VM_ARCH_1 0x01000000 /* Architecture-specific flag */
+/* MPX specific bounds table or bounds directory (x86) */
+#define VM_MPX 0x02000000
#define VM_DONTDUMP 0x04000000 /* Do not include in the core dump */

#ifdef CONFIG_MEM_SOFT_DIRTY
--
1.7.1

2014-07-21 05:42:33

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

In order to do performance optimization, this patch adds macro
cpu_has_mpx which will directly return 0 when MPX is not supported
by kernel.

Community gave a lot of comments on this macro cpu_has_mpx in previous
version. Dave will introduce a patchset about disabled features to fix
it later.

In this code:
if (cpu_has_mpx)
do_some_mpx_thing();

The patch series from Dave will introduce a new macro cpu_feature_enabled()
(if merged after this patchset) to replace the cpu_has_mpx.
if (cpu_feature_enabled(X86_FEATURE_MPX))
do_some_mpx_thing();

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index e265ff9..f302d08 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -339,6 +339,12 @@ extern const char * const x86_power_flags[32];
#define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU)
#define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT)

+#ifdef CONFIG_X86_INTEL_MPX
+#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX)
+#else
+#define cpu_has_mpx 0
+#endif /* CONFIG_X86_INTEL_MPX */
+
#ifdef CONFIG_X86_64

#undef cpu_has_vme
--
1.7.1

2014-07-21 05:42:39

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 06/10] mips: sync struct siginfo with general version

Due to new fields about bound violation added into struct siginfo,
this patch syncs it with general version to avoid build issue.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/mips/include/uapi/asm/siginfo.h | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h
index e811744..d08f83f 100644
--- a/arch/mips/include/uapi/asm/siginfo.h
+++ b/arch/mips/include/uapi/asm/siginfo.h
@@ -92,6 +92,10 @@ typedef struct siginfo {
int _trapno; /* TRAP # which caused the signal */
#endif
short _addr_lsb;
+ struct {
+ void __user *_lower;
+ void __user *_upper;
+ } _addr_bnd;
} _sigfault;

/* SIGPOLL, SIGXFSZ (To do ...) */
--
1.7.1

2014-07-21 05:42:43

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information

This patch sets bound violation fields of siginfo struct in #BR
exception handler by decoding the user instruction and constructing
the faulting pointer.

This patch does't use the generic decoder, and implements a limited
special-purpose decoder to decode MPX instructions, simply because the
generic decoder is very heavyweight not just in terms of performance
but in terms of interface -- because it has to.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 23 ++++
arch/x86/kernel/mpx.c | 299 ++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 6 +
3 files changed, 328 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index b7598ac..780af63 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -3,6 +3,7 @@

#include <linux/types.h>
#include <asm/ptrace.h>
+#include <asm/insn.h>

#ifdef CONFIG_X86_64

@@ -44,15 +45,37 @@
#define MPX_BNDSTA_ERROR_CODE 0x3
#define MPX_BD_ENTRY_VALID_FLAG 0x1

+struct mpx_insn {
+ struct insn_field rex_prefix; /* REX prefix */
+ struct insn_field modrm;
+ struct insn_field sib;
+ struct insn_field displacement;
+
+ unsigned char addr_bytes; /* effective address size */
+ unsigned char limit;
+ unsigned char x86_64;
+
+ const unsigned char *kaddr; /* kernel address of insn to analyze */
+ const unsigned char *next_byte;
+};
+
+#define MAX_MPX_INSN_SIZE 15
+
unsigned long mpx_mmap(unsigned long len);

#ifdef CONFIG_X86_INTEL_MPX
int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf);
#else
static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
{
return -EINVAL;
}
+static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf)
+{
+}
#endif /* CONFIG_X86_INTEL_MPX */

#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index f02dcea..c1957a8 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -2,6 +2,275 @@
#include <linux/syscalls.h>
#include <asm/mpx.h>

+enum reg_type {
+ REG_TYPE_RM = 0,
+ REG_TYPE_INDEX,
+ REG_TYPE_BASE,
+};
+
+static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+ int regno = 0;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ static const int regoff[] = {
+ offsetof(struct pt_regs, ax),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, sp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+#endif
+ };
+
+ switch (type) {
+ case REG_TYPE_RM:
+ regno = X86_MODRM_RM(modrm);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_INDEX:
+ regno = X86_SIB_INDEX(sib);
+ if (X86_REX_X(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ case REG_TYPE_BASE:
+ regno = X86_SIB_BASE(sib);
+ if (X86_REX_B(insn->rex_prefix.value) == 1)
+ regno += 8;
+ break;
+
+ default:
+ break;
+ }
+
+ return regs_get_register(regs, regoff[regno]);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned long addr;
+ unsigned long base;
+ unsigned long indx;
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+ unsigned char sib = (unsigned char)insn->sib.value;
+
+ if (X86_MODRM_MOD(modrm) == 3) {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ } else {
+ if (insn->sib.nbytes) {
+ base = get_reg(insn, regs, REG_TYPE_BASE);
+ indx = get_reg(insn, regs, REG_TYPE_INDEX);
+ addr = base + indx * (1 << X86_SIB_SCALE(sib));
+ } else {
+ addr = get_reg(insn, regs, REG_TYPE_RM);
+ }
+ addr += insn->displacement.value;
+ }
+
+ return addr;
+}
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n) \
+ ((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit)
+
+#define __get_next(t, insn) \
+({ \
+ t r = *(t *)insn->next_byte; \
+ insn->next_byte += sizeof(t); \
+ r; \
+})
+
+#define __peek_next(t, insn) \
+({ \
+ t r = *(t *)insn->next_byte; \
+ r; \
+})
+
+#define get_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __get_next(t, insn); \
+})
+
+#define peek_next(t, insn) \
+({ \
+ if (unlikely(!validate_next(t, insn, 0))) \
+ goto err_out; \
+ __peek_next(t, insn); \
+})
+
+static void mpx_insn_get_prefixes(struct mpx_insn *insn)
+{
+ unsigned char b;
+
+ /* Decode legacy prefix and REX prefix */
+ b = peek_next(unsigned char, insn);
+ while (b != 0x0f) {
+ /*
+ * look for a rex prefix
+ * a REX prefix cannot be followed by a legacy prefix.
+ */
+ if (insn->x86_64 && ((b&0xf0) == 0x40)) {
+ insn->rex_prefix.value = b;
+ insn->rex_prefix.nbytes = 1;
+ insn->next_byte++;
+ break;
+ }
+
+ /* check the other legacy prefixes */
+ switch (b) {
+ case 0xf2:
+ case 0xf3:
+ case 0xf0:
+ case 0x64:
+ case 0x65:
+ case 0x2e:
+ case 0x3e:
+ case 0x26:
+ case 0x36:
+ case 0x66:
+ case 0x67:
+ insn->next_byte++;
+ break;
+ default: /* everything else is garbage */
+ goto err_out;
+ }
+ b = peek_next(unsigned char, insn);
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_modrm(struct mpx_insn *insn)
+{
+ insn->modrm.value = get_next(unsigned char, insn);
+ insn->modrm.nbytes = 1;
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_sib(struct mpx_insn *insn)
+{
+ unsigned char modrm = (unsigned char)insn->modrm.value;
+
+ if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+ insn->sib.value = get_next(unsigned char, insn);
+ insn->sib.nbytes = 1;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_get_displacement(struct mpx_insn *insn)
+{
+ unsigned char mod, rm, base;
+
+ /*
+ * Interpreting the modrm byte:
+ * mod = 00 - no displacement fields (exceptions below)
+ * mod = 01 - 1-byte displacement field
+ * mod = 10 - displacement field is 4 bytes
+ * mod = 11 - no memory operand
+ *
+ * mod != 11, r/m = 100 - SIB byte exists
+ * mod = 00, SIB base = 101 - displacement field is 4 bytes
+ * mod = 00, r/m = 101 - rip-relative addressing, displacement
+ * field is 4 bytes
+ */
+ mod = X86_MODRM_MOD(insn->modrm.value);
+ rm = X86_MODRM_RM(insn->modrm.value);
+ base = X86_SIB_BASE(insn->sib.value);
+ if (mod == 3)
+ return;
+ if (mod == 1) {
+ insn->displacement.value = get_next(unsigned char, insn);
+ insn->displacement.nbytes = 1;
+ } else if ((mod == 0 && rm == 5) || mod == 2 ||
+ (mod == 0 && base == 5)) {
+ insn->displacement.value = get_next(int, insn);
+ insn->displacement.nbytes = 4;
+ }
+
+err_out:
+ return;
+}
+
+static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs)
+{
+ unsigned char buf[MAX_MPX_INSN_SIZE];
+ int bytes;
+
+ memset(insn, 0, sizeof(*insn));
+
+ bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE);
+ insn->limit = MAX_MPX_INSN_SIZE - bytes;
+ insn->kaddr = buf;
+ insn->next_byte = buf;
+
+ /*
+ * In 64-bit Mode, all Intel MPX instructions use 64-bit
+ * operands for bounds and 64 bit addressing, i.e. REX.W &
+ * 67H have no effect on data or address size.
+ *
+ * In compatibility and legacy modes (including 16-bit code
+ * segments, real and virtual 8086 modes) all Intel MPX
+ * instructions use 32-bit operands for bounds and 32 bit
+ * addressing.
+ */
+#ifdef CONFIG_X86_64
+ insn->x86_64 = 1;
+ insn->addr_bytes = 8;
+#else
+ insn->x86_64 = 0;
+ insn->addr_bytes = 4;
+#endif
+}
+
+static unsigned long mpx_insn_decode(struct mpx_insn *insn,
+ struct pt_regs *regs)
+{
+ mpx_insn_init(insn, regs);
+
+ /*
+ * In this case, we only need decode bndcl/bndcn/bndcu,
+ * so we can use private diassembly interfaces to get
+ * prefixes, modrm, sib, displacement, etc..
+ */
+ mpx_insn_get_prefixes(insn);
+ insn->next_byte += 2; /* ignore opcode */
+ mpx_insn_get_modrm(insn);
+ mpx_insn_get_sib(insn);
+ mpx_insn_get_displacement(insn);
+
+ return get_addr_ref(insn, regs);
+}
+
static int allocate_bt(long __user *bd_entry)
{
unsigned long bt_addr, old_val = 0;
@@ -58,3 +327,33 @@ int do_mpx_bt_fault(struct xsave_struct *xsave_buf)

return allocate_bt((long __user *)bd_entry);
}
+
+/*
+ * If a bounds overflow occurs then a #BR is generated. The fault
+ * handler will decode MPX instructions to get violation address
+ * and set this address into extended struct siginfo.
+ */
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+ struct xsave_struct *xsave_buf)
+{
+ struct mpx_insn insn;
+ uint8_t bndregno;
+ unsigned long addr_vio;
+
+ addr_vio = mpx_insn_decode(&insn, regs);
+
+ bndregno = X86_MODRM_REG(insn.modrm.value);
+ if (bndregno > 3)
+ return;
+
+ /* Note: the upper 32 bits are ignored in 32-bit mode. */
+ info->si_lower = (void __user *)(unsigned long)
+ (xsave_buf->bndregs.bndregs[2*bndregno]);
+ info->si_upper = (void __user *)(unsigned long)
+ (~xsave_buf->bndregs.bndregs[2*bndregno+1]);
+ info->si_addr_lsb = 0;
+ info->si_signo = SIGSEGV;
+ info->si_errno = 0;
+ info->si_code = SEGV_BNDERR;
+ info->si_addr = (void __user *)addr_vio;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 396a88b..93ce924 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -284,6 +284,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
unsigned long status;
struct xsave_struct *xsave_buf;
struct task_struct *tsk = current;
+ siginfo_t info;

prev_state = exception_enter();
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
@@ -319,6 +320,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
break;

case 1: /* Bound violation. */
+ do_mpx_bounds(regs, &info, xsave_buf);
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs,
+ error_code, &info);
+ break;
+
case 0: /* No exception caused by Intel MPX operations. */
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
break;
--
1.7.1

2014-07-21 05:42:52

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 10/10] x86, mpx: add documentation on Intel MPX

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <[email protected]>
---
Documentation/x86/intel_mpx.txt | 127 +++++++++++++++++++++++++++++++++++++++
1 files changed, 127 insertions(+), 0 deletions(-)
create mode 100644 Documentation/x86/intel_mpx.txt

diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 0000000..1af9809
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,127 @@
+1. Intel(R) MPX Overview
+========================
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
+capability introduced into Intel Architecture. Intel MPX provides
+hardware features that can be used in conjunction with compiler
+changes to check memory references, for those references whose
+compile-time normal intentions are usurped at runtime due to
+buffer overflow or underflow.
+
+For more information, please refer to Intel(R) Architecture
+Instruction Set Extensions Programming Reference, Chapter 9:
+Intel(R) Memory Protection Extensions.
+
+Note: Currently no hardware with MPX ISA is available but it is always
+possible to use SDE (Intel(R) Software Development Emulator) instead,
+which can be downloaded from
+http://software.intel.com/en-us/articles/intel-software-development-emulator
+
+
+2. How does MPX kernel code work
+================================
+
+Handling #BR faults caused by MPX
+---------------------------------
+
+When MPX is enabled, there are 2 new situations that can generate
+#BR faults.
+ * bounds violation caused by MPX instructions.
+ * new bounds tables (BT) need to be allocated to save bounds.
+
+We hook #BR handler to handle these two new situations.
+
+Decoding MPX instructions
+-------------------------
+
+If a #BR is generated due to a bounds violation caused by MPX.
+We need to decode MPX instructions to get violation address and
+set this address into extended struct siginfo.
+
+The _sigfault feild of struct siginfo is extended as follow:
+
+87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+88 struct {
+89 void __user *_addr; /* faulting insn/memory ref. */
+90 #ifdef __ARCH_SI_TRAPNO
+91 int _trapno; /* TRAP # which caused the signal */
+92 #endif
+93 short _addr_lsb; /* LSB of the reported address */
+94 struct {
+95 void __user *_lower;
+96 void __user *_upper;
+97 } _addr_bnd;
+98 } _sigfault;
+
+The '_addr' field refers to violation address, and new '_addr_and'
+field refers to the upper/lower bounds when a #BR is caused.
+
+Glibc will be also updated to support this new siginfo. So user
+can get violation address and bounds when bounds violations occur.
+
+Freeing unused bounds tables
+----------------------------
+
+When a BNDSTX instruction attempts to save bounds to a bounds directory
+entry marked as invalid, a #BR is generated. This is an indication that
+no bounds table exists for this entry. In this case the fault handler
+will allocate a new bounds table on demand.
+
+Since the kernel allocated those tables on-demand without userspace
+knowledge, it is also responsible for freeing them when the associated
+mappings go away.
+
+Here, the solution for this issue is to hook do_munmap() to check
+whether one process is MPX enabled. If yes, those bounds tables covered
+in the virtual address region which is being unmapped will be freed also.
+
+Adding new prctl commands
+-------------------------
+
+Runtime library in userspace is responsible for allocation of bounds
+directory. So kernel have to use XSAVE instruction to get the base
+of bounds directory from BNDCFG register.
+
+But XSAVE is expected to be very expensive. In order to do performance
+optimization, we have to add new prctl command to get the base of
+bounds directory to be used in future.
+
+Two new prctl commands are added to register and unregister MPX related
+resource.
+
+155 #define PR_MPX_REGISTER 41
+156 #define PR_MPX_UNREGISTER 42
+
+The base of the bounds directory is set into mm_struct during
+PR_MPX_REGISTER command execution. This member can be used to
+check whether one application is mpx enabled.
+
+
+3. Tips
+=======
+
+1) Users are not allowed to create bounds tables and point the bounds
+directory at them in the userspace. In fact, it is not also necessary
+for users to create bounds tables in the userspace.
+
+When #BR fault is produced due to invalid entry, bounds table will be
+created in kernel on demand and kernel will not transfer this fault to
+userspace. So usersapce can't receive #BR fault for invalid entry, and
+it is not also necessary for users to create bounds tables by themselves.
+
+Certainly users can allocate bounds tables and forcibly point the bounds
+directory at them through XSAVE instruction, and then set valid bit
+of bounds entry to have this entry valid. But we have no way to track
+the memory usage of these user-created bounds tables. In regard to this,
+this behaviour is outlawed here.
+
+2) We will not support the case that multiple bounds directory entries
+are pointed at the same bounds table.
+
+Users can be allowed to take multiple bounds directory entries and point
+them at the same bounds table. See more information "Intel(R) Architecture
+Instruction Set Extensions Programming Reference" (9.3.4).
+
+If userspace did this, it will be possible for kernel to unmap an in-use
+bounds table since it does not recognize sharing. So this behavior is
+also outlawed here.
--
1.7.1

2014-07-21 05:42:37

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 05/10] x86, mpx: extend siginfo structure to include bound violation information

This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.

Signed-off-by: Qiaowei Ren <[email protected]>
---
include/uapi/asm-generic/siginfo.h | 9 ++++++++-
kernel/signal.c | 4 ++++
2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index ba5be7f..1e35520 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,10 @@ typedef struct siginfo {
int _trapno; /* TRAP # which caused the signal */
#endif
short _addr_lsb; /* LSB of the reported address */
+ struct {
+ void __user *_lower;
+ void __user *_upper;
+ } _addr_bnd;
} _sigfault;

/* SIGPOLL */
@@ -131,6 +135,8 @@ typedef struct siginfo {
#define si_trapno _sifields._sigfault._trapno
#endif
#define si_addr_lsb _sifields._sigfault._addr_lsb
+#define si_lower _sifields._sigfault._addr_bnd._lower
+#define si_upper _sifields._sigfault._addr_bnd._upper
#define si_band _sifields._sigpoll._band
#define si_fd _sifields._sigpoll._fd
#ifdef __ARCH_SIGSYS
@@ -199,7 +205,8 @@ typedef struct siginfo {
*/
#define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object */
#define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped object */
-#define NSIGSEGV 2
+#define SEGV_BNDERR (__SI_FAULT|3) /* failed address bound checks */
+#define NSIGSEGV 3

/*
* SIGBUS si_codes
diff --git a/kernel/signal.c b/kernel/signal.c
index a4077e9..2131636 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from)
if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
#endif
+#ifdef SEGV_BNDERR
+ err |= __put_user(from->si_lower, &to->si_lower);
+ err |= __put_user(from->si_upper, &to->si_upper);
+#endif
break;
case __SI_CHLD:
err |= __put_user(from->si_pid, &to->si_pid);
--
1.7.1

2014-07-21 05:42:47

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
commands. These commands can be used to register and unregister MPX
related resource on the x86 platform.

The base of the bounds directory is set into mm_struct during
PR_MPX_REGISTER command execution. This member can be used to
check whether one application is mpx enabled.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 1 +
arch/x86/include/asm/processor.h | 18 ++++++++++++
arch/x86/kernel/mpx.c | 56 ++++++++++++++++++++++++++++++++++++++
include/linux/mm_types.h | 3 ++
include/uapi/linux/prctl.h | 6 ++++
kernel/sys.c | 12 ++++++++
6 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 780af63..6cb0853 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -43,6 +43,7 @@
#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))

#define MPX_BNDSTA_ERROR_CODE 0x3
+#define MPX_BNDCFG_ENABLE_FLAG 0x1
#define MPX_BD_ENTRY_VALID_FLAG 0x1

struct mpx_insn {
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6e0966e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -952,6 +952,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
extern int get_tsc_mode(unsigned long adr);
extern int set_tsc_mode(unsigned int val);

+/* Register/unregister a process' MPX related resource */
+#define MPX_REGISTER(tsk) mpx_register((tsk))
+#define MPX_UNREGISTER(tsk) mpx_unregister((tsk))
+
+#ifdef CONFIG_X86_INTEL_MPX
+extern int mpx_register(struct task_struct *tsk);
+extern int mpx_unregister(struct task_struct *tsk);
+#else
+static inline int mpx_register(struct task_struct *tsk)
+{
+ return -EINVAL;
+}
+static inline int mpx_unregister(struct task_struct *tsk)
+{
+ return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_MPX */
+
extern u16 amd_get_nb_id(int cpu);

static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index c1957a8..6b7e526 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -1,6 +1,62 @@
#include <linux/kernel.h>
#include <linux/syscalls.h>
+#include <linux/prctl.h>
#include <asm/mpx.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
+
+/*
+ * This should only be called when cpuid has been checked
+ * and we are sure that MPX is available.
+ */
+static __user void *task_get_bounds_dir(struct task_struct *tsk)
+{
+ struct xsave_struct *xsave_buf;
+
+ fpu_xsave(&tsk->thread.fpu);
+ xsave_buf = &(tsk->thread.fpu.state->xsave);
+ if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
+ return NULL;
+
+ return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
+ MPX_BNDCFG_ADDR_MASK);
+}
+
+int mpx_register(struct task_struct *tsk)
+{
+ struct mm_struct *mm = tsk->mm;
+
+ if (!cpu_has_mpx)
+ return -EINVAL;
+
+ /*
+ * runtime in the userspace will be responsible for allocation of
+ * the bounds directory. Then, it will save the base of the bounds
+ * directory into XSAVE/XRSTOR Save Area and enable MPX through
+ * XRSTOR instruction.
+ *
+ * fpu_xsave() is expected to be very expensive. In order to do
+ * performance optimization, here we get the base of the bounds
+ * directory and then save it into mm_struct to be used in future.
+ */
+ mm->bd_addr = task_get_bounds_dir(tsk);
+ if (!mm->bd_addr)
+ return -EINVAL;
+
+ pr_debug("MPX BD base address %p\n", mm->bd_addr);
+ return 0;
+}
+
+int mpx_unregister(struct task_struct *tsk)
+{
+ struct mm_struct *mm = current->mm;
+
+ if (!cpu_has_mpx)
+ return -EINVAL;
+
+ mm->bd_addr = NULL;
+ return 0;
+}

enum reg_type {
REG_TYPE_RM = 0,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 96c5750..131b5b3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -454,6 +454,9 @@ struct mm_struct {
bool tlb_flush_pending;
#endif
struct uprobes_state uprobes_state;
+#ifdef CONFIG_X86_INTEL_MPX
+ void __user *bd_addr; /* address of the bounds directory */
+#endif
};

static inline void mm_init_cpumask(struct mm_struct *mm)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 58afc04..ce86fa9 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -152,4 +152,10 @@
#define PR_SET_THP_DISABLE 41
#define PR_GET_THP_DISABLE 42

+/*
+ * Register/unregister MPX related resource.
+ */
+#define PR_MPX_REGISTER 43
+#define PR_MPX_UNREGISTER 44
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 66a751e..eadff9c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -91,6 +91,12 @@
#ifndef SET_TSC_CTL
# define SET_TSC_CTL(a) (-EINVAL)
#endif
+#ifndef MPX_REGISTER
+# define MPX_REGISTER(a) (-EINVAL)
+#endif
+#ifndef MPX_UNREGISTER
+# define MPX_UNREGISTER(a) (-EINVAL)
+#endif

/*
* this is where the system-wide overflow UID and GID are defined, for
@@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
me->mm->def_flags &= ~VM_NOHUGEPAGE;
up_write(&me->mm->mmap_sem);
break;
+ case PR_MPX_REGISTER:
+ error = MPX_REGISTER(me);
+ break;
+ case PR_MPX_UNREGISTER:
+ error = MPX_UNREGISTER(me);
+ break;
default:
error = -EINVAL;
break;
--
1.7.1

2014-07-21 05:43:17

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables

Since the kernel allocated those tables on-demand without userspace
knowledge, it is also responsible for freeing them when the associated
mappings go away.

Here, the solution for this issue is to hook do_munmap() to check
whether one process is MPX enabled. If yes, those bounds tables covered
in the virtual address region which is being unmapped will be freed also.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mmu_context.h | 16 +++
arch/x86/include/asm/mpx.h | 9 ++
arch/x86/mm/mpx.c | 181 ++++++++++++++++++++++++++++++++++++
include/asm-generic/mmu_context.h | 6 +
mm/mmap.c | 2 +
5 files changed, 214 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index be12c53..af70d4f 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -6,6 +6,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
+#include <asm/mpx.h>
#ifndef CONFIG_PARAVIRT
#include <asm-generic/mm_hooks.h>

@@ -96,4 +97,19 @@ do { \
} while (0)
#endif

+static inline void arch_unmap(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end)
+{
+#ifdef CONFIG_X86_INTEL_MPX
+ /*
+ * Check whether this vma comes from MPX-enabled application.
+ * If so, release this vma related bound tables.
+ */
+ if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
+ mpx_unmap(mm, start, end);
+
+#endif
+}
+
#endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 6cb0853..e848a74 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -42,6 +42,13 @@
#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))

+#define MPX_BD_ENTRY_MASK ((1<<MPX_BD_ENTRY_OFFSET)-1)
+#define MPX_BT_ENTRY_MASK ((1<<MPX_BT_ENTRY_OFFSET)-1)
+#define MPX_GET_BD_ENTRY_OFFSET(addr) ((((addr)>>(MPX_BT_ENTRY_OFFSET+ \
+ MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT)
+#define MPX_GET_BT_ENTRY_OFFSET(addr) ((((addr)>>MPX_IGN_BITS) & \
+ MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT)
+
#define MPX_BNDSTA_ERROR_CODE 0x3
#define MPX_BNDCFG_ENABLE_FLAG 0x1
#define MPX_BD_ENTRY_VALID_FLAG 0x1
@@ -63,6 +70,8 @@ struct mpx_insn {
#define MAX_MPX_INSN_SIZE 15

unsigned long mpx_mmap(unsigned long len);
+void mpx_unmap(struct mm_struct *mm,
+ unsigned long start, unsigned long end);

#ifdef CONFIG_X86_INTEL_MPX
int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index e1b28e6..d29ec9c 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -2,6 +2,7 @@
#include <linux/syscalls.h>
#include <asm/mpx.h>
#include <asm/mman.h>
+#include <asm/mmu_context.h>
#include <linux/sched/sysctl.h>

static const char *mpx_mapping_name(struct vm_area_struct *vma)
@@ -77,3 +78,183 @@ out:
up_write(&mm->mmap_sem);
return ret;
}
+
+/*
+ * Get the base of bounds tables pointed by specific bounds
+ * directory entry.
+ */
+static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr,
+ unsigned int *valid)
+{
+ if (get_user(*bt_addr, bd_entry))
+ return -EFAULT;
+
+ *valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
+ *bt_addr &= MPX_BT_ADDR_MASK;
+
+ /*
+ * If this bounds directory entry is nonzero, and meanwhile
+ * the valid bit is zero, one SIGSEGV will be produced due to
+ * this unexpected situation.
+ */
+ if (!(*valid) && *bt_addr)
+ force_sig(SIGSEGV, current);
+
+ return 0;
+}
+
+/*
+ * Free the backing physical pages of bounds table 'bt_addr'.
+ * Assume start...end is within that bounds table.
+ */
+static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr,
+ unsigned long start, unsigned long end)
+{
+ struct vm_area_struct *vma;
+
+ /* Find the vma which overlaps this bounds table */
+ vma = find_vma(mm, bt_addr);
+ if (!vma || vma->vm_start > bt_addr ||
+ vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
+ return;
+
+ zap_page_range(vma, start, end, NULL);
+}
+
+static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry,
+ unsigned long bt_addr)
+{
+ if (user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry,
+ bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0))
+ return;
+
+ /*
+ * to avoid recursion, do_munmap() will check whether it comes
+ * from one bounds table through VM_MPX flag.
+ */
+ do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+}
+
+/*
+ * If the bounds table pointed by bounds directory 'bd_entry' is
+ * not shared, unmap this whole bounds table. Otherwise, only free
+ * those backing physical pages of bounds table entries covered
+ * in this virtual address region start...end.
+ */
+static void unmap_shared_bt(struct mm_struct *mm, long __user *bd_entry,
+ unsigned long start, unsigned long end,
+ bool prev_shared, bool next_shared)
+{
+ unsigned long bt_addr;
+ unsigned int bde_valid = 0;
+
+ if (get_bt_addr(bd_entry, &bt_addr, &bde_valid) || !bde_valid)
+ return;
+
+ if (prev_shared && next_shared)
+ zap_bt_entries(mm, bt_addr,
+ bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
+ bt_addr+MPX_GET_BT_ENTRY_OFFSET(end-1));
+ else if (prev_shared)
+ zap_bt_entries(mm, bt_addr,
+ bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
+ bt_addr+MPX_BT_SIZE_BYTES);
+ else if (next_shared)
+ zap_bt_entries(mm, bt_addr, bt_addr,
+ bt_addr+MPX_GET_BT_ENTRY_OFFSET(end-1));
+ else
+ unmap_single_bt(mm, bd_entry, bt_addr);
+}
+
+/*
+ * A virtual address region being munmap()ed might share bounds table
+ * with adjacent VMAs. We only need to free the backing physical
+ * memory of these shared bounds tables entries covered in this virtual
+ * address region.
+ *
+ * the VMAs covering the virtual address region start...end have already
+ * been split if necessary and removed from the VMA list.
+ */
+static void unmap_side_bts(struct mm_struct *mm, unsigned long start,
+ unsigned long end)
+{
+ long __user *bde_start, *bde_end;
+ struct vm_area_struct *prev, *next;
+ bool prev_shared = false, next_shared = false;
+
+ bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
+ bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+
+ /*
+ * Check whether bde_start and bde_end are shared with adjacent
+ * VMAs. Because the VMAs covering the virtual address region
+ * start...end have already been removed from the VMA list, if
+ * next is not NULL it will satisfy start < end <= next->vm_start.
+ * And if prev is not NULL, prev->vm_end <= start < end.
+ */
+ next = find_vma_prev(mm, start, &prev);
+ if (prev && MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1) == (long)bde_start)
+ prev_shared = true;
+ if (next && MPX_GET_BD_ENTRY_OFFSET(next->vm_start) == (long)bde_end)
+ next_shared = true;
+
+ /*
+ * This virtual address region being munmap()ed is only
+ * covered by one bounds table.
+ *
+ * In this case, if this table is also shared with adjacent
+ * VMAs, only part of the backing physical memory of the bounds
+ * table need be freeed. Otherwise the whole bounds table need
+ * be unmapped.
+ */
+ if (bde_start == bde_end) {
+ unmap_shared_bt(mm, bde_start, start, end,
+ prev_shared, next_shared);
+ return;
+ }
+
+ /*
+ * If more than one bounds tables are covered in this virtual
+ * address region being munmap()ed, we need to separately check
+ * whether bde_start and bde_end are shared with adjacent VMAs.
+ */
+ unmap_shared_bt(mm, bde_start, start, end, prev_shared, false);
+ unmap_shared_bt(mm, bde_end, start, end, false, next_shared);
+}
+
+/*
+ * Free unused bounds tables covered in a virtual address region being
+ * munmap()ed. Assume end > start.
+ *
+ * This function will be called by do_munmap(), and the VMAs covering
+ * the virtual address region start...end have already been split if
+ * necessary and remvoed from the VMA list.
+ */
+void mpx_unmap(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+ long __user *bd_entry, *bde_start, *bde_end;
+ unsigned long bt_addr;
+ unsigned int bde_valid;
+
+ /*
+ * unmap bounds tables pointed out by start/end bounds directory
+ * entries, or only free part of their backing physical memroy
+ * if they are shared with adjacent VMAs.
+ */
+ unmap_side_bts(mm, start, end);
+
+ /*
+ * unmap those bounds table which are entirely covered in this
+ * virtual address region.
+ */
+ bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
+ bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+ for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
+ if (get_bt_addr(bd_entry, &bt_addr, &bde_valid))
+ return;
+ if (!bde_valid)
+ continue;
+ unmap_single_bt(mm, bd_entry, bt_addr);
+ }
+}
diff --git a/include/asm-generic/mmu_context.h b/include/asm-generic/mmu_context.h
index a7eec91..ac558ca 100644
--- a/include/asm-generic/mmu_context.h
+++ b/include/asm-generic/mmu_context.h
@@ -42,4 +42,10 @@ static inline void activate_mm(struct mm_struct *prev_mm,
{
}

+static inline void arch_unmap(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end)
+{
+}
+
#endif /* __ASM_GENERIC_MMU_CONTEXT_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index 129b847..8550d84 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2560,6 +2560,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
/* Fix up all other VM information */
remove_vma_list(mm, vma);

+ arch_unmap(mm, vma, start, end);
+
return 0;
}

--
1.7.1

2014-07-21 05:43:59

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 04/10] x86, mpx: hook #BR exception handler to allocate bound tables

This patch handles a #BR exception for non-existent tables by
carving the space out of the normal processes address space
(essentially calling mmap() from inside the kernel) and then
pointing the bounds-directory over to it.

The tables need to be accessed and controlled by userspace
because the compiler generates instructions for MPX-enabled
code which frequently store and retrieve entries from the bounds
tables. Any direct kernel involvement (like a syscall) to access
the tables would destroy performance since these are so frequent.

The tables are carved out of userspace because we have no better
spot to put them. For each pointer which is being tracked by MPX,
the bounds tables contain 4 longs worth of data, and the tables
are indexed virtually. If we were to preallocate the tables, we
would theoretically need to allocate 4x the virtual space that
we have available for userspace somewhere else. We don't have
that room in the kernel address space.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/include/asm/mpx.h | 20 ++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/mpx.c | 60 ++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 55 +++++++++++++++++++++++++++++++++++++++-
4 files changed, 135 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/kernel/mpx.c

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 5725ac4..b7598ac 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -18,6 +18,8 @@
#define MPX_BT_ENTRY_SHIFT 5
#define MPX_IGN_BITS 3

+#define MPX_BD_ENTRY_TAIL 3
+
#else

#define MPX_BD_ENTRY_OFFSET 20
@@ -26,13 +28,31 @@
#define MPX_BT_ENTRY_SHIFT 4
#define MPX_IGN_BITS 2

+#define MPX_BD_ENTRY_TAIL 2
+
#endif

+#define MPX_BNDSTA_TAIL 2
+#define MPX_BNDCFG_TAIL 12
+#define MPX_BNDSTA_ADDR_MASK (~((1UL<<MPX_BNDSTA_TAIL)-1))
+#define MPX_BNDCFG_ADDR_MASK (~((1UL<<MPX_BNDCFG_TAIL)-1))
+#define MPX_BT_ADDR_MASK (~((1UL<<MPX_BD_ENTRY_TAIL)-1))
+
#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))

#define MPX_BNDSTA_ERROR_CODE 0x3
+#define MPX_BD_ENTRY_VALID_FLAG 0x1

unsigned long mpx_mmap(unsigned long len);

+#ifdef CONFIG_X86_INTEL_MPX
+int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+#else
+static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+ return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_MPX */
+
#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 047f9ff..5e81e16 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_PREEMPT) += preempt.o

obj-y += process.o
obj-y += i387.o xsave.o
+obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-y += ptrace.o
obj-$(CONFIG_X86_32) += tls.o
obj-$(CONFIG_IA32_EMULATION) += tls.o
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
new file mode 100644
index 0000000..f02dcea
--- /dev/null
+++ b/arch/x86/kernel/mpx.c
@@ -0,0 +1,60 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/mpx.h>
+
+static int allocate_bt(long __user *bd_entry)
+{
+ unsigned long bt_addr, old_val = 0;
+ int ret = 0;
+
+ bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
+ if (IS_ERR((void *)bt_addr))
+ return bt_addr;
+ bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG;
+
+ ret = user_atomic_cmpxchg_inatomic(&old_val, bd_entry, 0, bt_addr);
+ if (ret)
+ goto out;
+
+ /*
+ * there is a existing bounds table pointed at this bounds
+ * directory entry, and so we need to free the bounds table
+ * allocated just now.
+ */
+ if (old_val)
+ goto out;
+
+ pr_debug("Allocate bounds table %lx at entry %p\n",
+ bt_addr, bd_entry);
+ return 0;
+
+out:
+ vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+ return ret;
+}
+
+/*
+ * When a BNDSTX instruction attempts to save bounds to a BD entry
+ * with the lack of the valid bit being set, a #BR is generated.
+ * This is an indication that no BT exists for this entry. In this
+ * case the fault handler will allocate a new BT.
+ *
+ * With 32-bit mode, the size of BD is 4MB, and the size of each
+ * bound table is 16KB. With 64-bit mode, the size of BD is 2GB,
+ * and the size of each bound table is 4MB.
+ */
+int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+ unsigned long status;
+ unsigned long bd_entry, bd_base;
+
+ bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+ status = xsave_buf->bndcsr.status_reg;
+
+ bd_entry = status & MPX_BNDSTA_ADDR_MASK;
+ if ((bd_entry < bd_base) ||
+ (bd_entry >= bd_base + MPX_BD_SIZE_BYTES))
+ return -EINVAL;
+
+ return allocate_bt((long __user *)bd_entry);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0d0e922..396a88b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -60,6 +60,7 @@
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
#include <asm/alternative.h>
+#include <asm/mpx.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -228,7 +229,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \

DO_ERROR(X86_TRAP_DE, SIGFPE, "divide error", divide_error)
DO_ERROR(X86_TRAP_OF, SIGSEGV, "overflow", overflow)
-DO_ERROR(X86_TRAP_BR, SIGSEGV, "bounds", bounds)
DO_ERROR(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op)
DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun",coprocessor_segment_overrun)
DO_ERROR(X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS)
@@ -278,6 +278,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
}
#endif

+dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
+{
+ enum ctx_state prev_state;
+ unsigned long status;
+ struct xsave_struct *xsave_buf;
+ struct task_struct *tsk = current;
+
+ prev_state = exception_enter();
+ if (notify_die(DIE_TRAP, "bounds", regs, error_code,
+ X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
+ goto exit;
+ conditional_sti(regs);
+
+ if (!user_mode(regs))
+ die("bounds", regs, error_code);
+
+ if (!cpu_has_mpx) {
+ /* The exception is not from Intel MPX */
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ goto exit;
+ }
+
+ fpu_xsave(&tsk->thread.fpu);
+ xsave_buf = &(tsk->thread.fpu.state->xsave);
+ status = xsave_buf->bndcsr.status_reg;
+
+ /*
+ * The error code field of the BNDSTATUS register communicates status
+ * information of a bound range exception #BR or operation involving
+ * bound directory.
+ */
+ switch (status & MPX_BNDSTA_ERROR_CODE) {
+ case 2:
+ /*
+ * Bound directory has invalid entry.
+ */
+ if (do_mpx_bt_fault(xsave_buf))
+ force_sig(SIGSEGV, tsk);
+ break;
+
+ case 1: /* Bound violation. */
+ case 0: /* No exception caused by Intel MPX operations. */
+ do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+ break;
+
+ default:
+ die("bounds", regs, error_code);
+ }
+
+exit:
+ exception_exit(prev_state);
+}
+
dotraplinkage void
do_general_protection(struct pt_regs *regs, long error_code)
{
--
1.7.1

2014-07-21 05:44:33

by Ren Qiaowei

[permalink] [raw]
Subject: [PATCH v7 02/10] x86, mpx: add MPX specific mmap interface

This patch adds one MPX specific mmap interface, which only handles
mpx related maps, including bounds table and bounds directory.

In order to track MPX specific memory usage, this interface is added
to stick new vm_flag VM_MPX in the vma_area_struct when create a
bounds table or bounds directory.

These bounds tables can take huge amounts of memory. In the
worst-case scenario, the tables can be 4x the size of the data
structure being tracked. IOW, a 1-page structure can require 4
bounds-table pages.

My expectation is that folks using MPX are going to be keen on
figuring out how much memory is being dedicated to it. With this
feature, plus some grepping in /proc/$pid/smaps one could take a
pretty good stab at it.

Signed-off-by: Qiaowei Ren <[email protected]>
---
arch/x86/Kconfig | 4 ++
arch/x86/include/asm/mpx.h | 38 +++++++++++++++++++++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/mpx.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 123 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/mpx.h
create mode 100644 arch/x86/mm/mpx.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..020db35 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -238,6 +238,10 @@ config HAVE_INTEL_TXT
def_bool y
depends on INTEL_IOMMU && ACPI

+config X86_INTEL_MPX
+ def_bool y
+ depends on CPU_SUP_INTEL
+
config X86_32_SMP
def_bool y
depends on X86_32 && SMP
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
new file mode 100644
index 0000000..5725ac4
--- /dev/null
+++ b/arch/x86/include/asm/mpx.h
@@ -0,0 +1,38 @@
+#ifndef _ASM_X86_MPX_H
+#define _ASM_X86_MPX_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_64
+
+/* upper 28 bits [47:20] of the virtual address in 64-bit used to
+ * index into bounds directory (BD).
+ */
+#define MPX_BD_ENTRY_OFFSET 28
+#define MPX_BD_ENTRY_SHIFT 3
+/* bits [19:3] of the virtual address in 64-bit used to index into
+ * bounds table (BT).
+ */
+#define MPX_BT_ENTRY_OFFSET 17
+#define MPX_BT_ENTRY_SHIFT 5
+#define MPX_IGN_BITS 3
+
+#else
+
+#define MPX_BD_ENTRY_OFFSET 20
+#define MPX_BD_ENTRY_SHIFT 2
+#define MPX_BT_ENTRY_OFFSET 10
+#define MPX_BT_ENTRY_SHIFT 4
+#define MPX_IGN_BITS 2
+
+#endif
+
+#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
+#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
+
+#define MPX_BNDSTA_ERROR_CODE 0x3
+
+unsigned long mpx_mmap(unsigned long len);
+
+#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 6a19ad9..ecfdc46 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o
obj-$(CONFIG_NUMA_EMU) += numa_emulation.o

obj-$(CONFIG_MEMTEST) += memtest.o
+
+obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
new file mode 100644
index 0000000..e1b28e6
--- /dev/null
+++ b/arch/x86/mm/mpx.c
@@ -0,0 +1,79 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/mpx.h>
+#include <asm/mman.h>
+#include <linux/sched/sysctl.h>
+
+static const char *mpx_mapping_name(struct vm_area_struct *vma)
+{
+ return "[mpx]";
+}
+
+static struct vm_operations_struct mpx_vma_ops = {
+ .name = mpx_mapping_name,
+};
+
+/*
+ * this is really a simplified "vm_mmap". it only handles mpx
+ * related maps, including bounds table and bounds directory.
+ *
+ * here we can stick new vm_flag VM_MPX in the vma_area_struct
+ * when create a bounds table or bounds directory, in order to
+ * track MPX specific memory.
+ */
+unsigned long mpx_mmap(unsigned long len)
+{
+ unsigned long ret;
+ unsigned long addr, pgoff;
+ struct mm_struct *mm = current->mm;
+ vm_flags_t vm_flags;
+ struct vm_area_struct *vma;
+
+ /* Only bounds table and bounds directory can be allocated here */
+ if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES)
+ return -EINVAL;
+
+ down_write(&mm->mmap_sem);
+
+ /* Too many mappings? */
+ if (mm->map_count > sysctl_max_map_count) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ /* Obtain the address to map to. we verify (or select) it and ensure
+ * that it represents a valid section of the address space.
+ */
+ addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE);
+ if (addr & ~PAGE_MASK) {
+ ret = addr;
+ goto out;
+ }
+
+ vm_flags = VM_READ | VM_WRITE | VM_MPX |
+ mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+
+ /* Set pgoff according to addr for anon_vma */
+ pgoff = addr >> PAGE_SHIFT;
+
+ ret = mmap_region(NULL, addr, len, vm_flags, pgoff);
+ if (IS_ERR_VALUE(ret))
+ goto out;
+
+ vma = find_vma(mm, ret);
+ if (!vma) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ vma->vm_ops = &mpx_vma_ops;
+
+ if (vm_flags & VM_LOCKED) {
+ up_write(&mm->mmap_sem);
+ mm_populate(ret, len);
+ return ret;
+ }
+
+out:
+ up_write(&mm->mmap_sem);
+ return ret;
+}
--
1.7.1

2014-07-21 06:08:02

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information

Qiaowei Ren <[email protected]> writes:
> + */
> +#ifdef CONFIG_X86_64
> + insn->x86_64 = 1;
> + insn->addr_bytes = 8;
> +#else
> + insn->x86_64 = 0;
> + insn->addr_bytes = 4;
> +#endif

How would that handle compat mode on a 64bit kernel?
Should likely look at the code segment instead of ifdef.
> + /* Note: the upper 32 bits are ignored in 32-bit mode. */

Again correct for compat mode? I believe the upper bits
are undefined.


-Andi
--
[email protected] -- Speaking for myself only

2014-07-21 06:10:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

Qiaowei Ren <[email protected]> writes:

> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> commands. These commands can be used to register and unregister MPX
> related resource on the x86 platform.

Please provide a manpage for the API. This is needed
for proper review. Your description is far too vague.

-Andi

--
[email protected] -- Speaking for myself only

2014-07-21 06:11:34

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 07/10] x86, mpx: decode MPX instruction to get bound violation information



On 2014-07-21, Andi Kleen wrote:
> Qiaowei Ren <[email protected]> writes:
>> + */
>> +#ifdef CONFIG_X86_64
>> + insn->x86_64 = 1;
>> + insn->addr_bytes = 8;
>> +#else
>> + insn->x86_64 = 0;
>> + insn->addr_bytes = 4;
>> +#endif
>
> How would that handle compat mode on a 64bit kernel?
> Should likely look at the code segment instead of ifdef.
>> + /* Note: the upper 32 bits are ignored in 32-bit mode. */
>
> Again correct for compat mode? I believe the upper bits are undefined.
>
Compat mode will be supported on next patch in future, as 0/ patch mentioned. ^-^

Thanks,
Qiaowei

2014-07-22 00:42:59

by Zhang, Tianfei

[permalink] [raw]
Subject: RE: [PATCH v7 05/10] x86, mpx: extend siginfo structure to include bound violation information



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Qiaowei Ren
> Sent: Monday, July 21, 2014 1:39 PM
> To: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; Hansen, Dave
> Cc: [email protected]; [email protected]; [email protected]; Ren,
> Qiaowei
> Subject: [PATCH v7 05/10] x86, mpx: extend siginfo structure to include bound
> violation information
>
> This patch adds new fields about bound violation into siginfo structure.
> si_lower and si_upper are respectively lower bound and upper bound when
> bound violation is caused.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> include/uapi/asm-generic/siginfo.h | 9 ++++++++-
> kernel/signal.c | 4 ++++
> 2 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/include/uapi/asm-generic/siginfo.h
> b/include/uapi/asm-generic/siginfo.h
> index ba5be7f..1e35520 100644
> --- a/include/uapi/asm-generic/siginfo.h
> +++ b/include/uapi/asm-generic/siginfo.h
> @@ -91,6 +91,10 @@ typedef struct siginfo {
> int _trapno; /* TRAP # which caused the signal */
> #endif
> short _addr_lsb; /* LSB of the reported address */
> + struct {
> + void __user *_lower;
> + void __user *_upper;
> + } _addr_bnd;
> } _sigfault;
>
> /* SIGPOLL */
> @@ -131,6 +135,8 @@ typedef struct siginfo {
> #define si_trapno _sifields._sigfault._trapno
> #endif
> #define si_addr_lsb _sifields._sigfault._addr_lsb
> +#define si_lower _sifields._sigfault._addr_bnd._lower
> +#define si_upper _sifields._sigfault._addr_bnd._upper
> #define si_band _sifields._sigpoll._band
> #define si_fd _sifields._sigpoll._fd
> #ifdef __ARCH_SIGSYS
> @@ -199,7 +205,8 @@ typedef struct siginfo {
> */
> #define SEGV_MAPERR (__SI_FAULT|1) /* address not mapped to object
> */
> #define SEGV_ACCERR (__SI_FAULT|2) /* invalid permissions for mapped
> object */
> -#define NSIGSEGV 2
> +#define SEGV_BNDERR (__SI_FAULT|3) /* failed address bound checks */
> +#define NSIGSEGV 3
>
> /*
> * SIGBUS si_codes
> diff --git a/kernel/signal.c b/kernel/signal.c index a4077e9..2131636 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to,
> const siginfo_t *from)
> if (from->si_code == BUS_MCEERR_AR || from->si_code ==
> BUS_MCEERR_AO)
> err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb); #endif
> +#ifdef SEGV_BNDERR
> + err |= __put_user(from->si_lower, &to->si_lower);
> + err |= __put_user(from->si_upper, &to->si_upper); #endif

"#endif" should be in a new line.

> break;
> case __SI_CHLD:
> err |= __put_user(from->si_pid, &to->si_pid);
> --
> 1.7.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to
> [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2014-07-22 00:50:26

by Zhang, Tianfei

[permalink] [raw]
Subject: RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Qiaowei Ren
> Sent: Monday, July 21, 2014 1:39 PM
> To: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; Hansen, Dave
> Cc: [email protected]; [email protected]; [email protected]; Ren,
> Qiaowei
> Subject: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables
>
> Since the kernel allocated those tables on-demand without userspace
> knowledge, it is also responsible for freeing them when the associated
> mappings go away.
>
> Here, the solution for this issue is to hook do_munmap() to check whether one
> process is MPX enabled. If yes, those bounds tables covered in the virtual
> address region which is being unmapped will be freed also.
>
> Signed-off-by: Qiaowei Ren <[email protected]>
> ---
> arch/x86/include/asm/mmu_context.h | 16 +++
> arch/x86/include/asm/mpx.h | 9 ++
> arch/x86/mm/mpx.c | 181
> ++++++++++++++++++++++++++++++++++++
> include/asm-generic/mmu_context.h | 6 +
> mm/mmap.c | 2 +
> 5 files changed, 214 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu_context.h
> b/arch/x86/include/asm/mmu_context.h
> index be12c53..af70d4f 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -6,6 +6,7 @@
> #include <asm/pgalloc.h>
> #include <asm/tlbflush.h>
> #include <asm/paravirt.h>
> +#include <asm/mpx.h>
> #ifndef CONFIG_PARAVIRT
> #include <asm-generic/mm_hooks.h>
>
> @@ -96,4 +97,19 @@ do { \
> } while (0)
> #endif
>
> +static inline void arch_unmap(struct mm_struct *mm,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end) { #ifdef
> CONFIG_X86_INTEL_MPX

"#indef" new line

> + /*
> + * Check whether this vma comes from MPX-enabled application.
> + * If so, release this vma related bound tables.
> + */
> + if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> + mpx_unmap(mm, start, end);
> +
> +#endif
> +}
> +
> #endif /* _ASM_X86_MMU_CONTEXT_H */
> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index
> 6cb0853..e848a74 100644
> --- a/arch/x86/include/asm/mpx.h
> +++ b/arch/x86/include/asm/mpx.h
> @@ -42,6 +42,13 @@
> #define MPX_BD_SIZE_BYTES
> (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
> #define MPX_BT_SIZE_BYTES
> (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
>
> +#define MPX_BD_ENTRY_MASK ((1<<MPX_BD_ENTRY_OFFSET)-1)
> +#define MPX_BT_ENTRY_MASK ((1<<MPX_BT_ENTRY_OFFSET)-1)
> +#define MPX_GET_BD_ENTRY_OFFSET(addr)
> ((((addr)>>(MPX_BT_ENTRY_OFFSET+ \
> + MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) <<
> MPX_BD_ENTRY_SHIFT)
> +#define MPX_GET_BT_ENTRY_OFFSET(addr) ((((addr)>>MPX_IGN_BITS) & \
> + MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT)
> +
> #define MPX_BNDSTA_ERROR_CODE 0x3
> #define MPX_BNDCFG_ENABLE_FLAG 0x1
> #define MPX_BD_ENTRY_VALID_FLAG 0x1
> @@ -63,6 +70,8 @@ struct mpx_insn {
> #define MAX_MPX_INSN_SIZE 15
>
> unsigned long mpx_mmap(unsigned long len);
> +void mpx_unmap(struct mm_struct *mm,
> + unsigned long start, unsigned long end);
>
> #ifdef CONFIG_X86_INTEL_MPX
> int do_mpx_bt_fault(struct xsave_struct *xsave_buf); diff --git
> a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index e1b28e6..d29ec9c 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -2,6 +2,7 @@
> #include <linux/syscalls.h>
> #include <asm/mpx.h>
> #include <asm/mman.h>
> +#include <asm/mmu_context.h>
> #include <linux/sched/sysctl.h>
>
> static const char *mpx_mapping_name(struct vm_area_struct *vma) @@
> -77,3 +78,183 @@ out:
> up_write(&mm->mmap_sem);
> return ret;
> }
> +
> +/*
> + * Get the base of bounds tables pointed by specific bounds
> + * directory entry.
> + */
> +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr,
> + unsigned int *valid)
> +{
> + if (get_user(*bt_addr, bd_entry))
> + return -EFAULT;
> +
> + *valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
> + *bt_addr &= MPX_BT_ADDR_MASK;
> +
> + /*
> + * If this bounds directory entry is nonzero, and meanwhile
> + * the valid bit is zero, one SIGSEGV will be produced due to
> + * this unexpected situation.
> + */
> + if (!(*valid) && *bt_addr)
> + force_sig(SIGSEGV, current);
> +
> + return 0;
> +}
> +
> +/*
> + * Free the backing physical pages of bounds table 'bt_addr'.
> + * Assume start...end is within that bounds table.
> + */
> +static void zap_bt_entries(struct mm_struct *mm, unsigned long bt_addr,
> + unsigned long start, unsigned long end) {
> + struct vm_area_struct *vma;
> +
> + /* Find the vma which overlaps this bounds table */
> + vma = find_vma(mm, bt_addr);
> + if (!vma || vma->vm_start > bt_addr ||
> + vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
> + return;
> +
> + zap_page_range(vma, start, end, NULL); }
> +
> +static void unmap_single_bt(struct mm_struct *mm, long __user *bd_entry,
> + unsigned long bt_addr)
> +{
> + if (user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry,
> + bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0))
> + return;
> +
> + /*
> + * to avoid recursion, do_munmap() will check whether it comes
> + * from one bounds table through VM_MPX flag.
> + */
> + do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK,
> MPX_BT_SIZE_BYTES); }
> +
> +/*
> + * If the bounds table pointed by bounds directory 'bd_entry' is
> + * not shared, unmap this whole bounds table. Otherwise, only free
> + * those backing physical pages of bounds table entries covered
> + * in this virtual address region start...end.
> + */
> +static void unmap_shared_bt(struct mm_struct *mm, long __user *bd_entry,
> + unsigned long start, unsigned long end,
> + bool prev_shared, bool next_shared)
> +{
> + unsigned long bt_addr;
> + unsigned int bde_valid = 0;
> +
> + if (get_bt_addr(bd_entry, &bt_addr, &bde_valid) || !bde_valid)
> + return;
> +
> + if (prev_shared && next_shared)
> + zap_bt_entries(mm, bt_addr,
> + bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
> + bt_addr+MPX_GET_BT_ENTRY_OFFSET(end-1));
> + else if (prev_shared)
> + zap_bt_entries(mm, bt_addr,
> + bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
> + bt_addr+MPX_BT_SIZE_BYTES);
> + else if (next_shared)
> + zap_bt_entries(mm, bt_addr, bt_addr,
> + bt_addr+MPX_GET_BT_ENTRY_OFFSET(end-1));
> + else
> + unmap_single_bt(mm, bd_entry, bt_addr); }

"}" new line

> +
> +/*
> + * A virtual address region being munmap()ed might share bounds table
> + * with adjacent VMAs. We only need to free the backing physical
> + * memory of these shared bounds tables entries covered in this virtual
> + * address region.
> + *
> + * the VMAs covering the virtual address region start...end have
> +already
> + * been split if necessary and removed from the VMA list.
> + */
> +static void unmap_side_bts(struct mm_struct *mm, unsigned long start,
> + unsigned long end)
> +{
> + long __user *bde_start, *bde_end;
> + struct vm_area_struct *prev, *next;
> + bool prev_shared = false, next_shared = false;
> +
> + bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
> + bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
> +
> + /*
> + * Check whether bde_start and bde_end are shared with adjacent
> + * VMAs. Because the VMAs covering the virtual address region
> + * start...end have already been removed from the VMA list, if
> + * next is not NULL it will satisfy start < end <= next->vm_start.
> + * And if prev is not NULL, prev->vm_end <= start < end.
> + */
> + next = find_vma_prev(mm, start, &prev);
> + if (prev && MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1) ==
> (long)bde_start)
> + prev_shared = true;
> + if (next && MPX_GET_BD_ENTRY_OFFSET(next->vm_start) ==
> (long)bde_end)
> + next_shared = true;
> +
> + /*
> + * This virtual address region being munmap()ed is only
> + * covered by one bounds table.
> + *
> + * In this case, if this table is also shared with adjacent
> + * VMAs, only part of the backing physical memory of the bounds
> + * table need be freeed. Otherwise the whole bounds table need
> + * be unmapped.
> + */
> + if (bde_start == bde_end) {
> + unmap_shared_bt(mm, bde_start, start, end,
> + prev_shared, next_shared);
> + return;
> + }
> +
> + /*
> + * If more than one bounds tables are covered in this virtual
> + * address region being munmap()ed, we need to separately check
> + * whether bde_start and bde_end are shared with adjacent VMAs.
> + */
> + unmap_shared_bt(mm, bde_start, start, end, prev_shared, false);
> + unmap_shared_bt(mm, bde_end, start, end, false, next_shared); }
> +
> +/*
> + * Free unused bounds tables covered in a virtual address region being
> + * munmap()ed. Assume end > start.
> + *
> + * This function will be called by do_munmap(), and the VMAs covering
> + * the virtual address region start...end have already been split if
> + * necessary and remvoed from the VMA list.
> + */
> +void mpx_unmap(struct mm_struct *mm,
> + unsigned long start, unsigned long end) {
> + long __user *bd_entry, *bde_start, *bde_end;
> + unsigned long bt_addr;
> + unsigned int bde_valid;
> +
> + /*
> + * unmap bounds tables pointed out by start/end bounds directory
> + * entries, or only free part of their backing physical memroy
> + * if they are shared with adjacent VMAs.
> + */
> + unmap_side_bts(mm, start, end);
> +
> + /*
> + * unmap those bounds table which are entirely covered in this
> + * virtual address region.
> + */
> + bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
> + bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
> + for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
> + if (get_bt_addr(bd_entry, &bt_addr, &bde_valid))
> + return;
> + if (!bde_valid)
> + continue;
> + unmap_single_bt(mm, bd_entry, bt_addr);
> + }
> +}
> diff --git a/include/asm-generic/mmu_context.h
> b/include/asm-generic/mmu_context.h
> index a7eec91..ac558ca 100644
> --- a/include/asm-generic/mmu_context.h
> +++ b/include/asm-generic/mmu_context.h
> @@ -42,4 +42,10 @@ static inline void activate_mm(struct mm_struct
> *prev_mm, { }
>
> +static inline void arch_unmap(struct mm_struct *mm,
> + struct vm_area_struct *vma,
> + unsigned long start, unsigned long end) { }
> +
> #endif /* __ASM_GENERIC_MMU_CONTEXT_H */ diff --git a/mm/mmap.c
> b/mm/mmap.c index 129b847..8550d84 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2560,6 +2560,8 @@ int do_munmap(struct mm_struct *mm, unsigned
> long start, size_t len)
> /* Fix up all other VM information */
> remove_vma_list(mm, vma);
>
> + arch_unmap(mm, vma, start, end);
> +
> return 0;
> }
>
> --
> 1.7.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to
> [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2014-07-22 16:19:05

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

On 07/20/2014 10:38 PM, Qiaowei Ren wrote:
> +#ifdef CONFIG_X86_INTEL_MPX
> +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX)
> +#else
> +#define cpu_has_mpx 0
> +#endif /* CONFIG_X86_INTEL_MPX */

Is this enough checking? Looking at the extension reference, it says:

> 9.3.3
> Enabling of Intel MPX States
> An OS can enable Intel MPX states to support software operation using bounds registers with the following steps:
> ? Verify the processor supports XSAVE/XRSTOR/XSETBV/XGETBV instructions and XCR0 by checking
> CPUID.1.ECX.XSAVE[bit 26]=1.

That, I assume the xsave code is already doing.

> ? Verify the processor supports both Intel MPX states by checking CPUID.(EAX=0x0D, ECX=0):EAX[4:3] is 11b.

I see these bits _attempting_ to get set in pcntxt_mask via XCNTXT_MASK.
But, I don't see us ever actually checking that they _do_ get set. For
instance, we do this for:

> if ((pcntxt_mask & XSTATE_FPSSE) != XSTATE_FPSSE) {
> pr_err("FP/SSE not shown under xsave features 0x%llx\n",
> pcntxt_mask);
> BUG();
> }

2014-07-23 02:45:09

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx



On 2014-07-23, Hansen, Dave wrote:
> On 07/20/2014 10:38 PM, Qiaowei Ren wrote:
>> +#ifdef CONFIG_X86_INTEL_MPX
>> +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX) #else #define
>> +cpu_has_mpx 0 #endif /* CONFIG_X86_INTEL_MPX */
>
> Is this enough checking? Looking at the extension reference, it says:
>
>> 9.3.3 Enabling of Intel MPX States An OS can enable Intel MPX states to
>> support software operation using bounds registers with the following
>> steps: * Verify the processor supports XSAVE/XRSTOR/XSETBV/XGETBV
>> instructions and XCR0 by checking CPUID.1.ECX.XSAVE[bit 26]=1.
>
> That, I assume the xsave code is already doing.
>
>> * Verify the processor supports both Intel MPX states by checking
> CPUID.(EAX=0x0D, ECX=0):EAX[4:3] is 11b.
>
> I see these bits _attempting_ to get set in pcntxt_mask via XCNTXT_MASK.
> But, I don't see us ever actually checking that they _do_ get set.
> For instance, we do this for:
>
>> if ((pcntxt_mask & XSTATE_FPSSE) != XSTATE_FPSSE) {
>> pr_err("FP/SSE not shown under xsave features
> 0x%llx\n",
>> pcntxt_mask);
>> BUG();
>> }

The checking about MPX feature should be as follow:

if (pcntxt_mask & XSTATE_EAGER) {
if (eagerfpu == DISABLE) {
pr_err("eagerfpu not present, disabling some xstate features: 0x%llx\n",
pcntxt_mask & XSTATE_EAGER);
pcntxt_mask &= ~XSTATE_EAGER;
} else {
eagerfpu = ENABLE;
}
}

This patch was merged into kernel the ending of last year (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )

Thanks,
Qiaowei

2014-07-23 16:02:18

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
> The checking about MPX feature should be as follow:
>
> if (pcntxt_mask & XSTATE_EAGER) {
> if (eagerfpu == DISABLE) {
> pr_err("eagerfpu not present, disabling some xstate features: 0x%llx\n",
> pcntxt_mask & XSTATE_EAGER);
> pcntxt_mask &= ~XSTATE_EAGER;
> } else {
> eagerfpu = ENABLE;
> }
> }
>
> This patch was merged into kernel the ending of last year (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )

Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?

This isn't major, but I can't _ever_ imagine a user being able to track
down why MPX is not working from this message. Should we spruce it up
somehow?

2014-07-23 16:20:29

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

On 07/20/2014 10:38 PM, Qiaowei Ren wrote:
> + pr_debug("MPX BD base address %p\n", mm->bd_addr);
> + return 0;
> +}

Please remove all of the pr_debug()s. They're not appropriate for
common paths in production code.

There's another one in allocate_bt(), btw.

2014-07-23 16:38:35

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables

On 07/20/2014 10:38 PM, Qiaowei Ren wrote:
> Since the kernel allocated those tables on-demand without userspace
> knowledge, it is also responsible for freeing them when the associated
> mappings go away.
>
> Here, the solution for this issue is to hook do_munmap() to check
> whether one process is MPX enabled. If yes, those bounds tables covered
> in the virtual address region which is being unmapped will be freed also.

This is the part of the code that I'm the most concerned about.

Could you elaborate on how you've tested this to make sure it works OK?

2014-07-24 00:49:48

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables



On 2014-07-24, Hansen, Dave wrote:
> On 07/20/2014 10:38 PM, Qiaowei Ren wrote:
>> Since the kernel allocated those tables on-demand without userspace
>> knowledge, it is also responsible for freeing them when the
>> associated mappings go away.
>>
>> Here, the solution for this issue is to hook do_munmap() to check
>> whether one process is MPX enabled. If yes, those bounds tables
>> covered in the virtual address region which is being unmapped will
>> be freed
> also.
>
> This is the part of the code that I'm the most concerned about.
>
> Could you elaborate on how you've tested this to make sure it works OK?

I can check a lot of debug information when one VMA and related bounds tables are allocated and freed through adding a lot of print() like log into kernel/runtime. Do you think this is enough?

Thanks,
Qiaowei

2014-07-24 00:57:10

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx



On 2014-07-24, Hansen, Dave wrote:
> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
>> The checking about MPX feature should be as follow:
>>
>> if (pcntxt_mask & XSTATE_EAGER) {
>> if (eagerfpu == DISABLE) {
>> pr_err("eagerfpu not present, disabling some
> xstate features: 0x%llx\n",
>> pcntxt_mask &
> XSTATE_EAGER);
>> pcntxt_mask &= ~XSTATE_EAGER; } else { eagerfpu
>> = ENABLE;
>> }
>> }
>> This patch was merged into kernel the ending of last year
>> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com
>> mi
>> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )
>
> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?
>
> This isn't major, but I can't _ever_ imagine a user being able to
> track down why MPX is not working from this message. Should we spruce it up somehow?

Maybe. If the error log "disabling some xstate features:" is changed to "disabling MPX xstate features:", do you think it is OK?

Thanks,
Qiaowei

2014-07-24 01:04:59

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables

On 07/23/2014 05:49 PM, Ren, Qiaowei wrote:
> I can check a lot of debug information when one VMA and related
> bounds tables are allocated and freed through adding a lot of print()
> like log into kernel/runtime. Do you think this is enough?

I thought the entire reason we grabbed a VM_ flag was to make it
possible to figure out without resorting to this.

2014-07-24 01:27:56

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables



On 2014-07-24, Hansen, Dave wrote:
> On 07/23/2014 05:49 PM, Ren, Qiaowei wrote:
>> I can check a lot of debug information when one VMA and related
>> bounds tables are allocated and freed through adding a lot of
>> print() like log into kernel/runtime. Do you think this is enough?
>
> I thought the entire reason we grabbed a VM_ flag was to make it
> possible to figure out without resorting to this.

All cleanup work certainly depends on this VM_ flag. In addition, as we discussed, this new VM_ flag can mainly have runtime know how much memory is occupied by MPX.

Thanks,
Qiaowei

2014-07-24 04:46:32

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

On 07/23/2014 05:56 PM, Ren, Qiaowei wrote:
> On 2014-07-24, Hansen, Dave wrote:
>> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
>>> The checking about MPX feature should be as follow:
>>>
>>> if (pcntxt_mask & XSTATE_EAGER) {
>>> if (eagerfpu == DISABLE) {
>>> pr_err("eagerfpu not present, disabling some
>> xstate features: 0x%llx\n",
>>> pcntxt_mask &
>> XSTATE_EAGER);
>>> pcntxt_mask &= ~XSTATE_EAGER; } else { eagerfpu
>>> = ENABLE;
>>> }
>>> }
>>> This patch was merged into kernel the ending of last year
>>> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com
>>> mi
>>> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )
>>
>> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?
>>
>> This isn't major, but I can't _ever_ imagine a user being able to
>> track down why MPX is not working from this message. Should we
>> spruce it up somehow?
>
> Maybe. If the error log "disabling some xstate features:" is changed
> to "disabling MPX xstate features:", do you think it is OK?

That's better. Is it really disabling MPX, though?

And shouldn't we clear the cpu feature bit too?

2014-07-24 05:24:00

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx



On 2014-07-24, Hansen, Dave wrote:
> On 07/23/2014 05:56 PM, Ren, Qiaowei wrote:
>> On 2014-07-24, Hansen, Dave wrote:
>>> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
>>>> The checking about MPX feature should be as follow:
>>>>
>>>> if (pcntxt_mask & XSTATE_EAGER) {
>>>> if (eagerfpu == DISABLE) {
>>>> pr_err("eagerfpu not present, disabling
> some
>>> xstate features: 0x%llx\n",
>>>> pcntxt_mask &
>>> XSTATE_EAGER);
>>>> pcntxt_mask &= ~XSTATE_EAGER; } else {
>>>> eagerfpu = ENABLE;
>>>> }
>>>> }
>>>> This patch was merged into kernel the ending of last year
>>>> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c
>>>> om
>>>> mi
>>>> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )
>>>
>>> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?
>>>
>>> This isn't major, but I can't _ever_ imagine a user being able to
>>> track down why MPX is not working from this message. Should we
>>> spruce it up somehow?
>>
>> Maybe. If the error log "disabling some xstate features:" is changed
>> to "disabling MPX xstate features:", do you think it is OK?
>
> That's better. Is it really disabling MPX, though?
>
> And shouldn't we clear the cpu feature bit too?

I am not sure. I am suspecting whether this checking should be moved before xstate_enable().

Peter, what do you think of it?

Thanks,
Qiaowei

2014-10-13 17:42:40

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER

On 07/20/2014 11:09 PM, Andi Kleen wrote:
> Qiaowei Ren <[email protected]> writes:
>> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
>> commands. These commands can be used to register and unregister MPX
>> related resource on the x86 platform.
>
> Please provide a manpage for the API. This is needed
> for proper review. Your description is far too vague.

Qiaowei, have you written this manpage yet? I see the new patches, but
no manpage yet.

2014-10-14 01:44:19

by Ren Qiaowei

[permalink] [raw]
Subject: RE: [PATCH v7 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER



On 2014-10-14, Hansen, Dave wrote:
> On 07/20/2014 11:09 PM, Andi Kleen wrote:
>> Qiaowei Ren <[email protected]> writes:
>>> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
>>> commands. These commands can be used to register and unregister MPX
>>> related resource on the x86 platform.
>>
>> Please provide a manpage for the API. This is needed for proper
>> review. Your description is far too vague.
>
> Qiaowei, have you written this manpage yet? I see the new patches,
> but no manpage yet.

It will be added into subsequent version or another mainline patchset.

Thanks,
Qiaowei