This is v2 of my first submission done a while ago[1]. I apologize for the
delay.
User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table.
These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register
If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.
There is a caveat, however. Certain applications rely on some of these
instructions to function. An example of this are applications that use
WineHQ[2]. For instance, these applications rely on sidt returning a non-
accesible memory location[3]. During the discussions, it was proposed that
the fault could be relied to the user-space and perform the emulation in
user-mode. However, this would break existing applications until, for
instance, they update to a new WineHQ version. However, this approach
would require UMIP to be disabled by default. The concensus in this forum
is to always enable it.
This patchset initially treated tasks running in virtual-8086 mode as a
special case. However, I received clarification that DOSEMU[4] does not
support applications that use these instructions. It relies on WineHQ for
this[4]. Furthermore, the applications for which the concern was raised
run in protected mode [3].
This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
* SGDT and SIDT return a base address to a dummy location in kernel memory
and a limit of 0.
* STR, SLDT returns 0 as the segment selector. This seems OK since we are
providing a dummy value as the base address of the global descriptor
table.
* SMSW returns 0.
Lastly, I found very useful the code for Intel MPX (Memory Protection
Extensions) used to parse opcodes and the memory locations contained in the
general purpose registers when used as operands. I put some of this code in
a separate file that both MPX and UMIP can access and avoid code
duplication. While here, I fixed two small bugs that I found in the MPX
implementation.
The code that I used to test the emulated instructions can be found in [6].
[1]. https://lwn.net/Articles/705877/
[2]. https://www.winehq.org/
[3]. https://www.winehq.org/pipermail/wine-devel/2016-November/115320.html
[4]. http://www.dosemu.org/
[5]. http://marc.info/?l=linux-kernel&m=147876798717927&w=2
[6]. https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files
Thanks and BR,
Ricardo
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z Li <[email protected]>
Cc: [email protected]
Cc: [email protected]
Changes since V1:
* Virtual-8086 mode tasks are not treated in a special manner. All code
for this purpose was removed.
* Instead of attempting to disable UMIP during a context switch or when
entering virtual-8086 mode, UMIP remains enabled all the time. General
protection faults that occur are fixed-up by returning dummy values as
detailed above.
* Removed umip= kernel parameter in favor of using clearcpuid=514 to
disable UMIP.
* Removed selftests designed to detect the absence of SIGSEGV signals when
running in virtual-8086 mode.
* Reused code from MPX to decode instructions operands. For this purpose
code was put in a common location.
* Fixed two bugs in MPX code that decodes operands.
Ricardo Neri (7):
x86/mpx: Do not use SIB index if index points to R/ESP
x86/mpx: Fail when implicit zero-displacement is used along with R/EBP
x86/mpx, x86/insn: Relocate insn util functions to a new insn-utils
x86/cpufeature: Add User-Mode Instruction Prevention definitions
x86: Add emulation code for UMIP instructions
x86/traps: Fixup general protection faults caused by UMIP
x86: Enable User-Mode Instruction Prevention
arch/x86/Kconfig | 10 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/insn.h | 6 +
arch/x86/include/asm/umip.h | 16 +++
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 16 ++-
arch/x86/kernel/traps.c | 4 +
arch/x86/kernel/umip.c | 170 ++++++++++++++++++++++++++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/insn-utils.c | 148 ++++++++++++++++++++++++
arch/x86/mm/mpx.c | 119 +------------------
13 files changed, 382 insertions(+), 121 deletions(-)
create mode 100644 arch/x86/include/asm/umip.h
create mode 100644 arch/x86/kernel/umip.c
create mode 100644 arch/x86/lib/insn-utils.c
--
2.9.3
Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.
Thus, these functions are relocated to a new insn-utils.c file. The reason
to not relocate these utilities for insn.c is that the latter solely
analyses instructions given by a struct insn. The file insn-utils.c intends
to be used when, for instance, determining addresses from the contents
of the general purpose registers.
To avoid creating a new insn-utils.h, insn.h is used. One caveat, however,
is that several #include's were needed by the utility functions.
Functions are simply relocated. There are not functional or indentation
changes.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/insn.h | 6 ++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/insn-utils.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
arch/x86/mm/mpx.c | 136 +---------------------------------------
4 files changed, 156 insertions(+), 136 deletions(-)
create mode 100644 arch/x86/lib/insn-utils.c
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index b3e32b0..9dc9d42 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -22,6 +22,10 @@
/* insn_attr_t is defined in inat.h */
#include <asm/inat.h>
+#include <linux/compiler.h>
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <asm/ptrace.h>
struct insn_field {
union {
@@ -106,6 +110,8 @@ extern void insn_get_sib(struct insn *insn);
extern void insn_get_displacement(struct insn *insn);
extern void insn_get_immediate(struct insn *insn);
extern void insn_get_length(struct insn *insn);
+extern void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+extern int get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
/* Attribute will be determined after getting ModRM (for opcode groups) */
static inline void insn_get_attribute(struct insn *insn)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..0d01d82 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-utils.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-utils.c b/arch/x86/lib/insn-utils.c
new file mode 100644
index 0000000..598bbd6
--- /dev/null
+++ b/arch/x86/lib/insn-utils.c
@@ -0,0 +1,148 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2016
+ */
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/inat.h>
+#include <asm/insn.h>
+
+enum reg_type {
+ REG_TYPE_RM = 0,
+ REG_TYPE_INDEX,
+ REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+ int regno = 0;
+
+ static const int regoff[] = {
+ offsetof(struct pt_regs, ax),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, sp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+#endif
+ };
+ int nr_registers = ARRAY_SIZE(regoff);
+ /*
+ * Don't possibly decode a 32-bit instructions as
+ * reading a 64-bit-only register.
+ */
+ if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+ nr_registers -= 8;
+
+ switch (type) {
+ case REG_TYPE_RM:
+ regno = X86_MODRM_RM(insn->modrm.value);
+ if (X86_REX_B(insn->rex_prefix.value))
+ regno += 8;
+ break;
+
+ case REG_TYPE_INDEX:
+ regno = X86_SIB_INDEX(insn->sib.value);
+ if (X86_REX_X(insn->rex_prefix.value))
+ regno += 8;
+ /*
+ * If mod !=3, SP is not used as index. Check is done after
+ * looking at REX.X This is because R12 can be used as an
+ * index.
+ */
+ if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
+ return 0;
+ break;
+
+ case REG_TYPE_BASE:
+ regno = X86_SIB_BASE(insn->sib.value);
+ if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
+ WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
+ (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
+ "R13 or R" : "E");
+ return -EINVAL;
+ }
+
+ if (X86_REX_B(insn->rex_prefix.value))
+ regno += 8;
+ break;
+
+ default:
+ pr_err("invalid register type");
+ BUG();
+ break;
+ }
+
+ if (regno >= nr_registers) {
+ WARN_ONCE(1, "decoded an instruction with an invalid register");
+ return -EINVAL;
+ }
+ return regoff[regno];
+}
+
+int get_reg_offset_rm(struct insn *insn, struct pt_regs *regs)
+{
+ return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+ unsigned long addr, base, indx;
+ int addr_offset, base_offset, indx_offset;
+ insn_byte_t sib;
+
+ insn_get_modrm(insn);
+ insn_get_sib(insn);
+ sib = insn->sib.value;
+
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+ if (addr_offset < 0)
+ goto out_err;
+ addr = regs_get_register(regs, addr_offset);
+ } else {
+ if (insn->sib.nbytes) {
+ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+ if (base_offset < 0)
+ goto out_err;
+
+ indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
+ if (indx_offset < 0)
+ goto out_err;
+
+ base = regs_get_register(regs, base_offset);
+ if (indx_offset)
+ indx = regs_get_register(regs, indx_offset);
+ else
+ indx = 0;
+ addr = base + indx * (1 << X86_SIB_SCALE(sib));
+ } else {
+ addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+ if (addr_offset < 0)
+ goto out_err;
+ addr = regs_get_register(regs, addr_offset);
+ }
+ addr += insn->displacement.value;
+ }
+ return (void __user *)addr;
+out_err:
+ return (void __user *)-1;
+}
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 71681d0..32ba342 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -59,140 +59,6 @@ static unsigned long mpx_mmap(unsigned long len)
return addr;
}
-enum reg_type {
- REG_TYPE_RM = 0,
- REG_TYPE_INDEX,
- REG_TYPE_BASE,
-};
-
-static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
- enum reg_type type)
-{
- int regno = 0;
-
- static const int regoff[] = {
- offsetof(struct pt_regs, ax),
- offsetof(struct pt_regs, cx),
- offsetof(struct pt_regs, dx),
- offsetof(struct pt_regs, bx),
- offsetof(struct pt_regs, sp),
- offsetof(struct pt_regs, bp),
- offsetof(struct pt_regs, si),
- offsetof(struct pt_regs, di),
-#ifdef CONFIG_X86_64
- offsetof(struct pt_regs, r8),
- offsetof(struct pt_regs, r9),
- offsetof(struct pt_regs, r10),
- offsetof(struct pt_regs, r11),
- offsetof(struct pt_regs, r12),
- offsetof(struct pt_regs, r13),
- offsetof(struct pt_regs, r14),
- offsetof(struct pt_regs, r15),
-#endif
- };
- int nr_registers = ARRAY_SIZE(regoff);
- /*
- * Don't possibly decode a 32-bit instructions as
- * reading a 64-bit-only register.
- */
- if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
- nr_registers -= 8;
-
- switch (type) {
- case REG_TYPE_RM:
- regno = X86_MODRM_RM(insn->modrm.value);
- if (X86_REX_B(insn->rex_prefix.value))
- regno += 8;
- break;
-
- case REG_TYPE_INDEX:
- regno = X86_SIB_INDEX(insn->sib.value);
- if (X86_REX_X(insn->rex_prefix.value))
- regno += 8;
- /*
- * If mod !=3, SP is not used as index. Check is done after
- * looking at REX.X This is because R12 can be used as an
- * index.
- */
- if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
- return 0;
- break;
-
- case REG_TYPE_BASE:
- regno = X86_SIB_BASE(insn->sib.value);
- if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
- WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
- (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
- "R13 or R" : "E");
- return -EINVAL;
- }
-
- if (X86_REX_B(insn->rex_prefix.value))
- regno += 8;
- break;
-
- default:
- pr_err("invalid register type");
- BUG();
- break;
- }
-
- if (regno >= nr_registers) {
- WARN_ONCE(1, "decoded an instruction with an invalid register");
- return -EINVAL;
- }
- return regoff[regno];
-}
-
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
- */
-static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
-{
- unsigned long addr, base, indx;
- int addr_offset, base_offset, indx_offset;
- insn_byte_t sib;
-
- insn_get_modrm(insn);
- insn_get_sib(insn);
- sib = insn->sib.value;
-
- if (X86_MODRM_MOD(insn->modrm.value) == 3) {
- addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
- if (addr_offset < 0)
- goto out_err;
- addr = regs_get_register(regs, addr_offset);
- } else {
- if (insn->sib.nbytes) {
- base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
- if (base_offset < 0)
- goto out_err;
-
- indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
- if (indx_offset < 0)
- goto out_err;
-
- base = regs_get_register(regs, base_offset);
- if (indx_offset)
- indx = regs_get_register(regs, indx_offset);
- else
- indx = 0;
- addr = base + indx * (1 << X86_SIB_SCALE(sib));
- } else {
- addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
- if (addr_offset < 0)
- goto out_err;
- addr = regs_get_register(regs, addr_offset);
- }
- addr += insn->displacement.value;
- }
- return (void __user *)addr;
-out_err:
- return (void __user *)-1;
-}
-
static int mpx_insn_decode(struct insn *insn,
struct pt_regs *regs)
{
@@ -305,7 +171,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
info->si_signo = SIGSEGV;
info->si_errno = 0;
info->si_code = SEGV_BNDERR;
- info->si_addr = mpx_get_addr_ref(&insn, regs);
+ info->si_addr = insn_get_addr_ref(&insn, regs);
/*
* We were not able to extract an address from the instruction,
* probably because there was something invalid in it.
--
2.9.3
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register
This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: Alexandre Julliard <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index eafee31..be61bf9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
+#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */
#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */
#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
#define X86_FEATURE_RDPID (16*32+ 22) /* RDPID instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 85599ad..4707445 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
# define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31))
#endif
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP 0
+#else
+# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
#ifdef CONFIG_X86_64
# define DISABLE_VME (1<<(X86_FEATURE_VME & 31))
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -55,7 +61,7 @@
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
-#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE)
+#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP)
#define DISABLED_MASK17 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
#define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
#define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
#define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */
+#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT)
#define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */
#define X86_CR4_VMXE _BITUL(X86_CR4_VMXE_BIT)
#define X86_CR4_SMXE_BIT 14 /* enable safer mode (TXT) */
--
2.9.3
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.
Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).
This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.
The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).
For the instructions that return a kernel memory address, the result is
emulated as the location of a dummy variable in the kernel memory space.
This is needed as applications such as WineHQ rely on the result being
located in the kernel memory space function. The limit for the GDT and the
IDT are set to zero.
The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.
The instruction smsw is emulated to return zero.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: Alexandre Julliard <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/umip.h | 16 +++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/umip.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 187 insertions(+)
create mode 100644 arch/x86/include/asm/umip.h
create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 0000000..7bcaca6
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+#include <asm/insn.h>
+
+#ifdef CONFIG_X86_INTEL_UMIP
+int fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline int fixup_umip_exception(struct pt_regs *regs)
+{
+ return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_UMIP */
+#endif /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 581386c..c4aec02 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
ifdef CONFIG_FRAME_POINTER
obj-y += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 0000000..a104aea
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,170 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ * sgdt
+ * sldt
+ * sidt
+ * str
+ * smsw
+ *
+ * Copyright (c) 2016, Intel Corporation.
+ * Ricardo Neri <[email protected]>
+ */
+
+#include <linux/compiler.h>
+#include <linux/bug.h>
+#include <linux/uaccess.h>
+#include <linux/err.h>
+#include <asm/ptrace.h>
+#include <asm/umip.h>
+#include <linux/thread_info.h>
+#include <linux/thread_info.h>
+
+/*
+ * The address of this dummy values need to be readable by
+ * the user space
+ */
+
+static const long umip_dummy_gdt_base;
+static const long umip_dummy_idt_base;
+
+enum umip_insn {
+ UMIP_SGDT = 0, /* opcode 0f 01 ModR/M reg 0 */
+ UMIP_SIDT, /* opcode 0f 01 ModR/M reg 1 */
+ UMIP_SLDT, /* opcode 0f 00 ModR/M reg 0 */
+ UMIP_SMSW, /* opcode 0f 01 ModR/M reg 4 */
+ UMIP_STR, /* opcode 0f 00 ModR/M reg 1 */
+};
+
+static int __identify_insn(struct insn *insn)
+{
+ /* by getting modrm we also get the opcode */
+ insn_get_modrm(insn);
+ if (insn->opcode.bytes[0] != 0xf)
+ return -EINVAL;
+
+ if (insn->opcode.bytes[1] == 0x1) {
+ switch (X86_MODRM_REG(insn->modrm.value)) {
+ case 0:
+ return UMIP_SGDT;
+ case 1:
+ return UMIP_SIDT;
+ case 4:
+ return UMIP_SMSW;
+ default:
+ return -EINVAL;
+ }
+ } else if (insn->opcode.bytes[1] == 0x0) {
+ if (X86_MODRM_REG(insn->modrm.value) == 0)
+ return UMIP_SLDT;
+ else if (X86_MODRM_REG(insn->modrm.value) == 1)
+ return UMIP_STR;
+ else
+ return -EINVAL;
+ }
+}
+
+static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
+ unsigned char *data, int *data_size)
+{
+ unsigned long const *dummy_base_addr;
+ unsigned short dummy_limit = 0;
+ unsigned short dummy_value = 0;
+
+ switch (umip_inst) {
+ /*
+ * These two instructions return the base address and limit of the
+ * global and interrupt descriptor table. The base address can be
+ * 32-bit or 64-bit. Limit is always 16-bit.
+ */
+ case UMIP_SGDT:
+ case UMIP_SIDT:
+ if (umip_inst == UMIP_SGDT)
+ dummy_base_addr = &umip_dummy_gdt_base;
+ else
+ dummy_base_addr = &umip_dummy_idt_base;
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ WARN_ONCE(1, "SGDT cannot take register as argument!\n");
+ return -EINVAL;
+ }
+ /* 16-bit operand. fill most significant byte with zeros */
+ if (insn->opnd_bytes == 2)
+ dummy_base_addr = (unsigned long *)
+ ((unsigned long)
+ dummy_base_addr & 0xffffff);
+ memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
+ memcpy(data, &dummy_limit, sizeof(dummy_limit));
+ *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);
+ break;
+ /*
+ * These three instructions return a 16-bit value. We return
+ * all zeros. This is equivalent to a null descriptor for
+ * str and sldt. For smsw, is equivalent to an all-zero CR0.
+ */
+ case UMIP_SLDT:
+ case UMIP_SMSW:
+ case UMIP_STR:
+ /* if operand is a register, it is zero-extended*/
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ memset(data, 0, insn->opnd_bytes);
+ *data_size = insn->opnd_bytes;
+ } else
+ *data_size = sizeof(dummy_value);
+ memcpy(data, &dummy_value, sizeof(dummy_value));
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+}
+
+int fixup_umip_exception(struct pt_regs *regs)
+{
+ struct insn insn;
+ unsigned char buf[MAX_INSN_SIZE];
+ /* 10 bytes is the maximum size of the result of UMIP instructions */
+ unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+ int x86_64 = !test_thread_flag(TIF_IA32);
+ int not_copied, nr_copied, reg_offset, dummy_data_size;
+ void __user *uaddr;
+ unsigned long *reg_addr;
+ enum umip_insn umip_inst;
+
+ not_copied = copy_from_user(buf, (void __user *)regs->ip, sizeof(buf));
+ nr_copied = sizeof(buf) - not_copied;
+ /*
+ * The decoder _should_ fail nicely if we pass it a short buffer.
+ * But, let's not depend on that implementation detail. If we
+ * did not get anything, just error out now.
+ */
+ if (!nr_copied)
+ return -EFAULT;
+ insn_init(&insn, buf, nr_copied, x86_64);
+ insn_get_length(&insn);
+ if (nr_copied < insn.length)
+ return -EFAULT;
+
+ umip_inst = __identify_insn(&insn);
+ /* Check if we found an instruction protected by UMIP */
+ if (umip_inst < 0)
+ return -EINVAL;
+
+ if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
+ return -EINVAL;
+
+ /* If operand is a register, write directly to it */
+ if (X86_MODRM_MOD(insn.modrm.value) == 3) {
+ reg_offset = get_reg_offset_rm(&insn, regs);
+ reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
+ memcpy(reg_addr, dummy_data, dummy_data_size);
+ } else {
+ uaddr = insn_get_addr_ref(&insn, regs);
+ nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
+ if (nr_copied > 0)
+ return -EFAULT;
+ }
+
+ /* increase IP to let the program keep going */
+ regs->ip += insn.length;
+ return 0;
+}
--
2.9.3
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exceptino will emulate dummy results for these
instructions.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: Alexandre Julliard <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/kernel/traps.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bf0c6d0..5044fb3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -64,6 +64,7 @@
#include <asm/trace/mpx.h>
#include <asm/mpx.h>
#include <asm/vm86.h>
+#include <asm/umip.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -491,6 +492,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
+ if (user_mode(regs) && !fixup_umip_exception(regs))
+ return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
--
2.9.3
User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.
UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: Alexandre Julliard <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/Kconfig | 10 ++++++++++
arch/x86/kernel/cpu/common.c | 16 +++++++++++++++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e487493..bae1a8f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1733,6 +1733,16 @@ config X86_SMAP
If unsure, say Y.
+config X86_INTEL_UMIP
+ def_bool y
+ depends on CPU_SUP_INTEL
+ prompt "User Mode Instruction Prevention" if EXPERT
+ ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index dc1697c..b38a639 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -306,6 +306,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
}
}
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+ if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+ cpu_has(c, X86_FEATURE_UMIP))
+ cr4_set_bits(X86_CR4_UMIP);
+ else
+ /*
+ * Make sure UMIP is disabled in case it was enabled in a
+ * previous boot (e.g., via kexec).
+ */
+ cr4_clear_bits(X86_CR4_UMIP);
+}
+
/*
* Protection Keys are not available in 32-bit mode.
*/
@@ -1059,9 +1072,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP */
+ /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+ setup_umip(c);
/*
* The vendor-specific functions might have changed features.
--
2.9.3
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing is used
(i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
the SIB byte points to the R/ESP (i.e.,index = 4), the index should not be
used in the computation of the memory address.
An example of such instruction could be
insn -0x80(%rsp)
This is represented as:
[opcode] 4c 24 80
ModR/M: mod: 1, reg: 1: r/m: 4 (R/ESP)
SIB 24: sc: 0, index: 100 (R/ESP), base(R/ESP): 100
Displacement -0x80
The correct address is (base) + displacement; no index is used.
Care is taken to allow R12 to be used as index, which is a valid scenario.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/mm/mpx.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 324e571..6a75a75 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -109,6 +109,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+ /*
+ * If mod !=3, SP is not used as index. Check is done after
+ * looking at REX.X This is because R12 can be used as an
+ * index.
+ */
+ if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
+ return 0;
break;
case REG_TYPE_BASE:
@@ -161,7 +168,10 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
goto out_err;
base = regs_get_register(regs, base_offset);
- indx = regs_get_register(regs, indx_offset);
+ if (indx_offset)
+ indx = regs_get_register(regs, indx_offset);
+ else
+ indx = 0;
addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
--
2.9.3
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing with no
explicit displacement (i.e, mod part of ModR/M is 0), a SIB byte is used
and the base of the SIB byte points to (R/EBP) (i.e., base = 5), an
explicit displacement of 0 must be used.
Make the address decoder to return -EINVAL in such a case.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/mm/mpx.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 6a75a75..71681d0 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -120,6 +120,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+ if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
+ WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
+ (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
+ "R13 or R" : "E");
+ return -EINVAL;
+ }
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
--
2.9.3
On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
<[email protected]> wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when memory addressing with no
> explicit displacement (i.e, mod part of ModR/M is 0), a SIB byte is used
> and the base of the SIB byte points to (R/EBP) (i.e., base = 5), an
> explicit displacement of 0 must be used.
>
> Make the address decoder to return -EINVAL in such a case.
>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/mm/mpx.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index 6a75a75..71681d0 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -120,6 +120,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
>
> case REG_TYPE_BASE:
> regno = X86_SIB_BASE(insn->sib.value);
> + if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
> + WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
> + (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
> + "R13 or R" : "E");
> + return -EINVAL;
> + }
> +
Now that I've read the cover letter, I see what's going on. This
should not warn -- user code can easily trigger this deliberately.
On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
<[email protected]> wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when memory addressing is used
> (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
> the SIB byte points to the R/ESP (i.e.,index = 4), the index should not be
> used in the computation of the memory address.
>
> An example of such instruction could be
>
> insn -0x80(%rsp)
>
> This is represented as:
>
> [opcode] 4c 24 80
>
> ModR/M: mod: 1, reg: 1: r/m: 4 (R/ESP)
> SIB 24: sc: 0, index: 100 (R/ESP), base(R/ESP): 100
> Displacement -0x80
>
> The correct address is (base) + displacement; no index is used.
>
> Care is taken to allow R12 to be used as index, which is a valid scenario.
Since I have no idea what this patch has to do with the rest of the
series, I'll ask a question:
Why isn't this code in the standard x86 instruction decoder? Is the
decoder similarly buggy?
On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
<[email protected]> wrote:
> If the User-Mode Instruction Prevention CPU feature is available and
> enabled, a general protection fault will be issued if the instructions
> sgdt, sldt, sidt, str or smsw are executed from user-mode context
> (CPL > 0). If the fault was caused by any of the instructions protected
> by UMIP, fixup_umip_exceptino will emulate dummy results for these
> instructions.
>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Brian Gerst <[email protected]>
> Cc: Chen Yucong <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Fenghua Yu <[email protected]>
> Cc: Huang Rui <[email protected]>
> Cc: Jiri Slaby <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: Paul Gortmaker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Liang Z. Li <[email protected]>
> Cc: Alexandre Julliard <[email protected]>
> Cc: Stas Sergeev <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/kernel/traps.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index bf0c6d0..5044fb3 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -64,6 +64,7 @@
> #include <asm/trace/mpx.h>
> #include <asm/mpx.h>
> #include <asm/vm86.h>
> +#include <asm/umip.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -491,6 +492,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
> RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> cond_local_irq_enable(regs);
>
> + if (user_mode(regs) && !fixup_umip_exception(regs))
> + return;
> +
I would do fixup_umip_exception(regs) == 0 to make it more obvious
what's going on.
Also, since you're allowing this in v8086 mode, I think this should
have an explicit test in
tools/testing/selftests/x86/entry_from_vm86.c. I *think* it will work
fine, but the pt_regs handling in vm86 mode is quite scary and has
been rewritten recently.
On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
<[email protected]> wrote:
> The feature User-Mode Instruction Prevention present in recent Intel
> processor prevents a group of instructions from being executed with
> CPL > 0. Otherwise, a general protection fault is issued.
>
> Rather than relaying this fault to the user space (in the form of a SIGSEGV
> signal), the instructions protected by UMIP can be emulated to provide
> dummy results. This allows to conserve the current kernel behavior and not
> reveal the system resources that UMIP intends to protect (the global
> descriptor and interrupt descriptor tables, the segment selectors of the
> local descriptor table and the task state and the machine status word).
>
> This emulation is needed because certain applications (e.g., WineHQ) rely
> on this subset of instructions to function.
>
> The instructions protected by UMIP can be split in two groups. Those who
> return a kernel memory address (sgdt and sidt) and those who return a
> value (sldt, str and smsw).
>
> For the instructions that return a kernel memory address, the result is
> emulated as the location of a dummy variable in the kernel memory space.
> This is needed as applications such as WineHQ rely on the result being
> located in the kernel memory space function. The limit for the GDT and the
> IDT are set to zero.
Nak. This is a trivial KASLR bypass. Just give them hardcoded
values. For x86_64, I would suggest 0xfffffffffffe0000 and
0xffffffffffff0000.
>
> The instructions sldt and str return a segment selector relative to the
> base address of the global descriptor table. Since the actual address of
> such table is not revealed, it makes sense to emulate the result as zero.
Hmm, now I wonder if anything uses SLDT to see if there is an LDT. If
so, we could emulate it better, but I doubt this matters.
>
> The instruction smsw is emulated to return zero.
If you're going to emulate it, please return something plausible. The
protected mode bit should be on, for example. 0x33 is probably
reasonable.
> +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> + unsigned char *data, int *data_size)
> +{
> + unsigned long const *dummy_base_addr;
> + unsigned short dummy_limit = 0;
> + unsigned short dummy_value = 0;
> +
> + switch (umip_inst) {
> + /*
> + * These two instructions return the base address and limit of the
> + * global and interrupt descriptor table. The base address can be
> + * 32-bit or 64-bit. Limit is always 16-bit.
> + */
> + case UMIP_SGDT:
> + case UMIP_SIDT:
> + if (umip_inst == UMIP_SGDT)
> + dummy_base_addr = &umip_dummy_gdt_base;
> + else
> + dummy_base_addr = &umip_dummy_idt_base;
> + if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> + WARN_ONCE(1, "SGDT cannot take register as argument!\n");
No warnings please.
> +int fixup_umip_exception(struct pt_regs *regs)
> +{
> + struct insn insn;
> + unsigned char buf[MAX_INSN_SIZE];
> + /* 10 bytes is the maximum size of the result of UMIP instructions */
> + unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> + int x86_64 = !test_thread_flag(TIF_IA32);
user_64bit_mode(regs)
> + int not_copied, nr_copied, reg_offset, dummy_data_size;
> + void __user *uaddr;
> + unsigned long *reg_addr;
> + enum umip_insn umip_inst;
> +
> + not_copied = copy_from_user(buf, (void __user *)regs->ip, sizeof(buf));
This is slightly wrong due to PKRU. I doubt we care.
> + nr_copied = sizeof(buf) - not_copied;
> + /*
> + * The decoder _should_ fail nicely if we pass it a short buffer.
> + * But, let's not depend on that implementation detail. If we
> + * did not get anything, just error out now.
> + */
> + if (!nr_copied)
> + return -EFAULT;
If the caller cares about EINVAL vs EFAULT, it cares because it is
considering changing the signal to a fake page fault. If so, then
this should be EINVAL -- failure to read the text should just prevent
emulation.
> + insn_init(&insn, buf, nr_copied, x86_64);
> + insn_get_length(&insn);
> + if (nr_copied < insn.length)
> + return -EFAULT;
Ditto.
> +
> + umip_inst = __identify_insn(&insn);
> + /* Check if we found an instruction protected by UMIP */
> + if (umip_inst < 0)
> + return -EINVAL;
> +
> + if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
> + return -EINVAL;
> +
> + /* If operand is a register, write directly to it */
> + if (X86_MODRM_MOD(insn.modrm.value) == 3) {
> + reg_offset = get_reg_offset_rm(&insn, regs);
> + reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
> + memcpy(reg_addr, dummy_data, dummy_data_size);
> + } else {
> + uaddr = insn_get_addr_ref(&insn, regs);
> + nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> + if (nr_copied > 0)
> + return -EFAULT;
This should be the only EFAULT case.
Hi Ricardo,
[auto build test WARNING on tip/auto-latest]
[also build test WARNING on v4.9 next-20161223]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Ricardo-Neri/x86-enable-User-Mode-Instruction-Prevention/20161224-094338
config: x86_64-randconfig-x008-201651 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
>> warning: objtool: x86 instruction decoder differs from kernel
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Ricardo,
[auto build test WARNING on tip/auto-latest]
[also build test WARNING on next-20161223]
[cannot apply to tip/x86/core v4.9]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Ricardo-Neri/x86-enable-User-Mode-Instruction-Prevention/20161224-094338
config: x86_64-acpi-redef (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
arch/x86/kernel/umip.c: In function '__identify_insn':
>> arch/x86/kernel/umip.c:65:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
vim +65 arch/x86/kernel/umip.c
78301541 Ricardo Neri 2016-12-23 49 return UMIP_SGDT;
78301541 Ricardo Neri 2016-12-23 50 case 1:
78301541 Ricardo Neri 2016-12-23 51 return UMIP_SIDT;
78301541 Ricardo Neri 2016-12-23 52 case 4:
78301541 Ricardo Neri 2016-12-23 53 return UMIP_SMSW;
78301541 Ricardo Neri 2016-12-23 54 default:
78301541 Ricardo Neri 2016-12-23 55 return -EINVAL;
78301541 Ricardo Neri 2016-12-23 56 }
78301541 Ricardo Neri 2016-12-23 57 } else if (insn->opcode.bytes[1] == 0x0) {
78301541 Ricardo Neri 2016-12-23 58 if (X86_MODRM_REG(insn->modrm.value) == 0)
78301541 Ricardo Neri 2016-12-23 59 return UMIP_SLDT;
78301541 Ricardo Neri 2016-12-23 60 else if (X86_MODRM_REG(insn->modrm.value) == 1)
78301541 Ricardo Neri 2016-12-23 61 return UMIP_STR;
78301541 Ricardo Neri 2016-12-23 62 else
78301541 Ricardo Neri 2016-12-23 63 return -EINVAL;
78301541 Ricardo Neri 2016-12-23 64 }
78301541 Ricardo Neri 2016-12-23 @65 }
78301541 Ricardo Neri 2016-12-23 66
78301541 Ricardo Neri 2016-12-23 67 static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
78301541 Ricardo Neri 2016-12-23 68 unsigned char *data, int *data_size)
78301541 Ricardo Neri 2016-12-23 69 {
78301541 Ricardo Neri 2016-12-23 70 unsigned long const *dummy_base_addr;
78301541 Ricardo Neri 2016-12-23 71 unsigned short dummy_limit = 0;
78301541 Ricardo Neri 2016-12-23 72 unsigned short dummy_value = 0;
78301541 Ricardo Neri 2016-12-23 73
:::::: The code at line 65 was first introduced by commit
:::::: 7830154191c35fde49ef59c2b9328f6b32203be4 x86: Add emulation code for UMIP instructions
:::::: TO: Ricardo Neri <[email protected]>
:::::: CC: 0day robot <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Ricardo,
[auto build test ERROR on tip/auto-latest]
[also build test ERROR on next-20161223]
[cannot apply to tip/x86/core v4.9]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Ricardo-Neri/x86-enable-User-Mode-Instruction-Prevention/20161224-094338
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
In file included from arch/x86/tools/test_get_len.c:27:0:
>> arch/x86/include/asm/insn.h:26:23: fatal error: linux/bug.h: No such file or directory
#include <linux/bug.h>
^
compilation terminated.
--
In file included from tools/include/linux/compiler.h:55:0,
from arch/x86/include/asm/insn.h:25,
from arch/x86/tools/insn_sanity.c:34:
>> tools/include/linux/types.h:28:18: error: conflicting types for 'u64'
typedef uint64_t u64;
^~~
In file included from /usr/include/asm-generic/types.h:6:0,
from /usr/include/x86_64-linux-gnu/asm/types.h:4,
from tools/include/linux/types.h:9,
from tools/include/linux/compiler.h:55,
from arch/x86/include/asm/insn.h:25,
from arch/x86/tools/insn_sanity.c:34:
include/asm-generic/int-ll64.h:25:28: note: previous declaration of 'u64' was here
typedef unsigned long long u64;
^~~
In file included from tools/include/linux/compiler.h:55:0,
from arch/x86/include/asm/insn.h:25,
from arch/x86/tools/insn_sanity.c:34:
>> tools/include/linux/types.h:29:17: error: conflicting types for 's64'
typedef int64_t s64;
^~~
In file included from /usr/include/asm-generic/types.h:6:0,
from /usr/include/x86_64-linux-gnu/asm/types.h:4,
from tools/include/linux/types.h:9,
from tools/include/linux/compiler.h:55,
from arch/x86/include/asm/insn.h:25,
from arch/x86/tools/insn_sanity.c:34:
include/asm-generic/int-ll64.h:24:26: note: previous declaration of 's64' was here
typedef signed long long s64;
^~~
In file included from tools/include/linux/types.h:4:0,
from tools/include/linux/compiler.h:55,
from arch/x86/include/asm/insn.h:25,
from arch/x86/tools/insn_sanity.c:34:
>> include/linux/stddef.h:10:2: error: expected identifier before numeric constant
false = 0,
^
In file included from arch/x86/include/asm/ptrace.h:5:0,
from arch/x86/include/asm/insn.h:28,
from arch/x86/tools/insn_sanity.c:34:
>> arch/x86/include/asm/page_types.h:61:15: error: unknown type name 'phys_addr_t'
static inline phys_addr_t get_max_mapped(void)
^~~~~~~~~~~
arch/x86/include/asm/page_types.h: In function 'get_max_mapped':
>> arch/x86/include/asm/page_types.h:63:10: error: 'phys_addr_t' undeclared (first use in this function)
return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
^~~~~~~~~~~
arch/x86/include/asm/page_types.h:63:10: note: each undeclared identifier is reported only once for each function it appears in
>> arch/x86/include/asm/page_types.h:63:22: error: expected ';' before 'max_pfn_mapped'
return (phys_addr_t)max_pfn_mapped << PAGE_SHIFT;
^~~~~~~~~~~~~~
In file included from arch/x86/include/asm/insn.h:28:0,
from arch/x86/tools/insn_sanity.c:34:
arch/x86/include/asm/ptrace.h: At top level:
>> arch/x86/include/asm/ptrace.h:33:8: error: redefinition of 'struct pt_regs'
struct pt_regs {
^~~~~~~
In file included from arch/x86/include/asm/insn.h:26:0,
from arch/x86/tools/insn_sanity.c:34:
include/linux/bug.h:13:8: note: originally defined here
struct pt_regs;
^~~~~~~
vim +33 arch/x86/include/asm/ptrace.h
92bc2056 include/asm-x86/ptrace.h Harvey Harrison 2008-02-08 27 unsigned long sp;
92bc2056 include/asm-x86/ptrace.h Harvey Harrison 2008-02-08 28 unsigned long ss;
65ea5b03 include/asm-x86/ptrace.h H. Peter Anvin 2008-01-30 29 };
8fc37f2c include/asm-x86/ptrace.h Thomas Gleixner 2007-10-23 30
8fc37f2c include/asm-x86/ptrace.h Thomas Gleixner 2007-10-23 31 #else /* __i386__ */
8fc37f2c include/asm-x86/ptrace.h Thomas Gleixner 2007-10-23 32
65ea5b03 include/asm-x86/ptrace.h H. Peter Anvin 2008-01-30 @33 struct pt_regs {
e90e147c arch/x86/include/asm/ptrace.h Denys Vlasenko 2015-02-26 34 /*
e90e147c arch/x86/include/asm/ptrace.h Denys Vlasenko 2015-02-26 35 * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
e90e147c arch/x86/include/asm/ptrace.h Denys Vlasenko 2015-02-26 36 * unless syscall needs a complete, fully filled "struct pt_regs".
:::::: The code at line 33 was first introduced by commit
:::::: 65ea5b0349903585bfed9720fa06f5edb4f1cd25 x86: rename the struct pt_regs members for 32/64-bit consistency
:::::: TO: H. Peter Anvin <[email protected]>
:::::: CC: Ingo Molnar <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Ricado,
On Fri, 23 Dec 2016 17:37:41 -0800
Ricardo Neri <[email protected]> wrote:
> Other kernel submodules can benefit from using the utility functions
> defined in mpx.c to obtain the addresses and values of operands contained
> in the general purpose registers. An instance of this is the emulation code
> used for instructions protected by the Intel User-Mode Instruction
> Prevention feature.
>
> Thus, these functions are relocated to a new insn-utils.c file. The reason
> to not relocate these utilities for insn.c is that the latter solely
> analyses instructions given by a struct insn. The file insn-utils.c intends
> to be used when, for instance, determining addresses from the contents
> of the general purpose registers.
>
> To avoid creating a new insn-utils.h, insn.h is used. One caveat, however,
> is that several #include's were needed by the utility functions.
>
> Functions are simply relocated. There are not functional or indentation
> changes.
Thank you for your great work! :)
> ---
> arch/x86/include/asm/insn.h | 6 ++
> arch/x86/lib/Makefile | 2 +-
> arch/x86/lib/insn-utils.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
> arch/x86/mm/mpx.c | 136 +---------------------------------------
> 4 files changed, 156 insertions(+), 136 deletions(-)
> create mode 100644 arch/x86/lib/insn-utils.c
>
> diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
> index b3e32b0..9dc9d42 100644
> --- a/arch/x86/include/asm/insn.h
> +++ b/arch/x86/include/asm/insn.h
> @@ -22,6 +22,10 @@
>
> /* insn_attr_t is defined in inat.h */
> #include <asm/inat.h>
> +#include <linux/compiler.h>
> +#include <linux/bug.h>
> +#include <linux/err.h>
> +#include <asm/ptrace.h>
>
> struct insn_field {
> union {
> @@ -106,6 +110,8 @@ extern void insn_get_sib(struct insn *insn);
> extern void insn_get_displacement(struct insn *insn);
> extern void insn_get_immediate(struct insn *insn);
> extern void insn_get_length(struct insn *insn);
> +extern void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> +extern int get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
Could you also rename this to add "insn_" prefix?
Other part looks good to me :)
(btw, I saw a kbuild bot warning, would you also test it with
CONFIG_X86_DECODER_SELFTEST=y?)
Thanks again!
>
> /* Attribute will be determined after getting ModRM (for opcode groups) */
> static inline void insn_get_attribute(struct insn *insn)
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 34a7413..0d01d82 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
> lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
> lib-y += memcpy_$(BITS).o
> lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
> -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
> +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-utils.o
> lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
>
> obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
> diff --git a/arch/x86/lib/insn-utils.c b/arch/x86/lib/insn-utils.c
> new file mode 100644
> index 0000000..598bbd6
> --- /dev/null
> +++ b/arch/x86/lib/insn-utils.c
> @@ -0,0 +1,148 @@
> +/*
> + * Utility functions for x86 operand and address decoding
> + *
> + * Copyright (C) Intel Corporation 2016
> + */
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <asm/inat.h>
> +#include <asm/insn.h>
> +
> +enum reg_type {
> + REG_TYPE_RM = 0,
> + REG_TYPE_INDEX,
> + REG_TYPE_BASE,
> +};
> +
> +static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> + enum reg_type type)
> +{
> + int regno = 0;
> +
> + static const int regoff[] = {
> + offsetof(struct pt_regs, ax),
> + offsetof(struct pt_regs, cx),
> + offsetof(struct pt_regs, dx),
> + offsetof(struct pt_regs, bx),
> + offsetof(struct pt_regs, sp),
> + offsetof(struct pt_regs, bp),
> + offsetof(struct pt_regs, si),
> + offsetof(struct pt_regs, di),
> +#ifdef CONFIG_X86_64
> + offsetof(struct pt_regs, r8),
> + offsetof(struct pt_regs, r9),
> + offsetof(struct pt_regs, r10),
> + offsetof(struct pt_regs, r11),
> + offsetof(struct pt_regs, r12),
> + offsetof(struct pt_regs, r13),
> + offsetof(struct pt_regs, r14),
> + offsetof(struct pt_regs, r15),
> +#endif
> + };
> + int nr_registers = ARRAY_SIZE(regoff);
> + /*
> + * Don't possibly decode a 32-bit instructions as
> + * reading a 64-bit-only register.
> + */
> + if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
> + nr_registers -= 8;
> +
> + switch (type) {
> + case REG_TYPE_RM:
> + regno = X86_MODRM_RM(insn->modrm.value);
> + if (X86_REX_B(insn->rex_prefix.value))
> + regno += 8;
> + break;
> +
> + case REG_TYPE_INDEX:
> + regno = X86_SIB_INDEX(insn->sib.value);
> + if (X86_REX_X(insn->rex_prefix.value))
> + regno += 8;
> + /*
> + * If mod !=3, SP is not used as index. Check is done after
> + * looking at REX.X This is because R12 can be used as an
> + * index.
> + */
> + if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
> + return 0;
> + break;
> +
> + case REG_TYPE_BASE:
> + regno = X86_SIB_BASE(insn->sib.value);
> + if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
> + WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
> + (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
> + "R13 or R" : "E");
> + return -EINVAL;
> + }
> +
> + if (X86_REX_B(insn->rex_prefix.value))
> + regno += 8;
> + break;
> +
> + default:
> + pr_err("invalid register type");
> + BUG();
> + break;
> + }
> +
> + if (regno >= nr_registers) {
> + WARN_ONCE(1, "decoded an instruction with an invalid register");
> + return -EINVAL;
> + }
> + return regoff[regno];
> +}
> +
> +int get_reg_offset_rm(struct insn *insn, struct pt_regs *regs)
> +{
> + return get_reg_offset(insn, regs, REG_TYPE_RM);
> +}
> +
> +/*
> + * return the address being referenced be instruction
> + * for rm=3 returning the content of the rm reg
> + * for rm!=3 calculates the address using SIB and Disp
> + */
> +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> +{
> + unsigned long addr, base, indx;
> + int addr_offset, base_offset, indx_offset;
> + insn_byte_t sib;
> +
> + insn_get_modrm(insn);
> + insn_get_sib(insn);
> + sib = insn->sib.value;
> +
> + if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> + if (addr_offset < 0)
> + goto out_err;
> + addr = regs_get_register(regs, addr_offset);
> + } else {
> + if (insn->sib.nbytes) {
> + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> + if (base_offset < 0)
> + goto out_err;
> +
> + indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> + if (indx_offset < 0)
> + goto out_err;
> +
> + base = regs_get_register(regs, base_offset);
> + if (indx_offset)
> + indx = regs_get_register(regs, indx_offset);
> + else
> + indx = 0;
> + addr = base + indx * (1 << X86_SIB_SCALE(sib));
> + } else {
> + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> + if (addr_offset < 0)
> + goto out_err;
> + addr = regs_get_register(regs, addr_offset);
> + }
> + addr += insn->displacement.value;
> + }
> + return (void __user *)addr;
> +out_err:
> + return (void __user *)-1;
> +}
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index 71681d0..32ba342 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -59,140 +59,6 @@ static unsigned long mpx_mmap(unsigned long len)
> return addr;
> }
>
> -enum reg_type {
> - REG_TYPE_RM = 0,
> - REG_TYPE_INDEX,
> - REG_TYPE_BASE,
> -};
> -
> -static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> - enum reg_type type)
> -{
> - int regno = 0;
> -
> - static const int regoff[] = {
> - offsetof(struct pt_regs, ax),
> - offsetof(struct pt_regs, cx),
> - offsetof(struct pt_regs, dx),
> - offsetof(struct pt_regs, bx),
> - offsetof(struct pt_regs, sp),
> - offsetof(struct pt_regs, bp),
> - offsetof(struct pt_regs, si),
> - offsetof(struct pt_regs, di),
> -#ifdef CONFIG_X86_64
> - offsetof(struct pt_regs, r8),
> - offsetof(struct pt_regs, r9),
> - offsetof(struct pt_regs, r10),
> - offsetof(struct pt_regs, r11),
> - offsetof(struct pt_regs, r12),
> - offsetof(struct pt_regs, r13),
> - offsetof(struct pt_regs, r14),
> - offsetof(struct pt_regs, r15),
> -#endif
> - };
> - int nr_registers = ARRAY_SIZE(regoff);
> - /*
> - * Don't possibly decode a 32-bit instructions as
> - * reading a 64-bit-only register.
> - */
> - if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
> - nr_registers -= 8;
> -
> - switch (type) {
> - case REG_TYPE_RM:
> - regno = X86_MODRM_RM(insn->modrm.value);
> - if (X86_REX_B(insn->rex_prefix.value))
> - regno += 8;
> - break;
> -
> - case REG_TYPE_INDEX:
> - regno = X86_SIB_INDEX(insn->sib.value);
> - if (X86_REX_X(insn->rex_prefix.value))
> - regno += 8;
> - /*
> - * If mod !=3, SP is not used as index. Check is done after
> - * looking at REX.X This is because R12 can be used as an
> - * index.
> - */
> - if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
> - return 0;
> - break;
> -
> - case REG_TYPE_BASE:
> - regno = X86_SIB_BASE(insn->sib.value);
> - if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
> - WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
> - (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
> - "R13 or R" : "E");
> - return -EINVAL;
> - }
> -
> - if (X86_REX_B(insn->rex_prefix.value))
> - regno += 8;
> - break;
> -
> - default:
> - pr_err("invalid register type");
> - BUG();
> - break;
> - }
> -
> - if (regno >= nr_registers) {
> - WARN_ONCE(1, "decoded an instruction with an invalid register");
> - return -EINVAL;
> - }
> - return regoff[regno];
> -}
> -
> -/*
> - * return the address being referenced be instruction
> - * for rm=3 returning the content of the rm reg
> - * for rm!=3 calculates the address using SIB and Disp
> - */
> -static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> -{
> - unsigned long addr, base, indx;
> - int addr_offset, base_offset, indx_offset;
> - insn_byte_t sib;
> -
> - insn_get_modrm(insn);
> - insn_get_sib(insn);
> - sib = insn->sib.value;
> -
> - if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> - if (addr_offset < 0)
> - goto out_err;
> - addr = regs_get_register(regs, addr_offset);
> - } else {
> - if (insn->sib.nbytes) {
> - base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> - if (base_offset < 0)
> - goto out_err;
> -
> - indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> - if (indx_offset < 0)
> - goto out_err;
> -
> - base = regs_get_register(regs, base_offset);
> - if (indx_offset)
> - indx = regs_get_register(regs, indx_offset);
> - else
> - indx = 0;
> - addr = base + indx * (1 << X86_SIB_SCALE(sib));
> - } else {
> - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> - if (addr_offset < 0)
> - goto out_err;
> - addr = regs_get_register(regs, addr_offset);
> - }
> - addr += insn->displacement.value;
> - }
> - return (void __user *)addr;
> -out_err:
> - return (void __user *)-1;
> -}
> -
> static int mpx_insn_decode(struct insn *insn,
> struct pt_regs *regs)
> {
> @@ -305,7 +171,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
> info->si_signo = SIGSEGV;
> info->si_errno = 0;
> info->si_code = SEGV_BNDERR;
> - info->si_addr = mpx_get_addr_ref(&insn, regs);
> + info->si_addr = insn_get_addr_ref(&insn, regs);
> /*
> * We were not able to extract an address from the instruction,
> * probably because there was something invalid in it.
> --
> 2.9.3
>
--
Masami Hiramatsu <[email protected]>
On Fri, 23 Dec 2016 17:37:43 -0800
Ricardo Neri <[email protected]> wrote:
> +static int __identify_insn(struct insn *insn)
> +{
> + /* by getting modrm we also get the opcode */
> + insn_get_modrm(insn);
> + if (insn->opcode.bytes[0] != 0xf)
> + return -EINVAL;
> +
> + if (insn->opcode.bytes[1] == 0x1) {
> + switch (X86_MODRM_REG(insn->modrm.value)) {
> + case 0:
> + return UMIP_SGDT;
> + case 1:
> + return UMIP_SIDT;
> + case 4:
> + return UMIP_SMSW;
> + default:
> + return -EINVAL;
> + }
> + } else if (insn->opcode.bytes[1] == 0x0) {
> + if (X86_MODRM_REG(insn->modrm.value) == 0)
> + return UMIP_SLDT;
> + else if (X86_MODRM_REG(insn->modrm.value) == 1)
> + return UMIP_STR;
> + else
> + return -EINVAL;
> + }
gcc detected an error here, you may need return "-EINVAL".
Thanks,
--
Masami Hiramatsu <[email protected]>
On Fri, 2016-12-23 at 17:57 -0800, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
> <[email protected]> wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when memory addressing is used
> > (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
> > the SIB byte points to the R/ESP (i.e.,index = 4), the index should not be
> > used in the computation of the memory address.
> >
> > An example of such instruction could be
> >
> > insn -0x80(%rsp)
> >
> > This is represented as:
> >
> > [opcode] 4c 24 80
> >
> > ModR/M: mod: 1, reg: 1: r/m: 4 (R/ESP)
> > SIB 24: sc: 0, index: 100 (R/ESP), base(R/ESP): 100
> > Displacement -0x80
> >
> > The correct address is (base) + displacement; no index is used.
> >
> > Care is taken to allow R12 to be used as index, which is a valid scenario.
>
> Since I have no idea what this patch has to do with the rest of the
> series, I'll ask a question:
Thanks for your feedback! I saw in a previous e-mail that you read the
cover-letter. :)
>
> Why isn't this code in the standard x86 instruction decoder? Is the
> decoder similarly buggy?
I did not find any bug in the instruction decoder. I think the reason
this code is not in the decoder is that the decoder only gives you the
bytes of the instructions without any meaning. For instance, it gives
you the ModRM byte but it does not tell you what register or addressing
mode is used.
To fully emulate the UMIP instructions I need to give meaning to the
ModRM and SIB bytes. Since I was trying many operand combinations, I ran
into this issue.
Thanks and BR,
Ricardo
On Fri, 2016-12-23 at 17:58 -0800, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
> <[email protected]> wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when memory addressing with no
> > explicit displacement (i.e, mod part of ModR/M is 0), a SIB byte is used
> > and the base of the SIB byte points to (R/EBP) (i.e., base = 5), an
> > explicit displacement of 0 must be used.
> >
> > Make the address decoder to return -EINVAL in such a case.
> >
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/mm/mpx.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index 6a75a75..71681d0 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -120,6 +120,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >
> > case REG_TYPE_BASE:
> > regno = X86_SIB_BASE(insn->sib.value);
> > + if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
> > + WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
> > + (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
> > + "R13 or R" : "E");
> > + return -EINVAL;
> > + }
> > +
>
> Now that I've read the cover letter, I see what's going on. This
> should not warn -- user code can easily trigger this deliberately.
OK, I'll remove it. Are you concerned about the warning printing the
calltrace, even only once?
On Fri, 2016-12-23 at 18:11 -0800, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
> <[email protected]> wrote:
> > If the User-Mode Instruction Prevention CPU feature is available and
> > enabled, a general protection fault will be issued if the instructions
> > sgdt, sldt, sidt, str or smsw are executed from user-mode context
> > (CPL > 0). If the fault was caused by any of the instructions protected
> > by UMIP, fixup_umip_exceptino will emulate dummy results for these
> > instructions.
> >
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Brian Gerst <[email protected]>
> > Cc: Chen Yucong <[email protected]>
> > Cc: Chris Metcalf <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Fenghua Yu <[email protected]>
> > Cc: Huang Rui <[email protected]>
> > Cc: Jiri Slaby <[email protected]>
> > Cc: Jonathan Corbet <[email protected]>
> > Cc: Michael S. Tsirkin <[email protected]>
> > Cc: Paul Gortmaker <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: Shuah Khan <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Tony Luck <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Liang Z. Li <[email protected]>
> > Cc: Alexandre Julliard <[email protected]>
> > Cc: Stas Sergeev <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/kernel/traps.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index bf0c6d0..5044fb3 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -64,6 +64,7 @@
> > #include <asm/trace/mpx.h>
> > #include <asm/mpx.h>
> > #include <asm/vm86.h>
> > +#include <asm/umip.h>
> >
> > #ifdef CONFIG_X86_64
> > #include <asm/x86_init.h>
> > @@ -491,6 +492,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
> > RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > cond_local_irq_enable(regs);
> >
> > + if (user_mode(regs) && !fixup_umip_exception(regs))
> > + return;
> > +
>
> I would do fixup_umip_exception(regs) == 0 to make it more obvious
> what's going on.
Sure. I will make this change.
>
> Also, since you're allowing this in v8086 mode, I think this should
> have an explicit test in
> tools/testing/selftests/x86/entry_from_vm86.c. I *think* it will work
> fine, but the pt_regs handling in vm86 mode is quite scary and has
> been rewritten recently.
I will include a test for this.
Thanks and BR,
Ricardo
On Sun, 2016-12-25 at 15:17 +0900, Masami Hiramatsu wrote:
> Hi Ricado,
>
> On Fri, 23 Dec 2016 17:37:41 -0800
> Ricardo Neri <[email protected]> wrote:
>
> > Other kernel submodules can benefit from using the utility functions
> > defined in mpx.c to obtain the addresses and values of operands contained
> > in the general purpose registers. An instance of this is the emulation code
> > used for instructions protected by the Intel User-Mode Instruction
> > Prevention feature.
> >
> > Thus, these functions are relocated to a new insn-utils.c file. The reason
> > to not relocate these utilities for insn.c is that the latter solely
> > analyses instructions given by a struct insn. The file insn-utils.c intends
> > to be used when, for instance, determining addresses from the contents
> > of the general purpose registers.
> >
> > To avoid creating a new insn-utils.h, insn.h is used. One caveat, however,
> > is that several #include's were needed by the utility functions.
> >
> > Functions are simply relocated. There are not functional or indentation
> > changes.
>
> Thank you for your great work! :)
Thanks a lot for taking the time to review!
>
> > ---
> > arch/x86/include/asm/insn.h | 6 ++
> > arch/x86/lib/Makefile | 2 +-
> > arch/x86/lib/insn-utils.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
> > arch/x86/mm/mpx.c | 136 +---------------------------------------
> > 4 files changed, 156 insertions(+), 136 deletions(-)
> > create mode 100644 arch/x86/lib/insn-utils.c
> >
> > diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
> > index b3e32b0..9dc9d42 100644
> > --- a/arch/x86/include/asm/insn.h
> > +++ b/arch/x86/include/asm/insn.h
> > @@ -22,6 +22,10 @@
> >
> > /* insn_attr_t is defined in inat.h */
> > #include <asm/inat.h>
> > +#include <linux/compiler.h>
> > +#include <linux/bug.h>
> > +#include <linux/err.h>
> > +#include <asm/ptrace.h>
> >
> > struct insn_field {
> > union {
> > @@ -106,6 +110,8 @@ extern void insn_get_sib(struct insn *insn);
> > extern void insn_get_displacement(struct insn *insn);
> > extern void insn_get_immediate(struct insn *insn);
> > extern void insn_get_length(struct insn *insn);
> > +extern void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
> > +extern int get_reg_offset_rm(struct insn *insn, struct pt_regs *regs);
>
> Could you also rename this to add "insn_" prefix?
Sure. This should have the prefix. I missed this.
>
> Other part looks good to me :)
> (btw, I saw a kbuild bot warning, would you also test it with
> CONFIG_X86_DECODER_SELFTEST=y?)
I will do.
Thanks and BR,
Ricardo
>
> Thanks again!
>
> >
> > /* Attribute will be determined after getting ModRM (for opcode groups) */
> > static inline void insn_get_attribute(struct insn *insn)
> > diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> > index 34a7413..0d01d82 100644
> > --- a/arch/x86/lib/Makefile
> > +++ b/arch/x86/lib/Makefile
> > @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
> > lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
> > lib-y += memcpy_$(BITS).o
> > lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
> > -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
> > +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-utils.o
> > lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
> >
> > obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
> > diff --git a/arch/x86/lib/insn-utils.c b/arch/x86/lib/insn-utils.c
> > new file mode 100644
> > index 0000000..598bbd6
> > --- /dev/null
> > +++ b/arch/x86/lib/insn-utils.c
> > @@ -0,0 +1,148 @@
> > +/*
> > + * Utility functions for x86 operand and address decoding
> > + *
> > + * Copyright (C) Intel Corporation 2016
> > + */
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +#include <asm/inat.h>
> > +#include <asm/insn.h>
> > +
> > +enum reg_type {
> > + REG_TYPE_RM = 0,
> > + REG_TYPE_INDEX,
> > + REG_TYPE_BASE,
> > +};
> > +
> > +static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> > + enum reg_type type)
> > +{
> > + int regno = 0;
> > +
> > + static const int regoff[] = {
> > + offsetof(struct pt_regs, ax),
> > + offsetof(struct pt_regs, cx),
> > + offsetof(struct pt_regs, dx),
> > + offsetof(struct pt_regs, bx),
> > + offsetof(struct pt_regs, sp),
> > + offsetof(struct pt_regs, bp),
> > + offsetof(struct pt_regs, si),
> > + offsetof(struct pt_regs, di),
> > +#ifdef CONFIG_X86_64
> > + offsetof(struct pt_regs, r8),
> > + offsetof(struct pt_regs, r9),
> > + offsetof(struct pt_regs, r10),
> > + offsetof(struct pt_regs, r11),
> > + offsetof(struct pt_regs, r12),
> > + offsetof(struct pt_regs, r13),
> > + offsetof(struct pt_regs, r14),
> > + offsetof(struct pt_regs, r15),
> > +#endif
> > + };
> > + int nr_registers = ARRAY_SIZE(regoff);
> > + /*
> > + * Don't possibly decode a 32-bit instructions as
> > + * reading a 64-bit-only register.
> > + */
> > + if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
> > + nr_registers -= 8;
> > +
> > + switch (type) {
> > + case REG_TYPE_RM:
> > + regno = X86_MODRM_RM(insn->modrm.value);
> > + if (X86_REX_B(insn->rex_prefix.value))
> > + regno += 8;
> > + break;
> > +
> > + case REG_TYPE_INDEX:
> > + regno = X86_SIB_INDEX(insn->sib.value);
> > + if (X86_REX_X(insn->rex_prefix.value))
> > + regno += 8;
> > + /*
> > + * If mod !=3, SP is not used as index. Check is done after
> > + * looking at REX.X This is because R12 can be used as an
> > + * index.
> > + */
> > + if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
> > + return 0;
> > + break;
> > +
> > + case REG_TYPE_BASE:
> > + regno = X86_SIB_BASE(insn->sib.value);
> > + if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
> > + WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
> > + (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
> > + "R13 or R" : "E");
> > + return -EINVAL;
> > + }
> > +
> > + if (X86_REX_B(insn->rex_prefix.value))
> > + regno += 8;
> > + break;
> > +
> > + default:
> > + pr_err("invalid register type");
> > + BUG();
> > + break;
> > + }
> > +
> > + if (regno >= nr_registers) {
> > + WARN_ONCE(1, "decoded an instruction with an invalid register");
> > + return -EINVAL;
> > + }
> > + return regoff[regno];
> > +}
> > +
> > +int get_reg_offset_rm(struct insn *insn, struct pt_regs *regs)
> > +{
> > + return get_reg_offset(insn, regs, REG_TYPE_RM);
> > +}
> > +
> > +/*
> > + * return the address being referenced be instruction
> > + * for rm=3 returning the content of the rm reg
> > + * for rm!=3 calculates the address using SIB and Disp
> > + */
> > +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> > +{
> > + unsigned long addr, base, indx;
> > + int addr_offset, base_offset, indx_offset;
> > + insn_byte_t sib;
> > +
> > + insn_get_modrm(insn);
> > + insn_get_sib(insn);
> > + sib = insn->sib.value;
> > +
> > + if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > + if (addr_offset < 0)
> > + goto out_err;
> > + addr = regs_get_register(regs, addr_offset);
> > + } else {
> > + if (insn->sib.nbytes) {
> > + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> > + if (base_offset < 0)
> > + goto out_err;
> > +
> > + indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> > + if (indx_offset < 0)
> > + goto out_err;
> > +
> > + base = regs_get_register(regs, base_offset);
> > + if (indx_offset)
> > + indx = regs_get_register(regs, indx_offset);
> > + else
> > + indx = 0;
> > + addr = base + indx * (1 << X86_SIB_SCALE(sib));
> > + } else {
> > + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > + if (addr_offset < 0)
> > + goto out_err;
> > + addr = regs_get_register(regs, addr_offset);
> > + }
> > + addr += insn->displacement.value;
> > + }
> > + return (void __user *)addr;
> > +out_err:
> > + return (void __user *)-1;
> > +}
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index 71681d0..32ba342 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -59,140 +59,6 @@ static unsigned long mpx_mmap(unsigned long len)
> > return addr;
> > }
> >
> > -enum reg_type {
> > - REG_TYPE_RM = 0,
> > - REG_TYPE_INDEX,
> > - REG_TYPE_BASE,
> > -};
> > -
> > -static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> > - enum reg_type type)
> > -{
> > - int regno = 0;
> > -
> > - static const int regoff[] = {
> > - offsetof(struct pt_regs, ax),
> > - offsetof(struct pt_regs, cx),
> > - offsetof(struct pt_regs, dx),
> > - offsetof(struct pt_regs, bx),
> > - offsetof(struct pt_regs, sp),
> > - offsetof(struct pt_regs, bp),
> > - offsetof(struct pt_regs, si),
> > - offsetof(struct pt_regs, di),
> > -#ifdef CONFIG_X86_64
> > - offsetof(struct pt_regs, r8),
> > - offsetof(struct pt_regs, r9),
> > - offsetof(struct pt_regs, r10),
> > - offsetof(struct pt_regs, r11),
> > - offsetof(struct pt_regs, r12),
> > - offsetof(struct pt_regs, r13),
> > - offsetof(struct pt_regs, r14),
> > - offsetof(struct pt_regs, r15),
> > -#endif
> > - };
> > - int nr_registers = ARRAY_SIZE(regoff);
> > - /*
> > - * Don't possibly decode a 32-bit instructions as
> > - * reading a 64-bit-only register.
> > - */
> > - if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
> > - nr_registers -= 8;
> > -
> > - switch (type) {
> > - case REG_TYPE_RM:
> > - regno = X86_MODRM_RM(insn->modrm.value);
> > - if (X86_REX_B(insn->rex_prefix.value))
> > - regno += 8;
> > - break;
> > -
> > - case REG_TYPE_INDEX:
> > - regno = X86_SIB_INDEX(insn->sib.value);
> > - if (X86_REX_X(insn->rex_prefix.value))
> > - regno += 8;
> > - /*
> > - * If mod !=3, SP is not used as index. Check is done after
> > - * looking at REX.X This is because R12 can be used as an
> > - * index.
> > - */
> > - if (regno == 4 && X86_MODRM_RM(insn->modrm.value) != 3)
> > - return 0;
> > - break;
> > -
> > - case REG_TYPE_BASE:
> > - regno = X86_SIB_BASE(insn->sib.value);
> > - if (regno == 5 && X86_MODRM_RM(insn->modrm.value) == 0) {
> > - WARN_ONCE(1, "An explicit displacement is required when %sBP used as SIB base.",
> > - (IS_ENABLED(CONFIG_X86_64) && insn->x86_64) ?
> > - "R13 or R" : "E");
> > - return -EINVAL;
> > - }
> > -
> > - if (X86_REX_B(insn->rex_prefix.value))
> > - regno += 8;
> > - break;
> > -
> > - default:
> > - pr_err("invalid register type");
> > - BUG();
> > - break;
> > - }
> > -
> > - if (regno >= nr_registers) {
> > - WARN_ONCE(1, "decoded an instruction with an invalid register");
> > - return -EINVAL;
> > - }
> > - return regoff[regno];
> > -}
> > -
> > -/*
> > - * return the address being referenced be instruction
> > - * for rm=3 returning the content of the rm reg
> > - * for rm!=3 calculates the address using SIB and Disp
> > - */
> > -static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> > -{
> > - unsigned long addr, base, indx;
> > - int addr_offset, base_offset, indx_offset;
> > - insn_byte_t sib;
> > -
> > - insn_get_modrm(insn);
> > - insn_get_sib(insn);
> > - sib = insn->sib.value;
> > -
> > - if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > - if (addr_offset < 0)
> > - goto out_err;
> > - addr = regs_get_register(regs, addr_offset);
> > - } else {
> > - if (insn->sib.nbytes) {
> > - base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> > - if (base_offset < 0)
> > - goto out_err;
> > -
> > - indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> > - if (indx_offset < 0)
> > - goto out_err;
> > -
> > - base = regs_get_register(regs, base_offset);
> > - if (indx_offset)
> > - indx = regs_get_register(regs, indx_offset);
> > - else
> > - indx = 0;
> > - addr = base + indx * (1 << X86_SIB_SCALE(sib));
> > - } else {
> > - addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> > - if (addr_offset < 0)
> > - goto out_err;
> > - addr = regs_get_register(regs, addr_offset);
> > - }
> > - addr += insn->displacement.value;
> > - }
> > - return (void __user *)addr;
> > -out_err:
> > - return (void __user *)-1;
> > -}
> > -
> > static int mpx_insn_decode(struct insn *insn,
> > struct pt_regs *regs)
> > {
> > @@ -305,7 +171,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
> > info->si_signo = SIGSEGV;
> > info->si_errno = 0;
> > info->si_code = SEGV_BNDERR;
> > - info->si_addr = mpx_get_addr_ref(&insn, regs);
> > + info->si_addr = insn_get_addr_ref(&insn, regs);
> > /*
> > * We were not able to extract an address from the instruction,
> > * probably because there was something invalid in it.
> > --
> > 2.9.3
> >
>
>
On Fri, 2016-12-23 at 18:11 -0800, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
> <[email protected]> wrote:
> > The feature User-Mode Instruction Prevention present in recent Intel
> > processor prevents a group of instructions from being executed with
> > CPL > 0. Otherwise, a general protection fault is issued.
> >
> > Rather than relaying this fault to the user space (in the form of a SIGSEGV
> > signal), the instructions protected by UMIP can be emulated to provide
> > dummy results. This allows to conserve the current kernel behavior and not
> > reveal the system resources that UMIP intends to protect (the global
> > descriptor and interrupt descriptor tables, the segment selectors of the
> > local descriptor table and the task state and the machine status word).
> >
> > This emulation is needed because certain applications (e.g., WineHQ) rely
> > on this subset of instructions to function.
> >
> > The instructions protected by UMIP can be split in two groups. Those who
> > return a kernel memory address (sgdt and sidt) and those who return a
> > value (sldt, str and smsw).
> >
> > For the instructions that return a kernel memory address, the result is
> > emulated as the location of a dummy variable in the kernel memory space.
> > This is needed as applications such as WineHQ rely on the result being
> > located in the kernel memory space function. The limit for the GDT and the
> > IDT are set to zero.
>
> Nak. This is a trivial KASLR bypass. Just give them hardcoded
> values. For x86_64, I would suggest 0xfffffffffffe0000 and
> 0xffffffffffff0000.
I see. I assume you are suggesting these values for x86_64 because they
lie in an unused hole. That makes sense to me.
For the case of x86_32, I have trouble finding a suitable place as there
are not many available holes. It could be put before VMALLOC_START or
after VMALLOC_END but this would reveal the position of the vmalloc
area. Although, to my knowledge, randomized memory is not available for
x86_32. Without randomization, does it hurt to make sidt/sgdt return the
address of a kernel static variable?
>
> >
> > The instructions sldt and str return a segment selector relative to the
> > base address of the global descriptor table. Since the actual address of
> > such table is not revealed, it makes sense to emulate the result as zero.
>
> Hmm, now I wonder if anything uses SLDT to see if there is an LDT. If
> so, we could emulate it better, but I doubt this matters.
So you are saying that the emulated sldt should return a different value
based on the presence/absence of a LDT? This could reveal this very
fact.
>
> >
> > The instruction smsw is emulated to return zero.
>
> If you're going to emulate it, please return something plausible. The
> protected mode bit should be on, for example. 0x33 is probably
> reasonable.
Sure. Will do.
>
> > +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> > + unsigned char *data, int *data_size)
> > +{
> > + unsigned long const *dummy_base_addr;
> > + unsigned short dummy_limit = 0;
> > + unsigned short dummy_value = 0;
> > +
> > + switch (umip_inst) {
> > + /*
> > + * These two instructions return the base address and limit of the
> > + * global and interrupt descriptor table. The base address can be
> > + * 32-bit or 64-bit. Limit is always 16-bit.
> > + */
> > + case UMIP_SGDT:
> > + case UMIP_SIDT:
> > + if (umip_inst == UMIP_SGDT)
> > + dummy_base_addr = &umip_dummy_gdt_base;
> > + else
> > + dummy_base_addr = &umip_dummy_idt_base;
> > + if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> > + WARN_ONCE(1, "SGDT cannot take register as argument!\n");
>
> No warnings please.
I'll. Remove it.
>
> > +int fixup_umip_exception(struct pt_regs *regs)
> > +{
> > + struct insn insn;
> > + unsigned char buf[MAX_INSN_SIZE];
> > + /* 10 bytes is the maximum size of the result of UMIP instructions */
> > + unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> > + int x86_64 = !test_thread_flag(TIF_IA32);
>
> user_64bit_mode(regs)
I'll make this change.
>
> > + int not_copied, nr_copied, reg_offset, dummy_data_size;
> > + void __user *uaddr;
> > + unsigned long *reg_addr;
> > + enum umip_insn umip_inst;
> > +
> > + not_copied = copy_from_user(buf, (void __user *)regs->ip, sizeof(buf));
>
> This is slightly wrong due to PKRU. I doubt we care.
I see. If I am not mistaken, if the memory is protected by a protection
key this would cause a page fault. I'll make a note of it.
>
> > + nr_copied = sizeof(buf) - not_copied;
> > + /*
> > + * The decoder _should_ fail nicely if we pass it a short buffer.
> > + * But, let's not depend on that implementation detail. If we
> > + * did not get anything, just error out now.
> > + */
> > + if (!nr_copied)
> > + return -EFAULT;
>
> If the caller cares about EINVAL vs EFAULT, it cares because it is
> considering changing the signal to a fake page fault. If so, then
> this should be EINVAL -- failure to read the text should just prevent
> emulation.
I see. The caller in this case do_general_protection, which will issue a
SIGSEGV to the user space anyways. I don't think it cares about the
EINVAL vs EFAULT. It does care about whether the emulation was
successful.
>
> > + insn_init(&insn, buf, nr_copied, x86_64);
> > + insn_get_length(&insn);
> > + if (nr_copied < insn.length)
> > + return -EFAULT;
>
> Ditto.
I will change to EINVAL.
>
> > +
> > + umip_inst = __identify_insn(&insn);
> > + /* Check if we found an instruction protected by UMIP */
> > + if (umip_inst < 0)
> > + return -EINVAL;
> > +
> > + if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
> > + return -EINVAL;
> > +
> > + /* If operand is a register, write directly to it */
> > + if (X86_MODRM_MOD(insn.modrm.value) == 3) {
> > + reg_offset = get_reg_offset_rm(&insn, regs);
> > + reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
> > + memcpy(reg_addr, dummy_data, dummy_data_size);
> > + } else {
> > + uaddr = insn_get_addr_ref(&insn, regs);
> > + nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> > + if (nr_copied > 0)
> > + return -EFAULT;
>
> This should be the only EFAULT case.
Should this be EFAULT event if the caller cares only about successful
(return 0) vs failed (return non-0) emulation?
Thanks for your thorough review! I really appreciate it.
Thanks and BR,
Ricardo
On Mon, 2016-12-26 at 00:49 +0900, Masami Hiramatsu wrote:
> On Fri, 23 Dec 2016 17:37:43 -0800
> Ricardo Neri <[email protected]> wrote:
>
> > +static int __identify_insn(struct insn *insn)
> > +{
> > + /* by getting modrm we also get the opcode */
> > + insn_get_modrm(insn);
> > + if (insn->opcode.bytes[0] != 0xf)
> > + return -EINVAL;
> > +
> > + if (insn->opcode.bytes[1] == 0x1) {
> > + switch (X86_MODRM_REG(insn->modrm.value)) {
> > + case 0:
> > + return UMIP_SGDT;
> > + case 1:
> > + return UMIP_SIDT;
> > + case 4:
> > + return UMIP_SMSW;
> > + default:
> > + return -EINVAL;
> > + }
> > + } else if (insn->opcode.bytes[1] == 0x0) {
> > + if (X86_MODRM_REG(insn->modrm.value) == 0)
> > + return UMIP_SLDT;
> > + else if (X86_MODRM_REG(insn->modrm.value) == 1)
> > + return UMIP_STR;
> > + else
> > + return -EINVAL;
> > + }
>
> gcc detected an error here, you may need return "-EINVAL".
I will make this change. I removed this EINVAL at the last minute as it
didn't look right. It was indeed right.
Thanks and BR,
Ricardo
>
> Thanks,
>
>
>
On Tue, Dec 27, 2016 at 4:39 PM, Ricardo Neri
<[email protected]> wrote:
> On Fri, 2016-12-23 at 18:11 -0800, Andy Lutomirski wrote:
>> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
>> <[email protected]> wrote:
>> > The feature User-Mode Instruction Prevention present in recent Intel
>> > processor prevents a group of instructions from being executed with
>> > CPL > 0. Otherwise, a general protection fault is issued.
>> >
>> > Rather than relaying this fault to the user space (in the form of a SIGSEGV
>> > signal), the instructions protected by UMIP can be emulated to provide
>> > dummy results. This allows to conserve the current kernel behavior and not
>> > reveal the system resources that UMIP intends to protect (the global
>> > descriptor and interrupt descriptor tables, the segment selectors of the
>> > local descriptor table and the task state and the machine status word).
>> >
>> > This emulation is needed because certain applications (e.g., WineHQ) rely
>> > on this subset of instructions to function.
>> >
>> > The instructions protected by UMIP can be split in two groups. Those who
>> > return a kernel memory address (sgdt and sidt) and those who return a
>> > value (sldt, str and smsw).
>> >
>> > For the instructions that return a kernel memory address, the result is
>> > emulated as the location of a dummy variable in the kernel memory space.
>> > This is needed as applications such as WineHQ rely on the result being
>> > located in the kernel memory space function. The limit for the GDT and the
>> > IDT are set to zero.
>>
>> Nak. This is a trivial KASLR bypass. Just give them hardcoded
>> values. For x86_64, I would suggest 0xfffffffffffe0000 and
>> 0xffffffffffff0000.
>
> I see. I assume you are suggesting these values for x86_64 because they
> lie in an unused hole. That makes sense to me.
>
> For the case of x86_32, I have trouble finding a suitable place as there
> are not many available holes. It could be put before VMALLOC_START or
> after VMALLOC_END but this would reveal the position of the vmalloc
> area. Although, to my knowledge, randomized memory is not available for
> x86_32. Without randomization, does it hurt to make sidt/sgdt return the
> address of a kernel static variable?
I would just use the same addresses, truncated. There's no reason
that the address needs to be truly not present -- it just needs to be
inaccessible to user code. Anything near the top of the address space
should work.
>
>>
>> >
>> > The instructions sldt and str return a segment selector relative to the
>> > base address of the global descriptor table. Since the actual address of
>> > such table is not revealed, it makes sense to emulate the result as zero.
>>
>> Hmm, now I wonder if anything uses SLDT to see if there is an LDT. If
>> so, we could emulate it better, but I doubt this matters.
>
> So you are saying that the emulated sldt should return a different value
> based on the presence/absence of a LDT? This could reveal this very
> fact.
User code knows whether the LDT exists because an LDT only exists if
the program called modify_ldt(). But I doubt this matters in
practice.
>> > +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
>> > + unsigned char *data, int *data_size)
>> > +{
>> > + unsigned long const *dummy_base_addr;
>> > + unsigned short dummy_limit = 0;
>> > + unsigned short dummy_value = 0;
>> > +
>> > + switch (umip_inst) {
>> > + /*
>> > + * These two instructions return the base address and limit of the
>> > + * global and interrupt descriptor table. The base address can be
>> > + * 32-bit or 64-bit. Limit is always 16-bit.
>> > + */
>> > + case UMIP_SGDT:
>> > + case UMIP_SIDT:
>> > + if (umip_inst == UMIP_SGDT)
>> > + dummy_base_addr = &umip_dummy_gdt_base;
>> > + else
>> > + dummy_base_addr = &umip_dummy_idt_base;
>> > + if (X86_MODRM_MOD(insn->modrm.value) == 3) {
>> > + WARN_ONCE(1, "SGDT cannot take register as argument!\n");
>>
>> No warnings please.
>
> I'll. Remove it.
Thanks. In general, WARN_ONCE, etc are supposed to indicate kernel
bugs, not user bugs.
>> > + int not_copied, nr_copied, reg_offset, dummy_data_size;
>> > + void __user *uaddr;
>> > + unsigned long *reg_addr;
>> > + enum umip_insn umip_inst;
>> > +
>> > + not_copied = copy_from_user(buf, (void __user *)regs->ip, sizeof(buf));
>>
>> This is slightly wrong due to PKRU. I doubt we care.
>
> I see. If I am not mistaken, if the memory is protected by a protection
> key this would cause a page fault. I'll make a note of it.
Exactly. This is correct behavior unless the key happens to be set up
so it can be executed but not read, in which case emulation will fail.
>>
>> > + nr_copied = sizeof(buf) - not_copied;
>> > + /*
>> > + * The decoder _should_ fail nicely if we pass it a short buffer.
>> > + * But, let's not depend on that implementation detail. If we
>> > + * did not get anything, just error out now.
>> > + */
>> > + if (!nr_copied)
>> > + return -EFAULT;
>>
>> If the caller cares about EINVAL vs EFAULT, it cares because it is
>> considering changing the signal to a fake page fault. If so, then
>> this should be EINVAL -- failure to read the text should just prevent
>> emulation.
>
> I see. The caller in this case do_general_protection, which will issue a
> SIGSEGV to the user space anyways. I don't think it cares about the
> EINVAL vs EFAULT. It does care about whether the emulation was
> successful.
Maybe just make it return bool then? But fixing up the return codes
would be fine, too. I just think that, if it returns int, the value
should be meaningful.
>> > + if (nr_copied > 0)
>> > + return -EFAULT;
>>
>> This should be the only EFAULT case.
> Should this be EFAULT event if the caller cares only about successful
> (return 0) vs failed (return non-0) emulation?
In theory this particular error would be a page fault not a general
protection fault (in the UMIP off case). If you were emulating it
extra carefully, you could change the signal accordingly. But, as I
said, I really doubt this matters.
On Tue, 2016-12-27 at 16:48 -0800, Andy Lutomirski wrote:
> On Tue, Dec 27, 2016 at 4:39 PM, Ricardo Neri
> <[email protected]> wrote:
> > On Fri, 2016-12-23 at 18:11 -0800, Andy Lutomirski wrote:
> >> On Fri, Dec 23, 2016 at 5:37 PM, Ricardo Neri
> >> <[email protected]> wrote:
> >> > The feature User-Mode Instruction Prevention present in recent Intel
> >> > processor prevents a group of instructions from being executed with
> >> > CPL > 0. Otherwise, a general protection fault is issued.
> >> >
> >> > Rather than relaying this fault to the user space (in the form of a SIGSEGV
> >> > signal), the instructions protected by UMIP can be emulated to provide
> >> > dummy results. This allows to conserve the current kernel behavior and not
> >> > reveal the system resources that UMIP intends to protect (the global
> >> > descriptor and interrupt descriptor tables, the segment selectors of the
> >> > local descriptor table and the task state and the machine status word).
> >> >
> >> > This emulation is needed because certain applications (e.g., WineHQ) rely
> >> > on this subset of instructions to function.
> >> >
> >> > The instructions protected by UMIP can be split in two groups. Those who
> >> > return a kernel memory address (sgdt and sidt) and those who return a
> >> > value (sldt, str and smsw).
> >> >
> >> > For the instructions that return a kernel memory address, the result is
> >> > emulated as the location of a dummy variable in the kernel memory space.
> >> > This is needed as applications such as WineHQ rely on the result being
> >> > located in the kernel memory space function. The limit for the GDT and the
> >> > IDT are set to zero.
> >>
> >> Nak. This is a trivial KASLR bypass. Just give them hardcoded
> >> values. For x86_64, I would suggest 0xfffffffffffe0000 and
> >> 0xffffffffffff0000.
> >
> > I see. I assume you are suggesting these values for x86_64 because they
> > lie in an unused hole. That makes sense to me.
> >
> > For the case of x86_32, I have trouble finding a suitable place as there
> > are not many available holes. It could be put before VMALLOC_START or
> > after VMALLOC_END but this would reveal the position of the vmalloc
> > area. Although, to my knowledge, randomized memory is not available for
> > x86_32. Without randomization, does it hurt to make sidt/sgdt return the
> > address of a kernel static variable?
>
> I would just use the same addresses, truncated. There's no reason
> that the address needs to be truly not present -- it just needs to be
> inaccessible to user code. Anything near the top of the address space
> should work.
Right. Then I will reuse the same addresses.
>
> >
> >>
> >> >
> >> > The instructions sldt and str return a segment selector relative to the
> >> > base address of the global descriptor table. Since the actual address of
> >> > such table is not revealed, it makes sense to emulate the result as zero.
> >>
> >> Hmm, now I wonder if anything uses SLDT to see if there is an LDT. If
> >> so, we could emulate it better, but I doubt this matters.
> >
> > So you are saying that the emulated sldt should return a different value
> > based on the presence/absence of a LDT? This could reveal this very
> > fact.
>
> User code knows whether the LDT exists because an LDT only exists if
> the program called modify_ldt(). But I doubt this matters in
> practice.
In such a case sldt would return a non-null segment selector. I will
keep giving the null segment selector in all cases and make a note in
the code.
>
> >> > +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> >> > + unsigned char *data, int *data_size)
> >> > +{
> >> > + unsigned long const *dummy_base_addr;
> >> > + unsigned short dummy_limit = 0;
> >> > + unsigned short dummy_value = 0;
> >> > +
> >> > + switch (umip_inst) {
> >> > + /*
> >> > + * These two instructions return the base address and limit of the
> >> > + * global and interrupt descriptor table. The base address can be
> >> > + * 32-bit or 64-bit. Limit is always 16-bit.
> >> > + */
> >> > + case UMIP_SGDT:
> >> > + case UMIP_SIDT:
> >> > + if (umip_inst == UMIP_SGDT)
> >> > + dummy_base_addr = &umip_dummy_gdt_base;
> >> > + else
> >> > + dummy_base_addr = &umip_dummy_idt_base;
> >> > + if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> >> > + WARN_ONCE(1, "SGDT cannot take register as argument!\n");
> >>
> >> No warnings please.
> >
> > I'll. Remove it.
>
> Thanks. In general, WARN_ONCE, etc are supposed to indicate kernel
> bugs, not user bugs.
Agreed. Your statement makes it very clear. I didn't have it that clear
in my mind.
>
> >> > + int not_copied, nr_copied, reg_offset, dummy_data_size;
> >> > + void __user *uaddr;
> >> > + unsigned long *reg_addr;
> >> > + enum umip_insn umip_inst;
> >> > +
> >> > + not_copied = copy_from_user(buf, (void __user *)regs->ip, sizeof(buf));
> >>
> >> This is slightly wrong due to PKRU. I doubt we care.
> >
> > I see. If I am not mistaken, if the memory is protected by a protection
> > key this would cause a page fault. I'll make a note of it.
>
> Exactly. This is correct behavior unless the key happens to be set up
> so it can be executed but not read, in which case emulation will fail.
If we can't read we can't emulate anyways. I will add another note.
>
> >>
> >> > + nr_copied = sizeof(buf) - not_copied;
> >> > + /*
> >> > + * The decoder _should_ fail nicely if we pass it a short buffer.
> >> > + * But, let's not depend on that implementation detail. If we
> >> > + * did not get anything, just error out now.
> >> > + */
> >> > + if (!nr_copied)
> >> > + return -EFAULT;
> >>
> >> If the caller cares about EINVAL vs EFAULT, it cares because it is
> >> considering changing the signal to a fake page fault. If so, then
> >> this should be EINVAL -- failure to read the text should just prevent
> >> emulation.
> >
> > I see. The caller in this case do_general_protection, which will issue a
> > SIGSEGV to the user space anyways. I don't think it cares about the
> > EINVAL vs EFAULT. It does care about whether the emulation was
> > successful.
>
> Maybe just make it return bool then? But fixing up the return codes
> would be fine, too. I just think that, if it returns int, the value
> should be meaningful.
Right. I have a small query/proposal related to this below.
>
> >> > + if (nr_copied > 0)
> >> > + return -EFAULT;
> >>
> >> This should be the only EFAULT case.
> > Should this be EFAULT event if the caller cares only about successful
> > (return 0) vs failed (return non-0) emulation?
>
> In theory this particular error would be a page fault not a general
> protection fault (in the UMIP off case). If you were emulating it
> extra carefully, you could change the signal accordingly. But, as I
> said, I really doubt this matters.
If simple enough and for the sake of accuracy, I could try to issue the
page fault. It seems to me that this entitles calling
force_sig_info_fault in this particular case as opposed to the
force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk) that do_general_protection
calls.
Thanks and BR,
Ricardo
On Thu, Dec 29, 2016 at 9:23 PM, Ricardo Neri
<[email protected]> wrote:
> On Tue, 2016-12-27 at 16:48 -0800, Andy Lutomirski wrote:
>>
>> >> > + if (nr_copied > 0)
>> >> > + return -EFAULT;
>> >>
>> >> This should be the only EFAULT case.
>> > Should this be EFAULT event if the caller cares only about successful
>> > (return 0) vs failed (return non-0) emulation?
>>
>> In theory this particular error would be a page fault not a general
>> protection fault (in the UMIP off case). If you were emulating it
>> extra carefully, you could change the signal accordingly. But, as I
>> said, I really doubt this matters.
>
> If simple enough and for the sake of accuracy, I could try to issue the
> page fault. It seems to me that this entitles calling
> force_sig_info_fault in this particular case as opposed to the
> force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk) that do_general_protection
> calls.
Sure. You could even do it by sending the signal in the emulation
code and returning true.
--Andy