This is v8 of this series. The seven previous submissions can be found
here [1], here [2], here[3], here[4], here[5], here[6] and here[7]. This
version addresses the feedback comments from Borislav Petkov received on
v7. Please see details in the change log.
=== What is UMIP?
User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table. Hiding these system resources reduces the tools
available to craft privilege escalation attacks such as [8].
These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register
If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.
=== How does it impact applications?
When enabled, However, UMIP will change the behavior that certain
applications expect from the operating system. For instance, programs
running on WineHQ and DOSEMU2 rely on some of these instructions to
function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the
instruction SMSW when running in virtual-8086 mode[9]. SGDT and SIDT can
also be used on virtual-8086 mode.
In order to not change the behavior of the system. This patchset emulates
SGDT, SIDT and SMSW. This should be sufficient to not break the
applications mentioned above. Regarding the two remaining instructions, STR
and SLDT, the WineHQ team has shown interest catching the general protection
fault and use it as a vehicle to fix broken applications[10]. Furthermore,
STR and SLDT can only run in protected and long modes.
DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken
unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support
it. Also, this should not pose a security risk as no system resouces would
be revealed. Instead, code running inside the KVM would only see the KVM's
GDT, IDT and MSW.
Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.
=== How are UMIP-protected instructions emulated?
UMIP is kept enabled at all times when the CONFIG_x86_INTEL_UMIP option is
selected. If a general protection fault caused by the instructions
protected by UMIP is detected, such fault will be trapped and fixed-up. The
return values will be dummy as follows:
* SGDT and SIDT return hard-coded dummy values as the base of the global
descriptor and interrupt descriptor tables. These hard-coded values
correspond to memory addresses that are near the end of the kernel
memory map. This is also the case for virtual-8086 mode tasks. In all
my experiments with 32-bit processes, the base of GDT and IDT was always
a 4-byte address, even for 16-bit operands. Thus, my emulation code does
the same. In all cases, the limit of the table is set to 0.
* SMSW returns the value with which the CR0 register is programmed in
head_32/64.S at boot time. This is, the following bits are enabled:
CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
Extension Type, which will always be 1 in recent processors with UMIP;
CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
Mask and CR0.31 for Paging. As per the Intel 64 and IA-32 Architectures
Software Developer's Manual, SMSW returns a 16-bit results for memory
operands. However, when the operand is a register, the results can be up
to CR0[63:0]. Since the emulation code only kicks-in for 32-bit
processes, we return up to CR[31:0].
* The proposed emulation code is handles faults that happens in both
protected and virtual-8086 mode.
* Again, STR and SLDT are not emulated.
=== How is this series laid out?
++ Preparatory work
As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved
the x86 page fault error codes to a header. Also, I made user_64bit_mode
available to x86_32 builds. This helps to reuse code and reduce the number
of #ifdef's in these patches. Borislav also suggested to uprobes should use
the existing definitions in arch/x86/include/asm/inat.h instead of hard-
coded values when checking instruction prefixes. I included this change
in the series.
++ Fix bugs in MPX address decoder
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put this code in a separate
library file that both MPX, UMIP and potentially others can access and
avoid code duplication.
Before creating the new library, I fixed a couple of bugs that I found in
in corner cases on how MPX determines the address contained in the
instruction and operands.
++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as the segment base address. Also, support to
process 16-bit address encodings. Armed with this arsenal, it is now
possible to determine the linear address onto which the emulated results
shall be copied. Furthermore, this new library relies on and extends the
capabilities of the existing instruction decoder in arch/x86/lib/insn.c.
This code supports long mode with 32 and 64 bit addresses, protected mode
with 16 and 32 bit addresses and virtual-8086 mode with 16 and 32 bit
addresses. Both global and local descriptor tables are supported.
Segmentation is supported in protected mode; in long mode, is supported
via the FS and GS registers.
++ Emulate UMIP instructions
A new fixup_umip_exception() functions inspect the instruction at the
instruction pointer. If it is an UMIP-protected instruction, it executes
the emulation code. This uses all the address-computing code of the
previous section.
++ Add self-tests
Lastly, self-tests are added to entry_from_v86.c to exercise the most
typical use cases of UMIP-protected instructions in a virtual-8086 mode.
++ Extensive tests
Extensive tests were performed to test all the combinations of ModRM,
SiB and displacements for 16-bit and 32-bit encodings for the SS, DS,
ES, FS and GS segments. Tests also include a 64-bit program that uses
segmentation via FS and GS. For this purpose, I temporarily enabled UMIP
support for 64-bit process. This change is not part of this patchset.
The intention is to test the computations of linear addresses in 64-bit
mode, including the extra R8-R15 registers. Extensive test is also
implemented for virtual-8086 tasks. Code of these tests can be found here
[11] and here [12].
++ Merging this series?
Eight versions of this series have been submitted. Am I any close to see
these patches merged? :)
[1]. https://lwn.net/Articles/705877/
[2]. https://lkml.org/lkml/2016/12/23/265
[3]. https://lkml.org/lkml/2017/1/25/622
[4]. https://lkml.org/lkml/2017/2/23/40
[5]. https://lkml.org/lkml/2017/3/3/678
[6]. https://lkml.org/lkml/2017/3/7/866
[7]. https://lkml.org/lkml/2017/5/5/398
[8]. http://timetobleed.com/a-closer-look-at-a-recent-privilege-escalation-bug-in-linux-cve-2013-2094/
[9]. https://www.winehq.org/pipermail/wine-devel/2017-April/117159.html
[10]. https://marc.info/?l=linux-kernel&m=147876798717927&w=2
[11]. https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files
[12]. https://github.com/01org/luv-yocto/commit/a72a7fe7d68693c0f4100ad86de6ecabde57334f#diff-3860c136a63add269bce4ea50222c248R1
Thanks and BR,
Ricardo
Changes since V7:
*UMIP is not enabled by default.
*Relocated definition of the initial state of CR0 into processor-flags.h
*Updated uprobes to use the autogenerated INAT_PFX_xS definitions instead of
hard-coded values.
*In insn-eval.c, refer to segment override prefixes using the autogenerated
INAT_PFX_XS definitions.
*Removed enumeration for segment registers that reused the segment override
instruction prefixes. Instead, a new, separate, set of #defines is used in
arch/x86/include/asm/inat.h
*Simplified function to identify string instruction.
*Split the code usde to determine the relevant segment register into two
functions: one to inspect segment overrides and a second one to determine
default segment registers based on the instruction and operands. A third
functions reads the segment register to obtain the segment selector.
*Reworked arithmetic to compute 32-bit and 64-bit effective addresses. Instead
of type casts, two separate functions are used in each case.
*Removed structure to hold segment default address and operand sizes. Used
#defines instead.
*Corrected bug when determining the limit of a segment.
*Updated various functions to use error codes from errno-base.h
*Replaced prink_ratelimited with pr_err_ratelimited.
*Corrected typos and format errors in functions' documentation.
*Fixed unimplemented handling of emulation of the SMSW instruction.
*Added documentation to file containing implementation for UMIP.
*Improved error handling in fixup_umip_exception() function.
Changes since V6:
*Reworded and addded more details on the special cases of ModRM and SIB
bytes. To avoid confusion, I ommited mentioning the involved registers
(EBP and ESP).
*Replaced BUG() with printk_ratelimited in function get_reg_offset of
insn-eval.c
*Removed unused utility functions that obtain a register value from pt_regs
given a SIB base and index.
*Clarified nomenclature to call CS, DS, ES, FS, GS and SS segment registers
and their values segment selectors.
*Reworked function resolve_seg_register to issue an error when more than
one segment overrides prefixes are used in the instruction.
*Added logic in resolve_seg_register to ignore segment register when in
long mode and not using FS or GS.
*Added logic to ensure the effective address is within the limits of the
segment in protected mode.
*Added logic to ensure segment override prefixes are ignored when resolving
the segment of EIP and EDI with string instructions.
*Added code to make user_64bit_mode() available in CONFIG_X86_32... and
make it return false, of course.
*Merged the two functions that obtain the default address and operand size
of a code segment into one as they are always used together.
*Corrected logic of displacement-only addressing in long mode to make the
displacement relative to the RIP of the next instruction.
*Reworked logic to sign-extend 32-bit memory offsets into 64-bit signed
memory offsets. This include more checks and putting all together in an
utility function.
*Removed the 'unlikely' of conditional statements as we are not in a
critical path.
*In virtual-8086 mode, ensure that effective addresses are always less
than 0x10000, even when address override prefixes are used. Also, ensure
that linear addresses have a size of 20-bits.
Changes since V5:
* Relocate the page fault error code enumerations to traps.h
Changes since V4:
* Audited patches to use braces in all the branches of conditional.
statements, except those in which the conditional action only takes one
line.
* Implemented support in 64-builds for both 32-bit and 64-bit tasks in the
instruction evaluating library.
* Split segment selector function in the instruction evaluating library
into two functions to resolve the segment type by instruction override
or default and a separate function to actually read the segment selector.
* Fixed a bug when evaluating 32-bit effective addresses with 64-bit
kernels.
* Split patches further for for easier review.
* Use signed variables for computation of effective address.
* Fixed issue with a spurious static modifier in function insn_get_addr_ref
found by kbuild test bot.
* Removed comparison between true and fixup_umip_exception.
* Reworked check logic when identifying erroneous vs invalid values of the
SiB base and index.
Changes since V3:
* Limited emulation to 32-bit and 16-bit modes. For 64-bit mode, a general
protection fault is still issued when UMIP-protected instructions are
executed with CPL > 0.
* Expanded instruction-evaluating code to obtain segment descriptor along
with their attributes such as base address and default address and
operand sizes. Also, support for 16-bit encodings in protected mode was
implemented.
* When getting a segment descriptor, this include support to obtain those
of a local descriptor table.
* Now the instruction-evaluating code returns -EDOM when the value of
registers should not be used in calculating the effective address. The
value -EINVAL is left for errors.
* Incorporate the value of the segment base address in the computation of
linear addresses.
* Renamed new instruction evaluation library from insn-kernel.c to
insn-eval.c
* Exported functions insn_get_reg_offset_* to obtain the register offset
by ModRM r/m, SiB base and SiB index.
* Improved documentation of functions.
* Split patches further for easier review.
Changes since V2:
* Added new utility functions to decode the memory addresses contained in
registers when the 16-bit addressing encodings are used. This includes
code to obtain and compute memory addresses using segment selectors for
real-mode address translation.
* Added support to emulate UMIP-protected instructions for virtual-8086
tasks.
* Added self-tests for virtual-8086 mode that contains representative
use cases: address represented as a displacement, address in registers
and registers as operands.
* Instead of maintaining a static variable for the dummy base addresses
of the IDT and GDT, a hard-coded value is used.
* The emulated SMSW instructions now return the value with which the CR0
register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP
| AM. For x86_64, PG is also enabled.
* The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/
insn-kernel.c. It also has its own header. This helps keep in sync the
the kernel and objtool instruction decoders. Also, the new insn-kernel.c
contains utility functions that are only relevant in a kernel context.
* Removed printed warnings for errors that occur when decoding instructions
with invalid operands.
* Added more comments on fixes in the instruction-decoding MPX functions.
* Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32)
to determine if the task is 32-bit or 64-bit.
* Found and fixed a bug in insn-decoder in which X86_MODRM_RM was
incorrectly used to obtain the mod part of the ModRM byte.
* Added more explanatory code in emulation and instruction decoding code.
This includes a comment regarding that copy_from_user could fail if there
exists a memory protection key in place.
* Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now.
* Prefixed get_reg_offset_rm with insn_ as this function is exposed
via a header file. For clarity, this function was added in a separate
patch.
Changes since V1:
* Virtual-8086 mode tasks are not treated in a special manner. All code
for this purpose was removed.
* Instead of attempting to disable UMIP during a context switch or when
entering virtual-8086 mode, UMIP remains enabled all the time. General
protection faults that occur are fixed-up by returning dummy values as
detailed above.
* Removed umip= kernel parameter in favor of using clearcpuid=514 to
disable UMIP.
* Removed selftests designed to detect the absence of SIGSEGV signals when
running in virtual-8086 mode.
* Reused code from MPX to decode instructions operands. For this purpose
code was put in a common location.
* Fixed two bugs in MPX code that decodes operands.
Ricardo Neri (28):
x86/mm: Relocate page fault error codes to traps.h
x86/boot: Relocate definition of the initial state of CR0
ptrace,x86: Make user_64bit_mode() available to 32-bit builds
uprobes/x86: Use existing definitions for segment override prefixes
x86/mpx: Use signed variables to compute effective addresses
x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is
not 11b
x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval
file
x86/insn-eval: Do not BUG on invalid register type
x86/insn-eval: Add a utility function to get register offsets
x86/insn-eval: Add utility function to identify string instructions
x86/insn-eval: Add utility functions to get segment selector
x86/insn-eval: Add utility function to get segment descriptor
x86/insn-eval: Add utility functions to get segment descriptor base
address and limit
x86/insn-eval: Add function to get default params of code segment
x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and
ModRM.rm is 101b
x86/insn-eval: Incorporate segment base in linear address computation
x86/insn-eval: Add support to resolve 32-bit address encodings
x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
x86/insn-eval: Add support to resolve 16-bit addressing encodings
x86/cpufeature: Add User-Mode Instruction Prevention definitions
x86: Add emulation code for UMIP instructions
x86/umip: Force a page fault when unable to copy emulated result to
user
x86: Enable User-Mode Instruction Prevention
x86/traps: Fixup general protection faults caused by UMIP
selftests/x86: Add tests for User-Mode Instruction Prevention
selftests/x86: Add tests for instruction str and sldt
arch/x86/Kconfig | 10 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/inat.h | 10 +
arch/x86/include/asm/insn-eval.h | 24 +
arch/x86/include/asm/ptrace.h | 6 +-
arch/x86/include/asm/traps.h | 18 +
arch/x86/include/asm/umip.h | 12 +
arch/x86/include/uapi/asm/processor-flags.h | 8 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 25 +-
arch/x86/kernel/head_32.S | 3 -
arch/x86/kernel/head_64.S | 3 -
arch/x86/kernel/traps.c | 5 +
arch/x86/kernel/umip.c | 344 ++++++++
arch/x86/kernel/uprobes.c | 15 +-
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/insn-eval.c | 1127 +++++++++++++++++++++++++
arch/x86/mm/fault.c | 88 +-
arch/x86/mm/mpx.c | 120 +--
tools/testing/selftests/x86/entry_from_vm86.c | 89 +-
21 files changed, 1730 insertions(+), 189 deletions(-)
create mode 100644 arch/x86/include/asm/insn-eval.h
create mode 100644 arch/x86/include/asm/umip.h
create mode 100644 arch/x86/kernel/umip.c
create mode 100644 arch/x86/lib/insn-eval.c
--
2.13.0
Rather than using hard-coded values of the segment override prefixes,
leverage the existing definitions provided in inat.h.
Suggested-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/kernel/uprobes.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 495c776de4b4..a3755d293a48 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn)
int i;
for (i = 0; i < insn->prefixes.nbytes; i++) {
- switch (insn->prefixes.bytes[i]) {
- case 0x26: /* INAT_PFX_ES */
- case 0x2E: /* INAT_PFX_CS */
- case 0x36: /* INAT_PFX_DS */
- case 0x3E: /* INAT_PFX_SS */
- case 0xF0: /* INAT_PFX_LOCK */
+ insn_attr_t attr;
+
+ attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]);
+ switch (attr) {
+ case INAT_MAKE_PREFIX(INAT_PFX_ES):
+ case INAT_MAKE_PREFIX(INAT_PFX_CS):
+ case INAT_MAKE_PREFIX(INAT_PFX_DS):
+ case INAT_MAKE_PREFIX(INAT_PFX_SS):
+ case INAT_MAKE_PREFIX(INAT_PFX_LOCK):
return true;
}
}
--
2.13.0
Both head_32.S and head_64.S utilize the same value to initialize the
control register CR0. Also, other parts of the kernel might want to access
to this initial definition (e.g., emulation code for User-Mode Instruction
Prevention uses this state to provide a sane dummy value for CR0 when
emulating the smsw instruction). Thus, relocate this definition to a
header file from which it can be conveniently accessed.
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/uapi/asm/processor-flags.h | 6 ++++++
arch/x86/kernel/head_32.S | 3 ---
arch/x86/kernel/head_64.S | 3 ---
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index 185f3d10c194..aae1f2aa7563 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -151,5 +151,11 @@
#define CX86_ARR_BASE 0xc4
#define CX86_RCR_BASE 0xdc
+/*
+ * Initial state of CR0 for head_32/64.S
+ */
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
+ X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
+ X86_CR0_PG)
#endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 0332664eb158..f64059835863 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -213,9 +213,6 @@ ENTRY(startup_32_smp)
#endif
.Ldefault_entry:
-#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
- X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
- X86_CR0_PG)
movl $(CR0_STATE & ~X86_CR0_PG),%eax
movl %eax,%cr0
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 513cbb012ecc..5e1bfdd86b5b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -149,9 +149,6 @@ ENTRY(secondary_startup_64)
1: wrmsr /* Make changes effective */
/* Setup cr0 */
-#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
- X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
- X86_CR0_PG)
movl $CR0_STATE, %eax
/* Make changes effective */
movq %rax, %cr0
--
2.13.0
Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.
Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve and userspace linear addresses based on
the contents of the instruction operands as well as the contents of pt_regs
structure.
These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.
Functions are simply relocated. There are not functional or indentation
changes. The checkpatch script issues the following warning with this
commit:
WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
rather than BUG() or BUG_ON()
+ BUG();
This warning will be fixed in a subsequent patch.
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/insn-eval.h | 16 ++++
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/insn-eval.c | 163 +++++++++++++++++++++++++++++++++++++++
arch/x86/mm/mpx.c | 156 +------------------------------------
4 files changed, 182 insertions(+), 155 deletions(-)
create mode 100644 arch/x86/include/asm/insn-eval.h
create mode 100644 arch/x86/lib/insn-eval.c
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 000000000000..5cab1b1da84d
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include <linux/compiler.h>
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <asm/ptrace.h>
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a74131a12c..675d7b075fed 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 000000000000..2bb8303ba92f
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,163 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/inat.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+
+enum reg_type {
+ REG_TYPE_RM = 0,
+ REG_TYPE_INDEX,
+ REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+ int regno = 0;
+
+ static const int regoff[] = {
+ offsetof(struct pt_regs, ax),
+ offsetof(struct pt_regs, cx),
+ offsetof(struct pt_regs, dx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, sp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+#endif
+ };
+ int nr_registers = ARRAY_SIZE(regoff);
+ /*
+ * Don't possibly decode a 32-bit instructions as
+ * reading a 64-bit-only register.
+ */
+ if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+ nr_registers -= 8;
+
+ switch (type) {
+ case REG_TYPE_RM:
+ regno = X86_MODRM_RM(insn->modrm.value);
+ if (X86_REX_B(insn->rex_prefix.value))
+ regno += 8;
+ break;
+
+ case REG_TYPE_INDEX:
+ regno = X86_SIB_INDEX(insn->sib.value);
+ if (X86_REX_X(insn->rex_prefix.value))
+ regno += 8;
+
+ /*
+ * If ModRM.mod != 3 and SIB.index = 4 the scale*index
+ * portion of the address computation is null. This is
+ * true only if REX.X is 0. In such a case, the SIB index
+ * is used in the address computation.
+ */
+ if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
+ return -EDOM;
+ break;
+
+ case REG_TYPE_BASE:
+ regno = X86_SIB_BASE(insn->sib.value);
+ /*
+ * If ModRM.mod is 0 and SIB.base == 5, the base of the
+ * register-indirect addressing is 0. In this case, a
+ * 32-bit displacement follows the SIB byte.
+ */
+ if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+ return -EDOM;
+
+ if (X86_REX_B(insn->rex_prefix.value))
+ regno += 8;
+ break;
+
+ default:
+ pr_err("invalid register type");
+ BUG();
+ break;
+ }
+
+ if (regno >= nr_registers) {
+ WARN_ONCE(1, "decoded an instruction with an invalid register");
+ return -EINVAL;
+ }
+ return regoff[regno];
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+ int addr_offset, base_offset, indx_offset;
+ unsigned long linear_addr;
+ long eff_addr, base, indx;
+ insn_byte_t sib;
+
+ insn_get_modrm(insn);
+ insn_get_sib(insn);
+ sib = insn->sib.value;
+
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+ if (addr_offset < 0)
+ goto out_err;
+
+ eff_addr = regs_get_register(regs, addr_offset);
+ } else {
+ if (insn->sib.nbytes) {
+ /*
+ * Negative values in the base and index offset means
+ * an error when decoding the SIB byte. Except -EDOM,
+ * which means that the registers should not be used
+ * in the address computation.
+ */
+ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+ if (base_offset == -EDOM)
+ base = 0;
+ else if (base_offset < 0)
+ goto out_err;
+ else
+ base = regs_get_register(regs, base_offset);
+
+ indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
+ if (indx_offset == -EDOM)
+ indx = 0;
+ else if (indx_offset < 0)
+ goto out_err;
+ else
+ indx = regs_get_register(regs, indx_offset);
+
+ eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+ } else {
+ addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+ if (addr_offset < 0)
+ goto out_err;
+
+ eff_addr = regs_get_register(regs, addr_offset);
+ }
+
+ eff_addr += insn->displacement.value;
+ }
+
+ linear_addr = (unsigned long)eff_addr;
+
+ return (void __user *)linear_addr;
+out_err:
+ return (void __user *)-1;
+}
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 53e24ca01f29..28782059ad2d 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -12,6 +12,7 @@
#include <linux/sched/sysctl.h>
#include <asm/insn.h>
+#include <asm/insn-eval.h>
#include <asm/mman.h>
#include <asm/mmu_context.h>
#include <asm/mpx.h>
@@ -60,159 +61,6 @@ static unsigned long mpx_mmap(unsigned long len)
return addr;
}
-enum reg_type {
- REG_TYPE_RM = 0,
- REG_TYPE_INDEX,
- REG_TYPE_BASE,
-};
-
-static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
- enum reg_type type)
-{
- int regno = 0;
-
- static const int regoff[] = {
- offsetof(struct pt_regs, ax),
- offsetof(struct pt_regs, cx),
- offsetof(struct pt_regs, dx),
- offsetof(struct pt_regs, bx),
- offsetof(struct pt_regs, sp),
- offsetof(struct pt_regs, bp),
- offsetof(struct pt_regs, si),
- offsetof(struct pt_regs, di),
-#ifdef CONFIG_X86_64
- offsetof(struct pt_regs, r8),
- offsetof(struct pt_regs, r9),
- offsetof(struct pt_regs, r10),
- offsetof(struct pt_regs, r11),
- offsetof(struct pt_regs, r12),
- offsetof(struct pt_regs, r13),
- offsetof(struct pt_regs, r14),
- offsetof(struct pt_regs, r15),
-#endif
- };
- int nr_registers = ARRAY_SIZE(regoff);
- /*
- * Don't possibly decode a 32-bit instructions as
- * reading a 64-bit-only register.
- */
- if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
- nr_registers -= 8;
-
- switch (type) {
- case REG_TYPE_RM:
- regno = X86_MODRM_RM(insn->modrm.value);
- if (X86_REX_B(insn->rex_prefix.value))
- regno += 8;
- break;
-
- case REG_TYPE_INDEX:
- regno = X86_SIB_INDEX(insn->sib.value);
- if (X86_REX_X(insn->rex_prefix.value))
- regno += 8;
-
- /*
- * If ModRM.mod != 3 and SIB.index = 4 the scale*index
- * portion of the address computation is null. This is
- * true only if REX.X is 0. In such a case, the SIB index
- * is used in the address computation.
- */
- if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
- return -EDOM;
- break;
-
- case REG_TYPE_BASE:
- regno = X86_SIB_BASE(insn->sib.value);
- /*
- * If ModRM.mod is 0 and SIB.base == 5, the base of the
- * register-indirect addressing is 0. In this case, a
- * 32-bit displacement follows the SIB byte.
- */
- if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
- return -EDOM;
-
- if (X86_REX_B(insn->rex_prefix.value))
- regno += 8;
- break;
-
- default:
- pr_err("invalid register type");
- BUG();
- break;
- }
-
- if (regno >= nr_registers) {
- WARN_ONCE(1, "decoded an instruction with an invalid register");
- return -EINVAL;
- }
- return regoff[regno];
-}
-
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
- */
-static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
-{
- int addr_offset, base_offset, indx_offset;
- unsigned long linear_addr;
- long eff_addr, base, indx;
- insn_byte_t sib;
-
- insn_get_modrm(insn);
- insn_get_sib(insn);
- sib = insn->sib.value;
-
- if (X86_MODRM_MOD(insn->modrm.value) == 3) {
- addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
- if (addr_offset < 0)
- goto out_err;
-
- eff_addr = regs_get_register(regs, addr_offset);
- } else {
- if (insn->sib.nbytes) {
- /*
- * Negative values in the base and index offset means
- * an error when decoding the SIB byte. Except -EDOM,
- * which means that the registers should not be used
- * in the address computation.
- */
- base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
- if (base_offset == -EDOM)
- base = 0;
- else if (base_offset < 0)
- goto out_err;
- else
- base = regs_get_register(regs, base_offset);
-
- indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
- if (indx_offset == -EDOM)
- indx = 0;
- else if (indx_offset < 0)
- goto out_err;
- else
- indx = regs_get_register(regs, indx_offset);
-
- eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
- } else {
- addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
- if (addr_offset < 0)
- goto out_err;
-
- eff_addr = regs_get_register(regs, addr_offset);
- }
-
- eff_addr += insn->displacement.value;
- }
-
- linear_addr = (unsigned long)eff_addr;
-
- return (void __user *)linear_addr;
-out_err:
- return (void __user *)-1;
-}
-
static int mpx_insn_decode(struct insn *insn,
struct pt_regs *regs)
{
@@ -325,7 +173,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
info->si_signo = SIGSEGV;
info->si_errno = 0;
info->si_code = SEGV_BNDERR;
- info->si_addr = mpx_get_addr_ref(&insn, regs);
+ info->si_addr = insn_get_addr_ref(&insn, regs);
/*
* We were not able to extract an address from the instruction,
* probably because there was something invalid in it.
--
2.13.0
32-bit and 64-bit address encodings are identical. Thus, the same logic
could be used to resolve the effective address. However, there are two key
differences: address size and enforcement of segment limits.
If running a 32-bit process on a 64-bit kernel, it is best to perform
the address calculation using 32-bit data types. In this manner hardware
is used for the arithmetic, including handling of signs and overflows.
32-bit addresses are generally used in protected mode; segment limits are
enforced in this mode. This implementation obtains the limit of the
segment associated with the instruction operands and prefixes. If the
computed address is outside the segment limits, an error is returned. It
is also possible to use 32-bit address in long mode and virtual-8086 mode
by using an address override prefix. In such cases, segment limits are not
enforced.
The new function get_addr_ref_32() is almost identical to the existing
function insn_get_addr_ref() (used for 64-bit addresses); except for the
differences mentioned above. For the sake of simplicity and readability,
it is better to use two separate functions.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 147 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8ae110a273de..6730c9ba02c5 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -665,6 +665,153 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_RM);
}
+/**
+ * get_addr_ref_32() - Obtain a 32-bit linear address
+ * @insn: Instruction struct with ModRM and SIB bytes and displacement
+ * @regs: Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 32-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable. If in protected
+ * mode, segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs)
+{
+ int eff_addr, base, indx, addr_offset, base_offset, indx_offset;
+ unsigned long linear_addr, seg_base_addr, seg_limit, tmp;
+ insn_byte_t sib;
+
+ insn_get_modrm(insn);
+ insn_get_sib(insn);
+ sib = insn->sib.value;
+
+ if (insn->addr_bytes != 4)
+ goto out_err;
+
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+ if (addr_offset < 0)
+ goto out_err;
+
+ tmp = regs_get_register(regs, addr_offset);
+ /* The 4 most significant bytes must be zero. */
+ if (tmp & ~0xffffffffL)
+ goto out_err;
+
+ eff_addr = (int)(tmp & 0xffffffff);
+
+ seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+ if (seg_base_addr == -1L)
+ goto out_err;
+
+ seg_limit = get_seg_limit(regs, insn, addr_offset);
+ } else {
+ if (insn->sib.nbytes) {
+ /*
+ * Negative values in the base and index offset means
+ * an error when decoding the SIB byte. Except -EDOM,
+ * which means that the registers should not be used
+ * in the address computation.
+ */
+ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+ if (base_offset == -EDOM) {
+ base = 0;
+ } else if (base_offset < 0) {
+ goto out_err;
+ } else {
+ tmp = regs_get_register(regs, base_offset);
+ /* The 4 most significant bytes must be zero. */
+ if (tmp & ~0xffffffffL)
+ goto out_err;
+
+ base = (int)(tmp & 0xffffffff);
+ }
+
+ indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
+ if (indx_offset == -EDOM) {
+ indx = 0;
+ } else if (indx_offset < 0) {
+ goto out_err;
+ } else {
+ tmp = regs_get_register(regs, indx_offset);
+ /* The 4 most significant bytes must be zero. */
+ if (tmp & ~0xffffffffL)
+ goto out_err;
+
+ indx = (int)(tmp & 0xffffffff);
+ }
+
+ eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+ seg_base_addr = insn_get_seg_base(regs, insn,
+ base_offset);
+ if (seg_base_addr == -1L)
+ goto out_err;
+
+ seg_limit = get_seg_limit(regs, insn, base_offset);
+ } else {
+ addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+
+ /*
+ * -EDOM means that we must ignore the address_offset.
+ * In such a case, in 64-bit mode the effective address
+ * relative to the RIP of the following instruction.
+ */
+ if (addr_offset == -EDOM) {
+ if (user_64bit_mode(regs))
+ eff_addr = (long)regs->ip + insn->length;
+ else
+ eff_addr = 0;
+ } else if (addr_offset < 0) {
+ goto out_err;
+ } else {
+ tmp = regs_get_register(regs, addr_offset);
+ /* The 4 most significant bytes must be zero. */
+ if (tmp & ~0xffffffffL)
+ goto out_err;
+
+ eff_addr = (int)(tmp & 0xffffffff);
+ }
+
+ seg_base_addr = insn_get_seg_base(regs, insn,
+ addr_offset);
+ if (seg_base_addr == -1L)
+ goto out_err;
+
+ seg_limit = get_seg_limit(regs, insn, addr_offset);
+ }
+ eff_addr += insn->displacement.value;
+ }
+
+ /*
+ * In protected mode, before computing the linear address, make sure
+ * the effective address is within the limits of the segment.
+ * 32-bit addresses can be used in long and virtual-8086 modes if an
+ * address override prefix is used. In such cases, segment limits are
+ * not enforced. When in virtual-8086 mode, the segment limit is -1L
+ * to reflect this situation.
+ *
+ * After computed, the effective address is treated as an unsigned
+ * quantity.
+ */
+ if (!user_64bit_mode(regs) && ((unsigned int)eff_addr > seg_limit))
+ goto out_err;
+
+ /*
+ * Data type long could be 64 bits in size. Ensure that our 32-bit
+ * effective address is not sign-extended when computing the linear
+ * address.
+ */
+ linear_addr = (unsigned long)(eff_addr & 0xffffffff) + seg_base_addr;
+
+ return (void __user *)linear_addr;
+out_err:
+ return (void __user *)-1L;
+}
+
/*
* return the address being referenced be instruction
* for rm=3 returning the content of the rm reg
--
2.13.0
insn_get_addr_ref() returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. The segment descriptor to use depends on the register used as
operand and segment override prefixes, if any.
In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.
Since the effective address is a signed quantity, the unsigned segment
base address is saved in a separate variable and added to the final,
unsigned, effective address.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 04f696c3793e..8ae110a273de 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -673,7 +673,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
{
int addr_offset, base_offset, indx_offset;
- unsigned long linear_addr;
+ unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx;
insn_byte_t sib;
@@ -687,6 +687,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
goto out_err;
eff_addr = regs_get_register(regs, addr_offset);
+
+ seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+ if (seg_base_addr == -1L)
+ goto out_err;
} else {
if (insn->sib.nbytes) {
/*
@@ -712,6 +716,11 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+ seg_base_addr = insn_get_seg_base(regs, insn,
+ base_offset);
+ if (seg_base_addr == -1L)
+ goto out_err;
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
@@ -730,12 +739,17 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
} else {
eff_addr = regs_get_register(regs, addr_offset);
}
+
+ seg_base_addr = insn_get_seg_base(regs, insn,
+ addr_offset);
+ if (seg_base_addr == -1L)
+ goto out_err;
}
eff_addr += insn->displacement.value;
}
- linear_addr = (unsigned long)eff_addr;
+ linear_addr = (unsigned long)eff_addr + seg_base_addr;
return (void __user *)linear_addr;
out_err:
--
2.13.0
String instructions are special because, in protected mode, the linear
address is always obtained via the ES segment register in operands that
use the (E)DI register; the DS segment register in operands that use
the (E)SI register. Furthermore, segment override prefixes are ignored
when calculating a linear address involving the (E)DI register; segment
override prefixes can be used when calculating linear addresses involving
the (E)SI register.
It follows that linear addresses are calculated differently for the case of
string instructions. The purpose of this utility function is to identify
such instructions for callers to determine a linear address correctly.
Note that this function only identifies string instructions; it does not
determine what segment register to use in the address computation. That is
left to callers. A subsequent commmit introduces a function to determine
the segment register to use given the instruction, operands and
segment override prefixes.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index c6120e9298f5..25b2eb3c64c1 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -16,6 +16,32 @@ enum reg_type {
REG_TYPE_BASE,
};
+/**
+ * is_string_insn() - Determine if instruction is a string instruction
+ * @insn: Instruction structure containing the opcode
+ *
+ * Return: true if the instruction, determined by the opcode, is any of the
+ * string instructions as defined in the Intel Software Development manual.
+ * False otherwise.
+ */
+static bool is_string_insn(struct insn *insn)
+{
+ insn_get_opcode(insn);
+
+ /* All string instructions have a 1-byte opcode. */
+ if (insn->opcode.nbytes != 1)
+ return false;
+
+ switch (insn->opcode.bytes[0]) {
+ case 0x6c ... 0x6f: /* INS, OUTS */
+ case 0xa4 ... 0xa7: /* MOVS, CMPS */
+ case 0xaa ... 0xaf: /* STOS, LODS, SCAS */
+ return true;
+ default:
+ return false;
+ }
+}
+
static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
enum reg_type type)
{
--
2.13.0
The function get_reg_offset() returns the offset to the register the
argument specifies as indicated in an enumeration of type offset. Callers
of this function would need the definition of such enumeration. This is
not needed. Instead, add helper functions for this purpose. These functions
are useful in cases when, for instance, the caller needs to decide whether
the operand is a register or a memory location by looking at the rm part
of the ModRM byte. As of now, this is the only helper function that is
needed.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/insn-eval.h | 1 +
arch/x86/lib/insn-eval.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1da84d..7e8c9633a377 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,6 @@
#include <asm/ptrace.h>
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 3919458fecbf..c6120e9298f5 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
return regoff[regno];
}
+/**
+ * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
+ * @insn: Instruction structure containing the ModRM byte
+ * @regs: Structure with register values as seen when entering kernel mode
+ *
+ * Return: The register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register and shall be ignored.
+ */
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
+{
+ return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
/*
* return the address being referenced be instruction
* for rm=3 returning the content of the rm reg
--
2.13.0
We are not in a critical failure path. The invalid register type is caused
when trying to decode invalid instruction bytes from a user-space program.
Thus, simply print an error message. To prevent this warning from being
abused from user space programs, use the rate-limited variant of pr_err().
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2bb8303ba92f..3919458fecbf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,6 +5,7 @@
*/
#include <linux/kernel.h>
#include <linux/string.h>
+#include <linux/ratelimit.h>
#include <asm/inat.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
@@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
break;
default:
- pr_err("invalid register type");
- BUG();
- break;
+ pr_err_ratelimited("insn: x86: invalid register type");
+ return -EINVAL;
}
if (regno >= nr_registers) {
--
2.13.0
Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.
The instructions protected by UMIP are executed in representative use
cases:
a) displacement-only memory addressing
b) register-indirect memory addressing
c) results stored directly in operands
Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
tools/testing/selftests/x86/entry_from_vm86.c | 73 ++++++++++++++++++++++++++-
1 file changed, 72 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0e5ca1..130e8ad1db05 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+ "vmcode_umip:\n\t"
+ /* addressing via displacements */
+ "smsw (2052)\n\t"
+ "sidt (2054)\n\t"
+ "sgdt (2060)\n\t"
+ /* addressing via registers */
+ "mov $2066, %bx\n\t"
+ "smsw (%bx)\n\t"
+ "mov $2068, %bx\n\t"
+ "sidt (%bx)\n\t"
+ "mov $2074, %bx\n\t"
+ "sgdt (%bx)\n\t"
+ /* register operands, only for smsw */
+ "smsw %ax\n\t"
+ "mov %ax, (2080)\n\t"
+ "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
extern unsigned char vmcode[], end_vmcode[];
extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
- vmcode_sti[], vmcode_int3[], vmcode_int80[];
+ vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
/* Returns false if the test was skipped. */
static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
return true;
}
+void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
+{
+ struct table_desc {
+ unsigned short limit;
+ unsigned long base;
+ } __attribute__((packed));
+
+ /* Initialize variables with arbitrary values */
+ struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x9999 };
+ struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae };
+ struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 };
+ struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 };
+ unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737;
+
+ /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */
+ do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests");
+
+ /* Results from displacement-only addressing */
+ msw1 = *(unsigned short *)(test_mem + 2052);
+ memcpy(&idt1, test_mem + 2054, sizeof(idt1));
+ memcpy(&gdt1, test_mem + 2060, sizeof(gdt1));
+
+ /* Results from register-indirect addressing */
+ msw2 = *(unsigned short *)(test_mem + 2066);
+ memcpy(&idt2, test_mem + 2068, sizeof(idt2));
+ memcpy(&gdt2, test_mem + 2074, sizeof(gdt2));
+
+ /* Results when using register operands */
+ msw3 = *(unsigned short *)(test_mem + 2080);
+
+ printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1);
+ printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n",
+ idt1.limit, idt1.base);
+ printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n",
+ gdt1.limit, gdt1.base);
+
+ if ((msw1 != msw2) || (msw1 != msw3))
+ printf("[FAIL]\tAll the results of SMSW should be the same.\n");
+ else
+ printf("[PASS]\tAll the results from SMSW are identical.\n");
+
+ if (memcmp(&gdt1, &gdt2, sizeof(gdt1)))
+ printf("[FAIL]\tAll the results of SGDT should be the same.\n");
+ else
+ printf("[PASS]\tAll the results from SGDT are identical.\n");
+
+ if (memcmp(&idt1, &idt2, sizeof(idt1)))
+ printf("[FAIL]\tAll the results of SIDT should be the same.\n");
+ else
+ printf("[PASS]\tAll the results from SIDT are identical.\n");
+}
+
int main(void)
{
struct vm86plus_struct v86;
@@ -218,6 +286,9 @@ int main(void)
v86.regs.eax = (unsigned int)-1;
do_test(&v86, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
+ /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+ do_umip_tests(&v86, addr);
+
/* Execute a null pointer */
v86.regs.cs = 0;
v86.regs.ss = 0;
--
2.13.0
The instructions str and sldt are not recognized when running on virtual-
8086 mode and generate an invalid operand exception. These two
instructions are protected by the Intel User-Mode Instruction Prevention
(UMIP) security feature. In protected mode, if UMIP is enabled, these
instructions generate a general protection fault if called from CPL > 0.
Linux traps the general protection fault and emulates the instructions
sgdt, sidt and smsw; but not str and sldt.
These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.
Tests fallback to exit with int3 in case emulation does happen.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
tools/testing/selftests/x86/entry_from_vm86.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c
index 130e8ad1db05..b7a0c9024477 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
"smsw %ax\n\t"
"mov %ax, (2080)\n\t"
"int3\n\t"
+ "vmcode_umip_str:\n\t"
+ "str %eax\n\t"
+ "vmcode_umip_sldt:\n\t"
+ "sldt %eax\n\t"
+ "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -119,7 +124,8 @@ asm (
extern unsigned char vmcode[], end_vmcode[];
extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
- vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+ vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+ vmcode_umip_str[], vmcode_umip_sldt[];
/* Returns false if the test was skipped. */
static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
printf("[FAIL]\tAll the results of SIDT should be the same.\n");
else
printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+ sethandler(SIGILL, sighandler, 0);
+ do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+ "STR instruction");
+ clearhandler(SIGILL);
+
+ sethandler(SIGILL, sighandler, 0);
+ do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+ "SLDT instruction");
+ clearhandler(SIGILL);
}
int main(void)
--
2.13.0
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception() will emulate dummy results for these
instructions as follows: if running a 32-bit process, sgdt, sidt and smsw
are emulated; str and sldt are not emulated. No emulation is done for
64-bit processes.
If emulation is successful, the result is passed to the user space program
and no SIGSEGV signal is emitted.
Please note that fixup_umip_exception() also caters for the case when
the fault originated while running in virtual-8086 mode.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: [email protected]
Reviewed-by: Andy Lutomirski <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/kernel/traps.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bf54309b85da..1c1bb7992f70 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
#include <asm/trace/mpx.h>
#include <asm/mpx.h>
#include <asm/vm86.h>
+#include <asm/umip.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -526,6 +527,10 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
+ if (static_cpu_has(X86_FEATURE_UMIP))
+ if (user_mode(regs) && fixup_umip_exception(regs))
+ return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
--
2.13.0
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.
It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.
UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/Kconfig | 10 ++++++++++
arch/x86/kernel/cpu/common.c | 25 ++++++++++++++++++++++++-
2 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ce3ed304288d..5c384d926937 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1801,6 +1801,16 @@ config X86_SMAP
If unsure, say Y.
+config X86_INTEL_UMIP
+ def_bool n
+ depends on CPU_SUP_INTEL
+ prompt "Intel User Mode Instruction Prevention" if EXPERT
+ ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b95cd94ca97b..5066d7ffa55e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -348,6 +348,28 @@ static void setup_pcid(struct cpuinfo_x86 *c)
}
}
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+ /* Check the boot processor, plus build option for UMIP. */
+ if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+ goto out;
+
+ /* Check the current processor's cpuid bits. */
+ if (!cpu_has(c, X86_FEATURE_UMIP))
+ goto out;
+
+ cr4_set_bits(X86_CR4_UMIP);
+
+ return;
+
+out:
+ /*
+ * Make sure UMIP is disabled in case it was enabled in a
+ * previous boot (e.g., via kexec).
+ */
+ cr4_clear_bits(X86_CR4_UMIP);
+}
+
/*
* Protection Keys are not available in 32-bit mode.
*/
@@ -1158,9 +1180,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP */
+ /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+ setup_umip(c);
/* Set up PCID */
setup_pcid(c);
--
2.13.0
Even though memory addresses are unsigned, the operands used to compute the
effective address do have a sign. This is true for ModRM.rm, SIB.base,
SIB.index as well as the displacement bytes. Thus, signed variables shall
be used when computing the effective address from these operands. Once the
signed effective address has been computed, it is casted to an unsigned
long to determine the linear address.
Variables are renamed to better reflect the type of address being
computed.
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Nathan Howard <[email protected]>
Cc: Adan Hawthorn <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/mm/mpx.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9ceaa955d2ba..9eec98022510 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,8 +138,9 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
*/
static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
{
- unsigned long addr, base, indx;
int addr_offset, base_offset, indx_offset;
+ unsigned long linear_addr;
+ long eff_addr, base, indx;
insn_byte_t sib;
insn_get_modrm(insn);
@@ -150,7 +151,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
- addr = regs_get_register(regs, addr_offset);
+
+ eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +165,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
base = regs_get_register(regs, base_offset);
indx = regs_get_register(regs, indx_offset);
- addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+ eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
- addr = regs_get_register(regs, addr_offset);
+
+ eff_addr = regs_get_register(regs, addr_offset);
}
- addr += insn->displacement.value;
+
+ eff_addr += insn->displacement.value;
}
- return (void __user *)addr;
+
+ linear_addr = (unsigned long)eff_addr;
+
+ return (void __user *)linear_addr;
out_err:
return (void __user *)-1;
}
--
2.13.0
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.
Rather than relaying to the user space the general protection fault caused
by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
can be trapped and emulate the result of such instructions to provide dummy
values. This allows to both conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (i.e., the
locations of the global descriptor and interrupt descriptor tables, the
segment selectors of the local descriptor table, the value of the task
state register and the contents of the CR0 register).
This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function. Given that sldt
and str are not commonly used in programs that run on WineHQ or DOSEMU2,
they are not emulated. Also, emulation is provided only for 32-bit
processes; 64-bit processes that attempt to use the instructions that UMIP
protects will receive the SIGSEGV signal issued as a consequence of the
general protection fault.
The instructions protected by UMIP can be split in two groups. Those which
return a kernel memory address (sgdt and sidt) and those which return a
value (sldt, str and smsw).
For the instructions that return a kernel memory address, applications such
as WineHQ rely on the result being located in the kernel memory space, not
the actual location of the table. The result is emulated as a hard-coded
value that lies close to the top of the kernel memory. The limit for the
GDT and the IDT are set to zero.
The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.
Care is taken to appropriately emulate the results when segmentation is
used. That is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses a local descriptor table.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/umip.h | 12 ++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/umip.c | 303 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 316 insertions(+)
create mode 100644 arch/x86/include/asm/umip.h
create mode 100644 arch/x86/kernel/umip.c
diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000000000000..db43f2a0d92c
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
+#endif /* CONFIG_X86_INTEL_UMIP */
+#endif /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 287eac7d207f..e057e22cd0d4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o
obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000000000000..cd3201a7deca
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,303 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention feature
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri <[email protected]>
+ */
+
+#include <linux/uaccess.h>
+#include <asm/umip.h>
+#include <asm/traps.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+#include <linux/ratelimit.h>
+
+/** DOC: Emulation for User-Mode Instruction Prevention (UMIP)
+ *
+ * The feature User-Mode Instruction Prevention present in recent Intel
+ * processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str)
+ * from being executed with CPL > 0. Otherwise, a general protection fault is
+ * issued.
+ *
+ * Rather than relaying to the user space the general protection fault caused by
+ * the UMIP-protected instructions (in the form of a SIGSEGV signal), it can be
+ * trapped and emulate the result of such instructions to provide dummy values.
+ * This allows to both conserve the current kernel behavior and not reveal the
+ * system resources that UMIP intends to protect (i.e., the locations of the
+ * global descriptor and interrupt descriptor tables, the segment selectors of
+ * the local descriptor table, the value of the task state register and the
+ * contents of the CR0 register).
+ *
+ * This emulation is needed because certain applications (e.g., WineHQ and
+ * DOSEMU2) rely on this subset of instructions to function.
+ *
+ * The instructions protected by UMIP can be split in two groups. Those which
+ * return a kernel memory address (sgdt and sidt) and those which return a
+ * value (sldt, str and smsw).
+ *
+ * For the instructions that return a kernel memory address, applications
+ * such as WineHQ rely on the result being located in the kernel memory space,
+ * not the actual location of the table. The result is emulated as a hard-coded
+ * value that, lies close to the top of the kernel memory. The limit for the GDT
+ * and the IDT are set to zero.
+ *
+ * Given that sldt and str are not commonly used in programs that run on WineHQ
+ * or DOSEMU2, they are not emulated.
+ *
+ * The instruction smsw is emulated to return the value that the register CR0
+ * has at boot time as set in the head_32.
+ *
+ * Also, emulation is provided only for 32-bit processes; 64-bit processes
+ * that attempt to use the instructions that UMIP protects will receive the
+ * SIGSEGV signal issued as a consequence of the general protection fault.
+ *
+ * Care is taken to appropriately emulate the results when segmentation is
+ * used. That is, rather than relying on USER_DS and USER_CS, the function
+ * insn_get_addr_ref() inspects the segment descriptor pointed by the
+ * registers in pt_regs. This ensures that we correctly obtain the segment
+ * base address and the address and operand sizes even if the user space
+ * application uses a local descriptor table.
+ */
+
+#define UMIP_DUMMY_GDT_BASE 0xfffe0000
+#define UMIP_DUMMY_IDT_BASE 0xffff0000
+
+/*
+ * The SGDT and SIDT instructions store the contents of the global descriptor
+ * table and interrupt table registers, respectively. The destination is a
+ * memory operand of X+2 bytes. X bytes are used to store the base address of
+ * the table and 2 bytes are used to store the limit. In 32-bit processes, the
+ * only processes for which emulation is provided, X has a value of 4.
+ */
+#define UMIP_GDT_IDT_BASE_SIZE 4
+#define UMIP_GDT_IDT_LIMIT_SIZE 2
+
+#define UMIP_INST_SGDT 0 /* 0F 01 /0 */
+#define UMIP_INST_SIDT 1 /* 0F 01 /1 */
+#define UMIP_INST_SMSW 3 /* 0F 01 /4 */
+
+/**
+ * identify_insn() - Identify a UMIP-protected instruction
+ * @insn: Instruction structure with opcode and ModRM byte.
+ *
+ * From the instruction opcode and the reg part of the ModRM byte, identify,
+ * if any, a UMIP-protected instruction.
+ *
+ * Return: a constant that identifies a specific UMIP-protected instruction.
+ * -EINVAL when not an UMIP-protected instruction.
+ */
+static int identify_insn(struct insn *insn)
+{
+ /* By getting modrm we also get the opcode. */
+ insn_get_modrm(insn);
+
+ /* All the instructions of interest start with 0x0f. */
+ if (insn->opcode.bytes[0] != 0xf)
+ return -EINVAL;
+
+ if (insn->opcode.bytes[1] == 0x1) {
+ switch (X86_MODRM_REG(insn->modrm.value)) {
+ case 0:
+ return UMIP_INST_SGDT;
+ case 1:
+ return UMIP_INST_SIDT;
+ case 4:
+ return UMIP_INST_SMSW;
+ default:
+ return -EINVAL;
+ }
+ }
+ /* SLDT AND STR are not emulated */
+ return -EINVAL;
+}
+
+/**
+ * emulate_umip_insn() - Emulate UMIP instructions with dummy values
+ * @insn: Instruction structure with operands
+ * @umip_inst: Instruction to emulate
+ * @data: Buffer into which the dummy values will be copied
+ * @data_size: Size of the emulated result
+ *
+ * Emulate an instruction protected by UMIP. The result of the emulation
+ * is saved in the provided buffer. The size of the results depends on both
+ * the instruction and type of operand (register vs memory address). Thus,
+ * the size of the result needs to be updated.
+ *
+ * Result: 0 if success, -EINVAL on error while emulating.
+ */
+static int emulate_umip_insn(struct insn *insn, int umip_inst,
+ unsigned char *data, int *data_size)
+{
+ unsigned long dummy_base_addr, dummy_value;
+ unsigned short dummy_limit = 0;
+
+ if (!data || !data_size || !insn)
+ return -EINVAL;
+ /*
+ * These two instructions return the base address and limit of the
+ * global and interrupt descriptor table, respectively. According to the
+ * Intel Software Development manual, the base address can be 24-bit,
+ * 32-bit or 64-bit. Limit is always 16-bit. If the operand size is
+ * 16-bit, the returned value of the base address is supposed to be a
+ * zero-extended 24-byte number. However, it seems that a 32-byte number
+ * is always returned irrespective of the operand size.
+ */
+
+ if (umip_inst == UMIP_INST_SGDT || umip_inst == UMIP_INST_SIDT) {
+ /* SGDT and SIDT do not use registers operands. */
+ if (X86_MODRM_MOD(insn->modrm.value) == 3)
+ return -EINVAL;
+
+ if (umip_inst == UMIP_INST_SGDT)
+ dummy_base_addr = UMIP_DUMMY_GDT_BASE;
+ else
+ dummy_base_addr = UMIP_DUMMY_IDT_BASE;
+
+ *data_size = UMIP_GDT_IDT_LIMIT_SIZE + UMIP_GDT_IDT_BASE_SIZE;
+
+ memcpy(data + 2, &dummy_base_addr, UMIP_GDT_IDT_BASE_SIZE);
+ memcpy(data, &dummy_limit, UMIP_GDT_IDT_LIMIT_SIZE);
+
+ } else if (umip_inst == UMIP_INST_SMSW) {
+ dummy_value = CR0_STATE;
+
+ /*
+ * Even though the CR0 register has 4 bytes, the number
+ * of bytes to be copied in the result buffer is determined
+ * by whether the operand is a register or a memory location.
+ * If operand is a register, return as many bytes as the operand
+ * size. If operand is memory, return only the two least
+ * siginificant bytes of CR0.
+ */
+ if (X86_MODRM_MOD(insn->modrm.value) == 3)
+ *data_size = insn->opnd_bytes;
+ else
+ *data_size = 2;
+
+ memcpy(data, &dummy_value, *data_size);
+ /* STR and SLDT are not emulated */
+ } else {
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * fixup_umip_exception() - Fixup #GP faults caused by UMIP
+ * @regs: Registers as saved when entering the #GP trap
+ *
+ * The instructions sgdt, sidt, str, smsw, sldt cause a general protection
+ * fault if executed with CPL > 0 (i.e., from user space). If the offending
+ * user-space process is 32-bit, this function fixes the exception up and
+ * provides dummy values for the sgdt, sidt and smsw; str and sldt are not
+ * fixed up. Also 64-bit user-space processes are not fixed up.
+ *
+ * If operands are memory addresses, results are copied to user-
+ * space memory as indicated by the instruction pointed by EIP using the
+ * registers indicated in the instruction operands. If operands are registers,
+ * results are copied into the context that was saved when entering kernel mode.
+ *
+ * Result: true if emulation was successful; false if not.
+ */
+bool fixup_umip_exception(struct pt_regs *regs)
+{
+ int not_copied, nr_copied, reg_offset, dummy_data_size, umip_inst;
+ /* 10 bytes is the maximum size of the result of UMIP instructions */
+ unsigned char dummy_data[10] = { 0 };
+ unsigned long seg_base, *reg_addr;
+ unsigned char buf[MAX_INSN_SIZE];
+ void __user *uaddr;
+ struct insn insn;
+ char seg_defs;
+
+ /* Do not emulate 64-bit processes. */
+ if (user_64bit_mode(regs))
+ return false;
+
+ /*
+ * Use the segment base in case user space used a different code
+ * segment, either in protected (e.g., from an LDT), virtual-8086
+ * or long (via the FS or GS registers) modes. In most of the cases
+ * seg_base will be zero as in USER_CS.
+ */
+ seg_base = insn_get_seg_base(regs, NULL, offsetof(struct pt_regs, ip));
+ if (seg_base == -1L)
+ return false;
+
+ not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
+ sizeof(buf));
+ nr_copied = sizeof(buf) - not_copied;
+
+ /*
+ * The copy_from_user above could have failed if user code is protected
+ * by a memory protection key. Give up on emulation in such a case.
+ * Should we issue a page fault?
+ */
+ if (!nr_copied)
+ return false;
+
+ insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
+
+ /*
+ * Override the default operand and address sizes with what is specified
+ * in the code segment descriptor. The instruction decoder only sets
+ * the address size it to either 4 or 8 address bytes and does nothing
+ * for the operand bytes. This OK for most of the cases, but we could
+ * have special cases where, for instance, a 16-bit code segment
+ * descriptor is used.
+ * If there is an address override prefix, the instruction decoder
+ * correctly updates these values, even for 16-bit defaults.
+ */
+ seg_defs = insn_get_code_seg_defaults(regs);
+ if (seg_defs == -EINVAL)
+ return false;
+
+ insn.addr_bytes = (unsigned char)INSN_CODE_SEG_ADDR_SZ(seg_defs);
+ insn.opnd_bytes = (unsigned char)INSN_CODE_SEG_OPND_SZ(seg_defs);
+
+ insn_get_length(&insn);
+ if (nr_copied < insn.length)
+ return false;
+
+ umip_inst = identify_insn(&insn);
+ if (umip_inst < 0)
+ return false;
+
+ if (emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
+ return false;
+
+ /*
+ * If operand is a register, write result to the copy of the register
+ * value that was pushed to the stack when entering into kernel mode.
+ * Upon exit, the value we write will be restored to the actual hardware
+ * register.
+ */
+ if (X86_MODRM_MOD(insn.modrm.value) == 3) {
+ reg_offset = insn_get_modrm_rm_off(&insn, regs);
+
+ /*
+ * Negative values are usually errors. In memory addressing,
+ * the exception is -EDOM. Since we expect a register operand,
+ * all negative values are errors.
+ */
+ if (reg_offset < 0)
+ return false;
+
+ reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
+ memcpy(reg_addr, dummy_data, dummy_data_size);
+ } else {
+ uaddr = insn_get_addr_ref(&insn, regs);
+ if ((unsigned long)uaddr == -1L)
+ return false;
+
+ nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
+ if (nr_copied > 0)
+ return false;
+ }
+
+ /* increase IP to let the program keep going */
+ regs->ip += insn.length;
+ return true;
+}
--
2.13.0
fixup_umip_exception() will be called from do_general_protection(). If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function, inspired in
force_sig_info_fault(), is introduced to model the page fault.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/kernel/umip.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index cd3201a7deca..6e38b8f5d305 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -185,6 +185,41 @@ static int emulate_umip_insn(struct insn *insn, int umip_inst,
}
/**
+ * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
+ * @addr: Address that caused the signal
+ * @regs: Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs)
+{
+ siginfo_t info;
+ struct task_struct *tsk = current;
+
+ tsk->thread.cr2 = (unsigned long)addr;
+ tsk->thread.error_code = X86_PF_USER | X86_PF_WRITE;
+ tsk->thread.trap_nr = X86_TRAP_PF;
+
+ info.si_signo = SIGSEGV;
+ info.si_errno = 0;
+ info.si_code = SEGV_MAPERR;
+ info.si_addr = addr;
+ force_sig_info(SIGSEGV, &info, tsk);
+
+ if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
+ return;
+
+ pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n",
+ tsk->comm, task_pid_nr(tsk), regs->ip,
+ regs->sp, X86_PF_USER | X86_PF_WRITE,
+ regs->ip);
+}
+
+/**
* fixup_umip_exception() - Fixup #GP faults caused by UMIP
* @regs: Registers as saved when entering the #GP trap
*
@@ -293,8 +328,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
return false;
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
- if (nr_copied > 0)
- return false;
+ if (nr_copied > 0) {
+ /*
+ * If copy fails, send a signal and tell caller that
+ * fault was fixed up.
+ */
+ force_sig_info_umip_fault(uaddr, regs);
+ return true;
+ }
}
/* increase IP to let the program keep going */
--
2.13.0
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register
This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.
Cc: Andy Lutomirski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Liang Z. Li <[email protected]>
Cc: [email protected]
Reviewed-by: Borislav Petkov <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 42bbbf0f173d..7b1aa7fc8657 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -291,6 +291,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
+#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */
#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */
#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index c10c9128f54e..14d6d5007314 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
# define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31))
#endif
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP 0
+#else
+# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
#ifdef CONFIG_X86_64
# define DISABLE_VME (1<<(X86_FEATURE_VME & 31))
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -63,7 +69,7 @@
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
-#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
+#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
#define DISABLED_MASK17 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index aae1f2aa7563..6ee8425b92d9 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
#define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
#define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
#define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */
+#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT)
#define X86_CR4_LA57_BIT 12 /* enable 5-level page tables */
#define X86_CR4_LA57 _BITUL(X86_CR4_LA57_BIT)
#define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */
--
2.13.0
It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of the
effective address is still limited to [0x-0xffff]. In such a case, return
error.
Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6537b613d0b3..93a6d1f57c2d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -801,12 +801,23 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs)
goto out_err;
/*
+ * Even though 32-bit address encodings are allowed in virtual-8086
+ * mode, the address range is still limited to [0x-0xffff].
+ */
+ if (v8086_mode(regs) && (eff_addr & ~0xffff))
+ goto out_err;
+
+ /*
* Data type long could be 64 bits in size. Ensure that our 32-bit
* effective address is not sign-extended when computing the linear
* address.
*/
linear_addr = (unsigned long)(eff_addr & 0xffffffff) + seg_base_addr;
+ /* Limit linear address to 20 bits */
+ if (v8086_mode(regs))
+ linear_addr &= 0xfffff;
+
return (void __user *)linear_addr;
out_err:
return (void __user *)-1L;
--
2.13.0
Tasks running in virtual-8086 mode, in protected mode with code segment
descriptors that specify 16-bit default address sizes via the
D bit, or via an address override prefix will use 16-bit addressing form
encodings as described in the Intel 64 and IA-32 Architecture Software
Developer's Manual Volume 2A Section 2.1.5, Table 2-1.
16-bit addressing encodings differ in several ways from the 32-bit/64-bit
addressing form encodings: ModRM.rm points to different registers and, in
some cases, effective addresses are indicated by the addition of the value
of two registers. Also, there is no support for SIB bytes. Thus, a
separate function is needed to parse this form of addressing.
A couple of functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_addr_ref_16() computes the linear
address indicated by the instructions using the value of the registers
given by ModRM and the base address of the applicable segment.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 171 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 93a6d1f57c2d..6abe46aed6fd 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -414,6 +414,78 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
}
/**
+ * get_reg_offset_16() - Obtain offset of register indicated by instruction
+ * @insn: Instruction structure containing ModRM and SIB bytes
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+ int *offs1, int *offs2)
+{
+ /*
+ * 16-bit addressing can use one or two registers. Specifics of
+ * encodings are given in Table 2-1. "16-Bit Addressing Forms with the
+ * ModR/M Byte" of the Intel Software Development Manual.
+ */
+ static const int regoff1[] = {
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, bx),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+ offsetof(struct pt_regs, bp),
+ offsetof(struct pt_regs, bx),
+ };
+
+ static const int regoff2[] = {
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+ offsetof(struct pt_regs, si),
+ offsetof(struct pt_regs, di),
+ -EDOM,
+ -EDOM,
+ -EDOM,
+ -EDOM,
+ };
+
+ if (!offs1 || !offs2)
+ return -EINVAL;
+
+ /* Operand is a register, use the generic function. */
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ *offs1 = insn_get_modrm_rm_off(insn, regs);
+ *offs2 = -EDOM;
+ return 0;
+ }
+
+ *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+ *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+ /*
+ * If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement-
+ * only addressing. This means that no registers are involved in
+ * computing the effective address. Thus, ensure that the first
+ * register offset is invalild. The second register offset is already
+ * invalid under the aforementioned conditions.
+ */
+ if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+ (X86_MODRM_RM(insn->modrm.value) == 6))
+ *offs1 = -EDOM;
+
+ return 0;
+}
+
+/**
* get_desc() - Obtain address of segment descriptor
* @sel: Segment selector
*
@@ -666,6 +738,103 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
}
/**
+ * get_addr_ref_16() - Obtain the 16-bit address referred by instruction
+ * @insn: Instruction structure containing ModRM byte and displacement
+ * @regs: Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 16-bit address encodings. Obtain the memory
+ * address referred by the instruction's ModRM and displacement bytes. Also, the
+ * segment used as base is determined by either any segment override prefixes in
+ * insn or the default segment of the registers involved in the address
+ * computation. In protected mode, segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+static void __user *get_addr_ref_16(struct insn *insn, struct pt_regs *regs)
+{
+ unsigned long linear_addr, seg_base_addr, seg_limit;
+ short eff_addr, addr1 = 0, addr2 = 0;
+ int addr_offset1, addr_offset2;
+ int ret;
+
+ insn_get_modrm(insn);
+ insn_get_displacement(insn);
+
+ if (insn->addr_bytes != 2)
+ goto out_err;
+
+ /*
+ * If operand is a register, the layout is the same as in
+ * 32-bit and 64-bit addressing.
+ */
+ if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+ addr_offset1 = get_reg_offset(insn, regs, REG_TYPE_RM);
+ if (addr_offset1 < 0)
+ goto out_err;
+
+ eff_addr = regs_get_register(regs, addr_offset1);
+
+ seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
+ if (seg_base_addr == -1L)
+ goto out_err;
+
+ seg_limit = get_seg_limit(regs, insn, addr_offset1);
+ } else {
+ ret = get_reg_offset_16(insn, regs, &addr_offset1,
+ &addr_offset2);
+ if (ret < 0)
+ goto out_err;
+
+ /*
+ * Don't fail on invalid offset values. They might be invalid
+ * because they cannot be used for this particular value of
+ * the ModRM. Instead, use them in the computation only if
+ * they contain a valid value.
+ */
+ if (addr_offset1 != -EDOM)
+ addr1 = 0xffff & regs_get_register(regs, addr_offset1);
+ if (addr_offset2 != -EDOM)
+ addr2 = 0xffff & regs_get_register(regs, addr_offset2);
+
+ eff_addr = addr1 + addr2;
+
+ /*
+ * The first operand register could indicate to use of either SS
+ * or DS registers to obtain the segment selector. The second
+ * operand register can only indicate the use of DS. Thus, use
+ * the first register to obtain the segment selector.
+ */
+ seg_base_addr = insn_get_seg_base(regs, insn, addr_offset1);
+ if (seg_base_addr == -1L)
+ goto out_err;
+
+ seg_limit = get_seg_limit(regs, insn, addr_offset1);
+
+ eff_addr += (insn->displacement.value & 0xffff);
+ }
+
+ /*
+ * Before computing the linear address, make sure the effective address
+ * is within the limits of the segment. In virtual-8086 mode, segment
+ * limits are not enforced. In such a case, the segment limit is -1L to
+ * reflect this fact.
+ */
+ if ((unsigned long)(eff_addr & 0xffff) > seg_limit)
+ goto out_err;
+
+ linear_addr = (unsigned long)(eff_addr & 0xffff) + seg_base_addr;
+
+ /* Limit linear address to 20 bits */
+ if (v8086_mode(regs))
+ linear_addr &= 0xfffff;
+
+ return (void __user *)linear_addr;
+out_err:
+ return (void __user *)-1L;
+}
+
+/**
* get_addr_ref_32() - Obtain a 32-bit linear address
* @insn: Instruction struct with ModRM and SIB bytes and displacement
* @regs: Structure with register values as seen when entering kernel mode
@@ -946,6 +1115,8 @@ static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
{
switch (insn->addr_bytes) {
+ case 2:
+ return get_addr_ref_16(insn, regs);
case 4:
return get_addr_ref_32(insn, regs);
case 8:
--
2.13.0
The function insn_get_addr_ref() is capable of handling only 64-bit
addresses. A previous commit introduced a function to handle 32-bit
addresses. Invoke these two functions from a third wrapper function that
calls the appropriate routine based on the address size specified in the
instruction structure (obtained by looking at the code segment default
address size and the address override prefix, if present).
While doing this, rename the original function insn_get_addr_ref() with
the more appropriate name get_addr_ref_64(), ensure it is only used
for 64-bit addresses and returns a 64-bit error value.
Also, since 64-bit addresses are not possible in 32-bit builds, provide
a dummy function such case.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 53 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 47 insertions(+), 6 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6730c9ba02c5..6537b613d0b3 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -812,12 +812,25 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs)
return (void __user *)-1L;
}
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * get_addr_ref_64() - Obtain a 64-bit linear address
+ * @insn: Instruction struct with ModRM and SIB bytes and displacement
+ * @regs: Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 64-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
*/
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+#ifndef CONFIG_X86_64
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
+{
+ return (void __user *)-1L;
+}
+#else
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
{
int addr_offset, base_offset, indx_offset;
unsigned long linear_addr, seg_base_addr;
@@ -828,6 +841,9 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
insn_get_sib(insn);
sib = insn->sib.value;
+ if (insn->addr_bytes != 8)
+ goto out_err;
+
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
@@ -900,5 +916,30 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
return (void __user *)linear_addr;
out_err:
- return (void __user *)-1;
+ return (void __user *)-1L;
+}
+#endif /* CONFIG_X86_64 */
+
+/**
+ * insn_get_addr_ref() - Obtain the linear address referred by instruction
+ * @insn: Instruction structure containing ModRM byte and displacement
+ * @regs: Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the linear address referred by the instruction's ModRM, SIB and
+ * displacement bytes, and segment base, as applicable. In protected mode,
+ * segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+ switch (insn->addr_bytes) {
+ case 4:
+ return get_addr_ref_32(insn, regs);
+ case 8:
+ return get_addr_ref_64(insn, regs);
+ default:
+ return (void __user *)-1L;
+ }
}
--
2.13.0
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod is zero and
ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
that none of the registers are used in the computation of the effective
address. A return value of -EDOM indicates callers that they should not
use the value of registers when computing the effective address for the
instruction.
In long mode, the effective address is given by the 32-bit displacement
plus the location of the next instruction. In protected mode, only the
displacement is used.
The instruction decoder takes care of obtaining the displacement.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a8e12bd0aecd..04f696c3793e 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -360,6 +360,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+
+ /*
+ * ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
+ * follows the ModRM byte.
+ */
+ if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+ return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -706,10 +714,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
- if (addr_offset < 0)
- goto out_err;
- eff_addr = regs_get_register(regs, addr_offset);
+ /*
+ * -EDOM means that we must ignore the address_offset.
+ * In such a case, in 64-bit mode the effective address
+ * relative to the RIP of the following instruction.
+ */
+ if (addr_offset == -EDOM) {
+ if (user_64bit_mode(regs))
+ eff_addr = (long)regs->ip + insn->length;
+ else
+ eff_addr = 0;
+ } else if (addr_offset < 0) {
+ goto out_err;
+ } else {
+ eff_addr = regs_get_register(regs, addr_offset);
+ }
}
eff_addr += insn->displacement.value;
--
2.13.0
With segmentation, the base address of the segment is needed to compute a
linear address. This base address is obtained from the applicable segment
descriptor. Such segment descriptor is referenced from a segment selector.
The segment selector is stored in the segment register associated with
operands in the instruction being executed or indicated in the instruction
prefixes. Thus, both a structure containing such instruction and its
prefixes as well as the register operand (specified as the offset from the
base of pt_regs) are given as inputs to the new function
insn_get_seg_base() that retrieves the base address indicated in the
segment descriptor.
The logic to obtain the segment selector is wrapped in the function
get_seg_selector() with the inputs described above. Once the selector is
known, the base address is determined. In protected mode, the selector is
used to obtain the segment descriptor and then its base address. In 64-bit
user mode, the segment base address is zero except when FS or GS are used.
In virtual-8086 mode, the base address is computed as the value of the
segment selector shifted 4 positions to the left.
In protected mode, segment limits are enforced. Thus, a function to
determine the limit of the segment is added. Segment limits are not
enforced in long or virtual-8086. For the latter, addresses are limited
to 20 bits; address size will be handled when computing the linear
address.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/insn-eval.h | 2 +
arch/x86/lib/insn-eval.c | 127 +++++++++++++++++++++++++++++++++++++++
2 files changed, 129 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7e8c9633a377..7f3c7fe72cd0 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -13,5 +13,7 @@
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+ int regoff);
#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 9cf2c49afc15..2c5e7081957d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -457,6 +457,133 @@ static struct desc_struct *get_desc(unsigned short sel)
}
/**
+ * insn_get_seg_base() - Obtain base address of segment descriptor.
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @insn: Instruction structure with selector override prefixes
+ * @regoff: Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. Zero in long mode,
+ * except when FS or GS are used. In virtual-8086 mode, the segment
+ * selector shifted 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+ int regoff)
+{
+ struct desc_struct *desc;
+ int seg_reg;
+ short sel;
+
+ seg_reg = resolve_seg_register(insn, regs, regoff);
+ if (seg_reg < 0)
+ return -1L;
+
+ sel = get_segment_selector(regs, seg_reg);
+ if (sel < 0)
+ return -1L;
+
+ if (v8086_mode(regs))
+ /*
+ * Base is simply the segment selector shifted 4
+ * positions to the right.
+ */
+ return (unsigned long)(sel << 4);
+
+ if (user_64bit_mode(regs)) {
+ /*
+ * Only FS or GS will have a base address, the rest of
+ * the segments' bases are forced to 0.
+ */
+ unsigned long base;
+
+ if (seg_reg == INAT_SEG_REG_FS)
+ rdmsrl(MSR_FS_BASE, base);
+ else if (seg_reg == INAT_SEG_REG_GS)
+ /*
+ * swapgs was called at the kernel entry point. Thus,
+ * MSR_KERNEL_GS_BASE will have the user-space GS base.
+ */
+ rdmsrl(MSR_KERNEL_GS_BASE, base);
+ else if (seg_reg != INAT_SEG_REG_IGNORE)
+ /* We should ignore the rest of segment registers. */
+ base = -1L;
+ else
+ base = 0;
+ return base;
+ }
+
+ /* In protected mode the segment selector cannot be null. */
+ if (!sel)
+ return -1L;
+
+ desc = get_desc(sel);
+ if (!desc)
+ return -1L;
+
+ return get_desc_base(desc);
+}
+
+/**
+ * get_seg_limit() - Obtain the limit of a segment descriptor
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @insn: Instruction structure with selector override prefixes
+ * @regoff: Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the limit of the segment descriptor. The segment selector is obtained
+ * from the relevant segment register determined by inspecting any segment
+ * override prefixes or the default segment register associated with regoff.
+ * regoff is specified as the offset in bytes from the base * of pt_regs.
+ *
+ * Return: In protected mode, the limit of the segment descriptor in bytes.
+ * In long mode and virtual-8086 mode, segment limits are not enforced. Thus,
+ * limit is returned as -1L to imply a limit-less segment. Zero is returned on
+ * error.
+ */
+static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
+ int regoff)
+{
+ struct desc_struct *desc;
+ unsigned long limit;
+ int seg_reg;
+ short sel;
+
+ seg_reg = resolve_seg_register(insn, regs, regoff);
+ if (seg_reg < 0)
+ return 0;
+
+ sel = get_segment_selector(regs, seg_reg);
+ if (sel < 0)
+ return 0;
+
+ if (user_64bit_mode(regs) || v8086_mode(regs))
+ return -1L;
+
+ if (!sel)
+ return 0;
+
+ desc = get_desc(sel);
+ if (!desc)
+ return 0;
+
+ /*
+ * If the granularity bit is set, the limit is given in multiples
+ * of 4096. This also means that the 12 least significant bits are
+ * not tested when checking the segment limits. In practice,
+ * this means that the segment ends in (limit << 12) + 0xfff.
+ */
+ limit = get_desc_limit(desc);
+ if (desc->g)
+ limit = (limit << 12) + 0xfff;
+
+ return limit;
+}
+
+/**
* insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
* @insn: Instruction structure containing the ModRM byte
* @regs: Structure with register values as seen when entering kernel mode
--
2.13.0
The segment descriptor contains information that is relevant to how linear
addresses need to be computed. It contains the default size of addresses
as well as the base address of the segment. Thus, given a segment
selector, we ought look at segment descriptor to correctly calculate the
linear address.
In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.
This function is a prerequisite for functions in subsequent commits that
will obtain the aforementioned attributes of the segment descriptor.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/lib/insn-eval.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 86f58ce6c302..9cf2c49afc15 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -6,9 +6,13 @@
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/ratelimit.h>
+#include <linux/mmu_context.h>
+#include <asm/desc_defs.h>
+#include <asm/desc.h>
#include <asm/inat.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/ldt.h>
#include <asm/vm86.h>
enum reg_type {
@@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
}
/**
+ * get_desc() - Obtain address of segment descriptor
+ * @sel: Segment selector
+ *
+ * Given a segment selector, obtain a pointer to the segment descriptor.
+ * Both global and local descriptor tables are supported.
+ *
+ * Return: pointer to segment descriptor on success. NULL on error.
+ */
+static struct desc_struct *get_desc(unsigned short sel)
+{
+ struct desc_ptr gdt_desc = {0, 0};
+ struct desc_struct *desc = NULL;
+ unsigned long desc_base;
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+ /* Bits [15:3] contain the index of the desired entry. */
+ sel >>= 3;
+
+ mutex_lock(¤t->active_mm->context.lock);
+ /* The size of the LDT refers to the number of entries. */
+ if (!current->active_mm->context.ldt ||
+ sel >= current->active_mm->context.ldt->nr_entries) {
+ mutex_unlock(¤t->active_mm->context.lock);
+ return NULL;
+ }
+
+ desc = ¤t->active_mm->context.ldt->entries[sel];
+ mutex_unlock(¤t->active_mm->context.lock);
+ return desc;
+ }
+#endif
+ native_store_gdt(&gdt_desc);
+
+ /*
+ * Segment descriptors have a size of 8 bytes. Thus, the index is
+ * multiplied by 8 to obtain the memory offset of the desired descriptor
+ * from the base of the GDT. As bits [15:3] of the segment selector
+ * contain the index, it can be regarded as multiplied by 8 already.
+ * All that remains is to clear bits [2:0].
+ */
+ desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+ if (desc_base > gdt_desc.size)
+ return NULL;
+
+ desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+ return desc;
+}
+
+/**
* insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
* @insn: Instruction structure containing the ModRM byte
* @regs: Structure with register values as seen when entering kernel mode
--
2.13.0
Obtain the default values of the address and operand sizes as specified in
the D and L bits of the the segment descriptor selected by the register
CS. The function can be used for both protected and long modes.
For virtual-8086 mode, the default address and operand sizes are always 2
bytes.
The returned parameters are encoded in a signed 8-bit data type. Auxiliar
macros are provided to encode and decode such values.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/insn-eval.h | 5 ++++
arch/x86/lib/insn-eval.c | 59 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+)
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7f3c7fe72cd0..e8c3e7cd1673 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -11,9 +11,14 @@
#include <linux/err.h>
#include <asm/ptrace.h>
+#define INSN_CODE_SEG_ADDR_SZ(params) ((params >> 4) & 0xf)
+#define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf)
+#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
+
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff);
+char insn_get_code_seg_defaults(struct pt_regs *regs);
#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2c5e7081957d..a8e12bd0aecd 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -584,6 +584,65 @@ static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn,
}
/**
+ * insn_get_code_seg_defaults() - Obtain code segment default parameters
+ * @regs: Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the default parameters of the code segment: address and operand sizes.
+ * The code segment is obtained from the selector contained in the CS register
+ * in regs. In protected mode, the default address is determined by inspecting
+ * the L and D bits of the segment descriptor. In virtual-8086 mode, the default
+ * is always two bytes for both address and operand sizes.
+ *
+ * Return: A signed 8-bit value containing the default parameters on success and
+ * -EINVAL on error.
+ */
+char insn_get_code_seg_defaults(struct pt_regs *regs)
+{
+ struct desc_struct *desc;
+ unsigned short sel;
+
+ if (v8086_mode(regs))
+ /* Address and operand size are both 16-bit. */
+ return INSN_CODE_SEG_PARAMS(2, 2);
+
+ sel = (unsigned short)regs->cs;
+
+ desc = get_desc(sel);
+ if (!desc)
+ return -EINVAL;
+
+ /*
+ * The most significant byte of the Type field of the segment descriptor
+ * determines whether a segment contains data or code. If this is a data
+ * segment, return error.
+ */
+ if (!(desc->type & BIT(3)))
+ return -EINVAL;
+
+ switch ((desc->l << 1) | desc->d) {
+ case 0: /*
+ * Legacy mode. CS.L=0, CS.D=0. Address and operand size are
+ * both 16-bit.
+ */
+ return INSN_CODE_SEG_PARAMS(2, 2);
+ case 1: /*
+ * Legacy mode. CS.L=0, CS.D=1. Address and operand size are
+ * both 32-bit.
+ */
+ return INSN_CODE_SEG_PARAMS(4, 4);
+ case 2: /*
+ * IA-32e 64-bit mode. CS.L=1, CS.D=0. Address size is 64-bit;
+ * operand size is 32-bit.
+ */
+ return INSN_CODE_SEG_PARAMS(4, 8);
+ case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+ /* fall through */
+ default:
+ return -EINVAL;
+ }
+}
+
+/**
* insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
* @insn: Instruction structure containing the ModRM byte
* @regs: Structure with register values as seen when entering kernel mode
--
2.13.0
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
is selected. This implies that code built with CONFIG_X86_64=n cannot use
it. If a piece of code needs to be built for both CONFIG_X86_64=y and
CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
an #ifdef/#endif; potentially, in multiple places.
This can be easily avoided with a single #ifdef/#endif pair within
user_64bit_mode() itself.
Suggested-by: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Reviewed-by: Borislav Petkov <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/ptrace.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..e2afbf689309 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -135,9 +135,9 @@ static inline int v8086_mode(struct pt_regs *regs)
#endif
}
-#ifdef CONFIG_X86_64
static inline bool user_64bit_mode(struct pt_regs *regs)
{
+#ifdef CONFIG_X86_64
#ifndef CONFIG_PARAVIRT
/*
* On non-paravirt systems, this is the only long mode CPL 3
@@ -148,8 +148,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
/* Headers are too twisted for this to go in paravirt.h. */
return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
#endif
+#else /* !CONFIG_X86_64 */
+ return false;
+#endif
}
+#ifdef CONFIG_X86_64
#define current_user_stack_pointer() current_pt_regs()->sp
#define compat_user_stack_pointer() current_pt_regs()->sp
#endif
--
2.13.0
When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.
If running in protected mode, the segment selector to be used when
computing a linear address is determined by either any of segment override
prefixes in the instruction or inferred from the registers involved in the
computation of the effective address; in that order. Also, there are cases
when the segment override prefixes shall be ignored (i.e., code segments
are always selected by the CS segment register; string instructions always
use the ES segment register when using (E)DI register as operand). In long
mode, segment registers are ignored, except for FS and GS. In these two
cases, base addresses are obtained from the respective MSRs.
For clarity, this process can be split into three steps (and an equal
number of functions): parse the segment override prefixes, if any; resolve
the relevant segment register to use, and, once known, read its value to
obtain the segment selector.
The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into a pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
the respective segment registers.
In order to identify the segment registers, a new set of #defines is
introduced. It also includes two special identifiers. One of them
indicates when the default segment register associated with instruction
operands shall be used. Another one indicates that the contents of the
segment register shall be ignored; this identifier is used when in long
mode.
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/inat.h | 10 ++
arch/x86/lib/insn-eval.c | 278 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 288 insertions(+)
diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 02aff0867211..1c78580e58be 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -97,6 +97,16 @@
#define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM)
#define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS)
+/* Identifiers for segment registers */
+#define INAT_SEG_REG_IGNORE 0
+#define INAT_SEG_REG_DEFAULT 1
+#define INAT_SEG_REG_CS 2
+#define INAT_SEG_REG_SS 3
+#define INAT_SEG_REG_DS 4
+#define INAT_SEG_REG_ES 5
+#define INAT_SEG_REG_FS 6
+#define INAT_SEG_REG_GS 7
+
/* Attribute search APIs */
extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 25b2eb3c64c1..86f58ce6c302 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -9,6 +9,7 @@
#include <asm/inat.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/vm86.h>
enum reg_type {
REG_TYPE_RM = 0,
@@ -42,6 +43,283 @@ static bool is_string_insn(struct insn *insn)
}
}
+/**
+ * get_overridden_seg_reg() - obtain segment register to use from prefixes
+ * @insn: Instruction structure with segment override prefixes
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @regoff: Operand offset, in pt_regs, used to deterimine segment register
+ *
+ * The segment register to which an effective address refers depends on
+ * a) whether running in long mode (in such a case semgment override prefixes
+ * are ignored. b) Whether segment override prefixes must be ignored for certain
+ * registers: always use CS when the register is (R|E)IP; always use ES when
+ * operand register is (E)DI with a string instruction as defined in the Intel
+ * documentation. c) If segment overrides prefixes are found in the instruction
+ * prefixes. d) Use the default segment register associated with the operand
+ * register.
+ *
+ * This function returns the overridden segment register to use, if any, as per
+ * the conditions described above. Please note that this function
+ * does not return the value in the segment register (i.e., the segment
+ * selector). The segment selector needs to be obtained using
+ * get_segment_selector() and passing the segment register resolved by
+ * this function.
+ *
+ * Return: A constant identifying the segment register to use, among CS, SS, DS,
+ * ES, FS, or GS. INAT_SEG_REG_IGNORE is returned if running in long mode.
+ * INAT_SEG_REG_DEFAULT is returned if no segment override prefixes were found
+ * and the default segment register shall be used. -EINVAL in case of error.
+ */
+static int get_overridden_seg_reg(struct insn *insn, struct pt_regs *regs,
+ int regoff)
+{
+ int i;
+ int sel_overrides = 0;
+ int seg_register = INAT_SEG_REG_DEFAULT;
+
+ /*
+ * Segment override prefixes should not be used for (E)IP. Check this
+ * case first as we might not have (and not needed at all) a
+ * valid insn structure to evaluate segment override prefixes.
+ */
+ if (regoff == offsetof(struct pt_regs, ip)) {
+ if (user_64bit_mode(regs))
+ return INAT_SEG_REG_IGNORE;
+ else
+ return INAT_SEG_REG_DEFAULT;
+ }
+
+ if (!insn)
+ return -EINVAL;
+
+ insn_get_prefixes(insn);
+
+ /* Look for any segment override prefixes. */
+ for (i = 0; i < insn->prefixes.nbytes; i++) {
+ insn_attr_t attr;
+
+ attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]);
+ switch (attr) {
+ case INAT_MAKE_PREFIX(INAT_PFX_CS):
+ seg_register = INAT_SEG_REG_CS;
+ sel_overrides++;
+ break;
+ case INAT_MAKE_PREFIX(INAT_PFX_SS):
+ seg_register = INAT_SEG_REG_SS;
+ sel_overrides++;
+ break;
+ case INAT_MAKE_PREFIX(INAT_PFX_DS):
+ seg_register = INAT_SEG_REG_DS;
+ sel_overrides++;
+ break;
+ case INAT_MAKE_PREFIX(INAT_PFX_ES):
+ seg_register = INAT_SEG_REG_ES;
+ sel_overrides++;
+ break;
+ case INAT_MAKE_PREFIX(INAT_PFX_FS):
+ seg_register = INAT_SEG_REG_FS;
+ sel_overrides++;
+ break;
+ case INAT_MAKE_PREFIX(INAT_PFX_GS):
+ seg_register = INAT_SEG_REG_GS;
+ sel_overrides++;
+ break;
+ /* No default action needed. */
+ }
+ }
+
+ /*
+ * In long mode, segment override prefixes are ignored, except for
+ * overrides for FS and GS.
+ */
+ if (user_64bit_mode(regs)) {
+ if (seg_register != INAT_SEG_REG_FS &&
+ seg_register != INAT_SEG_REG_GS)
+ return INAT_SEG_REG_IGNORE;
+ /* More than one segment override prefix leads to undefined behavior. */
+ } else if (sel_overrides > 1) {
+ return -EINVAL;
+ /*
+ * Segment override prefixes are always ignored for string instructions
+ * that involve the use the (E)DI register.
+ */
+ } else if ((regoff == offsetof(struct pt_regs, di)) &&
+ is_string_insn(insn)) {
+ return INAT_SEG_REG_DEFAULT;
+ }
+
+ return seg_register;
+}
+
+/**
+ * resolve_seg_register() - obtain segment register
+ * @insn: Instruction structure with segment override prefixes
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @regoff: Operand offset, in pt_regs, used to deterimine segment register
+ *
+ * Determine the segment register associated with the operands and, if
+ * applicable, prefixes and the instruction pointed by insn. The function first
+ * checks if the segment register shall be ignored or has been overridden in the
+ * instruction prefixes. Otherwise, it resolves the segment register to use
+ * based on the defaults described in the Intel documentation.
+ *
+ * The operand register, regoff, is represented as the offset from the base of
+ * pt_regs. Also, regoff can be -EDOM for cases in which registers are not
+ * used as operands (e.g., displacement-only memory addressing).
+ *
+ * Return: A constant identifying the segment register to use, among CS, SS, DS,
+ * ES, FS, or GS. INAT_SEG_REG_IGNORE is returned if running in long mode.
+ * -EINVAL in case of error.
+ */
+static int resolve_seg_register(struct insn *insn, struct pt_regs *regs,
+ int regoff)
+{
+ int seg_reg;
+
+ seg_reg = get_overridden_seg_reg(insn, regs, regoff);
+
+ if (seg_reg < 0)
+ return seg_reg;
+
+ if (seg_reg == INAT_SEG_REG_IGNORE)
+ return seg_reg;
+
+ if (seg_reg != INAT_SEG_REG_DEFAULT)
+ return seg_reg;
+
+ /*
+ * If we are here, we use the default segment register as described
+ * in the Intel documentation:
+ * + DS for all references involving (E)AX, (E)CX, (E)DX, (E)BX, and
+ * (E)SI.
+ * + If used in a string instruction, ES for (E)DI. Otherwise, DS.
+ * + AX, CX and DX are not valid register operands in 16-bit address
+ * encodings but are valid for 32-bit and 64-bit encodings.
+ * + -EDOM is reserved to identify for cases in which no register
+ * is used (i.e., displacement-only addressing). Use DS.
+ * + SS for (E)SP or (E)BP.
+ * + CS for (E)IP.
+ */
+
+ switch (regoff) {
+ case offsetof(struct pt_regs, ax):
+ case offsetof(struct pt_regs, cx):
+ case offsetof(struct pt_regs, dx):
+ /* Need insn to verify address size. */
+ if (!insn || insn->addr_bytes == 2)
+ return -EINVAL;
+ case -EDOM:
+ case offsetof(struct pt_regs, bx):
+ case offsetof(struct pt_regs, si):
+ return INAT_SEG_REG_DS;
+ case offsetof(struct pt_regs, di):
+ /* Need insn to see if insn is string instruction. */
+ if (!insn)
+ return -EINVAL;
+ if (is_string_insn(insn))
+ return INAT_SEG_REG_ES;
+ return INAT_SEG_REG_DS;
+ case offsetof(struct pt_regs, bp):
+ case offsetof(struct pt_regs, sp):
+ return INAT_SEG_REG_SS;
+ case offsetof(struct pt_regs, ip):
+ return INAT_SEG_REG_CS;
+ default:
+ return -EINVAL;
+ }
+}
+
+/**
+ * get_segment_selector() - obtain segment selector
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @seg_reg: Segment register to use
+ *
+ * Obtain the segment selector from any of the CS, SS, DS, ES, FS, GS segment
+ * registers. In CONFIG_X86_32, the segment is obtained from either pt_regs or
+ * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
+ * from pt_regs. DS, ES, FS and GS are obtained by reading the actual CPU
+ * registers. This done for only for completeness as in CONFIG_X86_64 segment
+ * registers are ignored.
+ *
+ * Return: Value of the segment selector, including null when running in
+ * long mode. -1 on error.
+ */
+static short get_segment_selector(struct pt_regs *regs, int seg_reg)
+{
+#ifdef CONFIG_X86_64
+ unsigned short sel;
+
+ switch (seg_reg) {
+ case INAT_SEG_REG_IGNORE:
+ return 0;
+ case INAT_SEG_REG_CS:
+ return (unsigned short)(regs->cs & 0xffff);
+ case INAT_SEG_REG_SS:
+ return (unsigned short)(regs->ss & 0xffff);
+ case INAT_SEG_REG_DS:
+ savesegment(ds, sel);
+ return sel;
+ case INAT_SEG_REG_ES:
+ savesegment(es, sel);
+ return sel;
+ case INAT_SEG_REG_FS:
+ savesegment(fs, sel);
+ return sel;
+ case INAT_SEG_REG_GS:
+ savesegment(gs, sel);
+ return sel;
+ default:
+ return -EINVAL;
+ }
+#else /* CONFIG_X86_32 */
+ struct kernel_vm86_regs *vm86regs = (struct kernel_vm86_regs *)regs;
+
+ if (v8086_mode(regs)) {
+ switch (seg_reg) {
+ case INAT_SEG_REG_CS:
+ return (unsigned short)(regs->cs & 0xffff);
+ case INAT_SEG_REG_SS:
+ return (unsigned short)(regs->ss & 0xffff);
+ case INAT_SEG_REG_DS:
+ return vm86regs->ds;
+ case INAT_SEG_REG_ES:
+ return vm86regs->es;
+ case INAT_SEG_REG_FS:
+ return vm86regs->fs;
+ case INAT_SEG_REG_GS:
+ return vm86regs->gs;
+ case INAT_SEG_REG_IGNORE:
+ /* fall through */
+ default:
+ return -EINVAL;
+ }
+ }
+
+ switch (seg_reg) {
+ case INAT_SEG_REG_CS:
+ return (unsigned short)(regs->cs & 0xffff);
+ case INAT_SEG_REG_SS:
+ return (unsigned short)(regs->ss & 0xffff);
+ case INAT_SEG_REG_DS:
+ return (unsigned short)(regs->ds & 0xffff);
+ case INAT_SEG_REG_ES:
+ return (unsigned short)(regs->es & 0xffff);
+ case INAT_SEG_REG_FS:
+ return (unsigned short)(regs->fs & 0xffff);
+ case INAT_SEG_REG_GS:
+ /*
+ * GS may or may not be in regs as per CONFIG_X86_32_LAZY_GS.
+ * The macro below takes care of both cases.
+ */
+ return get_user_gs(regs);
+ case INAT_SEG_REG_IGNORE:
+ /* fall through */
+ default:
+ return -EINVAL;
+ }
+#endif /* CONFIG_X86_64 */
+}
+
static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
enum reg_type type)
{
--
2.13.0
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod !=11b and
ModRM.rm = 100b indexed register-indirect addressing is used. In other
words, a SIB byte follows the ModRM byte. In the specific case of
SIB.index = 100b, the scale*index portion of the computation of the
effective address is null. To signal callers of this particular situation,
get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
error when decoding the SIB byte).
An example of this situation can be the following instruction:
8b 4c 23 80 mov -0x80(%rbx,%riz,1),%rcx
ModRM: 0x4c [mod:1b][reg:1b][rm:100b]
SIB: 0x23 [scale:0b][index:100b][base:11b]
Displacement: 0x80 (1-byte, as per ModRM.mod = 1b)
The %riz 'register' indicates a null index.
In long mode, a REX prefix may be used. When a REX prefix is present,
REX.X adds a fourth bit to the register selection of SIB.index. This gives
the ability to refer to all the 16 general purpose registers. When REX.X is
1b and SIB.index is 100b, the index is indicated in %r12. In our example,
this would look like:
42 8b 4c 23 80 mov -0x80(%rbx,%r12,1),%rcx
REX: 0x42 [W:0b][R:0b][X:1b][B:0b]
ModRM: 0x4c [mod:1b][reg:1b][rm:100b]
SIB: 0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
Displacement: 0x80 (1-byte, as per ModRM.mod = 1b)
%r12 is a valid register to use in the scale*index part of the effective
address computation.
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Nathan Howard <[email protected]>
Cc: Adan Hawthorn <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/mm/mpx.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9eec98022510..892aa6468805 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,15 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+
+ /*
+ * If ModRM.mod != 3 and SIB.index = 4 the scale*index
+ * portion of the address computation is null. This is
+ * true only if REX.X is 0. In such a case, the SIB index
+ * is used in the address computation.
+ */
+ if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
+ return -EDOM;
break;
case REG_TYPE_BASE:
@@ -160,11 +169,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
goto out_err;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
- if (indx_offset < 0)
+
+ /*
+ * A negative offset generally means a error, except
+ * -EDOM, which means that the contents of the register
+ * should not be used as index.
+ */
+ if (indx_offset == -EDOM)
+ indx = 0;
+ else if (indx_offset < 0)
goto out_err;
+ else
+ indx = regs_get_register(regs, indx_offset);
base = regs_get_register(regs, base_offset);
- indx = regs_get_register(regs, indx_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
--
2.13.0
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that if a SIB byte is used and
SIB.base is 101b and ModRM.mod is zero, then the base part of the base
part of the effective address computation is null. To signal this
situation, a -EDOM error is returned to indicate callers to ignore the
base value present in the register operand.
In this scenario, a 32-bit displacement follows the SIB byte. Displacement
is obtained when the instruction decoder parses the operands.
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Qiaowei Ren <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Nathan Howard <[email protected]>
Cc: Adan Hawthorn <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Ravi V. Shankar <[email protected]>
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/mm/mpx.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 892aa6468805..53e24ca01f29 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -123,6 +123,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+ /*
+ * If ModRM.mod is 0 and SIB.base == 5, the base of the
+ * register-indirect addressing is 0. In this case, a
+ * 32-bit displacement follows the SIB byte.
+ */
+ if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+ return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -164,17 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+ /*
+ * Negative values in the base and index offset means
+ * an error when decoding the SIB byte. Except -EDOM,
+ * which means that the registers should not be used
+ * in the address computation.
+ */
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
- if (base_offset < 0)
+ if (base_offset == -EDOM)
+ base = 0;
+ else if (base_offset < 0)
goto out_err;
+ else
+ base = regs_get_register(regs, base_offset);
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
-
- /*
- * A negative offset generally means a error, except
- * -EDOM, which means that the contents of the register
- * should not be used as index.
- */
if (indx_offset == -EDOM)
indx = 0;
else if (indx_offset < 0)
@@ -182,8 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
- base = regs_get_register(regs, base_offset);
-
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
--
2.13.0
Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.
While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: [email protected]
Reviewed-by: Andy Lutomirski <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
arch/x86/include/asm/traps.h | 18 +++++++++
arch/x86/mm/fault.c | 88 +++++++++++++++++---------------------------
2 files changed, 52 insertions(+), 54 deletions(-)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 01fd0a7f48cd..4a2e5852eacc 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -148,4 +148,22 @@ enum {
X86_TRAP_IRET = 32, /* 32, IRET Exception */
};
+/*
+ * Page fault error code bits:
+ *
+ * bit 0 == 0: no page found 1: protection fault
+ * bit 1 == 0: read access 1: write access
+ * bit 2 == 0: kernel-mode access 1: user-mode access
+ * bit 3 == 1: use of reserved bit detected
+ * bit 4 == 1: fault was an instruction fetch
+ * bit 5 == 1: protection keys block access
+ */
+enum x86_pf_error_code {
+ X86_PF_PROT = 1 << 0,
+ X86_PF_WRITE = 1 << 1,
+ X86_PF_USER = 1 << 2,
+ X86_PF_RSVD = 1 << 3,
+ X86_PF_INSTR = 1 << 4,
+ X86_PF_PK = 1 << 5,
+};
#endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2a1fa10c6a98..dc87badd16e9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -29,26 +29,6 @@
#include <asm/trace/exceptions.h>
/*
- * Page fault error code bits:
- *
- * bit 0 == 0: no page found 1: protection fault
- * bit 1 == 0: read access 1: write access
- * bit 2 == 0: kernel-mode access 1: user-mode access
- * bit 3 == 1: use of reserved bit detected
- * bit 4 == 1: fault was an instruction fetch
- * bit 5 == 1: protection keys block access
- */
-enum x86_pf_error_code {
-
- PF_PROT = 1 << 0,
- PF_WRITE = 1 << 1,
- PF_USER = 1 << 2,
- PF_RSVD = 1 << 3,
- PF_INSTR = 1 << 4,
- PF_PK = 1 << 5,
-};
-
-/*
* Returns 0 if mmiotrace is disabled, or if the fault is not
* handled by mmiotrace:
*/
@@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
* If it was a exec (instruction fetch) fault on NX page, then
* do not ignore the fault:
*/
- if (error_code & PF_INSTR)
+ if (error_code & X86_PF_INSTR)
return 0;
instr = (void *)convert_ip_to_linear(current, regs);
@@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
* siginfo so userspace can discover which protection key was set
* on the PTE.
*
- * If we get here, we know that the hardware signaled a PF_PK
+ * If we get here, we know that the hardware signaled a X86_PF_PK
* fault and that there was a VMA once we got in the fault
* handler. It does *not* guarantee that the VMA we find here
* was the one that we faulted on.
@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info,
/*
* force_sig_info_fault() is called from a number of
* contexts, some of which have a VMA and some of which
- * do not. The PF_PK handing happens after we have a
+ * do not. The X86_PF_PK handing happens after we have a
* valid VMA, so we should never reach this without a
* valid VMA.
*/
@@ -695,7 +675,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
if (!oops_may_print())
return;
- if (error_code & PF_INSTR) {
+ if (error_code & X86_PF_INSTR) {
unsigned int level;
pgd_t *pgd;
pte_t *pte;
@@ -779,7 +759,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
*/
if (current->thread.sig_on_uaccess_err && signal) {
tsk->thread.trap_nr = X86_TRAP_PF;
- tsk->thread.error_code = error_code | PF_USER;
+ tsk->thread.error_code = error_code | X86_PF_USER;
tsk->thread.cr2 = address;
/* XXX: hwpoison faults will set the wrong code. */
@@ -899,7 +879,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
struct task_struct *tsk = current;
/* User mode accesses just cause a SIGSEGV */
- if (error_code & PF_USER) {
+ if (error_code & X86_PF_USER) {
/*
* It's possible to have interrupts off here:
*/
@@ -920,7 +900,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
* Instruction fetch faults in the vsyscall page might need
* emulation.
*/
- if (unlikely((error_code & PF_INSTR) &&
+ if (unlikely((error_code & X86_PF_INSTR) &&
((address & ~0xfff) == VSYSCALL_ADDR))) {
if (emulate_vsyscall(regs, address))
return;
@@ -933,7 +913,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
* are always protection faults.
*/
if (address >= TASK_SIZE_MAX)
- error_code |= PF_PROT;
+ error_code |= X86_PF_PROT;
if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);
@@ -989,11 +969,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
if (!boot_cpu_has(X86_FEATURE_OSPKE))
return false;
- if (error_code & PF_PK)
+ if (error_code & X86_PF_PK)
return true;
/* this checks permission keys on the VMA: */
- if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE),
- (error_code & PF_INSTR), foreign))
+ if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
+ (error_code & X86_PF_INSTR), foreign))
return true;
return false;
}
@@ -1021,7 +1001,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
int code = BUS_ADRERR;
/* Kernel mode? Handle exceptions or die: */
- if (!(error_code & PF_USER)) {
+ if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
return;
}
@@ -1050,14 +1030,14 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct vm_area_struct *vma,
unsigned int fault)
{
- if (fatal_signal_pending(current) && !(error_code & PF_USER)) {
+ if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, 0, 0);
return;
}
if (fault & VM_FAULT_OOM) {
/* Kernel mode? Handle exceptions or die: */
- if (!(error_code & PF_USER)) {
+ if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address,
SIGSEGV, SEGV_MAPERR);
return;
@@ -1082,16 +1062,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
static int spurious_fault_check(unsigned long error_code, pte_t *pte)
{
- if ((error_code & PF_WRITE) && !pte_write(*pte))
+ if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
return 0;
- if ((error_code & PF_INSTR) && !pte_exec(*pte))
+ if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
return 0;
/*
* Note: We do not do lazy flushing on protection key
- * changes, so no spurious fault will ever set PF_PK.
+ * changes, so no spurious fault will ever set X86_PF_PK.
*/
- if ((error_code & PF_PK))
+ if ((error_code & X86_PF_PK))
return 1;
return 1;
@@ -1137,8 +1117,8 @@ spurious_fault(unsigned long error_code, unsigned long address)
* change, so user accesses are not expected to cause spurious
* faults.
*/
- if (error_code != (PF_WRITE | PF_PROT)
- && error_code != (PF_INSTR | PF_PROT))
+ if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
+ error_code != (X86_PF_INSTR | X86_PF_PROT))
return 0;
pgd = init_mm.pgd + pgd_index(address);
@@ -1198,19 +1178,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
* always an unconditional error and can never result in
* a follow-up action to resolve the fault, like a COW.
*/
- if (error_code & PF_PK)
+ if (error_code & X86_PF_PK)
return 1;
/*
* Make sure to check the VMA so that we do not perform
- * faults just to hit a PF_PK as soon as we fill in a
+ * faults just to hit a X86_PF_PK as soon as we fill in a
* page.
*/
- if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE),
- (error_code & PF_INSTR), foreign))
+ if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
+ (error_code & X86_PF_INSTR), foreign))
return 1;
- if (error_code & PF_WRITE) {
+ if (error_code & X86_PF_WRITE) {
/* write, present and write, not present: */
if (unlikely(!(vma->vm_flags & VM_WRITE)))
return 1;
@@ -1218,7 +1198,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
}
/* read, present: */
- if (unlikely(error_code & PF_PROT))
+ if (unlikely(error_code & X86_PF_PROT))
return 1;
/* read, not present: */
@@ -1241,7 +1221,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
if (!static_cpu_has(X86_FEATURE_SMAP))
return false;
- if (error_code & PF_USER)
+ if (error_code & X86_PF_USER)
return false;
if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC))
@@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* protection error (error_code & 9) == 0.
*/
if (unlikely(fault_in_kernel_space(address))) {
- if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) {
+ if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
if (vmalloc_fault(address) >= 0)
return;
@@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
if (unlikely(kprobes_fault(regs)))
return;
- if (unlikely(error_code & PF_RSVD))
+ if (unlikely(error_code & X86_PF_RSVD))
pgtable_bad(regs, error_code, address);
if (unlikely(smap_violation(error_code, regs))) {
@@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
*/
if (user_mode(regs)) {
local_irq_enable();
- error_code |= PF_USER;
+ error_code |= X86_PF_USER;
flags |= FAULT_FLAG_USER;
} else {
if (regs->flags & X86_EFLAGS_IF)
@@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
- if (error_code & PF_WRITE)
+ if (error_code & X86_PF_WRITE)
flags |= FAULT_FLAG_WRITE;
- if (error_code & PF_INSTR)
+ if (error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;
/*
@@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* space check, thus avoiding the deadlock:
*/
if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
- if ((error_code & PF_USER) == 0 &&
+ if (!(error_code & X86_PF_USER) &&
!search_exception_tables(regs->ip)) {
bad_area_nosemaphore(regs, error_code, address, NULL);
return;
@@ -1409,7 +1389,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
bad_area(regs, error_code, address);
return;
}
- if (error_code & PF_USER) {
+ if (error_code & X86_PF_USER) {
/*
* Accessing the stack below %sp is always a bug.
* The large cushion allows instructions like enter
--
2.13.0
On Fri, Aug 18, 2017 at 05:27:43PM -0700, Ricardo Neri wrote:
> Both head_32.S and head_64.S utilize the same value to initialize the
> control register CR0. Also, other parts of the kernel might want to access
> to this initial definition (e.g., emulation code for User-Mode Instruction
s/to //
> Prevention uses this state to provide a sane dummy value for CR0 when
> emulating the smsw instruction). Thus, relocate this definition to a
> header file from which it can be conveniently accessed.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Brian Gerst <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Denys Vlasenko <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/include/uapi/asm/processor-flags.h | 6 ++++++
> arch/x86/kernel/head_32.S | 3 ---
> arch/x86/kernel/head_64.S | 3 ---
> 3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> index 185f3d10c194..aae1f2aa7563 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -151,5 +151,11 @@
> #define CX86_ARR_BASE 0xc4
> #define CX86_RCR_BASE 0xdc
>
> +/*
> + * Initial state of CR0 for head_32/64.S
> + */
No need for that comment.
With the minor nitpicks addressed, you can add:
Reviewed-by: Borislav Petkov <[email protected]>
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, Aug 18, 2017 at 05:27:46PM -0700, Ricardo Neri wrote:
> Even though memory addresses are unsigned, the operands used to compute the
> effective address do have a sign. This is true for ModRM.rm, SIB.base,
> SIB.index as well as the displacement bytes. Thus, signed variables shall
> be used when computing the effective address from these operands. Once the
> signed effective address has been computed, it is casted to an unsigned
> long to determine the linear address.
>
> Variables are renamed to better reflect the type of address being
> computed.
>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Nathan Howard <[email protected]>
> Cc: Adan Hawthorn <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/mm/mpx.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
I think you can simplify this function even more (diff ontop):
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9eec98022510..d0ec5c9b2a57 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -139,7 +139,7 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
{
int addr_offset, base_offset, indx_offset;
- unsigned long linear_addr;
+ unsigned long linear_addr = -1;
long eff_addr, base, indx;
insn_byte_t sib;
@@ -150,18 +150,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
- goto out_err;
+ goto out;
eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
if (base_offset < 0)
- goto out_err;
+ goto out;
indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
if (indx_offset < 0)
- goto out_err;
+ goto out;
base = regs_get_register(regs, base_offset);
indx = regs_get_register(regs, indx_offset);
@@ -170,7 +170,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
- goto out_err;
+ goto out;
eff_addr = regs_get_register(regs, addr_offset);
}
@@ -180,9 +180,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
linear_addr = (unsigned long)eff_addr;
+out:
return (void __user *)linear_addr;
-out_err:
- return (void __user *)-1;
}
static int mpx_insn_decode(struct insn *insn,
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, 2017-08-25 at 19:41 +0200, Borislav Petkov wrote:
Thanks Borislav for your feedback!
> On Fri, Aug 18, 2017 at 05:27:43PM -0700, Ricardo Neri wrote:
> > Both head_32.S and head_64.S utilize the same value to initialize the
> > control register CR0. Also, other parts of the kernel might want to access
> > to this initial definition (e.g., emulation code for User-Mode Instruction
>
> s/to //
>
> > Prevention uses this state to provide a sane dummy value for CR0 when
I'll make this change.
> > emulating the smsw instruction). Thus, relocate this definition to a
> > header file from which it can be conveniently accessed.
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Brian Gerst <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Denys Vlasenko <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> > Cc: Josh Poimboeuf <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Suggested-by: Borislav Petkov <[email protected]>
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/include/uapi/asm/processor-flags.h | 6 ++++++
> > arch/x86/kernel/head_32.S | 3 ---
> > arch/x86/kernel/head_64.S | 3 ---
> > 3 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> > index 185f3d10c194..aae1f2aa7563 100644
> > --- a/arch/x86/include/uapi/asm/processor-flags.h
> > +++ b/arch/x86/include/uapi/asm/processor-flags.h
> > @@ -151,5 +151,11 @@
> > #define CX86_ARR_BASE 0xc4
> > #define CX86_RCR_BASE 0xdc
> >
> > +/*
> > + * Initial state of CR0 for head_32/64.S
> > + */
>
> No need for that comment.
>
> With the minor nitpicks addressed, you can add:
>
> Reviewed-by: Borislav Petkov <[email protected]>
Thank you! Is it necessary for me to submit a v9 with these updates?
Perhaps I can make these updates in branch for the maintainers to pull
when/if this series is ack'ed.
Thanks and BR,
Ricardo
On Tue, 2017-08-29 at 18:09 +0200, Borislav Petkov wrote:
> On Fri, Aug 18, 2017 at 05:27:46PM -0700, Ricardo Neri wrote:
> > Even though memory addresses are unsigned, the operands used to compute the
> > effective address do have a sign. This is true for ModRM.rm, SIB.base,
> > SIB.index as well as the displacement bytes. Thus, signed variables shall
> > be used when computing the effective address from these operands. Once the
> > signed effective address has been computed, it is casted to an unsigned
> > long to determine the linear address.
> >
> > Variables are renamed to better reflect the type of address being
> > computed.
> >
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Nathan Howard <[email protected]>
> > Cc: Adan Hawthorn <[email protected]>
> > Cc: Joe Perches <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/mm/mpx.c | 20 ++++++++++++++------
> > 1 file changed, 14 insertions(+), 6 deletions(-)
>
> I think you can simplify this function even more (diff ontop):
>
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index 9eec98022510..d0ec5c9b2a57 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -139,7 +139,7 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> {
> int addr_offset, base_offset, indx_offset;
> - unsigned long linear_addr;
> + unsigned long linear_addr = -1;
> long eff_addr, base, indx;
> insn_byte_t sib;
>
> @@ -150,18 +150,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> if (addr_offset < 0)
> - goto out_err;
> + goto out;
>
> eff_addr = regs_get_register(regs, addr_offset);
> } else {
> if (insn->sib.nbytes) {
> base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
> if (base_offset < 0)
> - goto out_err;
> + goto out;
> This is a good suggestion. This is a good suggestion.
> indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX);
> if (indx_offset < 0)
> - goto out_err;
> + goto out;
>
> base = regs_get_register(regs, base_offset);
> indx = regs_get_register(regs, indx_offset);
> @@ -170,7 +170,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
> } else {
> addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
> if (addr_offset < 0)
> - goto out_err;
> + goto out;
>
> eff_addr = regs_get_register(regs, addr_offset);
> }
> @@ -180,9 +180,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
>
> linear_addr = (unsigned long)eff_addr;
>
> +out:
> return (void __user *)linear_addr;
> -out_err:
> - return (void __user *)-1;
This is a good suggestion. I will work on it. By now my series comprises
28 patches. If you plan to review the rest of the series and you don't
have major objections, could I work on these updates as increments from
my v8 series? I think that with 28 patches in the series is becoming
difficult to review.
Thanks and BR,
Ricardo
On Wed, Aug 30, 2017 at 09:04:18PM -0700, Ricardo Neri wrote:
> Thank you! Is it necessary for me to submit a v9 with these updates?
> Perhaps I can make these updates in branch for the maintainers to pull
> when/if this series is ack'ed.
Don't do anything and let me go through the rest of them first. It is
too late for this merge window anyway so we can take our time. Once you
receive full feedback from me (and hopefully others) you can send what
looks like to be a final v9 with all feedback incorporated. :-)
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Wed, Aug 30, 2017 at 09:19:14PM -0700, Ricardo Neri wrote:
> This is a good suggestion. I will work on it. By now my series comprises
> 28 patches. If you plan to review the rest of the series and you don't
> have major objections, could I work on these updates as increments from
> my v8 series? I think that with 28 patches in the series is becoming
> difficult to review.
See my other reply. Just merge this diff with your patch - no need for a
separate one.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, Aug 18, 2017 at 05:27:47PM -0700, Ricardo Neri wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that when ModRM.mod !=11b and
> ModRM.rm = 100b indexed register-indirect addressing is used. In other
> words, a SIB byte follows the ModRM byte. In the specific case of
> SIB.index = 100b, the scale*index portion of the computation of the
> effective address is null. To signal callers of this particular situation,
> get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
> error when decoding the SIB byte).
>
> An example of this situation can be the following instruction:
>
> 8b 4c 23 80 mov -0x80(%rbx,%riz,1),%rcx
> ModRM: 0x4c [mod:1b][reg:1b][rm:100b]
> SIB: 0x23 [scale:0b][index:100b][base:11b]
> Displacement: 0x80 (1-byte, as per ModRM.mod = 1b)
>
> The %riz 'register' indicates a null index.
>
> In long mode, a REX prefix may be used. When a REX prefix is present,
> REX.X adds a fourth bit to the register selection of SIB.index. This gives
> the ability to refer to all the 16 general purpose registers. When REX.X is
> 1b and SIB.index is 100b, the index is indicated in %r12. In our example,
> this would look like:
>
> 42 8b 4c 23 80 mov -0x80(%rbx,%r12,1),%rcx
> REX: 0x42 [W:0b][R:0b][X:1b][B:0b]
> ModRM: 0x4c [mod:1b][reg:1b][rm:100b]
> SIB: 0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
> Displacement: 0x80 (1-byte, as per ModRM.mod = 1b)
>
> %r12 is a valid register to use in the scale*index part of the effective
> address computation.
>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Nathan Howard <[email protected]>
> Cc: Adan Hawthorn <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/mm/mpx.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Thu, 2017-08-31 at 21:38 +0200, Borislav Petkov wrote:
> On Fri, Aug 18, 2017 at 05:27:47PM -0700, Ricardo Neri wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when ModRM.mod !=11b and
> > ModRM.rm = 100b indexed register-indirect addressing is used. In other
> > words, a SIB byte follows the ModRM byte. In the specific case of
> > SIB.index = 100b, the scale*index portion of the computation of the
> > effective address is null. To signal callers of this particular situation,
> > get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
> > error when decoding the SIB byte).
> >
> > An example of this situation can be the following instruction:
> >
> > 8b 4c 23 80 mov -0x80(%rbx,%riz,1),%rcx
> > ModRM: 0x4c [mod:1b][reg:1b][rm:100b]
> > SIB: 0x23 [scale:0b][index:100b][base:11b]
> > Displacement: 0x80 (1-byte, as per ModRM.mod = 1b)
> >
> > The %riz 'register' indicates a null index.
> >
> > In long mode, a REX prefix may be used. When a REX prefix is present,
> > REX.X adds a fourth bit to the register selection of SIB.index. This gives
> > the ability to refer to all the 16 general purpose registers. When REX.X is
> > 1b and SIB.index is 100b, the index is indicated in %r12. In our example,
> > this would look like:
> >
> > 42 8b 4c 23 80 mov -0x80(%rbx,%r12,1),%rcx
> > REX: 0x42 [W:0b][R:0b][X:1b][B:0b]
> > ModRM: 0x4c [mod:1b][reg:1b][rm:100b]
> > SIB: 0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
> > Displacement: 0x80 (1-byte, as per ModRM.mod = 1b)
> >
> > %r12 is a valid register to use in the scale*index part of the effective
> > address computation.
> >
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Nathan Howard <[email protected]>
> > Cc: Adan Hawthorn <[email protected]>
> > Cc: Joe Perches <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/mm/mpx.c | 22 ++++++++++++++++++++--
> > 1 file changed, 20 insertions(+), 2 deletions(-)
>
> Reviewed-by: Borislav Petkov <[email protected]>
Thanks for your review!
On Thu, 2017-08-31 at 11:51 +0200, Borislav Petkov wrote:
> On Wed, Aug 30, 2017 at 09:04:18PM -0700, Ricardo Neri wrote:
> > Thank you! Is it necessary for me to submit a v9 with these updates?
> > Perhaps I can make these updates in branch for the maintainers to pull
> > when/if this series is ack'ed.
>
> Don't do anything and let me go through the rest of them first. It is
> too late for this merge window anyway so we can take our time. Once you
> receive full feedback from me (and hopefully others) you can send what
> looks like to be a final v9 with all feedback incorporated. :-)
Sure, I will wait until you (and hopefully others) are done reviewing.
Thanks and BR,
Ricardo
On Fri, Aug 18, 2017 at 05:27:48PM -0700, Ricardo Neri wrote:
> Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> Developer's Manual volume 2A states that if a SIB byte is used and
> SIB.base is 101b and ModRM.mod is zero, then the base part of the base
> part of the effective address computation is null. To signal this
> situation, a -EDOM error is returned to indicate callers to ignore the
> base value present in the register operand.
>
> In this scenario, a 32-bit displacement follows the SIB byte. Displacement
> is obtained when the instruction decoder parses the operands.
>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Nathan Howard <[email protected]>
> Cc: Adan Hawthorn <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/mm/mpx.c | 28 +++++++++++++++++++---------
> 1 file changed, 19 insertions(+), 9 deletions(-)
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, Aug 18, 2017 at 05:27:49PM -0700, Ricardo Neri wrote:
> Other kernel submodules can benefit from using the utility functions
> defined in mpx.c to obtain the addresses and values of operands contained
> in the general purpose registers. An instance of this is the emulation code
> used for instructions protected by the Intel User-Mode Instruction
> Prevention feature.
>
> Thus, these functions are relocated to a new insn-eval.c file. The reason
> to not relocate these utilities into insn.c is that the latter solely
> analyses instructions given by a struct insn without any knowledge of the
> meaning of the values of instruction operands. This new utility insn-
> eval.c aims to be used to resolve and userspace linear addresses based on
^
|
something's missing there - "kernel" maybe?
> the contents of the instruction operands as well as the contents of pt_regs
> structure.
>
> These utilities come with a separate header. This is to avoid taking insn.c
> out of sync from the instructions decoders under tools/obj and tools/perf.
> This also avoids adding cumbersome #ifdef's for the #include'd files
> required to decode instructions in a kernel context.
>
> Functions are simply relocated. There are not functional or indentation
> changes.
That text below you don't need to have in the commit message. Patch
handling and other modalities are usually put after the "---" and before
the diffstat below...
> The checkpatch script issues the following warning with this
> commit:
>
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> rather than BUG() or BUG_ON()
> + BUG();
>
> This warning will be fixed in a subsequent patch.
>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Garnier <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
<--- ... here. Put such notes here.
> arch/x86/include/asm/insn-eval.h | 16 ++++
> arch/x86/lib/Makefile | 2 +-
> arch/x86/lib/insn-eval.c | 163 +++++++++++++++++++++++++++++++++++++++
> arch/x86/mm/mpx.c | 156 +------------------------------------
> 4 files changed, 182 insertions(+), 155 deletions(-)
> create mode 100644 arch/x86/include/asm/insn-eval.h
> create mode 100644 arch/x86/lib/insn-eval.c
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Wed, 2017-09-06 at 17:54 +0200, Borislav Petkov wrote:
> On Fri, Aug 18, 2017 at 05:27:49PM -0700, Ricardo Neri wrote:
> > Other kernel submodules can benefit from using the utility functions
> > defined in mpx.c to obtain the addresses and values of operands contained
> > in the general purpose registers. An instance of this is the emulation code
> > used for instructions protected by the Intel User-Mode Instruction
> > Prevention feature.
> >
> > Thus, these functions are relocated to a new insn-eval.c file. The reason
> > to not relocate these utilities into insn.c is that the latter solely
> > analyses instructions given by a struct insn without any knowledge of the
> > meaning of the values of instruction operands. This new utility insn-
> > eval.c aims to be used to resolve and userspace linear addresses based on
> ^
> |
>
> something's missing there - "kernel" maybe?
I have updated this line to read "This new utility insn-eval.c aims to
be used to resolve userspace linear addresses based on the contents of
the instruction operands as well as the contents of pt_regs structure."
>
> > the contents of the instruction operands as well as the contents of pt_regs
> > structure.
> >
> > These utilities come with a separate header. This is to avoid taking insn.c
> > out of sync from the instructions decoders under tools/obj and tools/perf.
> > This also avoids adding cumbersome #ifdef's for the #include'd files
> > required to decode instructions in a kernel context.
> >
> > Functions are simply relocated. There are not functional or indentation
> > changes.
>
> That text below you don't need to have in the commit message. Patch
> handling and other modalities are usually put after the "---" and before
> the diffstat below...
>
> > The checkpatch script issues the following warning with this
> > commit:
> >
> > WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
> > rather than BUG() or BUG_ON()
> > + BUG();
> >
> > This warning will be fixed in a subsequent patch.
> >
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Masami Hiramatsu <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Thomas Garnier <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Dmitry Vyukov <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
>
> <--- ... here. Put such notes here.
Thanks for explaining this to me. I will move the note about the warning
here.
>
> > arch/x86/include/asm/insn-eval.h | 16 ++++
> > arch/x86/lib/Makefile | 2 +-
> > arch/x86/lib/insn-eval.c | 163 +++++++++++++++++++++++++++++++++++++++
> > arch/x86/mm/mpx.c | 156 +------------------------------------
> > 4 files changed, 182 insertions(+), 155 deletions(-)
> > create mode 100644 arch/x86/include/asm/insn-eval.h
> > create mode 100644 arch/x86/lib/insn-eval.c
>
> Reviewed-by: Borislav Petkov <[email protected]>
Thanks you!
BR,
Ricardo
On Fri, Aug 18, 2017 at 05:27:50PM -0700, Ricardo Neri wrote:
> We are not in a critical failure path. The invalid register type is caused
> when trying to decode invalid instruction bytes from a user-space program.
> Thus, simply print an error message. To prevent this warning from being
> abused from user space programs, use the rate-limited variant of pr_err().
>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Garnier <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/lib/insn-eval.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 2bb8303ba92f..3919458fecbf 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -5,6 +5,7 @@
> */
> #include <linux/kernel.h>
> #include <linux/string.h>
> +#include <linux/ratelimit.h>
> #include <asm/inat.h>
> #include <asm/insn.h>
> #include <asm/insn-eval.h>
> @@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> break;
>
> default:
> - pr_err("invalid register type");
> - BUG();
> - break;
> + pr_err_ratelimited("insn: x86: invalid register type");
Also, I meant to add it to pr_fmt. Feel free to merge this hunk ontop of
yours:
---
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 3919458fecbf..d46034ddfbb7 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -10,6 +10,9 @@
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#undef pr_fmt
+#define pr_fmt(fmt) "insn: " fmt
+
enum reg_type {
REG_TYPE_RM = 0,
REG_TYPE_INDEX,
@@ -86,7 +89,7 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
break;
default:
- pr_err_ratelimited("insn: x86: invalid register type");
+ pr_err_ratelimited("invalid register type: %d\n", type);
return -EINVAL;
}
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Thu, 2017-09-07 at 19:54 +0200, Borislav Petkov wrote:
>
> Also, I meant to add it to pr_fmt. Feel free to merge this hunk ontop
> of
> yours:
>
> ---
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 3919458fecbf..d46034ddfbb7 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -10,6 +10,9 @@
> #include <asm/insn.h>
> #include <asm/insn-eval.h>
>
> +#undef pr_fmt
> +#define pr_fmt(fmt) "insn: " fmt
> +
> enum reg_type {
> REG_TYPE_RM = 0,
> REG_TYPE_INDEX,
> @@ -86,7 +89,7 @@ static int get_reg_offset(struct insn *insn, struct
> pt_regs *regs,
> break;
>
> default:
> - pr_err_ratelimited("insn: x86: invalid register
> type");
> + pr_err_ratelimited("invalid register type: %d\n",
> type);
> return -EINVAL;
> }
>
Oh, I didn't understand your comment initially. Sure, I will add merge
this on top of my patch.
Thanks and BR,
Ricardo
On Fri, Aug 18, 2017 at 05:27:51PM -0700, Ricardo Neri wrote:
> The function get_reg_offset() returns the offset to the register the
> argument specifies as indicated in an enumeration of type offset. Callers
> of this function would need the definition of such enumeration. This is
> not needed. Instead, add helper functions for this purpose. These functions
> are useful in cases when, for instance, the caller needs to decide whether
> the operand is a register or a memory location by looking at the rm part
> of the ModRM byte. As of now, this is the only helper function that is
> needed.
>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Garnier <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/include/asm/insn-eval.h | 1 +
> arch/x86/lib/insn-eval.c | 15 +++++++++++++++
> 2 files changed, 16 insertions(+)
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, Aug 18, 2017 at 05:27:52PM -0700, Ricardo Neri wrote:
> String instructions are special because, in protected mode, the linear
> address is always obtained via the ES segment register in operands that
> use the (E)DI register; the DS segment register in operands that use
> the (E)SI register. Furthermore, segment override prefixes are ignored
> when calculating a linear address involving the (E)DI register; segment
> override prefixes can be used when calculating linear addresses involving
> the (E)SI register.
>
> It follows that linear addresses are calculated differently for the case of
> string instructions. The purpose of this utility function is to identify
> such instructions for callers to determine a linear address correctly.
>
> Note that this function only identifies string instructions; it does not
> determine what segment register to use in the address computation. That is
> left to callers. A subsequent commmit introduces a function to determine
> the segment register to use given the instruction, operands and
> segment override prefixes.
>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Garnier <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/lib/insn-eval.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
Reviewed-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, 2017-09-08 at 15:35 +0200, Borislav Petkov wrote:
> On Fri, Aug 18, 2017 at 05:27:51PM -0700, Ricardo Neri wrote:
> >
> > The function get_reg_offset() returns the offset to the register
> > the
> > argument specifies as indicated in an enumeration of type offset.
> > Callers
> > of this function would need the definition of such enumeration.
> > This is
> > not needed. Instead, add helper functions for this purpose. These
> > functions
> > are useful in cases when, for instance, the caller needs to decide
> > whether
> > the operand is a register or a memory location by looking at the rm
> > part
> > of the ModRM byte. As of now, this is the only helper function that
> > is
> > needed.
> >
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Masami Hiramatsu <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Thomas Garnier <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dmitry Vyukov <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/include/asm/insn-eval.h | 1 +
> > arch/x86/lib/insn-eval.c | 15 +++++++++++++++
> > 2 files changed, 16 insertions(+)
> Reviewed-by: Borislav Petkov <[email protected]>
Thanks for your review!
BR,
Ricardo
On Fri, 2017-09-08 at 15:57 +0200, Borislav Petkov wrote:
> On Fri, Aug 18, 2017 at 05:27:52PM -0700, Ricardo Neri wrote:
> >
> > String instructions are special because, in protected mode, the
> > linear
> > address is always obtained via the ES segment register in operands
> > that
> > use the (E)DI register; the DS segment register in operands that
> > use
> > the (E)SI register. Furthermore, segment override prefixes are
> > ignored
> > when calculating a linear address involving the (E)DI register;
> > segment
> > override prefixes can be used when calculating linear addresses
> > involving
> > the (E)SI register.
> >
> > It follows that linear addresses are calculated differently for the
> > case of
> > string instructions. The purpose of this utility function is to
> > identify
> > such instructions for callers to determine a linear address
> > correctly.
> >
> > Note that this function only identifies string instructions; it
> > does not
> > determine what segment register to use in the address computation.
> > That is
> > left to callers. A subsequent commmit introduces a function to
> > determine
> > the segment register to use given the instruction, operands and
> > segment override prefixes.
> >
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Masami Hiramatsu <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Thomas Garnier <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dmitry Vyukov <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/lib/insn-eval.c | 26 ++++++++++++++++++++++++++
> > 1 file changed, 26 insertions(+)
> Reviewed-by: Borislav Petkov <[email protected]>
Thanks for your review!
BR,
Ricardo
Hi,
On Fri, Aug 18, 2017 at 05:27:53PM -0700, Ricardo Neri wrote:
> When computing a linear address and segmentation is used, we need to know
> the base address of the segment involved in the computation. In most of
> the cases, the segment base address will be zero as in USER_DS/USER32_DS.
...
> arch/x86/include/asm/inat.h | 10 ++
> arch/x86/lib/insn-eval.c | 278 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 288 insertions(+)
so I did a bunch of simplifications on top, see if you agree:
* we should always test for if (!insn) first because otherwise we can't talk
about a segment at all.
* the nomenclature should be clear: if we return INAT_SEG_REG_* those are own
defined indices and not registers or prefixes or whatever else, so everywhere we
state that we're returning an *index*.
* and then shorten local variables' names as reading "reg" every
other line doesn't make it clearer :)
* also some comments formatting for better readability.
* and prefixing register names with "r" in the comments means then all
register widths, not only 32-bit. Dunno, is "(E)" SDM nomenclature for
the different register widths?
---
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 86f58ce6c302..720529573d72 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -44,50 +44,45 @@ static bool is_string_insn(struct insn *insn)
}
/**
- * get_overridden_seg_reg() - obtain segment register to use from prefixes
- * @insn: Instruction structure with segment override prefixes
- * @regs: Structure with register values as seen when entering kernel mode
+ * get_seg_reg_idx() - obtain segment register index to use from prefixes
+ * @insn: Instruction with segment override prefixes
+ * @regs: Register values as seen when entering kernel mode
* @regoff: Operand offset, in pt_regs, used to deterimine segment register
*
- * The segment register to which an effective address refers depends on
- * a) whether running in long mode (in such a case semgment override prefixes
- * are ignored. b) Whether segment override prefixes must be ignored for certain
- * registers: always use CS when the register is (R|E)IP; always use ES when
- * operand register is (E)DI with a string instruction as defined in the Intel
- * documentation. c) If segment overrides prefixes are found in the instruction
- * prefixes. d) Use the default segment register associated with the operand
- * register.
+ * The segment register to which an effective address refers, depends on:
+ *
+ * a) whether running in long mode (in such a case segment override prefixes
+ * are ignored).
+ *
+ * b) Whether segment override prefixes must be ignored for certain
+ * registers: always use CS when the register is rIP; always use ES when
+ * operand register is rDI with a string instruction as defined in the Intel
+ * documentation.
*
- * This function returns the overridden segment register to use, if any, as per
- * the conditions described above. Please note that this function
+ * c) If segment overrides prefixes are found in the instruction prefixes.
+ *
+ * d) Use the default segment register associated with the operand register.
+ *
+ * This function returns the segment register override to use, if any,
+ * as per the conditions described above. Please note that this function
* does not return the value in the segment register (i.e., the segment
- * selector). The segment selector needs to be obtained using
- * get_segment_selector() and passing the segment register resolved by
+ * selector) but our defined index. The segment selector needs to be obtained
+ * using get_segment_selector() and passing the segment register resolved by
* this function.
*
- * Return: A constant identifying the segment register to use, among CS, SS, DS,
+ * Returns:
+ *
+ * A constant identifying the segment register to use, among CS, SS, DS,
* ES, FS, or GS. INAT_SEG_REG_IGNORE is returned if running in long mode.
* INAT_SEG_REG_DEFAULT is returned if no segment override prefixes were found
- * and the default segment register shall be used. -EINVAL in case of error.
+ * and the default segment register shall be used.
+ *
+ * -EINVAL in case of error.
*/
-static int get_overridden_seg_reg(struct insn *insn, struct pt_regs *regs,
- int regoff)
+static int get_seg_reg_idx(struct insn *insn, struct pt_regs *regs, int regoff)
{
- int i;
- int sel_overrides = 0;
- int seg_register = INAT_SEG_REG_DEFAULT;
-
- /*
- * Segment override prefixes should not be used for (E)IP. Check this
- * case first as we might not have (and not needed at all) a
- * valid insn structure to evaluate segment override prefixes.
- */
- if (regoff == offsetof(struct pt_regs, ip)) {
- if (user_64bit_mode(regs))
- return INAT_SEG_REG_IGNORE;
- else
- return INAT_SEG_REG_DEFAULT;
- }
+ int idx = INAT_SEG_REG_DEFAULT;
+ int sel_overrides = 0, i;
if (!insn)
return -EINVAL;
@@ -101,27 +96,27 @@ static int get_overridden_seg_reg(struct insn *insn, struct pt_regs *regs,
attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]);
switch (attr) {
case INAT_MAKE_PREFIX(INAT_PFX_CS):
- seg_register = INAT_SEG_REG_CS;
+ idx = INAT_SEG_REG_CS;
sel_overrides++;
break;
case INAT_MAKE_PREFIX(INAT_PFX_SS):
- seg_register = INAT_SEG_REG_SS;
+ idx = INAT_SEG_REG_SS;
sel_overrides++;
break;
case INAT_MAKE_PREFIX(INAT_PFX_DS):
- seg_register = INAT_SEG_REG_DS;
+ idx = INAT_SEG_REG_DS;
sel_overrides++;
break;
case INAT_MAKE_PREFIX(INAT_PFX_ES):
- seg_register = INAT_SEG_REG_ES;
+ idx = INAT_SEG_REG_ES;
sel_overrides++;
break;
case INAT_MAKE_PREFIX(INAT_PFX_FS):
- seg_register = INAT_SEG_REG_FS;
+ idx = INAT_SEG_REG_FS;
sel_overrides++;
break;
case INAT_MAKE_PREFIX(INAT_PFX_GS):
- seg_register = INAT_SEG_REG_GS;
+ idx = INAT_SEG_REG_GS;
sel_overrides++;
break;
/* No default action needed. */
@@ -133,26 +128,26 @@ static int get_overridden_seg_reg(struct insn *insn, struct pt_regs *regs,
* overrides for FS and GS.
*/
if (user_64bit_mode(regs)) {
- if (seg_register != INAT_SEG_REG_FS &&
- seg_register != INAT_SEG_REG_GS)
+ if (idx != INAT_SEG_REG_FS &&
+ idx != INAT_SEG_REG_GS)
return INAT_SEG_REG_IGNORE;
/* More than one segment override prefix leads to undefined behavior. */
} else if (sel_overrides > 1) {
return -EINVAL;
/*
* Segment override prefixes are always ignored for string instructions
- * that involve the use the (E)DI register.
+ * that use the (E)DI register.
*/
} else if ((regoff == offsetof(struct pt_regs, di)) &&
is_string_insn(insn)) {
return INAT_SEG_REG_DEFAULT;
}
- return seg_register;
+ return idx;
}
/**
- * resolve_seg_register() - obtain segment register
+ * resolve_seg_reg() - obtain segment register index
* @insn: Instruction structure with segment override prefixes
* @regs: Structure with register values as seen when entering kernel mode
* @regoff: Operand offset, in pt_regs, used to deterimine segment register
@@ -169,36 +164,38 @@ static int get_overridden_seg_reg(struct insn *insn, struct pt_regs *regs,
*
* Return: A constant identifying the segment register to use, among CS, SS, DS,
* ES, FS, or GS. INAT_SEG_REG_IGNORE is returned if running in long mode.
+ *
* -EINVAL in case of error.
*/
-static int resolve_seg_register(struct insn *insn, struct pt_regs *regs,
- int regoff)
+static int resolve_seg_reg(struct insn *insn, struct pt_regs *regs, int regoff)
{
- int seg_reg;
+ int idx;
- seg_reg = get_overridden_seg_reg(insn, regs, regoff);
+ if (!insn)
+ return -EINVAL;
- if (seg_reg < 0)
- return seg_reg;
+ idx = get_seg_reg_idx(insn, regs, regoff);
+ if (idx < 0)
+ return idx;
- if (seg_reg == INAT_SEG_REG_IGNORE)
- return seg_reg;
+ if (idx == INAT_SEG_REG_IGNORE)
+ return idx;
- if (seg_reg != INAT_SEG_REG_DEFAULT)
- return seg_reg;
+ if (idx != INAT_SEG_REG_DEFAULT)
+ return idx;
/*
* If we are here, we use the default segment register as described
* in the Intel documentation:
- * + DS for all references involving (E)AX, (E)CX, (E)DX, (E)BX, and
- * (E)SI.
- * + If used in a string instruction, ES for (E)DI. Otherwise, DS.
+ *
+ * + DS for all references involving r[ABCD]X, and rSI.
+ * + If used in a string instruction, ES for rDI. Otherwise, DS.
* + AX, CX and DX are not valid register operands in 16-bit address
* encodings but are valid for 32-bit and 64-bit encodings.
* + -EDOM is reserved to identify for cases in which no register
* is used (i.e., displacement-only addressing). Use DS.
- * + SS for (E)SP or (E)BP.
- * + CS for (E)IP.
+ * + SS for rSP or rBP.
+ * + CS for rIP.
*/
switch (regoff) {
@@ -206,24 +203,26 @@ static int resolve_seg_register(struct insn *insn, struct pt_regs *regs,
case offsetof(struct pt_regs, cx):
case offsetof(struct pt_regs, dx):
/* Need insn to verify address size. */
- if (!insn || insn->addr_bytes == 2)
+ if (insn->addr_bytes == 2)
return -EINVAL;
+
case -EDOM:
case offsetof(struct pt_regs, bx):
case offsetof(struct pt_regs, si):
return INAT_SEG_REG_DS;
+
case offsetof(struct pt_regs, di):
- /* Need insn to see if insn is string instruction. */
- if (!insn)
- return -EINVAL;
if (is_string_insn(insn))
return INAT_SEG_REG_ES;
return INAT_SEG_REG_DS;
+
case offsetof(struct pt_regs, bp):
case offsetof(struct pt_regs, sp):
return INAT_SEG_REG_SS;
+
case offsetof(struct pt_regs, ip):
return INAT_SEG_REG_CS;
+
default:
return -EINVAL;
}
@@ -232,17 +231,20 @@ static int resolve_seg_register(struct insn *insn, struct pt_regs *regs,
/**
* get_segment_selector() - obtain segment selector
* @regs: Structure with register values as seen when entering kernel mode
- * @seg_reg: Segment register to use
+ * @seg_reg: Segment register index to use
*
- * Obtain the segment selector from any of the CS, SS, DS, ES, FS, GS segment
- * registers. In CONFIG_X86_32, the segment is obtained from either pt_regs or
- * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are obtained
+ * Obtain the segment selector from any of the CS, SS, DS, ES, FS, GS
+ * segment registers. In CONFIG_X86_32, the segment is obtained from either
+ * pt_regs or kernel_vm86_regs as applicable. On 64-bit, CS and SS are obtained
* from pt_regs. DS, ES, FS and GS are obtained by reading the actual CPU
- * registers. This done for only for completeness as in CONFIG_X86_64 segment
- * registers are ignored.
+ * registers. This done only for completeness as in long mode segment registers
+ * are ignored.
+ *
+ * Returns:
+ *
+ * Value of the segment selector, including null when running in long mode.
*
- * Return: Value of the segment selector, including null when running in
- * long mode. -1 on error.
+ * -EINVAL on error.
*/
static short get_segment_selector(struct pt_regs *regs, int seg_reg)
{
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Fri, Aug 18, 2017 at 05:27:54PM -0700, Ricardo Neri wrote:
> The segment descriptor contains information that is relevant to how linear
> addresses need to be computed. It contains the default size of addresses
> as well as the base address of the segment. Thus, given a segment
> selector, we ought look at segment descriptor to correctly calculate the
^
to
> linear address.
>
> In protected mode, the segment selector might indicate a segment
> descriptor from either the global descriptor table or a local descriptor
> table. Both cases are considered in this function.
>
> This function is a prerequisite for functions in subsequent commits that
> will obtain the aforementioned attributes of the segment descriptor.
>
> Cc: Dave Hansen <[email protected]>
> Cc: Adam Buchbinder <[email protected]>
> Cc: Colin Ian King <[email protected]>
> Cc: Lorenzo Stoakes <[email protected]>
> Cc: Qiaowei Ren <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Garnier <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Ravi V. Shankar <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> arch/x86/lib/insn-eval.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 86f58ce6c302..9cf2c49afc15 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -6,9 +6,13 @@
> #include <linux/kernel.h>
> #include <linux/string.h>
> #include <linux/ratelimit.h>
> +#include <linux/mmu_context.h>
> +#include <asm/desc_defs.h>
> +#include <asm/desc.h>
> #include <asm/inat.h>
> #include <asm/insn.h>
> #include <asm/insn-eval.h>
> +#include <asm/ldt.h>
> #include <asm/vm86.h>
>
> enum reg_type {
> @@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> }
>
> /**
> + * get_desc() - Obtain address of segment descriptor
Get segment descriptor.
> + * @sel: Segment selector
> + *
> + * Given a segment selector, obtain a pointer to the segment descriptor.
> + * Both global and local descriptor tables are supported.
> + *
> + * Return: pointer to segment descriptor on success. NULL on error.
> + */
> +static struct desc_struct *get_desc(unsigned short sel)
I've simplified this function to be more readable, here's a diff ontop.
More specifically, if you flip the logic and move @desc inside the if,
you don't need to have mutex_unlock() twice in there.
And having a local @ldt ptr makes the selector check more readable.
---
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 9cf2c49afc15..48af787cb160 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -417,24 +417,24 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
static struct desc_struct *get_desc(unsigned short sel)
{
struct desc_ptr gdt_desc = {0, 0};
- struct desc_struct *desc = NULL;
unsigned long desc_base;
#ifdef CONFIG_MODIFY_LDT_SYSCALL
if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+ struct desc_struct *desc = NULL;
+ struct ldt_struct *ldt;
+
/* Bits [15:3] contain the index of the desired entry. */
sel >>= 3;
mutex_lock(¤t->active_mm->context.lock);
- /* The size of the LDT refers to the number of entries. */
- if (!current->active_mm->context.ldt ||
- sel >= current->active_mm->context.ldt->nr_entries) {
- mutex_unlock(¤t->active_mm->context.lock);
- return NULL;
- }
- desc = ¤t->active_mm->context.ldt->entries[sel];
+ ldt = current->active_mm->context.ldt;
+ if (ldt && sel < ldt->nr_entries)
+ desc = &ldt->entries[sel];
+
mutex_unlock(¤t->active_mm->context.lock);
+
return desc;
}
#endif
@@ -452,8 +452,7 @@ static struct desc_struct *get_desc(unsigned short sel)
if (desc_base > gdt_desc.size)
return NULL;
- desc = (struct desc_struct *)(gdt_desc.address + desc_base);
- return desc;
+ return (struct desc_struct *)(gdt_desc.address + desc_base);
}
/**
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Tue, 2017-09-26 at 12:43 +0200, Borislav Petkov wrote:
> Hi,
>
> On Fri, Aug 18, 2017 at 05:27:53PM -0700, Ricardo Neri wrote:
> >
> > When computing a linear address and segmentation is used, we need
> > to know
> > the base address of the segment involved in the computation. In
> > most of
> > the cases, the segment base address will be zero as in
> > USER_DS/USER32_DS.
> ...
>
> >
> > arch/x86/include/asm/inat.h | 10 ++
> > arch/x86/lib/insn-eval.c | 278
> > ++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 288 insertions(+)
> so I did a bunch of simplifications on top, see if you agree:
>
> * we should always test for if (!insn) first because otherwise we
> can't talk
> about a segment at all.
This is true except when we don't have an insn at all (well, it may be
non-NULL but it will only contain garbage). The case to which I am
referring is when we begin decoding our instruction. The first step is
to copy_from_user the instruction and populate insn. For this we must
calculate the linear address from where we copy using CS and rIP.
Furthermore, in this only case we don't need to look at insn at all as
the only register involved is rIP no segment override prefixes are
allowed.
Please see my comment below.
>
> * the nomenclature should be clear: if we return INAT_SEG_REG_* those
> are own
> defined indices and not registers or prefixes or whatever else, so
> everywhere we
> state that we're returning an *index*.
I agree.
>
> * and then shorten local variables' names as reading "reg" every
> other line doesn't make it clearer :)
I agree.
>
> * also some comments formatting for better readability.
Thanks!
>
> * and prefixing register names with "r" in the comments means then
> all
> register widths, not only 32-bit. Dunno, is "(E)" SDM nomenclature
> for
> the different register widths?
A quick look at the section 3.1.1.3 of the Intel Software Development
Manual Vol2 reveals that r/m16 operands are referred as [ACDB]X,
[SB]P,and [SD]I. r/m32 operands are referred as E[ACDB]X, E[SB]P and
E[SD]I. r/m64 operands are referred as R[ACDB]X, R[SB]P, R[SD]I and
R[8-15].
Also, some instructions (e.g., string structions) do use the
nomenclature (E)[DI]I protected mode and (R|E)[DI]I for long mode.
I only used "(E)" (i.e., not the "(R|)" part) as these utility
functions will deal mostly with protected mode, unless FS or GS are
used in long mode.
>
> ---
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 86f58ce6c302..720529573d72 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -44,50 +44,45 @@ static bool is_string_insn(struct insn *insn)
> }
>
> /**
> - * get_overridden_seg_reg() - obtain segment register to use from
> prefixes
> - * @insn: Instruction structure with segment override
> prefixes
> - * @regs: Structure with register values as seen when
> entering kernel mode
> + * get_seg_reg_idx() - obtain segment register index to use from
> prefixes
> + * @insn: Instruction with segment override prefixes
> + * @regs: Register values as seen when entering kernel mode
> * @regoff: Operand offset, in pt_regs, used to deterimine
> segment register
> *
> - * The segment register to which an effective address refers depends
> on
> - * a) whether running in long mode (in such a case semgment override
> prefixes
> - * are ignored. b) Whether segment override prefixes must be ignored
> for certain
> - * registers: always use CS when the register is (R|E)IP; always use
> ES when
> - * operand register is (E)DI with a string instruction as defined in
> the Intel
> - * documentation. c) If segment overrides prefixes are found in the
> instruction
> - * prefixes. d) Use the default segment register associated with the
> operand
> - * register.
> + * The segment register to which an effective address refers,
> depends on:
> + *
> + * a) whether running in long mode (in such a case segment override
> prefixes
> + * are ignored).
> + *
> + * b) Whether segment override prefixes must be ignored for certain
> + * registers: always use CS when the register is rIP; always use ES
> when
> + * operand register is rDI with a string instruction as defined in
> the Intel
> + * documentation.
> *
> - * This function returns the overridden segment register to use, if
> any, as per
> - * the conditions described above. Please note that this function
> + * c) If segment overrides prefixes are found in the instruction
> prefixes.
> + *
> + * d) Use the default segment register associated with the operand
> register.
> + *
> + * This function returns the segment register override to use, if
> any,
> + * as per the conditions described above. Please note that this
> function
> * does not return the value in the segment register (i.e., the
> segment
> - * selector). The segment selector needs to be obtained using
> - * get_segment_selector() and passing the segment register resolved
> by
> + * selector) but our defined index. The segment selector needs to be
> obtained
> + * using get_segment_selector() and passing the segment register
> resolved by
> * this function.
> *
> - * Return: A constant identifying the segment register to use, among
> CS, SS, DS,
> + * Returns:
> + *
> + * A constant identifying the segment register to use, among CS, SS,
> DS,
> * ES, FS, or GS. INAT_SEG_REG_IGNORE is returned if running in long
> mode.
> * INAT_SEG_REG_DEFAULT is returned if no segment override prefixes
> were found
> - * and the default segment register shall be used. -EINVAL in case
> of error.
> + * and the default segment register shall be used.
> + *
> + * -EINVAL in case of error.
> */
This rewording looks OK to me. Thanks!
> -static int get_overridden_seg_reg(struct insn *insn, struct pt_regs
> *regs,
> - int regoff)
> +static int get_seg_reg_idx(struct insn *insn, struct pt_regs *regs,
> int regoff)
> {
> - int i;
> - int sel_overrides = 0;
> - int seg_register = INAT_SEG_REG_DEFAULT;
> -
> - /*
> - * Segment override prefixes should not be used for (E)IP.
> Check this
> - * case first as we might not have (and not needed at all) a
> - * valid insn structure to evaluate segment override
> prefixes.
> - */
> - if (regoff == offsetof(struct pt_regs, ip)) {
> - if (user_64bit_mode(regs))
> - return INAT_SEG_REG_IGNORE;
> - else
> - return INAT_SEG_REG_DEFAULT;
> - }
This function essentially inspects insn to find segment override
prefixes. However, if called with rIP, we still don't have any
instruction to inspect (we are yet to copy_from_user it). insn would
essentially contain garbage. I guess callers could zero-init insn in
such a case. However, I think that keeping this check makes things more
clear.
> + int idx = INAT_SEG_REG_DEFAULT;
> + int sel_overrides = 0, i;
>
> if (!insn)
> return -EINVAL;
> @@ -101,27 +96,27 @@ static int get_overridden_seg_reg(struct insn
> *insn, struct pt_regs *regs,
> attr = inat_get_opcode_attribute(insn-
> >prefixes.bytes[i]);
> switch (attr) {
> case INAT_MAKE_PREFIX(INAT_PFX_CS):
> - seg_register = INAT_SEG_REG_CS;
> + idx = INAT_SEG_REG_CS;
> sel_overrides++;
> break;
> case INAT_MAKE_PREFIX(INAT_PFX_SS):
> - seg_register = INAT_SEG_REG_SS;
> + idx = INAT_SEG_REG_SS;
> sel_overrides++;
> break;
> case INAT_MAKE_PREFIX(INAT_PFX_DS):
> - seg_register = INAT_SEG_REG_DS;
> + idx = INAT_SEG_REG_DS;
> sel_overrides++;
> break;
> case INAT_MAKE_PREFIX(INAT_PFX_ES):
> - seg_register = INAT_SEG_REG_ES;
> + idx = INAT_SEG_REG_ES;
> sel_overrides++;
> break;
> case INAT_MAKE_PREFIX(INAT_PFX_FS):
> - seg_register = INAT_SEG_REG_FS;
> + idx = INAT_SEG_REG_FS;
> sel_overrides++;
> break;
> case INAT_MAKE_PREFIX(INAT_PFX_GS):
> - seg_register = INAT_SEG_REG_GS;
> + idx = INAT_SEG_REG_GS;
> sel_overrides++;
> break;
> /* No default action needed. */
> @@ -133,26 +128,26 @@ static int get_overridden_seg_reg(struct insn
> *insn, struct pt_regs *regs,
> * overrides for FS and GS.
> */
> if (user_64bit_mode(regs)) {
> - if (seg_register != INAT_SEG_REG_FS &&
> - seg_register != INAT_SEG_REG_GS)
> + if (idx != INAT_SEG_REG_FS &&
> + idx != INAT_SEG_REG_GS)
> return INAT_SEG_REG_IGNORE;
> /* More than one segment override prefix leads to undefined
> behavior. */
> } else if (sel_overrides > 1) {
> return -EINVAL;
> /*
> * Segment override prefixes are always ignored for string
> instructions
> - * that involve the use the (E)DI register.
> + * that use the (E)DI register.
> */
> } else if ((regoff == offsetof(struct pt_regs, di)) &&
> is_string_insn(insn)) {
> return INAT_SEG_REG_DEFAULT;
> }
>
> - return seg_register;
I will change to use indexes as you suggested.
> + return idx;
> }
>
> /**
> - * resolve_seg_register() - obtain segment register
> + * resolve_seg_reg() - obtain segment register index
> * @insn: Instruction structure with segment override
> prefixes
> * @regs: Structure with register values as seen when
> entering kernel mode
> * @regoff: Operand offset, in pt_regs, used to deterimine
> segment register
> @@ -169,36 +164,38 @@ static int get_overridden_seg_reg(struct insn
> *insn, struct pt_regs *regs,
> *
> * Return: A constant identifying the segment register to use, among
> CS, SS, DS,
> * ES, FS, or GS. INAT_SEG_REG_IGNORE is returned if running in long
> mode.
> + *
> * -EINVAL in case of error.
> */
> -static int resolve_seg_register(struct insn *insn, struct pt_regs
> *regs,
> - int regoff)
> +static int resolve_seg_reg(struct insn *insn, struct pt_regs *regs,
> int regoff)
> {
> - int seg_reg;
> + int idx;
>
> - seg_reg = get_overridden_seg_reg(insn, regs, regoff);
> + if (!insn)
> + return -EINVAL;
I checked for a NULL insn only after get_overriden_seg_reg (now
get_seg_reg_idx) because such function is able to handle a null insn.
However, this function not always needs a non-NULL insn. If obtaing the
regment register for rIP, there is not need to inspect the instruction
at all.
I only check for a NULL insn when needed (i.e., the contents of the
instruction could change the used segment register).
>
> - if (seg_reg < 0)
> - return seg_reg;
> + idx = get_seg_reg_idx(insn, regs, regoff);
> + if (idx < 0)
> + return idx;
>
> - if (seg_reg == INAT_SEG_REG_IGNORE)
> - return seg_reg;
> + if (idx == INAT_SEG_REG_IGNORE)
> + return idx;
>
> - if (seg_reg != INAT_SEG_REG_DEFAULT)
> - return seg_reg;
> + if (idx != INAT_SEG_REG_DEFAULT)
> + return idx;
>
> /*
> * If we are here, we use the default segment register as
> described
> * in the Intel documentation:
> - * + DS for all references involving (E)AX, (E)CX, (E)DX,
> (E)BX, and
> - * (E)SI.
> - * + If used in a string instruction, ES for (E)DI.
> Otherwise, DS.
> + *
> + * + DS for all references involving r[ABCD]X, and rSI.
> + * + If used in a string instruction, ES for rDI.
> Otherwise, DS.
> * + AX, CX and DX are not valid register operands in 16-
> bit address
> * encodings but are valid for 32-bit and 64-bit
> encodings.
> * + -EDOM is reserved to identify for cases in which no
> register
> * is used (i.e., displacement-only addressing). Use DS.
> - * + SS for (E)SP or (E)BP.
> - * + CS for (E)IP.
> + * + SS for rSP or rBP.
> + * + CS for rIP.
> */
Thanks for the rewording!
>
> switch (regoff) {
> @@ -206,24 +203,26 @@ static int resolve_seg_register(struct insn
> *insn, struct pt_regs *regs,
> case offsetof(struct pt_regs, cx):
> case offsetof(struct pt_regs, dx):
> /* Need insn to verify address size. */
> - if (!insn || insn->addr_bytes == 2)
> + if (insn->addr_bytes == 2)
Here we care if insn is NULL as we need to look at the address size.
> return -EINVAL;
> +
> case -EDOM:
> case offsetof(struct pt_regs, bx):
> case offsetof(struct pt_regs, si):
> return INAT_SEG_REG_DS;
> +
> case offsetof(struct pt_regs, di):
> - /* Need insn to see if insn is string instruction.
> */
> - if (!insn)
> - return -EINVAL;
Here we need a valid insn to determine if it contains a string
instruction.
> if (is_string_insn(insn))
> return INAT_SEG_REG_ES;
> return INAT_SEG_REG_DS;
> +
> case offsetof(struct pt_regs, bp):
> case offsetof(struct pt_regs, sp):
> return INAT_SEG_REG_SS;
> +
> case offsetof(struct pt_regs, ip):
> return INAT_SEG_REG_CS;
For CS we don't need insn at all.
> +
> default:
> return -EINVAL;
> }
> @@ -232,17 +231,20 @@ static int resolve_seg_register(struct insn
> *insn, struct pt_regs *regs,
> /**
> * get_segment_selector() - obtain segment selector
> * @regs: Structure with register values as seen when
> entering kernel mode
> - * @seg_reg: Segment register to use
> + * @seg_reg: Segment register index to use
> *
> - * Obtain the segment selector from any of the CS, SS, DS, ES, FS,
> GS segment
> - * registers. In CONFIG_X86_32, the segment is obtained from either
> pt_regs or
> - * kernel_vm86_regs as applicable. In CONFIG_X86_64, CS and SS are
> obtained
> + * Obtain the segment selector from any of the CS, SS, DS, ES, FS,
> GS
> + * segment registers. In CONFIG_X86_32, the segment is obtained from
> either
> + * pt_regs or kernel_vm86_regs as applicable. On 64-bit, CS and SS
> are obtained
> * from pt_regs. DS, ES, FS and GS are obtained by reading the
> actual CPU
> - * registers. This done for only for completeness as in
> CONFIG_X86_64 segment
> - * registers are ignored.
> + * registers. This done only for completeness as in long mode
> segment registers
> + * are ignored.
> + *
> + * Returns:
> + *
> + * Value of the segment selector, including null when running in
> long mode.
> *
> - * Return: Value of the segment selector, including null when
> running in
> - * long mode. -1 on error.
> + * -EINVAL on error.
Thanks for the rewording. I will incorporate it in the series.
Thanks and BR,
Ricardo
On Tue, Sep 26, 2017 at 09:21:44PM -0700, Ricardo Neri wrote:
> This is true except when we don't have an insn at all (well, it may be
> non-NULL but it will only contain garbage). The case to which I am
> referring is when we begin decoding our instruction. The first step is
> to copy_from_user the instruction and populate insn. For this we must
> calculate the linear address from where we copy using CS and rIP.
Where do we do that?
> Furthermore, in this only case we don't need to look at insn at all as
> the only register involved is rIP no segment override prefixes are
> allowed.
In any case, as it is now it sounds convoluted: you may or may not
have an insn, and yet you call get_overridden_seg_reg() on it but you
don't really need segment overrides because you only need CS and rIP
initially.
Sounds to me like this initial parsing should be done separately from
this function...
> I only used "(E)" (i.e., not the "(R|)" part) as these utility
> functions will deal mostly with protected mode, unless FS or GS are
> used in long mode.
eIP or rIP is simply much easier to type and parse. Those brackets, not
really.
> I only check for a NULL insn when needed (i.e., the contents of the
> instruction could change the used segment register).
... and those if (!insn) tests sprinkled around simply make the code
unreadable and if we can get rid of them, we should.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Tue, 2017-09-26 at 20:05 +0200, Borislav Petkov wrote:
> On Fri, Aug 18, 2017 at 05:27:54PM -0700, Ricardo Neri wrote:
> >
> > The segment descriptor contains information that is relevant to how
> > linear
> > addresses need to be computed. It contains the default size of
> > addresses
> > as well as the base address of the segment. Thus, given a segment
> > selector, we ought look at segment descriptor to correctly
> > calculate the
> ^
> to
I will correct this syntax error.
>
> >
> > linear address.
> >
> > In protected mode, the segment selector might indicate a segment
> > descriptor from either the global descriptor table or a local
> > descriptor
> > table. Both cases are considered in this function.
> >
> > This function is a prerequisite for functions in subsequent commits
> > that
> > will obtain the aforementioned attributes of the segment
> > descriptor.
> >
> > Cc: Dave Hansen <[email protected]>
> > Cc: Adam Buchbinder <[email protected]>
> > Cc: Colin Ian King <[email protected]>
> > Cc: Lorenzo Stoakes <[email protected]>
> > Cc: Qiaowei Ren <[email protected]>
> > Cc: Arnaldo Carvalho de Melo <[email protected]>
> > Cc: Masami Hiramatsu <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Thomas Garnier <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dmitry Vyukov <[email protected]>
> > Cc: Ravi V. Shankar <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > arch/x86/lib/insn-eval.c | 55
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 55 insertions(+)
> >
> > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> > index 86f58ce6c302..9cf2c49afc15 100644
> > --- a/arch/x86/lib/insn-eval.c
> > +++ b/arch/x86/lib/insn-eval.c
> > @@ -6,9 +6,13 @@
> > #include <linux/kernel.h>
> > #include <linux/string.h>
> > #include <linux/ratelimit.h>
> > +#include <linux/mmu_context.h>
> > +#include <asm/desc_defs.h>
> > +#include <asm/desc.h>
> > #include <asm/inat.h>
> > #include <asm/insn.h>
> > #include <asm/insn-eval.h>
> > +#include <asm/ldt.h>
> > #include <asm/vm86.h>
> >
> > enum reg_type {
> > @@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn,
> > struct pt_regs *regs,
> > }
> >
> > /**
> > + * get_desc() - Obtain address of segment descriptor
> Get segment descriptor.
>
> >
> > + * @sel: Segment selector
> > + *
> > + * Given a segment selector, obtain a pointer to the segment
> > descriptor.
> > + * Both global and local descriptor tables are supported.
> > + *
> > + * Return: pointer to segment descriptor on success. NULL on
> > error.
> > + */
> > +static struct desc_struct *get_desc(unsigned short sel)
> I've simplified this function to be more readable, here's a diff
> ontop.
>
> More specifically, if you flip the logic and move @desc inside the
> if,
> you don't need to have mutex_unlock() twice in there.
>
> And having a local @ldt ptr makes the selector check more readable.
>
> ---
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 9cf2c49afc15..48af787cb160 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -417,24 +417,24 @@ static int get_reg_offset(struct insn *insn,
> struct pt_regs *regs,
> static struct desc_struct *get_desc(unsigned short sel)
> {
> struct desc_ptr gdt_desc = {0, 0};
> - struct desc_struct *desc = NULL;
> unsigned long desc_base;
>
> #ifdef CONFIG_MODIFY_LDT_SYSCALL
> if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) {
> + struct desc_struct *desc = NULL;
> + struct ldt_struct *ldt;
> +
> /* Bits [15:3] contain the index of the desired
> entry. */
> sel >>= 3;
>
> mutex_lock(¤t->active_mm->context.lock);
> - /* The size of the LDT refers to the number of
> entries. */
> - if (!current->active_mm->context.ldt ||
> - sel >= current->active_mm->context.ldt-
> >nr_entries) {
> - mutex_unlock(¤t->active_mm-
> >context.lock);
> - return NULL;
> - }
>
> - desc = ¤t->active_mm->context.ldt-
> >entries[sel];
> + ldt = current->active_mm->context.ldt;
> + if (ldt && sel < ldt->nr_entries)
> + desc = &ldt->entries[sel];
> +
> mutex_unlock(¤t->active_mm->context.lock);
> +
> return desc;
> }
> #endif
> @@ -452,8 +452,7 @@ static struct desc_struct *get_desc(unsigned
> short sel)
> if (desc_base > gdt_desc.size)
> return NULL;
>
> - desc = (struct desc_struct *)(gdt_desc.address + desc_base);
> - return desc;
> + return (struct desc_struct *)(gdt_desc.address + desc_base);
I have incorporated these changes in my code.
Thanks and BR,
Ricardo
On Wed, 2017-09-27 at 13:47 +0200, Borislav Petkov wrote:
> On Tue, Sep 26, 2017 at 09:21:44PM -0700, Ricardo Neri wrote:
> >
> > This is true except when we don't have an insn at all (well, it may
> > be
> > non-NULL but it will only contain garbage). The case to which I am
> > referring is when we begin decoding our instruction. The first step
> > is
> > to copy_from_user the instruction and populate insn. For this we
> > must
> > calculate the linear address from where we copy using CS and rIP.
> Where do we do that?
UMIP emulation does it when evaluating if emulation is needed after a
#GP(0). It copy_from_user into insn the code at rIP that caused the
exception [1].
>
> >
> > Furthermore, in this only case we don't need to look at insn at all
> > as
> > the only register involved is rIP no segment override prefixes are
> > allowed.
> In any case, as it is now it sounds convoluted: you may or may not
> have an insn, and yet you call get_overridden_seg_reg() on it but you
> don't really need segment overrides because you only need CS and rIP
> initially.
The idea is that get_overridden_seg_reg() would implement the logic you
just described. It would return return INAT_SEG_REG_DEFAULT/IGNORE when
segment override prefixes are not allowed (i.e., valid insn with
operand rDI and string instruction; and rIP) or needed (i.e., long
mode, except if there are override prefixes for FS or GS); or
INAT_SEG_REG_[CSDEFG]S otherwise.
Then resolve_seg_register() resolves the default segment if needed as
per the value returned by get_overridden_seg_reg().
Summarizing, a more accurate function name for the intended behavior is
get_overridden_seg_reg_if_any_or_needed().
> Sounds to me like this initial parsing should be done separately from
> this function...
I decided to put all the handling of segment override prefixes in a
single function.
Perhaps it could be split into two functions as follows(diff on top of
my original patches):
* Rename get_overridden_seg_reg top get_overridden_seg_reg_idx
* Remove from get_overridden_seg_reg_idx checks for rIP and rDI...
* Checks for rIP and rDI are done in a new function
* Now resolve_seg_reg calls the two functions above to determine if it
needs to resolve the default segment register index.
@@ -77,24 +77,12 @@ static bool is_string_insn(struct insn *insn)
* INAT_SEG_REG_DEFAULT is returned if no segment override prefixes
were found
* and the default segment register shall be used. -EINVAL in case of
error.
*/
-static int get_overridden_seg_reg(struct insn *insn, struct pt_regs
*regs,
- int regoff)
+static int get_overridden_seg_reg_idx(struct insn *insn, struct
pt_regs *regs,
+ int regoff)
{
int idx = INAT_SEG_REG_DEFAULT;
int sel_overrides = 0, i;
- /*
- * Segment override prefixes should not be used for (E)IP.
- * Check this case first as we might not have (and not needed
- * at all) a valid insn structure to evaluate segment
override
- * prefixes.
- */
- if (regoff == offsetof(struct pt_regs, ip)) {
- if (user_64bit_mode(regs))
- return INAT_SEG_REG_IGNORE;
- else
- return INAT_SEG_REG_DEFAULT;
- }
-
if (!insn)
return -EINVAL;
@@ -145,18 +133,32 @@ static int get_overridden_seg_reg(struct insn
*insn, struct pt_regs *regs,
/*
* More than one segment override prefix leads to undefined
* behavior.
*/
} else if (sel_overrides > 1) {
return -EINVAL;
- /*
- * Segment override prefixes are always ignored for string
- * instructions
- * that involve the use the (E)DI register.
- */
- } else if ((regoff == offsetof(struct pt_regs, di)) &&
- is_string_insn(insn)) {
- return INAT_SEG_REG_DEFAULT;
}
return idx;
}
+static int use_seg_reg_overrides(struct insn *insn, int regoff)
+{
+ /*
+ * Segment override prefixes should not be used for rIP.
Check
+ * this case first as we might not have (and not needed at
all) + * a valid insn structure to evaluate segment override
+ * prefixes.
+ */
+ if (regoff == offsetof(struct pt_regs, ip))
+ return 0;
+
+ /* Subsequent checks require a valid insn. */
+ if (!insn)
+ return -EINVAL;
+
+ if ((regoff == offsetof(struct pt_regs, di)) &&
+ is_string_insn(insn))
+ return 0;
+
+ return 1;
+}
+
/**
* resolve_seg_register() - obtain segment register
* @insn: Instruction structure with segment override prefixes
@@ -179,22 +181,20 @@ static int get_overridden_seg_reg(struct insn
*insn, struct pt_regs *regs,
*/
static int resolve_seg_reg(struct insn *insn, struct pt_regs *regs,
int regoff)
{
- int idx;
-
- idx = get_overridden_seg_reg(insn, regs, regoff);
+ int use_pfx_overrides;
- if (idx < 0)
- return idx;
-
- if (idx == INAT_SEG_REG_IGNORE)
- return idx;
+ use_pfx_overrides = use_seg_reg_overrides(insn, regoff);
+ if (use_pfx_overrides < 0)
+ return -EINVAL;
- if (idx != INAT_SEG_REG_DEFAULT)
- return idx;
+ if (use_pfx_overrides == 0)
+ goto resolve_default_idx;
- if (!insn)
- return -EINVAL;
+ return get_overridden_seg_reg_idx(insn, regs, regoff);
+resolve_default_idx:
+ if (user_64bit_mode(regs))
+ return INAT_SEG_REG_IGNORE;
/*
* If we are here, we use the default segment register as
* described in the Intel documentation:
@@ -209,6 +209,9 @@ static int resolve_seg_reg(struct insn *insn,
struct pt_regs *regs, int regoff)
* + CS for (E)IP.
*/
+ if (!insn)
+ return -EINVAL;
+
switch (regoff) {
case offsetof(struct pt_regs, ax):
case offsetof(struct pt_regs, cx):
Does this make sense?
>
> >
> > I only used "(E)" (i.e., not the "(R|)" part) as these utility
> > functions will deal mostly with protected mode, unless FS or GS are
> > used in long mode.
> eIP or rIP is simply much easier to type and parse. Those brackets,
> not
> really.
Agreed. Then I will use rIP.
>
> >
> > I only check for a NULL insn when needed (i.e., the contents of the
> > instruction could change the used segment register).
> ... and those if (!insn) tests sprinkled around simply make the code
> unreadable and if we can get rid of them, we should.
Sure, you are correct this will make code more readable.
Thanks and BR,
Ricardo
[1]. https://github.com/ricardon/tip/blob/rneri/umip_v9/arch/x86/kernel
/umip.c#L276
On Wed, Sep 27, 2017 at 03:32:26PM -0700, Ricardo Neri wrote:
> The idea is that get_overridden_seg_reg() would implement the logic you
> just described. It would return return INAT_SEG_REG_DEFAULT/IGNORE when
> segment override prefixes are not allowed (i.e., valid insn with
> operand rDI and string instruction; and rIP) or needed (i.e., long
> mode, except if there are override prefixes for FS or GS); or
> INAT_SEG_REG_[CSDEFG]S otherwise.
Ok, lemme see if we're talking the same thing. Your diff is linewrapped
so parsing that is hard.
Do this
if (regoff == offsetof(struct pt_regs, ip)) {
if (user_64bit_mode(regs))
return INAT_SEG_REG_IGNORE;
else
return INAT_SEG_REG_DEFAULT;
}
and all the other checking *before* you do insn_init(). Because you have
crazy stuff like:
if (seg_reg == INAT_SEG_REG_IGNORE)
return seg_reg;
which shortcuts those functions and is simply clumsy and complicates
following the code. The mere fact that you have to call the function
"get_overridden_seg_reg_if_any_or_needed()" already tells you that that
function is doing too many things at once.
When the function is called get_segment_register() then it should do
only that. And all the checking is done before or in wrappers.
IOW, all the rIP checking and early return down the
insn_get_seg_base() -> resolve_seg_register() -> .. should be done
separately.
*Then* you do insn_init() and hand it down to insn_get_seg_base() and
from now on you have a proper insn pointer which you hand around and
check for NULL only once, on function entry.
Then your code flow is much simpler: first you take care of the case
where rIP doesn't do segment overrides and all the other cases are
handled by the normal path, with a proper struct insn.
Makes more sense?
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
On Thu, 2017-09-28 at 11:36 +0200, Borislav Petkov wrote:
> On Wed, Sep 27, 2017 at 03:32:26PM -0700, Ricardo Neri wrote:
> >
> > The idea is that get_overridden_seg_reg() would implement the logic you
> > just described. It would return return INAT_SEG_REG_DEFAULT/IGNORE when
> > segment override prefixes are not allowed (i.e., valid insn with
> > operand rDI and string instruction; and rIP) or needed (i.e., long
> > mode, except if there are override prefixes for FS or GS); or
> > INAT_SEG_REG_[CSDEFG]S otherwise.
> Ok, lemme see if we're talking the same thing. Your diff is linewrapped
> so parsing that is hard.
>
> Do this
>
> if (regoff == offsetof(struct pt_regs, ip)) {
> if (user_64bit_mode(regs))
> return INAT_SEG_REG_IGNORE;
> else
> return INAT_SEG_REG_DEFAULT;
> }
>
> and all the other checking *before* you do insn_init(). Because you have
> crazy stuff like:
>
> if (seg_reg == INAT_SEG_REG_IGNORE)
> return seg_reg;
>
> which shortcuts those functions and is simply clumsy and complicates
> following the code. The mere fact that you have to call the function
> "get_overridden_seg_reg_if_any_or_needed()" already tells you that that
> function is doing too many things at once.
>
> When the function is called get_segment_register() then it should do
> only that. And all the checking is done before or in wrappers.
Yes, I realized this while I was typing.
>
> IOW, all the rIP checking and early return down the
> insn_get_seg_base() -> resolve_seg_register() -> .. should be done
> separately.
Agreed now.
>
> *Then* you do insn_init() and hand it down to insn_get_seg_base() and
> from now on you have a proper insn pointer which you hand around and
> check for NULL only once, on function entry.
I agree. In fact, insn_get_seg_base() does not need insn at all. All it needs is
a INAT_SEG_REG_* index. This would make things clear. UMIP (and callers that
need to copy_from_user code can do insn_get_seg_base(regs, INAT_SEG_REG_CS). No
insn needed.
In fact, it is only the insn_get_addr_ref_xx() family of functions that does
need to inspect insn (which will be populated and valided) to determine the what
registers are used as operands... and determine the applicable segment register.
However, insn_get_addr_ref_xx() functions call insn_get_seg_base() several times
each. Each time they would need to do:
if (can_use_seg_override_prefixes(insn, regoff))
idx = get_overriden_seg_reg(insn, regs)
else
idx = get_default_seg_reg()
The pseudocode above looks like a resolve_reg_idx() to me.
Then insn_get_addr_ref_xx() can call insn_get_seg_base(idx).
>
> Then your code flow is much simpler: first you take care of the case
> where rIP doesn't do segment overrides and all the other cases are
> handled by the normal path, with a proper struct insn.
Do you think the pseudocode above addresses your concerns?
*insn_get_seg_base() will take a INAT_SEG_REG_* index
*insn_get_ref_xx() receives an initialized insn that can check for NULL value.
*a reworked resolve_seg_reg_idx will clearly check if it can use segment
override prefixes and obtain them. If not, it will use default values.
Thanks and BR,
Ricardo
On Thu, Sep 28, 2017 at 11:06:42PM -0700, Ricardo Neri wrote:
> I agree. In fact, insn_get_seg_base() does not need insn at all. All it needs is
> a INAT_SEG_REG_* index. This would make things clear. UMIP (and callers that
> need to copy_from_user code can do insn_get_seg_base(regs, INAT_SEG_REG_CS). No
> insn needed.
Yap.
> In fact, it is only the insn_get_addr_ref_xx() family of functions that does
I think you mean get_addr_ref_xx() here.
> Do you think the pseudocode above addresses your concerns?
>
> *insn_get_seg_base() will take a INAT_SEG_REG_* index
> *insn_get_ref_xx() receives an initialized insn that can check for NULL value.
> *a reworked resolve_seg_reg_idx will clearly check if it can use segment
> override prefixes and obtain them. If not, it will use default values.
Makes sense, but send me the final version to take a look at it too.
Thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--