2018-12-13 21:34:52

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v4 0/5] x86: Add vDSO exception fixup for SGX

Episode IV: The vDSO Strikes Back

After a brief detour into ioctl-based fixup, the vDSO implementation
is back. Relative to v2 (the previous vDSO RFC), patch 4/4 once again
contains the vast majority of changes.

__vdso_sgx_enter_enclave() is now written entirely in straight assembly.
Implementing the expanded enclave ABI in inline assembly was an absolute
train wreck as GCC's contraints don't play nice with the full spectrum
of registers. And I suspect the previous vDSO RFCs were likely broken
due to incorrect register clobbering (I never tested them end-to-end).

The exit_handler() concept proposed in v2 is gone. I expect most SGX
libraries will directly call __vdso_sgx_enter_enclave(), at which point
the overhead of returning effectively boils down to restoring the
non-volatile registers, which is likely outweighed by the cost of the
retpoline call (to the handler) alone. In other words, I doubt anyone
will actually use the exit_handler()...

...except for libraries that want to manipulate the untrusted stack,
i.e. force __vdso_sgx_enter_enclave() to treat RSP as volatile. At that
point we're effectively maintaining two separate ABIs since the normal
ABI would be e.g. "RBP, RSP and the red zone must be preserved" vs. the
exit_handler() ABI's "RBP must be preserved". And *if* we want to deal
with maintaining two ABIs, supporting the kernel's existing signal ABI
is a lot cleaner (from a kernel perspective) than polluting the vDSO.

v1: https://lkml.kernel.org/r/[email protected]
v2: https://lkml.kernel.org/r/[email protected]
v3: https://lkml.kernel.org/r/[email protected]
v4:
- Back to vDSO
- Implement __vdso_sgx_enter_enclave() directly in assembly
- Modify effective enclave register ABI to follow x86-64 kernel ABI
- Split __vdso_sgx_enter_enclave input into separate non-union params
- Drop the exit_handler() concept

Sean Christopherson (5):
x86/vdso: Add support for exception fixup in vDSO functions
x86/fault: Add helper function to sanitize error code
x86/fault: Attempt to fixup unhandled #PF on ENCLU before signaling
x86/traps: Attempt to fixup exceptions in vDSO before signaling
x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave
transitions

arch/x86/entry/vdso/Makefile | 6 +-
arch/x86/entry/vdso/extable.c | 37 ++++++
arch/x86/entry/vdso/extable.h | 29 +++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 +-
arch/x86/entry/vdso/vdso.lds.S | 1 +
arch/x86/entry/vdso/vdso2c.h | 58 ++++++++--
arch/x86/entry/vdso/vsgx_enter_enclave.S | 136 +++++++++++++++++++++++
arch/x86/include/asm/vdso.h | 5 +
arch/x86/include/uapi/asm/sgx.h | 44 ++++++++
arch/x86/kernel/traps.c | 14 +++
arch/x86/mm/fault.c | 33 ++++--
11 files changed, 349 insertions(+), 23 deletions(-)
create mode 100644 arch/x86/entry/vdso/extable.c
create mode 100644 arch/x86/entry/vdso/extable.h
create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S

--
2.19.2



2018-12-13 21:33:07

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v4 4/5] x86/traps: Attempt to fixup exceptions in vDSO before signaling

Call fixup_vdso_exception() in all trap flows that generate signals to
userspace immediately prior to generating any such signal. If the
exception is fixed, return cleanly and do not generate a signal.

The goal of vDSO fixup is not to fixup all faults, nor is it to avoid
all signals, but rather to report faults directly to userspace when the
fault would otherwise directly result in a signal being sent to the
process.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/traps.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a7..b1ca05efb15e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/mpx.h>
#include <asm/vm86.h>
#include <asm/umip.h>
+#include <asm/vdso.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -209,6 +210,9 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
die(str, regs, error_code);
+ } else {
+ if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+ return 0;
}

/*
@@ -560,6 +564,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
return;
}

+ if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
+ return;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;

@@ -774,6 +781,10 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
SIGTRAP) == NOTIFY_STOP)
goto exit;

+ if (user_mode(regs) &&
+ fixup_vdso_exception(regs, X86_TRAP_DB, error_code, 0))
+ goto exit;
+
/*
* Let others (NMI) know that the debug stack is in use
* as we may switch to the interrupt stack.
@@ -854,6 +865,9 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
if (!si_code)
return;

+ if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+ return;
+
force_sig_fault(SIGFPE, si_code,
(void __user *)uprobe_get_trap_addr(regs), task);
}
--
2.19.2


2018-12-13 21:33:08

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
enclave mode that runs as a sort of black box shared object that is
hosted by an untrusted normal CPL3 process.

Enclave transitions have semantics that are a lovely blend of SYCALL,
SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
an enclave can only be done through SGX-specific instructions, EENTER
and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
RIP from R{B,C}X.

But in a faulting/interrupting scenario, enclave transitions act more
like VM-Exit and VMRESUME. Maintaining the black box nature of the
enclave means that hardware must automatically switch CPU context when
an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
or exception (exceptions are AEEs because asynchronous in this context
is relative to the enclave and not CPU execution, e.g. the enclave
doesn't get an opportunity to save/fuzz CPU state).

Like VM-Exits, all AEEs jump to a common location, referred to as the
Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
via register passed to EENTER/ERESUME, similar to how the hypervisor
specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
Resuming the enclave/VM after the exiting event is handled is done via
ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
the AEP which then ERESUMEs th enclave.

Enclaves also behave a bit like VMs in the sense that they can generate
exceptions as part of their normal operation that for all intents and
purposes need to handled in the enclave/VM. However, unlike VMX, SGX
doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
as doing so would circumvent the enclave's security. So to handle an
exception, the enclave must first be re-entered through the normal
EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
(VMRESUME behavior) after the source of the exception is resolved.

All of the above is just the tip of the iceberg when it comes to running
an enclave. But, SGX was designed in such a way that the host process
can utilize a library to build, launch and run an enclave. This is
roughly analogous to how e.g. libc implementations are used by most
applications so that the application can focus on its business logic.

The big gotcha is that because enclaves can generate *and* handle
exceptions, any SGX library must be prepared to handle nearly any
exception at any time (well, any time a thread is executing in an
enclave). In Linux, this means the SGX library must register a
signal handler in order to intercept relevant exceptions and forward
them to the enclave (or in some cases, take action on behalf of the
enclave). Unfortunately, Linux's signal mechanism doesn't mesh well
with libraries, e.g. signal handlers are process wide, are difficult
to chain, etc... This becomes particularly nasty when using multiple
levels of libraries that register signal handlers, e.g. running an
enclave via cgo inside of the Go runtime.

In comes vDSO to save the day. Now that vDSO can fixup exceptions,
add a function, __vdso_sgx_enter_enclave(), to wrap enclave transitions
and intercept any exceptions that occur when running the enclave.

__vdso_sgx_enter_enclave() accepts four parameters:

- The ENCLU leaf to execute (must be EENTER or ERESUME).

- A pointer to a Thread Control Structure (TCS). A TCS is a page
within the enclave that defines/tracks the context of an enclave
thread.

- An optional 'struct sgx_enclave_regs' pointer. If defined, the
corresponding registers are loaded prior to entering the enclave
and saved after (cleanly) exiting the enclave.

The effective enclave register ABI follows the kernel x86-64 ABI.
The x86-64 userspace ABI is not used due to RCX being usurped by
hardware to pass the return RIP to the enclave.

- An optional 'struct sgx_enclave_exception' pointer. If provided,
the struct is filled with the faulting ENCLU leaf, trapnr, error
code and address if an unhandled exception occurs on ENCLU or in
the enclave. An unhandled exception is an exception that would
normally be delivered to userspace via a signal, e.g. SIGSEGV.

Note that this means that not all enclave exits are reported to
the caller, e.g. interrupts and faults that are handled by the
kernel do not trigger fixup and IRET back to ENCLU[ERESUME], i.e.
unconditionally resume the enclave.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Haitao Huang <[email protected]>
Cc: Jethro Beekman <[email protected]>
Cc: Dr. Greg Wettstein <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/entry/vdso/Makefile | 2 +
arch/x86/entry/vdso/vdso.lds.S | 1 +
arch/x86/entry/vdso/vsgx_enter_enclave.S | 136 +++++++++++++++++++++++
arch/x86/include/uapi/asm/sgx.h | 44 ++++++++
4 files changed, 183 insertions(+)
create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b8f7c301b88f..5e28f838d8aa 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y

# files to link into the vdso
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-$(VDSO64-y) += vsgx_enter_enclave.o

# files to link into kernel
obj-y += vma.o extable.o
@@ -85,6 +86,7 @@ CFLAGS_REMOVE_vdso-note.o = -pg
CFLAGS_REMOVE_vclock_gettime.o = -pg
CFLAGS_REMOVE_vgetcpu.o = -pg
CFLAGS_REMOVE_vvar.o = -pg
+CFLAGS_REMOVE_vsgx_enter_enclave.o = -pg

#
# X32 processes use x32 vDSO to access 64bit kernel data.
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..50952a995a6c 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -25,6 +25,7 @@ VERSION {
__vdso_getcpu;
time;
__vdso_time;
+ __vdso_sgx_enter_enclave;
local: *;
};
}
diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S b/arch/x86/entry/vdso/vsgx_enter_enclave.S
new file mode 100644
index 000000000000..0e4cd8a9549a
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/export.h>
+#include <asm/errno.h>
+
+#include "extable.h"
+
+#define RDI 0*8
+#define RSI 1*8
+#define RDX 2*8
+#define R8 3*8
+#define R9 4*8
+#define R10 5*8
+
+#define EX_LEAF 0*8
+#define EX_TRAPNR 0*8+4
+#define EX_ERROR_CODE 0*8+6
+#define EX_ADDRESS 1*8
+
+.code64
+.section .text, "ax"
+
+/*
+ * long __vdso_sgx_enter_enclave(__u32 leaf, void *tcs
+ * struct sgx_enclave_regs *regs
+ * struct sgx_enclave_exception *e)
+ * {
+ * if (leaf != SGX_EENTER && leaf != SGX_ERESUME)
+ * return -EINVAL;
+ *
+ * if (!tcs)
+ * return -EINVAL;
+ *
+ * if (regs)
+ * copy_regs_to_cpu(regs);
+ *
+ * try {
+ * ENCLU[leaf];
+ * } catch (exception) {
+ * if (e)
+ * *e = exception;
+ * return -EFAULT;
+ * }
+ *
+ * if (regs)
+ * copy_cpu_to_regs(regs);
+ * return 0;
+ * }
+ */
+ENTRY(__vdso_sgx_enter_enclave)
+ /* EENTER <= leaf <= ERESUME */
+ lea -0x2(%edi), %eax
+ cmp $0x1, %eax
+ ja bad_input
+
+ /* TCS must be non-NULL */
+ test %rsi, %rsi
+ je bad_input
+
+ /* save non-volatile registers */
+ push %rbp
+ mov %rsp, %rbp
+ push %r15
+ push %r14
+ push %r13
+ push %r12
+ push %rbx
+
+ /* save @regs and @e to the red zone */
+ mov %rdx, -0x8(%rsp)
+ mov %rcx, -0x10(%rsp)
+
+ /* load leaf, TCS and AEP for ENCLU */
+ mov %edi, %eax
+ mov %rsi, %rbx
+ lea 1f(%rip), %rcx
+
+ /* optionally copy @regs to registers */
+ test %rdx, %rdx
+ je 1f
+
+ mov %rdx, %r11
+ mov RDI(%r11), %rdi
+ mov RSI(%r11), %rsi
+ mov RDX(%r11), %rdx
+ mov R8(%r11), %r8
+ mov R9(%r11), %r9
+ mov R10(%r11), %r10
+
+1: enclu
+
+ /* ret = 0 */
+ xor %eax, %eax
+
+ /* optionally copy registers to @regs */
+ mov -0x8(%rsp), %r11
+ test %r11, %r11
+ je 2f
+
+ mov %rdi, RDI(%r11)
+ mov %rsi, RSI(%r11)
+ mov %rdx, RDX(%r11)
+ mov %r8, R8(%r11)
+ mov %r9, R9(%r11)
+ mov %r10, R10(%r11)
+
+ /* restore non-volatile registers and return */
+2: pop %rbx
+ pop %r12
+ pop %r13
+ pop %r14
+ pop %r15
+ pop %rbp
+ ret
+
+bad_input:
+ mov $(-EINVAL), %rax
+ ret
+
+.pushsection .fixup, "ax"
+3: mov -0x10(%rsp), %r11
+ test %r11, %r11
+ je 4f
+
+ mov %eax, EX_LEAF(%r11)
+ mov %di, EX_TRAPNR(%r11)
+ mov %si, EX_ERROR_CODE(%r11)
+ mov %rdx, EX_ADDRESS(%r11)
+4: mov $(-EFAULT), %rax
+ jmp 2b
+.popsection
+
+_ASM_VDSO_EXTABLE_HANDLE(1b, 3b)
+
+ENDPROC(__vdso_sgx_enter_enclave)
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 266b813eefa1..4f840b334369 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -96,4 +96,48 @@ struct sgx_enclave_modify_pages {
__u8 op;
} __attribute__((__packed__));

+/**
+ * struct sgx_enclave_exception - structure to pass register in/out of enclave
+ * by way of __vdso_sgx_enter_enclave
+ *
+ * @rdi: value of %rdi, loaded/saved on enter/exit
+ * @rsi: value of %rsi, loaded/saved on enter/exit
+ * @rdx: value of %rdx, loaded/saved on enter/exit
+ * @r8: value of %r8, loaded/saved on enter/exit
+ * @r9: value of %r9, loaded/saved on enter/exit
+ * @r10: value of %r10, loaded/saved on enter/exit
+ */
+struct sgx_enclave_regs {
+ __u64 rdi;
+ __u64 rsi;
+ __u64 rdx;
+ __u64 r8;
+ __u64 r9;
+ __u64 r10;
+};
+
+/**
+ * struct sgx_enclave_exception - structure to report exceptions encountered in
+ * __vdso_sgx_enter_enclave
+ *
+ * @leaf: ENCLU leaf from %rax at time of exception
+ * @trapnr: exception trap number, a.k.a. fault vector
+ * @error_cdde: exception error code
+ * @address: exception address, e.g. CR2 on a #PF
+ */
+struct sgx_enclave_exception {
+ __u32 leaf;
+ __u16 trapnr;
+ __u16 error_code;
+ __u64 address;
+};
+
+/**
+ * typedef __vsgx_enter_enclave_t - Function pointer prototype for
+ * __vdso_sgx_enter_enclave
+ */
+typedef long (*__vsgx_enter_enclave_t)(__u32 leaf, void *tcs,
+ struct sgx_enclave_regs *regs,
+ struct sgx_enclave_exception *e);
+
#endif /* _UAPI_ASM_X86_SGX_H */
--
2.19.2


2018-12-13 21:33:16

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v4 3/5] x86/fault: Attempt to fixup unhandled #PF on ENCLU before signaling

Call fixup_sgx_enclu_exception() in the SIGSEGV and SIGBUS paths of
the page fault handler immediately prior to signaling. If the fault
is fixed, return cleanly and do not generate a signal.

In the SIGSEGV flow, make sure the error code passed to userspace has
been sanitized.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fefeb745d21d..c6f5f77ffabd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
#include <asm/mmu_context.h> /* vma_pkey() */
#include <asm/efi.h> /* efi_recover_from_page_fault()*/
#include <asm/desc.h> /* store_idt(), ... */
+#include <asm/vdso.h> /* fixup_vdso_exception() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,

sanitize_error_code(address, &error_code);

+ if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+ return;
+
if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);

@@ -1047,6 +1051,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,

sanitize_error_code(address, &error_code);

+ if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+ return;
+
set_signal_archinfo(address, error_code);

#ifdef CONFIG_MEMORY_FAILURE
--
2.19.2


2018-12-13 21:33:21

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v4 1/5] x86/vdso: Add support for exception fixup in vDSO functions

The basic concept and implementation is very similar to the kernel's
exception fixup mechanism. The key differences are that the kernel
handler is hardcoded and the fixup entry addresses are relative to
the overall table as opposed to individual entries.

Hardcoding the kernel handler avoids the need to figure out how to
get userspace code to point at a kernel function. Given that the
expected usage is to propagate information to userspace, dumping all
fault information into registers is likely the desired behavior for
the vast majority of yet-to-be-created functions. Use registers
DI, SI and DX to communicate fault information, which follows Linux's
ABI for register consumption and hopefully avoids conflict with
hardware features that might leverage the fixup capabilities, e.g.
register usage for SGX instructions was at least partially designed
with calling conventions in mind.

Making fixup addresses relative to the overall table allows the table
to be stripped from the final vDSO image (it's a kernel construct)
without complicating the offset logic, e.g. entry-relative addressing
would also need to account for the table's location relative to the
image.

Regarding stripping the table, modify vdso2c to extract the table from
the raw, a.k.a. unstripped, data and dump it as a standalone byte array
in the resulting .c file. The original base of the table, its length
and a pointer to the byte array are captured in struct vdso_image.
Alternatively, the table could be dumped directly into the struct,
but because the number of entries can vary per image, that would
require either hardcoding a max sized table into the struct definition
or defining the table as a flexible length array. The flexible length
array approach has zero benefits, e.g. the base/size are still needed,
and prevents reusing the extraction code, while hardcoding the max size
adds ongoing maintenance just to avoid exporting the explicit size.

The immediate use case is for Intel Software Guard Extensions (SGX).
SGX introduces a new CPL3-only "enclave" mode that runs as a sort of
black box shared object that is hosted by an untrusted "normal" CPl3
process.

Entering an enclave can only be done through SGX-specific instructions,
EENTER and ERESUME, and is a non-trivial process. Because of the
complexity of transitioning to/from an enclave, the vast majority of
enclaves are expected to utilize a library to handle the actual
transitions. This is roughly analogous to how e.g. libc implementations
are used by most applications.

Another crucial characteristic of SGX enclaves is that they can generate
exceptions as part of their normal (at least as "normal" as SGX can be)
operation that need to be handled *in* the enclave and/or are unique
to SGX.

And because they are essentially fancy shared objects, a process can
host any number of enclaves, each of which can execute multiple threads
simultaneously.

Putting everything together, userspace enclaves will utilize a library
that must be prepared to handle any and (almost) all exceptions any time
at least one thread may be executing in an enclave. Leveraging signals
to handle the enclave exceptions is unpleasant, to put it mildly, e.g.
the SGX library must constantly (un)register its signal handler based
on whether or not at least one thread is executing in an enclave, and
filter and forward exceptions that aren't related to its enclaves. This
becomes particularly nasty when using multiple levels of libraries that
register signal handlers, e.g. running an enclave via cgo inside of the
Go runtime.

Enabling exception fixup in vDSO allows the kernel to provide a vDSO
function that wraps the low-level transitions to/from the enclave, i.e.
the EENTER and ERESUME instructions. The vDSO function can intercept
exceptions that would otherwise generate a signal and return the fault
information directly to its caller, thus avoiding the need to juggle
signal handlers.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/entry/vdso/Makefile | 4 +-
arch/x86/entry/vdso/extable.c | 37 +++++++++++++++++
arch/x86/entry/vdso/extable.h | 29 ++++++++++++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 ++++-
arch/x86/entry/vdso/vdso2c.h | 58 +++++++++++++++++++++++----
arch/x86/include/asm/vdso.h | 5 +++
6 files changed, 131 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/entry/vdso/extable.c
create mode 100644 arch/x86/entry/vdso/extable.h

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 0624bf2266fd..b8f7c301b88f 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -20,7 +20,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o

# files to link into kernel
-obj-y += vma.o
+obj-y += vma.o extable.o
OBJECT_FILES_NON_STANDARD_vma.o := n

# vDSO images to build
@@ -115,7 +115,7 @@ $(obj)/%-x32.o: $(obj)/%.o FORCE

targets += vdsox32.lds $(vobjx32s-y)

-$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: OBJCOPYFLAGS := -S --remove-section __ex_table
$(obj)/%.so: $(obj)/%.so.dbg
$(call if_changed,objcopy)

diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c
new file mode 100644
index 000000000000..49284d560d36
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <asm/current.h>
+#include <asm/vdso.h>
+
+struct vdso_exception_table_entry {
+ int insn, fixup;
+};
+
+bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+ unsigned long error_code, unsigned long fault_addr)
+{
+ const struct vdso_image *image = current->mm->context.vdso_image;
+ const struct vdso_exception_table_entry *extable;
+ unsigned int nr_entries, i;
+ unsigned long base;
+
+ if (!current->mm->context.vdso)
+ return false;
+
+ base = (unsigned long)current->mm->context.vdso + image->extable_base;
+ nr_entries = image->extable_len / (sizeof(*extable));
+ extable = image->extable;
+
+ for (i = 0; i < nr_entries; i++) {
+ if (regs->ip == base + extable[i].insn) {
+ regs->ip = base + extable[i].fixup;
+ regs->di = trapnr;
+ regs->si = error_code;
+ regs->dx = fault_addr;
+ return true;
+ }
+ }
+
+ return false;
+}
diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
new file mode 100644
index 000000000000..aafdac396948
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_EXTABLE_H
+#define __VDSO_EXTABLE_H
+
+/*
+ * Inject exception fixup for vDSO code. Unlike normal exception fixup,
+ * vDSO uses a dedicated handler the addresses are relative to the overall
+ * exception table, not each individual entry.
+ */
+#ifdef __ASSEMBLY__
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to) \
+ ASM_VDSO_EXTABLE_HANDLE from to
+
+.macro ASM_VDSO_EXTABLE_HANDLE from:req to:req
+ .pushsection __ex_table, "a"
+ .long (\from) - __ex_table
+ .long (\to) - __ex_table
+ .popsection
+.endm
+#else
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to) \
+ ".pushsection __ex_table, \"a\"\n" \
+ ".long (" #from ") - __ex_table\n" \
+ ".long (" #to ") - __ex_table\n" \
+ ".popsection\n"
+#endif
+
+#endif /* __VDSO_EXTABLE_H */
+
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index 93c6dc7812d0..8ef849064501 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -63,11 +63,18 @@ SECTIONS
* stuff that isn't used at runtime in between.
*/

- .text : { *(.text*) } :text =0x90909090,
+ .text : {
+ *(.text*)
+ *(.fixup)
+ } :text =0x90909090,
+
+

.altinstructions : { *(.altinstructions) } :text
.altinstr_replacement : { *(.altinstr_replacement) } :text

+ __ex_table : { *(__ex_table) } :text
+
/DISCARD/ : {
*(.discard)
*(.discard.*)
diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index fa847a620f40..eca2f808bec3 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -5,6 +5,41 @@
* are built for 32-bit userspace.
*/

+static void BITSFUNC(copy)(FILE *outfile, const unsigned char *data, size_t len)
+{
+ size_t i;
+
+ for (i = 0; i < len; i++) {
+ if (i % 10 == 0)
+ fprintf(outfile, "\n\t");
+ fprintf(outfile, "0x%02X, ", (int)(data)[i]);
+ }
+}
+
+
+/*
+ * Extract a section from the input data into a standalone blob. Used to
+ * capture kernel-only data that needs to persist indefinitely, e.g. the
+ * exception fixup tables, but only in the kernel, i.e. the section can
+ * be stripped from the final vDSO image.
+ */
+static void BITSFUNC(extract)(const unsigned char *data, size_t data_len,
+ FILE *outfile, ELF(Shdr) *sec, const char *name)
+{
+ unsigned long offset;
+ size_t len;
+
+ offset = (unsigned long)GET_LE(&sec->sh_offset);
+ len = (size_t)GET_LE(&sec->sh_size);
+
+ if (offset + len > data_len)
+ fail("section to extract overruns input data");
+
+ fprintf(outfile, "static const unsigned char %s[%lu] = {", name, len);
+ BITSFUNC(copy)(outfile, data + offset, len);
+ fprintf(outfile, "\n};\n\n");
+}
+
static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
void *stripped_addr, size_t stripped_len,
FILE *outfile, const char *name)
@@ -14,9 +49,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
unsigned long mapping_size;
ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
int i;
- unsigned long j;
ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
- *alt_sec = NULL;
+ *alt_sec = NULL, *extable_sec = NULL;
ELF(Dyn) *dyn = 0, *dyn_end = 0;
const char *secstrings;
INT_BITS syms[NSYMS] = {};
@@ -78,6 +112,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
if (!strcmp(secstrings + GET_LE(&sh->sh_name),
".altinstructions"))
alt_sec = sh;
+ if (!strcmp(secstrings + GET_LE(&sh->sh_name), "__ex_table"))
+ extable_sec = sh;
}

if (!symtab_hdr)
@@ -149,13 +185,11 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile,
"static unsigned char raw_data[%lu] __ro_after_init __aligned(PAGE_SIZE) = {",
mapping_size);
- for (j = 0; j < stripped_len; j++) {
- if (j % 10 == 0)
- fprintf(outfile, "\n\t");
- fprintf(outfile, "0x%02X, ",
- (int)((unsigned char *)stripped_addr)[j]);
- }
+ BITSFUNC(copy)(outfile, stripped_addr, stripped_len);
fprintf(outfile, "\n};\n\n");
+ if (extable_sec)
+ BITSFUNC(extract)(raw_addr, raw_len, outfile,
+ extable_sec, "extable");

fprintf(outfile, "const struct vdso_image %s = {\n", name);
fprintf(outfile, "\t.data = raw_data,\n");
@@ -166,6 +200,14 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile, "\t.alt_len = %lu,\n",
(unsigned long)GET_LE(&alt_sec->sh_size));
}
+ if (extable_sec) {
+ fprintf(outfile, "\t.extable_base = %lu,\n",
+ (unsigned long)GET_LE(&extable_sec->sh_offset));
+ fprintf(outfile, "\t.extable_len = %lu,\n",
+ (unsigned long)GET_LE(&extable_sec->sh_size));
+ fprintf(outfile, "\t.extable = extable,\n");
+ }
+
for (i = 0; i < NSYMS; i++) {
if (required_syms[i].export && syms[i])
fprintf(outfile, "\t.sym_%s = %" PRIi64 ",\n",
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 27566e57e87d..1c8a6a8f7b59 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -15,6 +15,8 @@ struct vdso_image {
unsigned long size; /* Always a multiple of PAGE_SIZE */

unsigned long alt, alt_len;
+ unsigned long extable_base, extable_len;
+ const void *extable;

long sym_vvar_start; /* Negative offset to the vvar area */

@@ -45,6 +47,9 @@ extern void __init init_vdso_image(const struct vdso_image *image);

extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);

+extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+ unsigned long error_code,
+ unsigned long fault_addr);
#endif /* __ASSEMBLER__ */

#endif /* _ASM_X86_VDSO_H */
--
2.19.2


2018-12-13 21:34:14

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v4 2/5] x86/fault: Add helper function to sanitize error code

...to prepare for vDSO exception fixup, which will expose the error
code to userspace and runs before set_signal_archinfo(), i.e. squashes
the signal when fixup is successful.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7e8a7558ca07..fefeb745d21d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -719,18 +719,22 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
oops_end(flags, regs, sig);
}

-static void set_signal_archinfo(unsigned long address,
- unsigned long error_code)
+static void sanitize_error_code(unsigned long address,
+ unsigned long *error_code)
{
- struct task_struct *tsk = current;
-
/*
* To avoid leaking information about the kernel page
* table layout, pretend that user-mode accesses to
* kernel addresses are always protection faults.
*/
if (address >= TASK_SIZE_MAX)
- error_code |= X86_PF_PROT;
+ *error_code |= X86_PF_PROT;
+}
+
+static void set_signal_archinfo(unsigned long address,
+ unsigned long error_code)
+{
+ struct task_struct *tsk = current;

tsk->thread.trap_nr = X86_TRAP_PF;
tsk->thread.error_code = error_code | X86_PF_USER;
@@ -771,6 +775,8 @@ no_context(struct pt_regs *regs, unsigned long error_code,
* faulting through the emulate_vsyscall() logic.
*/
if (current->thread.sig_on_uaccess_err && signal) {
+ sanitize_error_code(address, &error_code);
+
set_signal_archinfo(address, error_code);

/* XXX: hwpoison faults will set the wrong code. */
@@ -920,13 +926,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
if (is_errata100(regs, address))
return;

- /*
- * To avoid leaking information about the kernel page table
- * layout, pretend that user-mode accesses to kernel addresses
- * are always protection faults.
- */
- if (address >= TASK_SIZE_MAX)
- error_code |= X86_PF_PROT;
+ sanitize_error_code(address, &error_code);

if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);
@@ -1045,6 +1045,8 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
if (is_prefetch(regs, error_code, address))
return;

+ sanitize_error_code(address, &error_code);
+
set_signal_archinfo(address, error_code);

#ifdef CONFIG_MEMORY_FAILURE
--
2.19.2


2018-12-14 09:57:17

by Jethro Beekman

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On 2018-12-14 03:01, Sean Christopherson wrote:
> +struct sgx_enclave_regs {
> + __u64 rdi;
> + __u64 rsi;
> + __u64 rdx;
> + __u64 r8;
> + __u64 r9;
> + __u64 r10;
> +};

This is fine, but why not just cover all 13 normal registers that are
not used by SGX?

Minor comments below.

> +/**
> + * struct sgx_enclave_exception - structure to pass register in/out of enclave

Typo in struct name.

> + * by way of __vdso_sgx_enter_enclave
> + *
> + * @rdi: value of %rdi, loaded/saved on enter/exit
> + * @rsi: value of %rsi, loaded/saved on enter/exit
> + * @rdx: value of %rdx, loaded/saved on enter/exit
> + * @r8: value of %r8, loaded/saved on enter/exit
> + * @r9: value of %r9, loaded/saved on enter/exit
> + * @r10: value of %r10, loaded/saved on enter/exit
> + */

> + /* load leaf, TCS and AEP for ENCLU */
> + mov %edi, %eax
> + mov %rsi, %rbx
> + lea 1f(%rip), %rcx

If you move this below the jump, you can use %rcx for @regs

> +
> + /* optionally copy @regs to registers */
> + test %rdx, %rdx
> + je 1f
> +
> + mov %rdx, %r11
> + mov RDI(%r11), %rdi
> + mov RSI(%r11), %rsi
> + mov RDX(%r11), %rdx
> + mov R8(%r11), %r8
> + mov R9(%r11), %r9
> + mov R10(%r11), %r10
> +
> +1: enclu
> +
> + /* ret = 0 */
> + xor %eax, %eax
> +
> + /* optionally copy registers to @regs */
> + mov -0x8(%rsp), %r11
> + test %r11, %r11
> + je 2f
> +
> + mov %rdi, RDI(%r11)
> + mov %rsi, RSI(%r11)
> + mov %rdx, RDX(%r11)
> + mov %r8, R8(%r11)
> + mov %r9, R9(%r11)
> + mov %r10, R10(%r11)

Here you can use %rax for @regs and clear it at the end.

> +2: pop %rbx
> + pop %r12
> + pop %r13
> + pop %r14
> + pop %r15
> + pop %rbp
> + ret

x86-64 ABI requires that you call CLD here (enclave may set it).

--
Jethro Beekman | Fortanix


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2018-12-14 15:15:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 09:55:49AM +0000, Jethro Beekman wrote:
> On 2018-12-14 03:01, Sean Christopherson wrote:
> >+struct sgx_enclave_regs {
> >+ __u64 rdi;
> >+ __u64 rsi;
> >+ __u64 rdx;
> >+ __u64 r8;
> >+ __u64 r9;
> >+ __u64 r10;
> >+};
>
> This is fine, but why not just cover all 13 normal registers that are not
> used by SGX?

Trying to balance flexibility/usability with unecessary overhead. And I
think this ABI meshes well with the idea of requiring the enclave to be
compliant with the x86-64 ABI (see below).

> Minor comments below.
>
> >+/**
> >+ * struct sgx_enclave_exception - structure to pass register in/out of enclave
>
> Typo in struct name.

Doh, thanks.

> >+ * by way of __vdso_sgx_enter_enclave
> >+ *
> >+ * @rdi: value of %rdi, loaded/saved on enter/exit
> >+ * @rsi: value of %rsi, loaded/saved on enter/exit
> >+ * @rdx: value of %rdx, loaded/saved on enter/exit
> >+ * @r8: value of %r8, loaded/saved on enter/exit
> >+ * @r9: value of %r9, loaded/saved on enter/exit
> >+ * @r10: value of %r10, loaded/saved on enter/exit
> >+ */
>
> >+ /* load leaf, TCS and AEP for ENCLU */
> >+ mov %edi, %eax
> >+ mov %rsi, %rbx
> >+ lea 1f(%rip), %rcx
>
> If you move this below the jump, you can use %rcx for @regs

EDI needs to be moved to EAX before it is potentially overwritten
below and I wanted the loading of the three registers used by hardware
grouped together. And IMO using the same register for accessing the
structs in all flows improves readability.

> >+
> >+ /* optionally copy @regs to registers */
> >+ test %rdx, %rdx
> >+ je 1f
> >+
> >+ mov %rdx, %r11
> >+ mov RDI(%r11), %rdi
> >+ mov RSI(%r11), %rsi
> >+ mov RDX(%r11), %rdx
> >+ mov R8(%r11), %r8
> >+ mov R9(%r11), %r9
> >+ mov R10(%r11), %r10
> >+
> >+1: enclu
> >+
> >+ /* ret = 0 */
> >+ xor %eax, %eax
> >+
> >+ /* optionally copy registers to @regs */
> >+ mov -0x8(%rsp), %r11
> >+ test %r11, %r11
> >+ je 2f
> >+
> >+ mov %rdi, RDI(%r11)
> >+ mov %rsi, RSI(%r11)
> >+ mov %rdx, RDX(%r11)
> >+ mov %r8, R8(%r11)
> >+ mov %r9, R9(%r11)
> >+ mov %r10, R10(%r11)
>
> Here you can use %rax for @regs and clear it at the end.

Clearing RAX early avoids the use of another label, though obviously
that's not exactly critical. The comment about using the same register
for accessing structs applies here as well.

> >+2: pop %rbx
> >+ pop %r12
> >+ pop %r13
> >+ pop %r14
> >+ pop %r15
> >+ pop %rbp
> >+ ret
>
> x86-64 ABI requires that you call CLD here (enclave may set it).

Ugh. Technically MXCSR and the x87 CW also need to be preserved.

What if rather than treating the enclave as hostile we require it to be
compliant with the x86-64 ABI like any other function? That would solve
the EFLAGS.DF, MXCSR and x87 issues without adding unnecessary overhead.
And we wouldn't have to save/restore R12-R15. It'd mean we couldn't use
the stack's red zone to hold @regs and @e, but that's poor form anyways.

>
> --
> Jethro Beekman | Fortanix
>


2018-12-14 15:39:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 07:12:04AM -0800, Sean Christopherson wrote:
> On Fri, Dec 14, 2018 at 09:55:49AM +0000, Jethro Beekman wrote:
> > On 2018-12-14 03:01, Sean Christopherson wrote:
> > >+2: pop %rbx
> > >+ pop %r12
> > >+ pop %r13
> > >+ pop %r14
> > >+ pop %r15
> > >+ pop %rbp
> > >+ ret
> >
> > x86-64 ABI requires that you call CLD here (enclave may set it).
>
> Ugh. Technically MXCSR and the x87 CW also need to be preserved.
>
> What if rather than treating the enclave as hostile we require it to be
> compliant with the x86-64 ABI like any other function? That would solve
> the EFLAGS.DF, MXCSR and x87 issues without adding unnecessary overhead.
> And we wouldn't have to save/restore R12-R15. It'd mean we couldn't use
> the stack's red zone to hold @regs and @e, but that's poor form anyways.

Grr, except the processor crushes R12-R15, FCW and MXCSR on asynchronous
exits. But not EFLAGS.DF, that's real helpful.

2018-12-14 17:04:57

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 07:38:30AM -0800, Sean Christopherson wrote:
> On Fri, Dec 14, 2018 at 07:12:04AM -0800, Sean Christopherson wrote:
> > On Fri, Dec 14, 2018 at 09:55:49AM +0000, Jethro Beekman wrote:
> > > On 2018-12-14 03:01, Sean Christopherson wrote:
> > > >+2: pop %rbx
> > > >+ pop %r12
> > > >+ pop %r13
> > > >+ pop %r14
> > > >+ pop %r15
> > > >+ pop %rbp
> > > >+ ret
> > >
> > > x86-64 ABI requires that you call CLD here (enclave may set it).
> >
> > Ugh. Technically MXCSR and the x87 CW also need to be preserved.
> >
> > What if rather than treating the enclave as hostile we require it to be
> > compliant with the x86-64 ABI like any other function? That would solve
> > the EFLAGS.DF, MXCSR and x87 issues without adding unnecessary overhead.
> > And we wouldn't have to save/restore R12-R15. It'd mean we couldn't use
> > the stack's red zone to hold @regs and @e, but that's poor form anyways.
>
> Grr, except the processor crushes R12-R15, FCW and MXCSR on asynchronous
> exits. But not EFLAGS.DF, that's real helpful.

I can think of three options that are at least somewhat reasonable:

1) Save/restore MXCSR and FCW

+ 100% compliant with the x86-64 ABI
+ Callable from any code
+ Minimal documentation required
- Restoring MXCSR/FCW is likely unnecessary 99% of the time
- Slow

2) Clear EFLAGS.DF but not save/restore MXCSR and FCW

+ Mostly compliant with the x86-64 ABI
+ Callable from any code that doesn't use SIMD registers
- Need to document deviations from x86-64 ABI

3) Require the caller to save/restore everything.

+ Fast
+ Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
- Completely custom ABI
- For all intents and purposes must be called from an assembly wrapper


Option (3) actually isn't all that awful. RCX can be used to pass an
optional pointer to a 'struct sgx_enclave_exception' and we can still
return standard error codes, e.g. -EFAULT.

E.g.:

/**
* __vdso_sgx_enter_enclave() - Enter an SGX enclave
*
* %eax: ENCLU leaf, must be EENTER or ERESUME
* %rbx: TCS, must be non-NULL
* %rcx: Optional pointer to 'struct sgx_enclave_exception'
*
* Return:
* 0 on a clean entry/exit to/from the enclave
* -EINVAL if ENCLU leaf is not allowed or if TCS is NULL
* -EFAULT if ENCLU or the enclave faults
*/
ENTRY(__vdso_sgx_enter_enclave)
/* EENTER <= leaf <= ERESUME */
cmp $0x2, %eax
jb bad_input

cmp $0x3, %eax
ja bad_input

/* TCS must be non-NULL */
test %rbx, %rbx
je bad_input

/* save @exception pointer */
push %rcx

/* load leaf, TCS and AEP for ENCLU */
lea 1f(%rip), %rcx
1: enclu

add 0x8, %rsp
xor %eax, %eax
ret

bad_input:
mov $(-EINVAL), %rax
ret

.pushsection .fixup, "ax"
2: pop %rcx
test %rcx, %rcx
je 3f

mov %eax, EX_LEAF(%rcx)
mov %di, EX_TRAPNR(%rcx)
mov %si, EX_ERROR_CODE(%rcx)
mov %rdx, EX_ADDRESS(%rcx)
3: mov $(-EFAULT), %rax
ret
.popsection

_ASM_VDSO_EXTABLE_HANDLE(1b, 3b)

ENDPROC(__vdso_sgx_enter_enclave)

2018-12-14 18:22:33

by Josh Triplett

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 09:03:11AM -0800, Sean Christopherson wrote:
> On Fri, Dec 14, 2018 at 07:38:30AM -0800, Sean Christopherson wrote:
> > On Fri, Dec 14, 2018 at 07:12:04AM -0800, Sean Christopherson wrote:
> > > On Fri, Dec 14, 2018 at 09:55:49AM +0000, Jethro Beekman wrote:
> > > > On 2018-12-14 03:01, Sean Christopherson wrote:
> > > > >+2: pop %rbx
> > > > >+ pop %r12
> > > > >+ pop %r13
> > > > >+ pop %r14
> > > > >+ pop %r15
> > > > >+ pop %rbp
> > > > >+ ret
> > > >
> > > > x86-64 ABI requires that you call CLD here (enclave may set it).
> > >
> > > Ugh. Technically MXCSR and the x87 CW also need to be preserved.
> > >
> > > What if rather than treating the enclave as hostile we require it to be
> > > compliant with the x86-64 ABI like any other function? That would solve
> > > the EFLAGS.DF, MXCSR and x87 issues without adding unnecessary overhead.
> > > And we wouldn't have to save/restore R12-R15. It'd mean we couldn't use
> > > the stack's red zone to hold @regs and @e, but that's poor form anyways.
> >
> > Grr, except the processor crushes R12-R15, FCW and MXCSR on asynchronous
> > exits. But not EFLAGS.DF, that's real helpful.
>
> I can think of three options that are at least somewhat reasonable:
>
> 1) Save/restore MXCSR and FCW
>
> + 100% compliant with the x86-64 ABI
> + Callable from any code
> + Minimal documentation required
> - Restoring MXCSR/FCW is likely unnecessary 99% of the time
> - Slow
>
> 2) Clear EFLAGS.DF but not save/restore MXCSR and FCW
>
> + Mostly compliant with the x86-64 ABI
> + Callable from any code that doesn't use SIMD registers
> - Need to document deviations from x86-64 ABI
>
> 3) Require the caller to save/restore everything.
>
> + Fast
> + Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
> - Completely custom ABI
> - For all intents and purposes must be called from an assembly wrapper
>
>
> Option (3) actually isn't all that awful. RCX can be used to pass an
> optional pointer to a 'struct sgx_enclave_exception' and we can still
> return standard error codes, e.g. -EFAULT.

Entering and exiting a syscall requires an assembly wrapper, and that
doesn't seem completely unreasonable. It's an easy bit of inline
assembly.

2018-12-14 18:40:01

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 10:20:39AM -0800, Josh Triplett wrote:
> On Fri, Dec 14, 2018 at 09:03:11AM -0800, Sean Christopherson wrote:
> > On Fri, Dec 14, 2018 at 07:38:30AM -0800, Sean Christopherson wrote:
> > > On Fri, Dec 14, 2018 at 07:12:04AM -0800, Sean Christopherson wrote:
> > > > On Fri, Dec 14, 2018 at 09:55:49AM +0000, Jethro Beekman wrote:
> > > > > On 2018-12-14 03:01, Sean Christopherson wrote:
> > > > > >+2: pop %rbx
> > > > > >+ pop %r12
> > > > > >+ pop %r13
> > > > > >+ pop %r14
> > > > > >+ pop %r15
> > > > > >+ pop %rbp
> > > > > >+ ret
> > > > >
> > > > > x86-64 ABI requires that you call CLD here (enclave may set it).
> > > >
> > > > Ugh. Technically MXCSR and the x87 CW also need to be preserved.
> > > >
> > > > What if rather than treating the enclave as hostile we require it to be
> > > > compliant with the x86-64 ABI like any other function? That would solve
> > > > the EFLAGS.DF, MXCSR and x87 issues without adding unnecessary overhead.
> > > > And we wouldn't have to save/restore R12-R15. It'd mean we couldn't use
> > > > the stack's red zone to hold @regs and @e, but that's poor form anyways.
> > >
> > > Grr, except the processor crushes R12-R15, FCW and MXCSR on asynchronous
> > > exits. But not EFLAGS.DF, that's real helpful.
> >
> > I can think of three options that are at least somewhat reasonable:
> >
> > 1) Save/restore MXCSR and FCW
> >
> > + 100% compliant with the x86-64 ABI
> > + Callable from any code
> > + Minimal documentation required
> > - Restoring MXCSR/FCW is likely unnecessary 99% of the time
> > - Slow
> >
> > 2) Clear EFLAGS.DF but not save/restore MXCSR and FCW
> >
> > + Mostly compliant with the x86-64 ABI
> > + Callable from any code that doesn't use SIMD registers
> > - Need to document deviations from x86-64 ABI
> >
> > 3) Require the caller to save/restore everything.
> >
> > + Fast
> > + Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
> > - Completely custom ABI
> > - For all intents and purposes must be called from an assembly wrapper
> >
> >
> > Option (3) actually isn't all that awful. RCX can be used to pass an
> > optional pointer to a 'struct sgx_enclave_exception' and we can still
> > return standard error codes, e.g. -EFAULT.
>
> Entering and exiting a syscall requires an assembly wrapper, and that
> doesn't seem completely unreasonable. It's an easy bit of inline
> assembly.

The code I posted had a few typos (stupid AT&T syntax), but with those
fixed the idea checks out.

My initial reaction to a barebones ABI was that it would be a
"documentation nightmare", but it's not too bad if it returns actual
error codes and fills in a struct on exceptions instead of stuffing
registers. And with the MXCSR/FCW issues it might actually be less
documentation in the long run since we can simply say that all state
is the caller's responsibility.

I *really* like that we basically eliminate bikeshedding on which GPRs
to pass to/from the enclave.

2018-12-14 18:45:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions



> On Dec 14, 2018, at 9:03 AM, Sean Christopherson <[email protected]> wrote:
>
>> On Fri, Dec 14, 2018 at 07:38:30AM -0800, Sean Christopherson wrote:
>>> On Fri, Dec 14, 2018 at 07:12:04AM -0800, Sean Christopherson wrote:
>>>> On Fri, Dec 14, 2018 at 09:55:49AM +0000, Jethro Beekman wrote:
>>>>> On 2018-12-14 03:01, Sean Christopherson wrote:
>>>>> +2: pop %rbx
>>>>> + pop %r12
>>>>> + pop %r13
>>>>> + pop %r14
>>>>> + pop %r15
>>>>> + pop %rbp
>>>>> + ret
>>>>
>>>> x86-64 ABI requires that you call CLD here (enclave may set it).
>>>
>>> Ugh. Technically MXCSR and the x87 CW also need to be preserved.
>>>
>>> What if rather than treating the enclave as hostile we require it to be
>>> compliant with the x86-64 ABI like any other function? That would solve
>>> the EFLAGS.DF, MXCSR and x87 issues without adding unnecessary overhead.
>>> And we wouldn't have to save/restore R12-R15. It'd mean we couldn't use
>>> the stack's red zone to hold @regs and @e, but that's poor form anyways.
>>
>> Grr, except the processor crushes R12-R15, FCW and MXCSR on asynchronous
>> exits. But not EFLAGS.DF, that's real helpful.
>
> I can think of three options that are at least somewhat reasonable:
>
> 1) Save/restore MXCSR and FCW
>
> + 100% compliant with the x86-64 ABI
> + Callable from any code
> + Minimal documentation required
> - Restoring MXCSR/FCW is likely unnecessary 99% of the time
> - Slow
>
> 2) Clear EFLAGS.DF but not save/restore MXCSR and FCW
>
> + Mostly compliant with the x86-64 ABI
> + Callable from any code that doesn't use SIMD registers
> - Need to document deviations from x86-64 ABI
>
> 3) Require the caller to save/restore everything.
>
> + Fast
> + Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
> - Completely custom ABI
> - For all intents and purposes must be called from an assembly wrapper
>
> Option (3) actually isn't all that awful. RCX can be used to pass an
> optional pointer to a 'struct sgx_enclave_exception' and we can still
> return standard error codes, e.g. -EFAULT.

I like 3, but:

>
> E.g.:
>
> /**
> * __vdso_sgx_enter_enclave() - Enter an SGX enclave
> *
> * %eax: ENCLU leaf, must be EENTER or ERESUME
> * %rbx: TCS, must be non-NULL
> * %rcx: Optional pointer to 'struct sgx_enclave_exception'
> *
> * Return:
> * 0 on a clean entry/exit to/from the enclave
> * -EINVAL if ENCLU leaf is not allowed or if TCS is NULL
> * -EFAULT if ENCLU or the enclave faults
> */
> ENTRY(__vdso_sgx_enter_enclave)
> /* EENTER <= leaf <= ERESUME */
> cmp $0x2, %eax
> jb bad_input
>
> cmp $0x3, %eax
> ja bad_input
>
> /* TCS must be non-NULL */
> test %rbx, %rbx
> je bad_input
>
> /* save @exception pointer */
> push %rcx
>
> /* load leaf, TCS and AEP for ENCLU */
> lea 1f(%rip), %rcx
> 1: enclu
>
> add 0x8, %rsp
> xor %eax, %eax
> ret
>
> bad_input:
> mov $(-EINVAL), %rax
> ret
>
> .pushsection .fixup, "ax"
> 2: pop %rcx
> test %rcx, %rcx
> je 3f
>
> mov %eax, EX_LEAF(%rcx)
> mov %di, EX_TRAPNR(%rcx)
> mov %si, EX_ERROR_CODE(%rcx)
> mov %rdx, EX_ADDRESS(%rcx)
> 3: mov $(-EFAULT), %rax
> ret

I’m not totally sold on -EFAULT as the error code. That usually indicates a bad pointer. I’m not sure I have a better suggestion.

> .popsection
>
> _ASM_VDSO_EXTABLE_HANDLE(1b, 3b)
>
> ENDPROC(__vdso_sgx_enter_enclave)

2018-12-14 19:21:46

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 10:44:10AM -0800, Andy Lutomirski wrote:
>
> > On Dec 14, 2018, at 9:03 AM, Sean Christopherson <[email protected]> wrote:
> >
> > .pushsection .fixup, "ax"
> > 2: pop %rcx
> > test %rcx, %rcx
> > je 3f
> >
> > mov %eax, EX_LEAF(%rcx)
> > mov %di, EX_TRAPNR(%rcx)
> > mov %si, EX_ERROR_CODE(%rcx)
> > mov %rdx, EX_ADDRESS(%rcx)
> > 3: mov $(-EFAULT), %rax
> > ret
>
> I’m not totally sold on -EFAULT as the error code. That usually
> indicates a bad pointer. I’m not sure I have a better suggestion.

Hmm, one idea would be to return positive signal numbers, e.g. SIGILL
for #UD. I don't like that approach though as it adds a fair amount
of code to the fixup handler for dubious value, e.g. userspace would
still need to check the exception error code to determine if the EPC
is lost. And we'd have to update the vDSO if a new exception and/or
signal was added, e.g. #CP for CET.

Encapsulating "you faulted" in a single error code seems cleaner for
both kernel and userspace code, and -EFAULT makes that pretty obvious
even though we're bastardizing its meaning a bit.

In general, I'd prefer to return only 0 or negative values so that
userspace can easily merge in their own (positive value) error codes
from the enclave, e.g. in the vDSO wrapper:

/* Enclave's return value is in RDI, overwrite RAX on success */
test %rax, %rax
cmove %rdi, %rax
ret