2018-12-14 22:01:49

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v5 0/5] x86: Add vDSO exception fixup for SGX

__vdso_sgx_enter_enclave() gets another rewrite, this time to strip
it down to the bare minimum and explicitly break compliance with the
x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave()
would need to save (a lot) more than just the non-volatile GPRs to be
compliant with the x86-64 ABI, at which point breaking from the ABI
completely became much more palatable.

The non-standard ABI also solves the question of "which GPRs should be
marshalled to/from the enclave" by getting out of the way entirely and
letting userspace have free reign (except for RSP, which has a big ol'
DO NOT TOUCH sign on it).

[1] https://lkml.kernel.org/r/[email protected]


v1: https://lkml.kernel.org/r/[email protected]
v2: https://lkml.kernel.org/r/[email protected]
v3: https://lkml.kernel.org/r/[email protected]
v4: https://lkml.kernel.org/r/[email protected]
v5:
- Strip down __vdso_sgx_enter_enclave() so it only touches RSP and
GPRs that are already collateral damage of ENCLU[EENTER/ERESUME].
- Add a 16-byte reserved field to 'struct sgx_enclave_exception' so
the struct can grow in a backwards compatible fashion.
- Add a blurb at the end of the changelog for patch 1/5 explaining
why the vDSO fixup macros look different than the equivalent kernel
macros.

Sean Christopherson (5):
x86/vdso: Add support for exception fixup in vDSO functions
x86/fault: Add helper function to sanitize error code
x86/fault: Attempt to fixup unhandled #PF on ENCLU before signaling
x86/traps: Attempt to fixup exceptions in vDSO before signaling
x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave
transitions

arch/x86/entry/vdso/Makefile | 6 +-
arch/x86/entry/vdso/extable.c | 37 +++++++++
arch/x86/entry/vdso/extable.h | 29 +++++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 ++-
arch/x86/entry/vdso/vdso.lds.S | 1 +
arch/x86/entry/vdso/vdso2c.h | 58 ++++++++++++--
arch/x86/entry/vdso/vsgx_enter_enclave.S | 97 ++++++++++++++++++++++++
arch/x86/include/asm/vdso.h | 5 ++
arch/x86/include/uapi/asm/sgx.h | 18 +++++
arch/x86/kernel/traps.c | 14 ++++
arch/x86/mm/fault.c | 33 +++++---
11 files changed, 284 insertions(+), 23 deletions(-)
create mode 100644 arch/x86/entry/vdso/extable.c
create mode 100644 arch/x86/entry/vdso/extable.h
create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S

--
2.19.2



2018-12-14 21:58:48

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v5 2/5] x86/fault: Add helper function to sanitize error code

...to prepare for vDSO exception fixup, which will expose the error
code to userspace and runs before set_signal_archinfo(), i.e. squashes
the signal when fixup is successful.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7e8a7558ca07..fefeb745d21d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -719,18 +719,22 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
oops_end(flags, regs, sig);
}

-static void set_signal_archinfo(unsigned long address,
- unsigned long error_code)
+static void sanitize_error_code(unsigned long address,
+ unsigned long *error_code)
{
- struct task_struct *tsk = current;
-
/*
* To avoid leaking information about the kernel page
* table layout, pretend that user-mode accesses to
* kernel addresses are always protection faults.
*/
if (address >= TASK_SIZE_MAX)
- error_code |= X86_PF_PROT;
+ *error_code |= X86_PF_PROT;
+}
+
+static void set_signal_archinfo(unsigned long address,
+ unsigned long error_code)
+{
+ struct task_struct *tsk = current;

tsk->thread.trap_nr = X86_TRAP_PF;
tsk->thread.error_code = error_code | X86_PF_USER;
@@ -771,6 +775,8 @@ no_context(struct pt_regs *regs, unsigned long error_code,
* faulting through the emulate_vsyscall() logic.
*/
if (current->thread.sig_on_uaccess_err && signal) {
+ sanitize_error_code(address, &error_code);
+
set_signal_archinfo(address, error_code);

/* XXX: hwpoison faults will set the wrong code. */
@@ -920,13 +926,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
if (is_errata100(regs, address))
return;

- /*
- * To avoid leaking information about the kernel page table
- * layout, pretend that user-mode accesses to kernel addresses
- * are always protection faults.
- */
- if (address >= TASK_SIZE_MAX)
- error_code |= X86_PF_PROT;
+ sanitize_error_code(address, &error_code);

if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);
@@ -1045,6 +1045,8 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
if (is_prefetch(regs, error_code, address))
return;

+ sanitize_error_code(address, &error_code);
+
set_signal_archinfo(address, error_code);

#ifdef CONFIG_MEMORY_FAILURE
--
2.19.2


2018-12-14 21:59:15

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v5 4/5] x86/traps: Attempt to fixup exceptions in vDSO before signaling

Call fixup_vdso_exception() in all trap flows that generate signals to
userspace immediately prior to generating any such signal. If the
exception is fixed, return cleanly and do not generate a signal.

The goal of vDSO fixup is not to fixup all faults, nor is it to avoid
all signals, but rather to report faults directly to userspace when the
fault would otherwise directly result in a signal being sent to the
process.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/traps.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9b7c4ca8f0a7..b1ca05efb15e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/mpx.h>
#include <asm/vm86.h>
#include <asm/umip.h>
+#include <asm/vdso.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -209,6 +210,9 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
die(str, regs, error_code);
+ } else {
+ if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+ return 0;
}

/*
@@ -560,6 +564,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
return;
}

+ if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
+ return;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;

@@ -774,6 +781,10 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
SIGTRAP) == NOTIFY_STOP)
goto exit;

+ if (user_mode(regs) &&
+ fixup_vdso_exception(regs, X86_TRAP_DB, error_code, 0))
+ goto exit;
+
/*
* Let others (NMI) know that the debug stack is in use
* as we may switch to the interrupt stack.
@@ -854,6 +865,9 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
if (!si_code)
return;

+ if (fixup_vdso_exception(regs, trapnr, error_code, 0))
+ return;
+
force_sig_fault(SIGFPE, si_code,
(void __user *)uprobe_get_trap_addr(regs), task);
}
--
2.19.2


2018-12-14 21:59:17

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v5 3/5] x86/fault: Attempt to fixup unhandled #PF on ENCLU before signaling

Call fixup_sgx_enclu_exception() in the SIGSEGV and SIGBUS paths of
the page fault handler immediately prior to signaling. If the fault
is fixed, return cleanly and do not generate a signal.

In the SIGSEGV flow, make sure the error code passed to userspace has
been sanitized.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fefeb745d21d..c6f5f77ffabd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -28,6 +28,7 @@
#include <asm/mmu_context.h> /* vma_pkey() */
#include <asm/efi.h> /* efi_recover_from_page_fault()*/
#include <asm/desc.h> /* store_idt(), ... */
+#include <asm/vdso.h> /* fixup_vdso_exception() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -928,6 +929,9 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,

sanitize_error_code(address, &error_code);

+ if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+ return;
+
if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk);

@@ -1047,6 +1051,9 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,

sanitize_error_code(address, &error_code);

+ if (fixup_vdso_exception(regs, X86_TRAP_PF, error_code, address))
+ return;
+
set_signal_archinfo(address, error_code);

#ifdef CONFIG_MEMORY_FAILURE
--
2.19.2


2018-12-14 22:00:00

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v5 1/5] x86/vdso: Add support for exception fixup in vDSO functions

The basic concept and implementation is very similar to the kernel's
exception fixup mechanism. The key differences are that the kernel
handler is hardcoded and the fixup entry addresses are relative to
the overall table as opposed to individual entries.

Hardcoding the kernel handler avoids the need to figure out how to
get userspace code to point at a kernel function. Given that the
expected usage is to propagate information to userspace, dumping all
fault information into registers is likely the desired behavior for
the vast majority of yet-to-be-created functions. Use registers
DI, SI and DX to communicate fault information, which follows Linux's
ABI for register consumption and hopefully avoids conflict with
hardware features that might leverage the fixup capabilities, e.g.
register usage for SGX instructions was at least partially designed
with calling conventions in mind.

Making fixup addresses relative to the overall table allows the table
to be stripped from the final vDSO image (it's a kernel construct)
without complicating the offset logic, e.g. entry-relative addressing
would also need to account for the table's location relative to the
image.

Regarding stripping the table, modify vdso2c to extract the table from
the raw, a.k.a. unstripped, data and dump it as a standalone byte array
in the resulting .c file. The original base of the table, its length
and a pointer to the byte array are captured in struct vdso_image.
Alternatively, the table could be dumped directly into the struct,
but because the number of entries can vary per image, that would
require either hardcoding a max sized table into the struct definition
or defining the table as a flexible length array. The flexible length
array approach has zero benefits, e.g. the base/size are still needed,
and prevents reusing the extraction code, while hardcoding the max size
adds ongoing maintenance just to avoid exporting the explicit size.

The immediate use case is for Intel Software Guard Extensions (SGX).
SGX introduces a new CPL3-only "enclave" mode that runs as a sort of
black box shared object that is hosted by an untrusted "normal" CPl3
process.

Entering an enclave can only be done through SGX-specific instructions,
EENTER and ERESUME, and is a non-trivial process. Because of the
complexity of transitioning to/from an enclave, the vast majority of
enclaves are expected to utilize a library to handle the actual
transitions. This is roughly analogous to how e.g. libc implementations
are used by most applications.

Another crucial characteristic of SGX enclaves is that they can generate
exceptions as part of their normal (at least as "normal" as SGX can be)
operation that need to be handled *in* the enclave and/or are unique
to SGX.

And because they are essentially fancy shared objects, a process can
host any number of enclaves, each of which can execute multiple threads
simultaneously.

Putting everything together, userspace enclaves will utilize a library
that must be prepared to handle any and (almost) all exceptions any time
at least one thread may be executing in an enclave. Leveraging signals
to handle the enclave exceptions is unpleasant, to put it mildly, e.g.
the SGX library must constantly (un)register its signal handler based
on whether or not at least one thread is executing in an enclave, and
filter and forward exceptions that aren't related to its enclaves. This
becomes particularly nasty when using multiple levels of libraries that
register signal handlers, e.g. running an enclave via cgo inside of the
Go runtime.

Enabling exception fixup in vDSO allows the kernel to provide a vDSO
function that wraps the low-level transitions to/from the enclave, i.e.
the EENTER and ERESUME instructions. The vDSO function can intercept
exceptions that would otherwise generate a signal and return the fault
information directly to its caller, thus avoiding the need to juggle
signal handlers.

Note that unlike the kernel's _ASM_EXTABLE_HANDLE implementation, the
'C' version of _ASM_VDSO_EXTABLE_HANDLE doesn't use a pre-compiled
assembly macro. Duplicating four lines of code is simpler than adding
the necessary infrastructure to generate pre-compiled assembly and the
intended benefit of massaging GCC's inlining algorithm is unlikely to
realized in the vDSO any time soon, if ever.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/entry/vdso/Makefile | 4 +-
arch/x86/entry/vdso/extable.c | 37 +++++++++++++++++
arch/x86/entry/vdso/extable.h | 29 ++++++++++++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 ++++-
arch/x86/entry/vdso/vdso2c.h | 58 +++++++++++++++++++++++----
arch/x86/include/asm/vdso.h | 5 +++
6 files changed, 131 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/entry/vdso/extable.c
create mode 100644 arch/x86/entry/vdso/extable.h

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 0624bf2266fd..b8f7c301b88f 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -20,7 +20,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o

# files to link into kernel
-obj-y += vma.o
+obj-y += vma.o extable.o
OBJECT_FILES_NON_STANDARD_vma.o := n

# vDSO images to build
@@ -115,7 +115,7 @@ $(obj)/%-x32.o: $(obj)/%.o FORCE

targets += vdsox32.lds $(vobjx32s-y)

-$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: OBJCOPYFLAGS := -S --remove-section __ex_table
$(obj)/%.so: $(obj)/%.so.dbg
$(call if_changed,objcopy)

diff --git a/arch/x86/entry/vdso/extable.c b/arch/x86/entry/vdso/extable.c
new file mode 100644
index 000000000000..49284d560d36
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <asm/current.h>
+#include <asm/vdso.h>
+
+struct vdso_exception_table_entry {
+ int insn, fixup;
+};
+
+bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+ unsigned long error_code, unsigned long fault_addr)
+{
+ const struct vdso_image *image = current->mm->context.vdso_image;
+ const struct vdso_exception_table_entry *extable;
+ unsigned int nr_entries, i;
+ unsigned long base;
+
+ if (!current->mm->context.vdso)
+ return false;
+
+ base = (unsigned long)current->mm->context.vdso + image->extable_base;
+ nr_entries = image->extable_len / (sizeof(*extable));
+ extable = image->extable;
+
+ for (i = 0; i < nr_entries; i++) {
+ if (regs->ip == base + extable[i].insn) {
+ regs->ip = base + extable[i].fixup;
+ regs->di = trapnr;
+ regs->si = error_code;
+ regs->dx = fault_addr;
+ return true;
+ }
+ }
+
+ return false;
+}
diff --git a/arch/x86/entry/vdso/extable.h b/arch/x86/entry/vdso/extable.h
new file mode 100644
index 000000000000..aafdac396948
--- /dev/null
+++ b/arch/x86/entry/vdso/extable.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __VDSO_EXTABLE_H
+#define __VDSO_EXTABLE_H
+
+/*
+ * Inject exception fixup for vDSO code. Unlike normal exception fixup,
+ * vDSO uses a dedicated handler the addresses are relative to the overall
+ * exception table, not each individual entry.
+ */
+#ifdef __ASSEMBLY__
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to) \
+ ASM_VDSO_EXTABLE_HANDLE from to
+
+.macro ASM_VDSO_EXTABLE_HANDLE from:req to:req
+ .pushsection __ex_table, "a"
+ .long (\from) - __ex_table
+ .long (\to) - __ex_table
+ .popsection
+.endm
+#else
+#define _ASM_VDSO_EXTABLE_HANDLE(from, to) \
+ ".pushsection __ex_table, \"a\"\n" \
+ ".long (" #from ") - __ex_table\n" \
+ ".long (" #to ") - __ex_table\n" \
+ ".popsection\n"
+#endif
+
+#endif /* __VDSO_EXTABLE_H */
+
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index 93c6dc7812d0..8ef849064501 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -63,11 +63,18 @@ SECTIONS
* stuff that isn't used at runtime in between.
*/

- .text : { *(.text*) } :text =0x90909090,
+ .text : {
+ *(.text*)
+ *(.fixup)
+ } :text =0x90909090,
+
+

.altinstructions : { *(.altinstructions) } :text
.altinstr_replacement : { *(.altinstr_replacement) } :text

+ __ex_table : { *(__ex_table) } :text
+
/DISCARD/ : {
*(.discard)
*(.discard.*)
diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index fa847a620f40..eca2f808bec3 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -5,6 +5,41 @@
* are built for 32-bit userspace.
*/

+static void BITSFUNC(copy)(FILE *outfile, const unsigned char *data, size_t len)
+{
+ size_t i;
+
+ for (i = 0; i < len; i++) {
+ if (i % 10 == 0)
+ fprintf(outfile, "\n\t");
+ fprintf(outfile, "0x%02X, ", (int)(data)[i]);
+ }
+}
+
+
+/*
+ * Extract a section from the input data into a standalone blob. Used to
+ * capture kernel-only data that needs to persist indefinitely, e.g. the
+ * exception fixup tables, but only in the kernel, i.e. the section can
+ * be stripped from the final vDSO image.
+ */
+static void BITSFUNC(extract)(const unsigned char *data, size_t data_len,
+ FILE *outfile, ELF(Shdr) *sec, const char *name)
+{
+ unsigned long offset;
+ size_t len;
+
+ offset = (unsigned long)GET_LE(&sec->sh_offset);
+ len = (size_t)GET_LE(&sec->sh_size);
+
+ if (offset + len > data_len)
+ fail("section to extract overruns input data");
+
+ fprintf(outfile, "static const unsigned char %s[%lu] = {", name, len);
+ BITSFUNC(copy)(outfile, data + offset, len);
+ fprintf(outfile, "\n};\n\n");
+}
+
static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
void *stripped_addr, size_t stripped_len,
FILE *outfile, const char *name)
@@ -14,9 +49,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
unsigned long mapping_size;
ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
int i;
- unsigned long j;
ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
- *alt_sec = NULL;
+ *alt_sec = NULL, *extable_sec = NULL;
ELF(Dyn) *dyn = 0, *dyn_end = 0;
const char *secstrings;
INT_BITS syms[NSYMS] = {};
@@ -78,6 +112,8 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
if (!strcmp(secstrings + GET_LE(&sh->sh_name),
".altinstructions"))
alt_sec = sh;
+ if (!strcmp(secstrings + GET_LE(&sh->sh_name), "__ex_table"))
+ extable_sec = sh;
}

if (!symtab_hdr)
@@ -149,13 +185,11 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile,
"static unsigned char raw_data[%lu] __ro_after_init __aligned(PAGE_SIZE) = {",
mapping_size);
- for (j = 0; j < stripped_len; j++) {
- if (j % 10 == 0)
- fprintf(outfile, "\n\t");
- fprintf(outfile, "0x%02X, ",
- (int)((unsigned char *)stripped_addr)[j]);
- }
+ BITSFUNC(copy)(outfile, stripped_addr, stripped_len);
fprintf(outfile, "\n};\n\n");
+ if (extable_sec)
+ BITSFUNC(extract)(raw_addr, raw_len, outfile,
+ extable_sec, "extable");

fprintf(outfile, "const struct vdso_image %s = {\n", name);
fprintf(outfile, "\t.data = raw_data,\n");
@@ -166,6 +200,14 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
fprintf(outfile, "\t.alt_len = %lu,\n",
(unsigned long)GET_LE(&alt_sec->sh_size));
}
+ if (extable_sec) {
+ fprintf(outfile, "\t.extable_base = %lu,\n",
+ (unsigned long)GET_LE(&extable_sec->sh_offset));
+ fprintf(outfile, "\t.extable_len = %lu,\n",
+ (unsigned long)GET_LE(&extable_sec->sh_size));
+ fprintf(outfile, "\t.extable = extable,\n");
+ }
+
for (i = 0; i < NSYMS; i++) {
if (required_syms[i].export && syms[i])
fprintf(outfile, "\t.sym_%s = %" PRIi64 ",\n",
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 27566e57e87d..1c8a6a8f7b59 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -15,6 +15,8 @@ struct vdso_image {
unsigned long size; /* Always a multiple of PAGE_SIZE */

unsigned long alt, alt_len;
+ unsigned long extable_base, extable_len;
+ const void *extable;

long sym_vvar_start; /* Negative offset to the vvar area */

@@ -45,6 +47,9 @@ extern void __init init_vdso_image(const struct vdso_image *image);

extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);

+extern bool fixup_vdso_exception(struct pt_regs *regs, int trapnr,
+ unsigned long error_code,
+ unsigned long fault_addr);
#endif /* __ASSEMBLER__ */

#endif /* _ASM_X86_VDSO_H */
--
2.19.2


2018-12-14 22:00:30

by Sean Christopherson

[permalink] [raw]
Subject: [RFC PATCH v5 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
enclave mode that runs as a sort of black box shared object that is
hosted by an untrusted normal CPL3 process.

Enclave transitions have semantics that are a lovely blend of SYCALL,
SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
an enclave can only be done through SGX-specific instructions, EENTER
and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
RIP from R{B,C}X.

But in a faulting/interrupting scenario, enclave transitions act more
like VM-Exit and VMRESUME. Maintaining the black box nature of the
enclave means that hardware must automatically switch CPU context when
an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
or exception (exceptions are AEEs because asynchronous in this context
is relative to the enclave and not CPU execution, e.g. the enclave
doesn't get an opportunity to save/fuzz CPU state).

Like VM-Exits, all AEEs jump to a common location, referred to as the
Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
via register passed to EENTER/ERESUME, similar to how the hypervisor
specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
Resuming the enclave/VM after the exiting event is handled is done via
ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
the AEP which then ERESUMEs th enclave.

Enclaves also behave a bit like VMs in the sense that they can generate
exceptions as part of their normal operation that for all intents and
purposes need to handled in the enclave/VM. However, unlike VMX, SGX
doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
as doing so would circumvent the enclave's security. So to handle an
exception, the enclave must first be re-entered through the normal
EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
(VMRESUME behavior) after the source of the exception is resolved.

All of the above is just the tip of the iceberg when it comes to running
an enclave. But, SGX was designed in such a way that the host process
can utilize a library to build, launch and run an enclave. This is
roughly analogous to how e.g. libc implementations are used by most
applications so that the application can focus on its business logic.

The big gotcha is that because enclaves can generate *and* handle
exceptions, any SGX library must be prepared to handle nearly any
exception at any time (well, any time a thread is executing in an
enclave). In Linux, this means the SGX library must register a
signal handler in order to intercept relevant exceptions and forward
them to the enclave (or in some cases, take action on behalf of the
enclave). Unfortunately, Linux's signal mechanism doesn't mesh well
with libraries, e.g. signal handlers are process wide, are difficult
to chain, etc... This becomes particularly nasty when using multiple
levels of libraries that register signal handlers, e.g. running an
enclave via cgo inside of the Go runtime.

In comes vDSO to save the day. Now that vDSO can fixup exceptions,
add a function, __vdso_sgx_enter_enclave(), to wrap enclave transitions
and intercept any exceptions that occur when running the enclave.

__vdso_sgx_enter_enclave() does NOT adhere to the x86-64 ABI and instead
uses a custom calling convention. The primary motivation is to avoid
issues that arise due to asynchronous enclave exits. The x86-64 ABI
requires that EFLAGS.DF, MXCSR and FCW be preserved by the callee, and
unfortunately for the vDSO, the aformentioned registers/bits are not
restored after an asynchronous exit, e.g. EFLAGS.DF is in an unknown
state while MXCSR and FCW are reset to their init values. So the vDSO
cannot simply pass the buck by requiring enclaves to adhere to the
x86-64 ABI. That leaves three somewhat reasonable options:

1) Save/restore non-volatile GPRs, MXCSR and FCW, and clear EFLAGS.DF

+ 100% compliant with the x86-64 ABI
+ Callable from any code
+ Minimal documentation required
- Restoring MXCSR/FCW is likely unnecessary 99% of the time
- Slow

2) Save/restore non-volatile GPRs and clear EFLAGS.DF

+ Mostly compliant with the x86-64 ABI
+ Callable from any code that doesn't use SIMD registers
- Need to document deviations from x86-64 ABI, i.e. MXCSR and FCW

3) Require the caller to save/restore everything.

+ Fast
+ Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
- Custom ABI
- For all intents and purposes must be called from an assembly wrapper

__vdso_sgx_enter_enclave() implements option (3). The custom ABI is
mostly a documentation issue, and even that is offset by the fact that
being more similar to hardware's ENCLU[EENTER/ERESUME] ABI reduces the
amount of documentation needed for the vDSO, e.g. options (2) and (3)
would need to document which registers are marshalled to/from enclaves.
Requiring an assembly wrapper imparts minimal pain on userspace as SGX
libraries and/or applications need a healthy chunk of assembly, e.g. in
the enclave, regardless of the vDSO's implementation.

Suggested-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Haitao Huang <[email protected]>
Cc: Jethro Beekman <[email protected]>
Cc: Dr. Greg Wettstein <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/entry/vdso/Makefile | 2 +
arch/x86/entry/vdso/vdso.lds.S | 1 +
arch/x86/entry/vdso/vsgx_enter_enclave.S | 97 ++++++++++++++++++++++++
arch/x86/include/uapi/asm/sgx.h | 18 +++++
4 files changed, 118 insertions(+)
create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b8f7c301b88f..5e28f838d8aa 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y

# files to link into the vdso
vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-$(VDSO64-y) += vsgx_enter_enclave.o

# files to link into kernel
obj-y += vma.o extable.o
@@ -85,6 +86,7 @@ CFLAGS_REMOVE_vdso-note.o = -pg
CFLAGS_REMOVE_vclock_gettime.o = -pg
CFLAGS_REMOVE_vgetcpu.o = -pg
CFLAGS_REMOVE_vvar.o = -pg
+CFLAGS_REMOVE_vsgx_enter_enclave.o = -pg

#
# X32 processes use x32 vDSO to access 64bit kernel data.
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index d3a2dce4cfa9..50952a995a6c 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -25,6 +25,7 @@ VERSION {
__vdso_getcpu;
time;
__vdso_time;
+ __vdso_sgx_enter_enclave;
local: *;
};
}
diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S b/arch/x86/entry/vdso/vsgx_enter_enclave.S
new file mode 100644
index 000000000000..af572adcd8ed
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/export.h>
+#include <asm/errno.h>
+
+#include "extable.h"
+
+#define EX_LEAF 0*8
+#define EX_TRAPNR 0*8+4
+#define EX_ERROR_CODE 0*8+6
+#define EX_ADDRESS 1*8
+
+.code64
+.section .text, "ax"
+
+/**
+ * __vdso_sgx_enter_enclave() - Enter an SGX enclave
+ *
+ * %eax: ENCLU leaf, must be EENTER or ERESUME
+ * %rbx: TCS, must be non-NULL
+ * %rcx: Optional pointer to 'struct sgx_enclave_exception'
+ *
+ * Return:
+ * 0 on a clean entry/exit to/from the enclave
+ * -EINVAL if ENCLU leaf is not allowed or if TCS is NULL
+ * -EFAULT if ENCLU or the enclave faults
+ *
+ * Note that __vdso_sgx_enter_enclave() is not compliant with the x86-64 ABI.
+ * All registers except RSP must be treated as volatile from the caller's
+ * perspective, including but not limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc...
+ * Conversely, the enclave being run must preserve the untrusted RSP and stack.
+ *
+ * __vdso_sgx_enter_enclave(u32 leaf, void *tcs,
+ * struct sgx_enclave_exception *exception_info)
+ * {
+ * if (leaf != SGX_EENTER && leaf != SGX_ERESUME)
+ * return -EINVAL;
+ *
+ * if (!tcs)
+ * return -EINVAL;
+ *
+ * try {
+ * ENCLU[leaf];
+ * } catch (exception) {
+ * if (e)
+ * *e = exception;
+ * return -EFAULT;
+ * }
+ *
+ * return 0;
+ * }
+ */
+ENTRY(__vdso_sgx_enter_enclave)
+ /* EENTER <= leaf <= ERESUME */
+ cmp $0x2, %eax
+ jb bad_input
+
+ cmp $0x3, %eax
+ ja bad_input
+
+ /* TCS must be non-NULL */
+ test %rbx, %rbx
+ je bad_input
+
+ /* Save @exception_info */
+ push %rcx
+
+ /* Load AEP for ENCLU */
+ lea 1f(%rip), %rcx
+1: enclu
+
+ add $0x8, %rsp
+ xor %eax, %eax
+ ret
+
+bad_input:
+ mov $(-EINVAL), %rax
+ ret
+
+.pushsection .fixup, "ax"
+ /* Re-load @exception_info and fill it (if it's non-NULL) */
+2: pop %rcx
+ test %rcx, %rcx
+ je 3f
+
+ mov %eax, EX_LEAF(%rcx)
+ mov %di, EX_TRAPNR(%rcx)
+ mov %si, EX_ERROR_CODE(%rcx)
+ mov %rdx, EX_ADDRESS(%rcx)
+3: mov $(-EFAULT), %rax
+ ret
+.popsection
+
+_ASM_VDSO_EXTABLE_HANDLE(1b, 2b)
+
+ENDPROC(__vdso_sgx_enter_enclave)
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 266b813eefa1..099cf8025dd8 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -96,4 +96,22 @@ struct sgx_enclave_modify_pages {
__u8 op;
} __attribute__((__packed__));

+/**
+ * struct sgx_enclave_exception - structure to report exceptions encountered in
+ * __vdso_sgx_enter_enclave
+ *
+ * @leaf: ENCLU leaf from %rax at time of exception
+ * @trapnr: exception trap number, a.k.a. fault vector
+ * @error_cdde: exception error code
+ * @address: exception address, e.g. CR2 on a #PF
+ * @reserved: reserved for future use
+ */
+struct sgx_enclave_exception {
+ __u32 leaf;
+ __u16 trapnr;
+ __u16 error_code;
+ __u64 address;
+ __u64 reserved[2];
+};
+
#endif /* _UAPI_ASM_X86_SGX_H */
--
2.19.2


2018-12-18 04:19:36

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [RFC PATCH v5 0/5] x86: Add vDSO exception fixup for SGX

On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote:
> __vdso_sgx_enter_enclave() gets another rewrite, this time to strip
> it down to the bare minimum and explicitly break compliance with the
> x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave()
> would need to save (a lot) more than just the non-volatile GPRs to be
> compliant with the x86-64 ABI, at which point breaking from the ABI
> completely became much more palatable.
>
> The non-standard ABI also solves the question of "which GPRs should be
> marshalled to/from the enclave" by getting out of the way entirely and
> letting userspace have free reign (except for RSP, which has a big ol'
> DO NOT TOUCH sign on it).

Can you share a reference, or is this better documented in the
accompanied patches?

/Jarkko

2018-12-18 15:10:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v5 0/5] x86: Add vDSO exception fixup for SGX

On Tue, Dec 18, 2018 at 06:18:15AM +0200, Jarkko Sakkinen wrote:
> On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote:
> > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip
> > it down to the bare minimum and explicitly break compliance with the
> > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave()
> > would need to save (a lot) more than just the non-volatile GPRs to be
> > compliant with the x86-64 ABI, at which point breaking from the ABI
> > completely became much more palatable.
> >
> > The non-standard ABI also solves the question of "which GPRs should be
> > marshalled to/from the enclave" by getting out of the way entirely and
> > letting userspace have free reign (except for RSP, which has a big ol'
> > DO NOT TOUCH sign on it).
>
> Can you share a reference, or is this better documented in the
> accompanied patches?

Patch 5/5 has more details, and v4 of the series has a lot of background
info by way of discussion:

https://lkml.kernel.org/r/[email protected]

2018-12-19 05:24:43

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [RFC PATCH v5 0/5] x86: Add vDSO exception fixup for SGX

On Tue, Dec 18, 2018 at 07:08:15AM -0800, Sean Christopherson wrote:
> On Tue, Dec 18, 2018 at 06:18:15AM +0200, Jarkko Sakkinen wrote:
> > On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote:
> > > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip
> > > it down to the bare minimum and explicitly break compliance with the
> > > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave()
> > > would need to save (a lot) more than just the non-volatile GPRs to be
> > > compliant with the x86-64 ABI, at which point breaking from the ABI
> > > completely became much more palatable.
> > >
> > > The non-standard ABI also solves the question of "which GPRs should be
> > > marshalled to/from the enclave" by getting out of the way entirely and
> > > letting userspace have free reign (except for RSP, which has a big ol'
> > > DO NOT TOUCH sign on it).
> >
> > Can you share a reference, or is this better documented in the
> > accompanied patches?
>
> Patch 5/5 has more details, and v4 of the series has a lot of background
> info by way of discussion:
>
> https://lkml.kernel.org/r/[email protected]

Thanks Sean, I will. Have not had yet much time to look at this.

/Jarkko

2018-12-19 05:27:11

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [RFC PATCH v5 0/5] x86: Add vDSO exception fixup for SGX

On Wed, Dec 19, 2018 at 06:43:46AM +0200, Jarkko Sakkinen wrote:
> On Tue, Dec 18, 2018 at 07:08:15AM -0800, Sean Christopherson wrote:
> > On Tue, Dec 18, 2018 at 06:18:15AM +0200, Jarkko Sakkinen wrote:
> > > On Fri, Dec 14, 2018 at 01:57:24PM -0800, Sean Christopherson wrote:
> > > > __vdso_sgx_enter_enclave() gets another rewrite, this time to strip
> > > > it down to the bare minimum and explicitly break compliance with the
> > > > x86-64 ABI. Feedback from v4 revealed that __vdso_sgx_enter_enclave()
> > > > would need to save (a lot) more than just the non-volatile GPRs to be
> > > > compliant with the x86-64 ABI, at which point breaking from the ABI
> > > > completely became much more palatable.
> > > >
> > > > The non-standard ABI also solves the question of "which GPRs should be
> > > > marshalled to/from the enclave" by getting out of the way entirely and
> > > > letting userspace have free reign (except for RSP, which has a big ol'
> > > > DO NOT TOUCH sign on it).
> > >
> > > Can you share a reference, or is this better documented in the
> > > accompanied patches?
> >
> > Patch 5/5 has more details, and v4 of the series has a lot of background
> > info by way of discussion:
> >
> > https://lkml.kernel.org/r/[email protected]
>
> Thanks Sean, I will. Have not had yet much time to look at this.

Might make sense to summarize the design decisions for future
patch set versions to the cover letter at least, namely the
reasoning why not to follow the ABI.

/Jarkko

2018-12-19 08:40:11

by Jarkko Sakkinen

[permalink] [raw]
Subject: x86/sgx: uapi change proposal

I have pretty much figured out how to change the driver implementation
from VMA based to file based. Most of the code in the driver can be
reused with not that enormous changes. I think it is a clue that the
architecture is somewhat right because changing the driver this
radically does not seem to require any changes to the core.

Using anon inode is the right choice because it is more robust interface
to be able to create multiple enclaves.

The only remaining open that I have when it comes to implementing this
is the backing storage. From API perspective the most robust choice
would be to revert to use shmem file. It would be easy then to create a
complete construction flow without any dependencies to mm_struct.

I do recognize the issue with accounting but to which process the
backing storage should be accounted anyway in this new paradigm.

I've attached the new uapi header to this email that I'm going forward
with.

/Jarkko


Attachments:
sgx.h (2.03 kB)

2018-12-19 09:26:06

by Jethro Beekman

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On 2018-12-19 13:28, Jarkko Sakkinen wrote:
> I have pretty much figured out how to change the driver implementation
> from VMA based to file based. Most of the code in the driver can be
> reused with not that enormous changes. I think it is a clue that the
> architecture is somewhat right because changing the driver this
> radically does not seem to require any changes to the core.

One weird thing is the departure from the normal mmap behavior that the
memory mapping persists even if the original fd is closed. (See man
mmap: "closing the file descriptor does not unmap the region.")

> Using anon inode is the right choice because it is more robust interface
> to be able to create multiple enclaves.
>
> The only remaining open that I have when it comes to implementing this
> is the backing storage. From API perspective the most robust choice
> would be to revert to use shmem file. It would be easy then to create a
> complete construction flow without any dependencies to mm_struct.
>
> I do recognize the issue with accounting but to which process the
> backing storage should be accounted anyway in this new paradigm.
>
> I've attached the new uapi header to this email that I'm going forward
> with.
>
> /Jarkko
>
> sgx.h
>
> /* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> /**
> * Copyright(c) 2016-18 Intel Corporation.
> */
> #ifndef _UAPI_ASM_X86_SGX_H
> #define _UAPI_ASM_X86_SGX_H
>
> #include <linux/types.h>
> #include <linux/ioctl.h>
>
> #define SGX_MAGIC 0xA4
>
> #define SGX_IOC_ENCLAVE_CREATE \
> _IOW(SGX_MAGIC, 0x00, struct sgx_enclave_create)
> #define SGX_IOC_ENCLAVE_ADD_PAGE \
> _IOW(SGX_MAGIC, 0x01, struct sgx_enclave_add_page)
> #define SGX_IOC_ENCLAVE_INIT \
> _IOW(SGX_MAGIC, 0x02, struct sgx_enclave_init)
> #define SGX_IOC_ENCLAVE_SET_ATTRIBUTE \
> _IOW(SGX_MAGIC, 0x03, struct sgx_enclave_set_attribute)
>
> /* IOCTL return values */
> #define SGX_POWER_LOST_ENCLAVE 0x40000000
>
> /**
> * struct sgx_enclave_create - parameter structure for the
> * %SGX_IOC_ENCLAVE_CREATE ioctl
> * @src: address for the SECS page data
> * @enclave_fd: file handle to the enclave address space (out)
> */
> struct sgx_enclave_create {
> __u64 src;
> __u64 enclave_fd;
> };
>
> /**
> * struct sgx_enclave_add_page - parameter structure for the
> * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
> * @eclave_fd: file handle to the enclave address space
> * @src: address for the page data
> * @secinfo: address for the SECINFO data
> * @mrmask: bitmask for the measured 256 byte chunks
> */
> struct sgx_enclave_add_page {
> __u64 enclave_fd;
> __u64 src;
> __u64 secinfo;
> __u16 mrmask;
> } __attribute__((__packed__));

Wouldn't you just pass enclave_fd as the ioctl fd parameter?

How to specify the address of the page that is being added?

>
>
> /**
> * struct sgx_enclave_init - parameter structure for the
> * %SGX_IOC_ENCLAVE_INIT ioctl
> * @eclave_fd: file handle to the enclave address space
> * @sigstruct: address for the SIGSTRUCT data
> */
> struct sgx_enclave_init {
> __u64 enclave_fd;
> __u64 sigstruct;
> }; >
> /**
> * struct sgx_enclave_set_attribute - parameter structure for the
> * %SGX_IOC_ENCLAVE_INIT ioctl
> * @addr: address within the ELRANGE

Stray parameter in docs

> * @eclave_fd: file handle to the enclave address space
> * @attribute_fd: file handle of the attribute file in the securityfs
> */
> struct sgx_enclave_set_attribute {
> __u64 enclave_fd;
> __u64 attribute_fd;
> };

What is this for?

>
> #endif /* _UAPI_ASM_X86_SGX_H */

--
Jethro Beekman | Fortanix


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2018-12-19 09:30:22

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote:
> One weird thing is the departure from the normal mmap behavior that the
> memory mapping persists even if the original fd is closed. (See man mmap:
> "closing the file descriptor does not unmap the region.")

The mmapped region and enclave would be completely disjoint to start
with. The enclave driver code would assume that an enclave VMA exists
when it maps enclave address space to a process.

I.e. VMA would no longer reference to the enclave or vice versa but
you would still create an enclave VMA with mmap().

This is IMHO very clear and well-defined semantics.

> > struct sgx_enclave_add_page {
> > __u64 enclave_fd;
> > __u64 src;
> > __u64 secinfo;
> > __u16 mrmask;
> > } __attribute__((__packed__));
>
> Wouldn't you just pass enclave_fd as the ioctl fd parameter?

I'm still planning to keep the API in the device fd and use enclave_fd
as handle to the enclave address space. I don't see any obvious reason
to change that behavior.

And if we ever add any "global" ioctls, then we would have to define
APIs to both fd's, which would become a mess.

> How to specify the address of the page that is being added?

Yes, that is correct and my bad to remove it (just quickly drafted what
I had in mind).

/Jarkko

2018-12-19 09:48:10

by Jethro Beekman

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On 2018-12-19 14:41, Jarkko Sakkinen wrote:
> On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote:
>> One weird thing is the departure from the normal mmap behavior that the
>> memory mapping persists even if the original fd is closed. (See man mmap:
>> "closing the file descriptor does not unmap the region.")
>
> The mmapped region and enclave would be completely disjoint to start
> with. The enclave driver code would assume that an enclave VMA exists
> when it maps enclave address space to a process.
>
> I.e. VMA would no longer reference to the enclave or vice versa but
> you would still create an enclave VMA with mmap().
>
> This is IMHO very clear and well-defined semantics.
>
>>> struct sgx_enclave_add_page {
>>> __u64 enclave_fd;
>>> __u64 src;
>>> __u64 secinfo;
>>> __u16 mrmask;
>>> } __attribute__((__packed__));
>>
>> Wouldn't you just pass enclave_fd as the ioctl fd parameter?
>
> I'm still planning to keep the API in the device fd and use enclave_fd
> as handle to the enclave address space. I don't see any obvious reason
> to change that behavior.
>
> And if we ever add any "global" ioctls, then we would have to define
> APIs to both fd's, which would become a mess.
>
>> How to specify the address of the page that is being added?
>
> Yes, that is correct and my bad to remove it (just quickly drafted what
> I had in mind).

So your plan is that to call EADD, userspace has to pass the device fd
AND the enclave fd AND the enclave address? That seems a little superfluous.

--
Jethro Beekman | Fortanix


Attachments:
smime.p7s (3.90 kB)
S/MIME Cryptographic Signature

2018-12-19 11:44:00

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [RFC PATCH v5 5/5] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

On Fri, Dec 14, 2018 at 01:57:29PM -0800, Sean Christopherson wrote:
> Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
> enclave mode that runs as a sort of black box shared object that is
> hosted by an untrusted normal CPL3 process.
>
> Enclave transitions have semantics that are a lovely blend of SYCALL,
> SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
> an enclave can only be done through SGX-specific instructions, EENTER
> and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
> e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
> RIP from R{B,C}X.
>
> But in a faulting/interrupting scenario, enclave transitions act more
> like VM-Exit and VMRESUME. Maintaining the black box nature of the
> enclave means that hardware must automatically switch CPU context when
> an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
> or exception (exceptions are AEEs because asynchronous in this context
> is relative to the enclave and not CPU execution, e.g. the enclave
> doesn't get an opportunity to save/fuzz CPU state).
>
> Like VM-Exits, all AEEs jump to a common location, referred to as the
> Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
> via register passed to EENTER/ERESUME, similar to how the hypervisor
> specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
> Resuming the enclave/VM after the exiting event is handled is done via
> ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
> kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
> the AEP which then ERESUMEs th enclave.
>
> Enclaves also behave a bit like VMs in the sense that they can generate
> exceptions as part of their normal operation that for all intents and
> purposes need to handled in the enclave/VM. However, unlike VMX, SGX
> doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
> as doing so would circumvent the enclave's security. So to handle an
> exception, the enclave must first be re-entered through the normal
> EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
> (VMRESUME behavior) after the source of the exception is resolved.
>
> All of the above is just the tip of the iceberg when it comes to running
> an enclave. But, SGX was designed in such a way that the host process
> can utilize a library to build, launch and run an enclave. This is
> roughly analogous to how e.g. libc implementations are used by most
> applications so that the application can focus on its business logic.
>
> The big gotcha is that because enclaves can generate *and* handle
> exceptions, any SGX library must be prepared to handle nearly any
> exception at any time (well, any time a thread is executing in an
> enclave). In Linux, this means the SGX library must register a
> signal handler in order to intercept relevant exceptions and forward
> them to the enclave (or in some cases, take action on behalf of the
> enclave). Unfortunately, Linux's signal mechanism doesn't mesh well
> with libraries, e.g. signal handlers are process wide, are difficult
> to chain, etc... This becomes particularly nasty when using multiple
> levels of libraries that register signal handlers, e.g. running an
> enclave via cgo inside of the Go runtime.
>
> In comes vDSO to save the day. Now that vDSO can fixup exceptions,
> add a function, __vdso_sgx_enter_enclave(), to wrap enclave transitions
> and intercept any exceptions that occur when running the enclave.
>
> __vdso_sgx_enter_enclave() does NOT adhere to the x86-64 ABI and instead
> uses a custom calling convention. The primary motivation is to avoid
> issues that arise due to asynchronous enclave exits. The x86-64 ABI
> requires that EFLAGS.DF, MXCSR and FCW be preserved by the callee, and
> unfortunately for the vDSO, the aformentioned registers/bits are not
> restored after an asynchronous exit, e.g. EFLAGS.DF is in an unknown
> state while MXCSR and FCW are reset to their init values. So the vDSO
> cannot simply pass the buck by requiring enclaves to adhere to the
> x86-64 ABI. That leaves three somewhat reasonable options:
>
> 1) Save/restore non-volatile GPRs, MXCSR and FCW, and clear EFLAGS.DF
>
> + 100% compliant with the x86-64 ABI
> + Callable from any code
> + Minimal documentation required
> - Restoring MXCSR/FCW is likely unnecessary 99% of the time
> - Slow
>
> 2) Save/restore non-volatile GPRs and clear EFLAGS.DF
>
> + Mostly compliant with the x86-64 ABI
> + Callable from any code that doesn't use SIMD registers
> - Need to document deviations from x86-64 ABI, i.e. MXCSR and FCW
>
> 3) Require the caller to save/restore everything.
>
> + Fast
> + Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
> - Custom ABI
> - For all intents and purposes must be called from an assembly wrapper
>
> __vdso_sgx_enter_enclave() implements option (3). The custom ABI is
> mostly a documentation issue, and even that is offset by the fact that
> being more similar to hardware's ENCLU[EENTER/ERESUME] ABI reduces the
> amount of documentation needed for the vDSO, e.g. options (2) and (3)
> would need to document which registers are marshalled to/from enclaves.
> Requiring an assembly wrapper imparts minimal pain on userspace as SGX
> libraries and/or applications need a healthy chunk of assembly, e.g. in
> the enclave, regardless of the vDSO's implementation.
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Jarkko Sakkinen <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Haitao Huang <[email protected]>
> Cc: Jethro Beekman <[email protected]>
> Cc: Dr. Greg Wettstein <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>

Looks good to me but without testing too early for reviewed-by.This is
fairly easy patch to give it because all the details are in the
underlying patches.

I think I will test the patch set as soon as I'm done with the new API
changes for v19 i.e. make an updated version of my smoke test program
with the use of this vDSO and the new enclave ioctl API. If that works
I'll give this patch tested-by at that point.

/Jarkko

2018-12-19 14:18:23

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
> On 2018-12-19 14:41, Jarkko Sakkinen wrote:
> > On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote:
> > > One weird thing is the departure from the normal mmap behavior that the
> > > memory mapping persists even if the original fd is closed. (See man mmap:
> > > "closing the file descriptor does not unmap the region.")
> >
> > The mmapped region and enclave would be completely disjoint to start
> > with. The enclave driver code would assume that an enclave VMA exists
> > when it maps enclave address space to a process.
> >
> > I.e. VMA would no longer reference to the enclave or vice versa but
> > you would still create an enclave VMA with mmap().
> >
> > This is IMHO very clear and well-defined semantics.
> >
> > > > struct sgx_enclave_add_page {
> > > > __u64 enclave_fd;
> > > > __u64 src;
> > > > __u64 secinfo;
> > > > __u16 mrmask;
> > > > } __attribute__((__packed__));
> > >
> > > Wouldn't you just pass enclave_fd as the ioctl fd parameter?
> >
> > I'm still planning to keep the API in the device fd and use enclave_fd
> > as handle to the enclave address space. I don't see any obvious reason
> > to change that behavior.
> >
> > And if we ever add any "global" ioctls, then we would have to define
> > APIs to both fd's, which would become a mess.
> >
> > > How to specify the address of the page that is being added?
> >
> > Yes, that is correct and my bad to remove it (just quickly drafted what
> > I had in mind).
>
> So your plan is that to call EADD, userspace has to pass the device fd AND
> the enclave fd AND the enclave address? That seems a little superfluous.

If the enclave fd would have ioctls to add pages etc., two ioctls APIs
would be already needed now (create for device fd and rest to the
enclave fd). Which one would be more superfluous?

/Jarkko

2018-12-19 15:58:08

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
> On 2018-12-19 14:41, Jarkko Sakkinen wrote:
> >On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote:
> >>One weird thing is the departure from the normal mmap behavior that the
> >>memory mapping persists even if the original fd is closed. (See man mmap:
> >>"closing the file descriptor does not unmap the region.")

It won't (be a departure). mmap() on a file grabs a reference to the
file, i.e. each VMA keeps a reference to the file. Closing the original
enclave fd will only put its reference to the file/enclave, not destroy
it outright.

> >
> >The mmapped region and enclave would be completely disjoint to start
> >with. The enclave driver code would assume that an enclave VMA exists
> >when it maps enclave address space to a process.
> >
> >I.e. VMA would no longer reference to the enclave or vice versa but
> >you would still create an enclave VMA with mmap().
> >
> >This is IMHO very clear and well-defined semantics.
> >
> >>>struct sgx_enclave_add_page {
> >>> __u64 enclave_fd;
> >>> __u64 src;
> >>> __u64 secinfo;
> >>> __u16 mrmask;
> >>>} __attribute__((__packed__));
> >>
> >>Wouldn't you just pass enclave_fd as the ioctl fd parameter?
> >
> >I'm still planning to keep the API in the device fd and use enclave_fd
> >as handle to the enclave address space. I don't see any obvious reason
> >to change that behavior.
> >
> >And if we ever add any "global" ioctls, then we would have to define
> >APIs to both fd's, which would become a mess.
> >
> >>How to specify the address of the page that is being added?
> >
> >Yes, that is correct and my bad to remove it (just quickly drafted what
> >I had in mind).
>
> So your plan is that to call EADD, userspace has to pass the device fd AND
> the enclave fd AND the enclave address? That seems a little superfluous.

I agree with Jethro, passing the enclave_fd as a param is obnoxious.
And it means the user needs to open /dev/sgx to do anything with an
enclave fd, e.g. the enclave fd might be passed to a builder thread,
it shouldn't also need the device fd.

E.g.:

sgx_fd = open("/dev/sgx", O_RDWR);
BUG_ON(sgx_fd < 0);

enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
BUG_ON(enclave_fd < 0);

ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
BUG_ON(ret);

...

ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
BUG_ON(ret);

...

close(enclave_fd);
close(sgx_fd);


Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
and ioctls for VMs and vCPUs.

2018-12-19 15:59:23

by Dr. Greg

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 08:41:12AM +0000, Jethro Beekman wrote:

Good morning, I everyone is weathering the pre-holiday season well.

> On 2018-12-19 13:28, Jarkko Sakkinen wrote:
> > * @eclave_fd: file handle to the enclave address space
> > * @attribute_fd: file handle of the attribute file in the securityfs
> > */
> >struct sgx_enclave_set_attribute {
> > __u64 enclave_fd;
> > __u64 attribute_fd;
> >};

> What is this for?

I believe it is a silent response to the issues we were prosecuting
4-5 weeks ago, regarding the requirement for an SGX driver on an FLC
hardware platform to have some semblance of policy management to be
relevant from a security/privacy perspective. It would have certainly
been collegial to include a reference to our discussions and concerns
in the changelog.

See 364f68f5a3c in Jarkko's next/master.

The changeset addresses enclave access to the PROVISION key but is
still insufficient to deliver guarantees that are consistent with the
SGX security model. In order to achieve that, policy management needs
to embrace the use of MRSIGNER values, which is what our SFLC patchset
uses.

The noted changeset actually implements most of the 'kernel bloat'
that our SFLC patchset needs to bolt onto.

As of yesterday afternoon next/master still won't initialize a
non-trivial enclave. Since there now appears to be a wholesale change
in the driver architecture and UAPI we are sitting on the sidelines
waiting for an indication all of that has some hope of working before
we introduce our approach.

Part of SFLC won't be popular but it is driven by clients who are
actually paying for SGX security engineering and architectures.

> Jethro Beekman | Fortanix

Best wishes for a pleasant holiday season to everyone.

Dr. Greg

As always,
Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
4206 N. 19th Ave. Specializing in information infra-structure
Fargo, ND 58102 development.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: [email protected]
------------------------------------------------------------------------------
"Politics is the business of getting power and privilege without possessing
merit."
-- P.J. O'Rourke

2018-12-20 06:19:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <[email protected]> wrote:
>
>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:

> I agree with Jethro, passing the enclave_fd as a param is obnoxious.
> And it means the user needs to open /dev/sgx to do anything with an
> enclave fd, e.g. the enclave fd might be passed to a builder thread,
> it shouldn't also need the device fd.
>
> E.g.:
>
> sgx_fd = open("/dev/sgx", O_RDWR);
> BUG_ON(sgx_fd < 0);
>
> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
> BUG_ON(enclave_fd < 0);
>
> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
> BUG_ON(ret);
>
> ...
>
> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
> BUG_ON(ret);
>
> ...
>
> close(enclave_fd);
> close(sgx_fd);
>
>
> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
> and ioctls for VMs and vCPUs.

Can one of you explain why SGX_ENCLAVE_CREATE is better than just
opening a new instance of /dev/sgx for each encalve?

2018-12-20 14:57:11

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 06:45:15AM -0800, Sean Christopherson wrote:
> I agree with Jethro, passing the enclave_fd as a param is obnoxious.
> And it means the user needs to open /dev/sgx to do anything with an
> enclave fd, e.g. the enclave fd might be passed to a builder thread,

Please note that this is not really a thing that I care that much in the
end of the day because either approach is straight forward to implement.
That is why asked from Jethro, which is more superfluous.

> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
> and ioctls for VMs and vCPUs.

I actually grabbed anon inode code from in-kernel LE code and started to
transform it to this framework just because I was familiar with that
snippet (because I wrote it) but yeah the idea is similar as in there.

/Jarkko

2018-12-20 14:59:56

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 08:43:43AM -0600, Dr. Greg wrote:
> I believe it is a silent response to the issues we were prosecuting
> 4-5 weeks ago, regarding the requirement for an SGX driver on an FLC
> hardware platform to have some semblance of policy management to be
> relevant from a security/privacy perspective. It would have certainly
> been collegial to include a reference to our discussions and concerns
> in the changelog.
>
> See 364f68f5a3c in Jarkko's next/master.
>
> The changeset addresses enclave access to the PROVISION key but is
> still insufficient to deliver guarantees that are consistent with the
> SGX security model. In order to achieve that, policy management needs
> to embrace the use of MRSIGNER values, which is what our SFLC patchset
> uses.
>
> The noted changeset actually implements most of the 'kernel bloat'
> that our SFLC patchset needs to bolt onto.
>
> As of yesterday afternoon next/master still won't initialize a
> non-trivial enclave. Since there now appears to be a wholesale change
> in the driver architecture and UAPI we are sitting on the sidelines
> waiting for an indication all of that has some hope of working before
> we introduce our approach.
>
> Part of SFLC won't be popular but it is driven by clients who are
> actually paying for SGX security engineering and architectures.

How many of these people are actually posting here?

/Jarkko

2018-12-20 15:00:35

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> opening a new instance of /dev/sgx for each encalve?

I think that fits better to the SCM_RIGHTS scenario i.e. you could send
the enclav to a process that does not have necessarily have rights to
/dev/sgx. Gives more robust environment to configure SGX.

/Jarkko

2018-12-20 15:36:38

by Arnd Bergmann

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 10:26 AM Jethro Beekman <[email protected]> wrote:
>
> On 2018-12-19 13:28, Jarkko Sakkinen wrote:
> > /**
> > * struct sgx_enclave_add_page - parameter structure for the
> > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
> > * @eclave_fd: file handle to the enclave address space
> > * @src: address for the page data
> > * @secinfo: address for the SECINFO data
> > * @mrmask: bitmask for the measured 256 byte chunks
> > */
> > struct sgx_enclave_add_page {
> > __u64 enclave_fd;
> > __u64 src;
> > __u64 secinfo;
> > __u16 mrmask;
> > } __attribute__((__packed__));
>
> Wouldn't you just pass enclave_fd as the ioctl fd parameter?
>
> How to specify the address of the page that is being added?

One more comment about the structure: I would generally recommend
against packing structures like this. Instead just extend the mrmask
member to 64 bits as well.

Arnd

2018-12-20 16:04:31

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Dec 20, 2018 at 01:08:46PM +0100, Arnd Bergmann wrote:
> On Wed, Dec 19, 2018 at 10:26 AM Jethro Beekman <[email protected]> wrote:
> >
> > On 2018-12-19 13:28, Jarkko Sakkinen wrote:
> > > /**
> > > * struct sgx_enclave_add_page - parameter structure for the
> > > * %SGX_IOC_ENCLAVE_ADD_PAGE ioctl
> > > * @eclave_fd: file handle to the enclave address space
> > > * @src: address for the page data
> > > * @secinfo: address for the SECINFO data
> > > * @mrmask: bitmask for the measured 256 byte chunks
> > > */
> > > struct sgx_enclave_add_page {
> > > __u64 enclave_fd;
> > > __u64 src;
> > > __u64 secinfo;
> > > __u16 mrmask;
> > > } __attribute__((__packed__));
> >
> > Wouldn't you just pass enclave_fd as the ioctl fd parameter?
> >
> > How to specify the address of the page that is being added?
>
> One more comment about the structure: I would generally recommend
> against packing structures like this. Instead just extend the mrmask
> member to 64 bits as well.

Can do. Thanks for the remark.

/Jarkko

2018-12-20 16:09:01

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > opening a new instance of /dev/sgx for each encalve?
>
> I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> the enclav to a process that does not have necessarily have rights to
> /dev/sgx. Gives more robust environment to configure SGX.

My only open for the implementation is where to swap? If it is a VMA,
whose VMA?

Please share your views here. Not a blocker for me to work on the
implementation, though. I'll use a private shmem file up until there
is a better option.

This ioctl API discussion is kind of meaningless for me ATM because it
does not have that much effect to the internals even if it wouldn't be
perfect in v19. Very trival to change.

/Jarkko

2018-12-20 16:09:52

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Dec 20, 2018 at 03:12:13PM +0200, Jarkko Sakkinen wrote:
> On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > opening a new instance of /dev/sgx for each encalve?
> >
> > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > the enclav to a process that does not have necessarily have rights to
> > /dev/sgx. Gives more robust environment to configure SGX.
>
> My only open for the implementation is where to swap? If it is a VMA,
> whose VMA?
>
> Please share your views here. Not a blocker for me to work on the
> implementation, though. I'll use a private shmem file up until there
> is a better option.
>
> This ioctl API discussion is kind of meaningless for me ATM because it
> does not have that much effect to the internals even if it wouldn't be
> perfect in v19. Very trival to change.

Oops, and after sending I realized that I started this thread asking
comments about the API (I think I mentioned swapping though too) :-) The
feedback has been valuable and I gained the required understanding about
enclave_fd but I think that now the things have been saturated to minor
details.

Appreciate all the feedback so far. Sorry for a bit harsh statement.

/Jarkko

2018-12-21 07:36:38

by Dr. Greg

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Dec 20, 2018 at 12:34:00PM +0200, Jarkko Sakkinen wrote:

Good afternoon to everyone.

> On Wed, Dec 19, 2018 at 08:43:43AM -0600, Dr. Greg wrote:
> > I believe it is a silent response to the issues we were
> > prosecuting 4-5 weeks ago, regarding the requirement for an SGX
> > driver on an FLC hardware platform to have some semblance of
> > policy management to be relevant from a security/privacy
> > perspective. It would have certainly been collegial to include a
> > reference to our discussions and concerns in the changelog.
> >
> > See 364f68f5a3c in Jarkko's next/master.
> >
> > The changeset addresses enclave access to the PROVISION key but is
> > still insufficient to deliver guarantees that are consistent with
> > the SGX security model. In order to achieve that, policy
> > management needs to embrace the use of MRSIGNER values, which is
> > what our SFLC patchset uses.
> >
> > The noted changeset actually implements most of the 'kernel bloat'
> > that our SFLC patchset needs to bolt onto.
> >
> > As of yesterday afternoon next/master still won't initialize a
> > non-trivial enclave. Since there now appears to be a wholesale
> > change in the driver architecture and UAPI we are sitting on the
> > sidelines waiting for an indication all of that has some hope of
> > working before we introduce our approach.
> >
> > Part of SFLC won't be popular but it is driven by clients who are
> > actually paying for SGX security engineering and architectures.

> How many of these people are actually posting here?

None that I know of.

The individuals I was referring to are CISO's and security risk
managers of multi-billion dollar corporations and/or 3-letter
entities. It has been my own personal observation that they don't
have time to post to the Linux Kernel Mailing List.

The time they do spend on this technology seems to involve sitting in
meetings and making decisions on whether or not to authorize capital
expenditure budgets for Intel processors and chipsets, based on
whether or not an SGX security stack can definably implement the
security controls that are being imposed on their organizations by the
government and/or their liability carriers.

Such issues may be out of mainstream kernel concerns but hopefully not
conceptually elusive with respect to their implications.

> /Jarkko

Merry Christmas and happy holidays to everyone.

Dr. Greg

As always,
Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
4206 N. 19th Ave. Specializing in information infra-structure
Fargo, ND 58102 development.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: [email protected]
------------------------------------------------------------------------------
"Don't talk unless you can improve the silence."
-- George Will

2018-12-21 22:31:15

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Dec 20, 2018 at 04:06:38PM -0600, Dr. Greg wrote:
> On Thu, Dec 20, 2018 at 12:34:00PM +0200, Jarkko Sakkinen wrote:
>
> Good afternoon to everyone.
>
> > On Wed, Dec 19, 2018 at 08:43:43AM -0600, Dr. Greg wrote:
> > > I believe it is a silent response to the issues we were
> > > prosecuting 4-5 weeks ago, regarding the requirement for an SGX
> > > driver on an FLC hardware platform to have some semblance of
> > > policy management to be relevant from a security/privacy
> > > perspective. It would have certainly been collegial to include a
> > > reference to our discussions and concerns in the changelog.
> > >
> > > See 364f68f5a3c in Jarkko's next/master.
> > >
> > > The changeset addresses enclave access to the PROVISION key but is
> > > still insufficient to deliver guarantees that are consistent with
> > > the SGX security model. In order to achieve that, policy
> > > management needs to embrace the use of MRSIGNER values, which is
> > > what our SFLC patchset uses.
> > >
> > > The noted changeset actually implements most of the 'kernel bloat'
> > > that our SFLC patchset needs to bolt onto.
> > >
> > > As of yesterday afternoon next/master still won't initialize a
> > > non-trivial enclave. Since there now appears to be a wholesale
> > > change in the driver architecture and UAPI we are sitting on the
> > > sidelines waiting for an indication all of that has some hope of
> > > working before we introduce our approach.
> > >
> > > Part of SFLC won't be popular but it is driven by clients who are
> > > actually paying for SGX security engineering and architectures.
>
> > How many of these people are actually posting here?
>
> None that I know of.
>
> The individuals I was referring to are CISO's and security risk
> managers of multi-billion dollar corporations and/or 3-letter
> entities. It has been my own personal observation that they don't
> have time to post to the Linux Kernel Mailing List.
>
> The time they do spend on this technology seems to involve sitting in
> meetings and making decisions on whether or not to authorize capital
> expenditure budgets for Intel processors and chipsets, based on
> whether or not an SGX security stack can definably implement the
> security controls that are being imposed on their organizations by the
> government and/or their liability carriers.
>
> Such issues may be out of mainstream kernel concerns but hopefully not
> conceptually elusive with respect to their implications.

Well, the process is anyway what it is. I'm sending v18 and you are free
to comment the code change associated with the provision.

/Jarkko

2018-12-22 17:04:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

> On Dec 21, 2018, at 9:28 AM, Sean Christopherson <[email protected]> wrote:
>
> On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
>>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <[email protected]> wrote:
>>>
>>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
>>>
>>> I agree with Jethro, passing the enclave_fd as a param is obnoxious.
>>> And it means the user needs to open /dev/sgx to do anything with an
>>> enclave fd, e.g. the enclave fd might be passed to a builder thread,
>>> it shouldn't also need the device fd.
>>>
>>> E.g.:
>>>
>>> sgx_fd = open("/dev/sgx", O_RDWR);
>>> BUG_ON(sgx_fd < 0);
>>>
>>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
>>> BUG_ON(enclave_fd < 0);
>>>
>>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
>>> BUG_ON(ret);
>>>
>>> ...
>>>
>>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
>>> BUG_ON(ret);
>>>
>>> ...
>>>
>>> close(enclave_fd);
>>> close(sgx_fd);
>>>
>>>
>>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
>>> and ioctls for VMs and vCPUs.
>>
>> Can one of you explain why SGX_ENCLAVE_CREATE is better than just
>> opening a new instance of /dev/sgx for each encalve?
>
> Directly associating /dev/sgx with an enclave means /dev/sgx can't be
> used to provide ioctl()'s for other SGX-related needs, e.g. to mmap()
> raw EPC and expose it a VM. Proposed layout in the link below. I'll
> also respond to Jarkko's question about exposing EPC through /dev/sgx
> instead of having KVM allocate it on behalf of the VM.

Hmm. I guess this makes some sense. My instinct would be to do it a
little differently and have:

/dev/sgx/enclave: Each instance is an enclave.

/dev/sgx/epc: Used to get raw EPC for KVM. Might have different
permissions, perhaps 0660 and group kvm.

/dev/sgx/something_else: For when SGX v3 adds something else :)

2018-12-22 17:24:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote:
> > On Dec 21, 2018, at 9:28 AM, Sean Christopherson <[email protected]> wrote:
> >
> > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> >>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <[email protected]> wrote:
> >>>
> >>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
> >>>
> >>> I agree with Jethro, passing the enclave_fd as a param is obnoxious.
> >>> And it means the user needs to open /dev/sgx to do anything with an
> >>> enclave fd, e.g. the enclave fd might be passed to a builder thread,
> >>> it shouldn't also need the device fd.
> >>>
> >>> E.g.:
> >>>
> >>> sgx_fd = open("/dev/sgx", O_RDWR);
> >>> BUG_ON(sgx_fd < 0);
> >>>
> >>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
> >>> BUG_ON(enclave_fd < 0);
> >>>
> >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
> >>> BUG_ON(ret);
> >>>
> >>> ...
> >>>
> >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
> >>> BUG_ON(ret);
> >>>
> >>> ...
> >>>
> >>> close(enclave_fd);
> >>> close(sgx_fd);
> >>>
> >>>
> >>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
> >>> and ioctls for VMs and vCPUs.
> >>
> >> Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> >> opening a new instance of /dev/sgx for each encalve?
> >
> > Directly associating /dev/sgx with an enclave means /dev/sgx can't be
> > used to provide ioctl()'s for other SGX-related needs, e.g. to mmap()
> > raw EPC and expose it a VM. Proposed layout in the link below. I'll
> > also respond to Jarkko's question about exposing EPC through /dev/sgx
> > instead of having KVM allocate it on behalf of the VM.
>
> Hmm. I guess this makes some sense. My instinct would be to do it a
> little differently and have:
>
> /dev/sgx/enclave: Each instance is an enclave.
>
> /dev/sgx/epc: Used to get raw EPC for KVM. Might have different
> permissions, perhaps 0660 and group kvm.
>
> /dev/sgx/something_else: For when SGX v3 adds something else :)

Mmmm, I like this approach a lot. It would allow userspace to easily
manage permissions for each "feature", e.g. give all users access to
/dev/sgx/epc but restrict /dev/sgx/enclave.

And we could add e.g. /dev/sgx/admin if we wanted to exposed ioctls()
that apply to all aspects of SGX.

Do you know if /dev/sgx/epc could be dynamically created, e.g. by
KVM when the kvm_intel module is loaded? That would seal the deal for
me as it'd keep open the option of having KVM handle oversubscription
of guest EPC while exposing EPC through /dev/sgx instead of /dev/kvm.

2018-12-22 19:05:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > On Dec 19, 2018, at 6:45 AM, Sean Christopherson <[email protected]> wrote:
> >
> >> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
>
> > I agree with Jethro, passing the enclave_fd as a param is obnoxious.
> > And it means the user needs to open /dev/sgx to do anything with an
> > enclave fd, e.g. the enclave fd might be passed to a builder thread,
> > it shouldn't also need the device fd.
> >
> > E.g.:
> >
> > sgx_fd = open("/dev/sgx", O_RDWR);
> > BUG_ON(sgx_fd < 0);
> >
> > enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
> > BUG_ON(enclave_fd < 0);
> >
> > ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
> > BUG_ON(ret);
> >
> > ...
> >
> > ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
> > BUG_ON(ret);
> >
> > ...
> >
> > close(enclave_fd);
> > close(sgx_fd);
> >
> >
> > Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
> > and ioctls for VMs and vCPUs.
>
> Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> opening a new instance of /dev/sgx for each encalve?

Directly associating /dev/sgx with an enclave means /dev/sgx can't be
used to provide ioctl()'s for other SGX-related needs, e.g. to mmap()
raw EPC and expose it a VM. Proposed layout in the link below. I'll
also respond to Jarkko's question about exposing EPC through /dev/sgx
instead of having KVM allocate it on behalf of the VM.

https://lkml.kernel.org/r/[email protected]

2018-12-23 09:37:57

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote:
> Hmm. I guess this makes some sense. My instinct would be to do it a
> little differently and have:
>
> /dev/sgx/enclave: Each instance is an enclave.
>
> /dev/sgx/epc: Used to get raw EPC for KVM. Might have different
> permissions, perhaps 0660 and group kvm.
>
> /dev/sgx/something_else: For when SGX v3 adds something else :)

If I make a draw by saying that I will go with the "ioctls for enclave
fd" as I'm already making good progress in implementation for v19 and we
will look at it?

Does not look like a fruitful conversation to continue forward without
functional code. I will also update my selftest (as it is part of the
patch set) to align with whatever we have so you can immediately run
something.

And since no one is giving me anything at all on swapping but instead
cutting hairs on here, I will lock in (at least for v19) to use shmem
again.

Sounds like a plan? All the internals will work for whatever mess we
want to introduce to the uapi.

/Jarkko

2018-12-23 09:38:10

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Dec 21, 2018 at 10:24:54AM -0800, Sean Christopherson wrote:
> On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote:
> > > On Dec 21, 2018, at 9:28 AM, Sean Christopherson <[email protected]> wrote:
> > >
> > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > >>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <[email protected]> wrote:
> > >>>
> > >>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
> > >>>
> > >>> I agree with Jethro, passing the enclave_fd as a param is obnoxious.
> > >>> And it means the user needs to open /dev/sgx to do anything with an
> > >>> enclave fd, e.g. the enclave fd might be passed to a builder thread,
> > >>> it shouldn't also need the device fd.
> > >>>
> > >>> E.g.:
> > >>>
> > >>> sgx_fd = open("/dev/sgx", O_RDWR);
> > >>> BUG_ON(sgx_fd < 0);
> > >>>
> > >>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
> > >>> BUG_ON(enclave_fd < 0);
> > >>>
> > >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
> > >>> BUG_ON(ret);
> > >>>
> > >>> ...
> > >>>
> > >>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
> > >>> BUG_ON(ret);
> > >>>
> > >>> ...
> > >>>
> > >>> close(enclave_fd);
> > >>> close(sgx_fd);
> > >>>
> > >>>
> > >>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
> > >>> and ioctls for VMs and vCPUs.
> > >>
> > >> Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > >> opening a new instance of /dev/sgx for each encalve?
> > >
> > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be
> > > used to provide ioctl()'s for other SGX-related needs, e.g. to mmap()
> > > raw EPC and expose it a VM. Proposed layout in the link below. I'll
> > > also respond to Jarkko's question about exposing EPC through /dev/sgx
> > > instead of having KVM allocate it on behalf of the VM.
> >
> > Hmm. I guess this makes some sense. My instinct would be to do it a
> > little differently and have:
> >
> > /dev/sgx/enclave: Each instance is an enclave.
> >
> > /dev/sgx/epc: Used to get raw EPC for KVM. Might have different
> > permissions, perhaps 0660 and group kvm.
> >
> > /dev/sgx/something_else: For when SGX v3 adds something else :)
>
> Mmmm, I like this approach a lot. It would allow userspace to easily
> manage permissions for each "feature", e.g. give all users access to
> /dev/sgx/epc but restrict /dev/sgx/enclave.
>
> And we could add e.g. /dev/sgx/admin if we wanted to exposed ioctls()
> that apply to all aspects of SGX.
>
> Do you know if /dev/sgx/epc could be dynamically created, e.g. by
> KVM when the kvm_intel module is loaded? That would seal the deal for
> me as it'd keep open the option of having KVM handle oversubscription
> of guest EPC while exposing EPC through /dev/sgx instead of /dev/kvm.

No issues with this approach from my side but won't use it for v19.

Please share these comments when reviewing v19. Neither objections to
have this interface.

/Jarkko

2018-12-23 16:21:28

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > opening a new instance of /dev/sgx for each encalve?
>
> I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> the enclav to a process that does not have necessarily have rights to
> /dev/sgx. Gives more robust environment to configure SGX.

Sean, is this why you wanted enclave fd and anon inode and not just use
the address space of /dev/sgx? Just taking notes of all observations.
I'm not sure what your rationale was (maybe it was somewhere). This was
something I made up, and this one is wrong deduction. You can easily
get the same benefit with /dev/sgx associated fd representing the
enclave.

This all means that for v19 I'm going without enclave fd involved with
fd to /dev/sgx representing the enclave. No anon inodes will be
involved.

/Jarkko

2018-12-23 16:44:35

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > opening a new instance of /dev/sgx for each encalve?
> > >
> > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > the enclav to a process that does not have necessarily have rights to
> > > /dev/sgx. Gives more robust environment to configure SGX.
> >
> > Sean, is this why you wanted enclave fd and anon inode and not just use
> > the address space of /dev/sgx? Just taking notes of all observations.
> > I'm not sure what your rationale was (maybe it was somewhere). This was
> > something I made up, and this one is wrong deduction. You can easily
> > get the same benefit with /dev/sgx associated fd representing the
> > enclave.
> >
> > This all means that for v19 I'm going without enclave fd involved with
> > fd to /dev/sgx representing the enclave. No anon inodes will be
> > involved.
>
> Based on these observations I updated the uapi.
>
> As far as I'm concerned there has to be a solution to do EPC mapping
> with a sequence:
>
> 1. Ping /dev/kvm to do something.
> 2. KVM asks SGX core to do something.
> 3. SGX core does something.
>
> I don't care what the something is exactly is, but KVM is the only sane
> place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> that they don't want to sprinkle KVM uapi to random places in other
> subsystems.

The one option to consider to do would be to have a device driver for
KVM if you really want this e.g. something like /dev/vsgx. With the
current knowledge I'm not yet sure why all could not be done just
through /dev/kvm.

/Jarkko

2018-12-23 16:50:13

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote:
> /dev/sgx/enclave: Each instance is an enclave.
>
> /dev/sgx/epc: Used to get raw EPC for KVM. Might have different
> permissions, perhaps 0660 and group kvm.
>
> /dev/sgx/something_else: For when SGX v3 adds something else :)

Responding again to this anyway now that I have had time think about
it.

Here is now I see it:

1. /dev/sgx/enclave should be /dev/sgx as it is now.
2. /dev/sgx/epc should be something that you'd reach through /dev/kvm.
This essentially a circular dependency. KVM uapi should provide
KVM services. Now you sprinkle KVM uapi to two subsystems.
3. "something else" is securityfs (e.g. provisioning). That is kind of
stuff that it is meant for.

I'm sorry but from my perspective this does not look too good no
matter what glasses I put on...

/Jarkko

2018-12-23 20:45:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Sun, Dec 23, 2018 at 4:52 AM Jarkko Sakkinen
<[email protected]> wrote:
>
> On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > > opening a new instance of /dev/sgx for each encalve?
> > > >
> > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > > the enclav to a process that does not have necessarily have rights to
> > > > /dev/sgx. Gives more robust environment to configure SGX.
> > >
> > > Sean, is this why you wanted enclave fd and anon inode and not just use
> > > the address space of /dev/sgx? Just taking notes of all observations.
> > > I'm not sure what your rationale was (maybe it was somewhere). This was
> > > something I made up, and this one is wrong deduction. You can easily
> > > get the same benefit with /dev/sgx associated fd representing the
> > > enclave.
> > >
> > > This all means that for v19 I'm going without enclave fd involved with
> > > fd to /dev/sgx representing the enclave. No anon inodes will be
> > > involved.
> >
> > Based on these observations I updated the uapi.
> >
> > As far as I'm concerned there has to be a solution to do EPC mapping
> > with a sequence:
> >
> > 1. Ping /dev/kvm to do something.
> > 2. KVM asks SGX core to do something.
> > 3. SGX core does something.
> >
> > I don't care what the something is exactly is, but KVM is the only sane
> > place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> > that they don't want to sprinkle KVM uapi to random places in other
> > subsystems.
>
> The one option to consider to do would be to have a device driver for
> KVM if you really want this e.g. something like /dev/vsgx. With the
> current knowledge I'm not yet sure why all could not be done just
> through /dev/kvm.
>

That seems reasonable too. I don't really care about the path to the
device node, but it does seem reasonable to me to have it be a
separate node entirely from the normal enclave interface.

2018-12-23 20:45:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

> On Dec 21, 2018, at 11:24 AM, Sean Christopherson <[email protected]> wrote:
>
> On Fri, Dec 21, 2018 at 09:12:46AM -0800, Andy Lutomirski wrote:
>>> On Dec 21, 2018, at 9:28 AM, Sean Christopherson <[email protected]> wrote:
>>>
>>> On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
>>>>> On Dec 19, 2018, at 6:45 AM, Sean Christopherson <[email protected]> wrote:
>>>>>
>>>>>>> On Wed, Dec 19, 2018 at 09:36:16AM +0000, Jethro Beekman wrote:
>>>>>
>>>>> I agree with Jethro, passing the enclave_fd as a param is obnoxious.
>>>>> And it means the user needs to open /dev/sgx to do anything with an
>>>>> enclave fd, e.g. the enclave fd might be passed to a builder thread,
>>>>> it shouldn't also need the device fd.
>>>>>
>>>>> E.g.:
>>>>>
>>>>> sgx_fd = open("/dev/sgx", O_RDWR);
>>>>> BUG_ON(sgx_fd < 0);
>>>>>
>>>>> enclave_fd = ioctl(sgx_fd, SGX_ENCLAVE_CREATE, &ecreate);
>>>>> BUG_ON(enclave_fd < 0);
>>>>>
>>>>> ret = ioctl(enclave_fd, SGX_ENCLAVE_ADD_PAGE, &eadd);
>>>>> BUG_ON(ret);
>>>>>
>>>>> ...
>>>>>
>>>>> ret = ioctl(enclave_fd, SGX_ENCLAVE_INIT, &einit);
>>>>> BUG_ON(ret);
>>>>>
>>>>> ...
>>>>>
>>>>> close(enclave_fd);
>>>>> close(sgx_fd);
>>>>>
>>>>>
>>>>> Take a look at virt/kvm/kvm_main.c to see how KVM manages anon inodes
>>>>> and ioctls for VMs and vCPUs.
>>>>
>>>> Can one of you explain why SGX_ENCLAVE_CREATE is better than just
>>>> opening a new instance of /dev/sgx for each encalve?
>>>
>>> Directly associating /dev/sgx with an enclave means /dev/sgx can't be
>>> used to provide ioctl()'s for other SGX-related needs, e.g. to mmap()
>>> raw EPC and expose it a VM. Proposed layout in the link below. I'll
>>> also respond to Jarkko's question about exposing EPC through /dev/sgx
>>> instead of having KVM allocate it on behalf of the VM.
>>
>> Hmm. I guess this makes some sense. My instinct would be to do it a
>> little differently and have:
>>
>> /dev/sgx/enclave: Each instance is an enclave.
>>
>> /dev/sgx/epc: Used to get raw EPC for KVM. Might have different
>> permissions, perhaps 0660 and group kvm.
>>
>> /dev/sgx/something_else: For when SGX v3 adds something else :)
>
> Mmmm, I like this approach a lot. It would allow userspace to easily
> manage permissions for each "feature", e.g. give all users access to
> /dev/sgx/epc but restrict /dev/sgx/enclave.
>
> And we could add e.g. /dev/sgx/admin if we wanted to exposed ioctls()
> that apply to all aspects of SGX.
>
> Do you know if /dev/sgx/epc could be dynamically created, e.g. by
> KVM when the kvm_intel module is loaded?

Presumably sgx would create a bus and enumerate the devices as needed.
Or I suppose these things could be platform or system devices. I’m not
really a device model expert, and the one time I tried to implement a
character device, I got so disgusted that I wrote a whole library for
it. It’s still in limbo somewhere.


> That would seal the deal for
> me as it'd keep open the option of having KVM handle oversubscription
> of guest EPC while exposing EPC through /dev/sgx instead of /dev/kvm.

2018-12-24 11:53:14

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Sun, Dec 23, 2018 at 12:42:48PM -0800, Andy Lutomirski wrote:
> On Sun, Dec 23, 2018 at 4:52 AM Jarkko Sakkinen
> <[email protected]> wrote:
> >
> > On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> > > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > > > opening a new instance of /dev/sgx for each encalve?
> > > > >
> > > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > > > the enclav to a process that does not have necessarily have rights to
> > > > > /dev/sgx. Gives more robust environment to configure SGX.
> > > >
> > > > Sean, is this why you wanted enclave fd and anon inode and not just use
> > > > the address space of /dev/sgx? Just taking notes of all observations.
> > > > I'm not sure what your rationale was (maybe it was somewhere). This was
> > > > something I made up, and this one is wrong deduction. You can easily
> > > > get the same benefit with /dev/sgx associated fd representing the
> > > > enclave.
> > > >
> > > > This all means that for v19 I'm going without enclave fd involved with
> > > > fd to /dev/sgx representing the enclave. No anon inodes will be
> > > > involved.
> > >
> > > Based on these observations I updated the uapi.
> > >
> > > As far as I'm concerned there has to be a solution to do EPC mapping
> > > with a sequence:
> > >
> > > 1. Ping /dev/kvm to do something.
> > > 2. KVM asks SGX core to do something.
> > > 3. SGX core does something.
> > >
> > > I don't care what the something is exactly is, but KVM is the only sane
> > > place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> > > that they don't want to sprinkle KVM uapi to random places in other
> > > subsystems.
> >
> > The one option to consider to do would be to have a device driver for
> > KVM if you really want this e.g. something like /dev/vsgx. With the
> > current knowledge I'm not yet sure why all could not be done just
> > through /dev/kvm.
> >
>
> That seems reasonable too. I don't really care about the path to the
> device node, but it does seem reasonable to me to have it be a
> separate node entirely from the normal enclave interface.

What is the core reason anyway that /dev/kvm is out of the question?

/Jarkko

2018-12-24 12:03:38

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Sun, Dec 23, 2018 at 12:41:49PM -0800, Andy Lutomirski wrote:
> Presumably sgx would create a bus and enumerate the devices as needed.
> Or I suppose these things could be platform or system devices. I’m not
> really a device model expert, and the one time I tried to implement a
> character device, I got so disgusted that I wrote a whole library for
> it. It’s still in limbo somewhere.

In the driver/main.c I have a fairly good framework for device creation
so if there is any reason to add more devices it is dead easy. I just
don't see any reason for that. And I already have a bus there for SGX.
All of this has been done a long time ago.

/Jarkko

2019-01-02 23:41:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > opening a new instance of /dev/sgx for each encalve?
> > >
> > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > the enclav to a process that does not have necessarily have rights to
> > > /dev/sgx. Gives more robust environment to configure SGX.
> >
> > Sean, is this why you wanted enclave fd and anon inode and not just use
> > the address space of /dev/sgx? Just taking notes of all observations.
> > I'm not sure what your rationale was (maybe it was somewhere). This was
> > something I made up, and this one is wrong deduction. You can easily
> > get the same benefit with /dev/sgx associated fd representing the
> > enclave.
> >
> > This all means that for v19 I'm going without enclave fd involved with
> > fd to /dev/sgx representing the enclave. No anon inodes will be
> > involved.
>
> Based on these observations I updated the uapi.
>
> As far as I'm concerned there has to be a solution to do EPC mapping
> with a sequence:
>
> 1. Ping /dev/kvm to do something.
> 2. KVM asks SGX core to do something.
> 3. SGX core does something.
>
> I don't care what the something is exactly is, but KVM is the only sane
> place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> that they don't want to sprinkle KVM uapi to random places in other
> subsystems.

It's not a KVM uapi.

KVM isn't a hypervisor in the traditional sense. The "real" hypervisor
lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy)
driver for hardware accelerators, e.g. VMX. Qemu for example is fully
capable of running an x86 VM without KVM, it's just substantially slower.

In terms of guest memory, KVM doesn't care or even know what a particular
region of memory represents or what, if anything, is backing a region in
the host. There are cases when KVM is made aware of certain aspects of
guest memory for performance or functional reasons, e.g. emulated MMIO
and encrypted memory, but in all cases the control logic ultimately
resides in userspace.

SGX is a weird case because ENCLS can't be emulated in software, i.e.
exposing SGX to a VM without KVM's help would be difficult. But, it
wouldn't be impossible, just slow and ugly.

And so, ignoring host oversubscription for the moment, there is no hard
requirement that SGX EPC can only be exposed to a VM through KVM. In
other words, allocating and exposing EPC to a VM is orthogonal to KVM
supporting SGX. Exposing EPC to userspace via /dev/sgx/epc would mean
that KVM would handle it like any other guest memory region, and all EPC
related code/logic would reside in the SGX subsystem.

Oversubscription throws a wrench in the system because ENCLV can only
be executed post-VMXON and EPC conflicts generate VMX VM-Exits. But
even then, KVM doesn't need to own the EPC uapi, e.g. it can call into
the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem
can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if
ENCLV faults.

I can't be 100% certain the oversubscription scheme will be sane without
actually writing the code, but I'd like to at least keep the option open,
i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc
is impossible or ugly.

2019-01-03 21:33:32

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Jan 02, 2019 at 12:47:52PM -0800, Sean Christopherson wrote:
> On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > > opening a new instance of /dev/sgx for each encalve?
> > > >
> > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > > the enclav to a process that does not have necessarily have rights to
> > > > /dev/sgx. Gives more robust environment to configure SGX.
> > >
> > > Sean, is this why you wanted enclave fd and anon inode and not just use
> > > the address space of /dev/sgx? Just taking notes of all observations.
> > > I'm not sure what your rationale was (maybe it was somewhere). This was
> > > something I made up, and this one is wrong deduction. You can easily
> > > get the same benefit with /dev/sgx associated fd representing the
> > > enclave.
> > >
> > > This all means that for v19 I'm going without enclave fd involved with
> > > fd to /dev/sgx representing the enclave. No anon inodes will be
> > > involved.
> >
> > Based on these observations I updated the uapi.
> >
> > As far as I'm concerned there has to be a solution to do EPC mapping
> > with a sequence:
> >
> > 1. Ping /dev/kvm to do something.
> > 2. KVM asks SGX core to do something.
> > 3. SGX core does something.
> >
> > I don't care what the something is exactly is, but KVM is the only sane
> > place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> > that they don't want to sprinkle KVM uapi to random places in other
> > subsystems.
>
> It's not a KVM uapi.
>
> KVM isn't a hypervisor in the traditional sense. The "real" hypervisor
> lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy)
> driver for hardware accelerators, e.g. VMX. Qemu for example is fully
> capable of running an x86 VM without KVM, it's just substantially slower.
>
> In terms of guest memory, KVM doesn't care or even know what a particular
> region of memory represents or what, if anything, is backing a region in
> the host. There are cases when KVM is made aware of certain aspects of
> guest memory for performance or functional reasons, e.g. emulated MMIO
> and encrypted memory, but in all cases the control logic ultimately
> resides in userspace.
>
> SGX is a weird case because ENCLS can't be emulated in software, i.e.
> exposing SGX to a VM without KVM's help would be difficult. But, it
> wouldn't be impossible, just slow and ugly.
>
> And so, ignoring host oversubscription for the moment, there is no hard
> requirement that SGX EPC can only be exposed to a VM through KVM. In
> other words, allocating and exposing EPC to a VM is orthogonal to KVM
> supporting SGX. Exposing EPC to userspace via /dev/sgx/epc would mean
> that KVM would handle it like any other guest memory region, and all EPC
> related code/logic would reside in the SGX subsystem.

I'm fine doing that if it makes sense. I just don't understand why you
cannot add ioctls to /dev/kvm for allocating the region. Why isn't that
possible? As I said to Andy earlier, adding new device files is easy as
everything related to device creation is nicely encapsulated.

> Oversubscription throws a wrench in the system because ENCLV can only
> be executed post-VMXON and EPC conflicts generate VMX VM-Exits. But
> even then, KVM doesn't need to own the EPC uapi, e.g. it can call into
> the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem
> can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if
> ENCLV faults.

If the uapi is *only* for KVM, it should definitely own it. KVM calling
SGX subsystem on a conflict is KVM using in-kernel APIs provided by the
SGX core.

> I can't be 100% certain the oversubscription scheme will be sane without
> actually writing the code, but I'd like to at least keep the option open,
> i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc
> is impossible or ugly.

/Jarkko

2019-01-08 20:12:07

by Kai Huang

[permalink] [raw]
Subject: RE: x86/sgx: uapi change proposal

> >
> > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > opening a new instance of /dev/sgx for each encalve?
>
> Directly associating /dev/sgx with an enclave means /dev/sgx can't be used
> to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and
> expose it a VM. Proposed layout in the link below. I'll also respond to
> Jarkko's question about exposing EPC through /dev/sgx instead of having
> KVM allocate it on behalf of the VM.
>
> https://lkml.kernel.org/r/[email protected]

Hi Sean,

Sorry for replying to old email. But IMHO it is not a must that Qemu needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual EPC slot, instead, KVM could create private slot, which is not visible to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation API directly.

I am not sure what's the good of allowing userspace to alloc/mmap a raw EPC region? Userspace is not allowed to touch EPC anyway, expect enclave code.

To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc and allowing userspace to map some raw EPC region.

Thanks,
-Kai

2019-01-08 22:57:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote:
> > >
> > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > opening a new instance of /dev/sgx for each encalve?
> >
> > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used
> > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and
> > expose it a VM. Proposed layout in the link below. I'll also respond to
> > Jarkko's question about exposing EPC through /dev/sgx instead of having
> > KVM allocate it on behalf of the VM.
> >
> > https://lkml.kernel.org/r/[email protected]
>
> Hi Sean,
>
> Sorry for replying to old email. But IMHO it is not a must that Qemu
> needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual
> EPC slot, instead, KVM could create private slot, which is not visible
> to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation
> API directly.

That's possible, but it has several downsides.

- Duplicates a lot of code in KVM for managing memory regions.
- Artificially restricts userspace to a single EPC region, unless
even more code is duplicated to handle multiple private regions.
- Requires additional ioctls() or capabilities to probe EPC support
- Does not fit with Qemu/KVM's memory model, e.g. all other types of
memory are exposed to a guest through KVM_SET_USER_MEMORY_REGION.
- Prevents userspace from debugging a guest's enclave. I'm not saying
this is a likely scenario, but I also don't think we should preclude
it without good reason.
- KVM is now responsible for managing the lifecycle of EPC, e.g. what
happens if an EPC cgroup limit is lowered on a running VM and
KVM can't gracefully reclaim EPC? The userspace hypervisor should
ultimately decide how to handle such an event.
- SGX logic is split between SGX and KVM, e.g. VA page management for
oversubscription will likely be common to SGX and KVM. From a long
term maintenance perspective, this means that changes to the EPC
management could potentially need to be Acked by KVM, and vice versa.

> I am not sure what's the good of allowing userspace to alloc/mmap a
> raw EPC region? Userspace is not allowed to touch EPC anyway, expect
> enclave code.
>
> To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc
> and allowing userspace to map some raw EPC region.

Cleaner in the sense that it's faster to get basic support up and running
since there are fewer touchpoints, but there are long term ramifications
to cramming EPC management in KVM.

And at this point I'm not stating any absolutes, e.g. how EPC will be
handled by KVM. What I'm pushing for is to not eliminate the possibility
of having the SGX subsystem own all EPC management, e.g. don't tie
/dev/sgx to a single enclave.

>
> Thanks,
> -Kai

2019-01-08 23:06:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
<[email protected]> wrote:
>
> On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote:
> > > >
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > opening a new instance of /dev/sgx for each encalve?
> > >
> > > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used
> > > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and
> > > expose it a VM. Proposed layout in the link below. I'll also respond to
> > > Jarkko's question about exposing EPC through /dev/sgx instead of having
> > > KVM allocate it on behalf of the VM.
> > >
> > > https://lkml.kernel.org/r/[email protected]
> >
> > Hi Sean,
> >
> > Sorry for replying to old email. But IMHO it is not a must that Qemu
> > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual
> > EPC slot, instead, KVM could create private slot, which is not visible
> > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation
> > API directly.
>
> That's possible, but it has several downsides.
>
> - Duplicates a lot of code in KVM for managing memory regions.
> - Artificially restricts userspace to a single EPC region, unless
> even more code is duplicated to handle multiple private regions.
> - Requires additional ioctls() or capabilities to probe EPC support
> - Does not fit with Qemu/KVM's memory model, e.g. all other types of
> memory are exposed to a guest through KVM_SET_USER_MEMORY_REGION.
> - Prevents userspace from debugging a guest's enclave. I'm not saying
> this is a likely scenario, but I also don't think we should preclude
> it without good reason.
> - KVM is now responsible for managing the lifecycle of EPC, e.g. what
> happens if an EPC cgroup limit is lowered on a running VM and
> KVM can't gracefully reclaim EPC? The userspace hypervisor should
> ultimately decide how to handle such an event.
> - SGX logic is split between SGX and KVM, e.g. VA page management for
> oversubscription will likely be common to SGX and KVM. From a long
> term maintenance perspective, this means that changes to the EPC
> management could potentially need to be Acked by KVM, and vice versa.
>
> > I am not sure what's the good of allowing userspace to alloc/mmap a
> > raw EPC region? Userspace is not allowed to touch EPC anyway, expect
> > enclave code.
> >
> > To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc
> > and allowing userspace to map some raw EPC region.
>
> Cleaner in the sense that it's faster to get basic support up and running
> since there are fewer touchpoints, but there are long term ramifications
> to cramming EPC management in KVM.
>
> And at this point I'm not stating any absolutes, e.g. how EPC will be
> handled by KVM. What I'm pushing for is to not eliminate the possibility
> of having the SGX subsystem own all EPC management, e.g. don't tie
> /dev/sgx to a single enclave.

I haven't gone and re-read all the relevant SDM bits, so I'll just
ask: what, if anything, are the actual semantics of mapping "raw EPC"
like this? You can't actually do anything with the mapping from user
mode unless you actually get an enclave created and initialized in it
and have it mapped at the correct linear address, right? I still
think you have the right idea, but it is a bit unusual.

I do think it makes sense to have QEMU delegate the various ENCLS
operations (especially EINIT) to the regular SGX interface, which will
mean that VM guests will have exactly the same access controls applied
as regular user programs, which is probably what we want. If so,
there will need to be a way to get INITTOKEN privilege for the purpose
of running non-Linux OSes in the VM, which isn't the end of the world.
We might still want the actual ioctl to do EINIT using an actual
explicit token to be somehow restricted in a way that strongly
discourages its use by anything other than a hypervisor. Or I suppose
we could just straight-up ignore the guest-provided init token.

--Andy

P.S. Is Intel ever going to consider a way to make guests get their
own set of keys that are different from the host's keys and other
guests' keys?

2019-01-09 05:27:26

by Kai Huang

[permalink] [raw]
Subject: RE: x86/sgx: uapi change proposal

> On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote:
> > > >
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > opening a new instance of /dev/sgx for each encalve?
> > >
> > > Directly associating /dev/sgx with an enclave means /dev/sgx can't
> > > be used to provide ioctl()'s for other SGX-related needs, e.g. to
> > > mmap() raw EPC and expose it a VM. Proposed layout in the link
> > > below. I'll also respond to Jarkko's question about exposing EPC
> > > through /dev/sgx instead of having KVM allocate it on behalf of the VM.
> > >
> > > https://lkml.kernel.org/r/[email protected]
> >
> > Hi Sean,
> >
> > Sorry for replying to old email. But IMHO it is not a must that Qemu
> > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual
> > EPC slot, instead, KVM could create private slot, which is not visible
> > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation
> > API directly.
>
> That's possible, but it has several downsides.
>
> - Duplicates a lot of code in KVM for managing memory regions.

I don't see why there will be duplicated code. you can simply call __x86_set_memory_region to create private slot. It is KVM x86 equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only difference is Qemu is not aware of the private slot.

> - Artificially restricts userspace to a single EPC region, unless
> even more code is duplicated to handle multiple private regions.

You can have multiple private slots, by calling __x86_set_memory_region for each EPC section. KVM receives EPC section/sections info from Qemu, via CPUID, or dedicated IOCTL (is this you are going to add?), and simply creates private EPC slot/slots.

> - Requires additional ioctls() or capabilities to probe EPC support

No. EPC info is from Qemu at the beginning (size is given by parameter, base is calculated by Qemu), and actually it is Qemu notifies KVM EPC info, so I don't think we require additional ioctls or capabilities here.

> - Does not fit with Qemu/KVM's memory model, e.g. all other types of
> memory are exposed to a guest through
> KVM_SET_USER_MEMORY_REGION.

EPC is different. I am not sure whether EPC needs to fit such model. There are already examples in KVM which uses private slot w/o using KVM_SET_USER_MEMORY_REGION, for example, APIC access page.

> - Prevents userspace from debugging a guest's enclave. I'm not saying
> this is a likely scenario, but I also don't think we should preclude
> it without good reason.

I am not sure how important it is, so don't know whether this can be a justification. To me we don't need to consider this. Qemu normally doesn't access guest memory unless it has to (ie, for device model).

> - KVM is now responsible for managing the lifecycle of EPC, e.g. what
> happens if an EPC cgroup limit is lowered on a running VM and
> KVM can't gracefully reclaim EPC? The userspace hypervisor should
> ultimately decide how to handle such an event.

Even using KVM_SET_USER_MEMORY_REGION, KVM is also responsible for managing the lifecycle of EPC, no? And "managing the lifecycle of EPC" you mean doesn't need to be "managing EPC itself" I suppose, since EPC should always be managed by core SGX code.

I don't see the difference between private slot and KVM_SET_USER_MEMORY_REGION, in terms of how does KVM reclaim EPC, or how does KVM do when it fails to reclaim EPC.

> - SGX logic is split between SGX and KVM, e.g. VA page management for
> oversubscription will likely be common to SGX and KVM. From a long
> term maintenance perspective, this means that changes to the EPC
> management could potentially need to be Acked by KVM, and vice versa.

I think most of the code should be in core SGX code under x86. KVM should only have the code that is specifically related to virtualization, ie, ENCLV.

VA page allocation should be under code SGX code. KVM might need to call function such as alloc_va_page, etc, but this is not a problem. There are many other cases now.

And this is not related to private slot vs KVM_SET_USER_MEMORY_REGION.

>
> > I am not sure what's the good of allowing userspace to alloc/mmap a
> > raw EPC region? Userspace is not allowed to touch EPC anyway, expect
> > enclave code.
> >
> > To me KVM creates private EPC slot is cleaner than exposing
> > /dev/sgx/epc and allowing userspace to map some raw EPC region.
>
> Cleaner in the sense that it's faster to get basic support up and running since
> there are fewer touchpoints, but there are long term ramifications to
> cramming EPC management in KVM.
>
> And at this point I'm not stating any absolutes, e.g. how EPC will be handled
> by KVM. What I'm pushing for is to not eliminate the possibility of having
> the SGX subsystem own all EPC management, e.g. don't tie /dev/sgx to a
> single enclave.

I suppose "SGX subsystem" you mean here is core SGX code under x86. IMHO EPC should always be managed by such SGX subsystem, and KVM and SGX driver are just consumers (ie, calling EPC allocation function, etc).

IMHO I think /dev/sgx (or whatever /dev/sgx/xxx) should be in SGX driver, but not SGX core code. For example, if we don't consider KVM EPC oversubscription here, theoretically we only need below code in core SGX code to make KVM SGX work:

1) SGX feature detection. There should be some core structure (such as boot_cpu_data, etc) where KVM can query SGX capabilities, ie, whether FLC is available.
2) EPC management. KVM simply calls EPC management APIs when it needs. For example, when EPC slot is created, we should populate all EPC pages, and fail to create VM if running out of EPC.
3) Other code to deal with ENCLS, SGX data structure, etc, since we have agreed even with EPC static allocation, we should trap EINIT.

We probably even don't need code to deal with enclave, if only for KVM. Of course, in order to support KVM EPC oversubscription, we should also include enclave management code in core SGX code, but this doesn't necessary mean we should include /dev/sgx/xxx in core SGX code.

It seems there still are lots of design issues we haven't got consensus in terms of how SGX driver should be (or how SGX stack is supposed to work), but if we can focus on what core SGX code should really have (w/o involving SGX driver logic), we can at least get KVM SGX code working. After all, we also have window SGX support.

Thanks,
-Kai

2019-01-09 14:48:40

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Jan 03, 2019 at 08:26:35AM -0800, Sean Christopherson wrote:
> What I was trying to explain is that the uapi isn't for KVM, it's for
> the userspace hypervisor, e.g. Qemu. Qemu will inform KVM of the
> resulting guest memory region so that KVM can configure its guest page
> tables accordingly, but that is done through KVM's existing memory uapi.

OK, I now I got it, apologies it took such a long time :-)

Now I see the analogy e.g. qemu creates independently VMAs and then
fuels those regions to KVM. Similarly qemu would create regions for
KVM using "/dev/sgx/mem".

For me this is perfectly fine now I understand the reasoning and neither
does make my job more difficult to implement the file based enclave
change.

Thanks for the patience with this...

/Jarkko

2019-01-09 16:45:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
> <[email protected]> wrote:
> >
> > Cleaner in the sense that it's faster to get basic support up and running
> > since there are fewer touchpoints, but there are long term ramifications
> > to cramming EPC management in KVM.
> >
> > And at this point I'm not stating any absolutes, e.g. how EPC will be
> > handled by KVM. What I'm pushing for is to not eliminate the possibility
> > of having the SGX subsystem own all EPC management, e.g. don't tie
> > /dev/sgx to a single enclave.
>
> I haven't gone and re-read all the relevant SDM bits, so I'll just
> ask: what, if anything, are the actual semantics of mapping "raw EPC"
> like this? You can't actually do anything with the mapping from user
> mode unless you actually get an enclave created and initialized in it
> and have it mapped at the correct linear address, right? I still
> think you have the right idea, but it is a bit unusual.

Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
But I'd argue that it's not unusual, just different. And really it's not
all that different than userspace mmap'ing /dev/sgx/enclave prior to
ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
to consider any faulting EPC address without an associated enclave as
illegal, e.g. signals SIGBUS.

The /dev/sgx/epc case simply has different semantics for moving pages in
and out of the EPC, i.e. different fault and eviction semantics. Yes,
this allows the guest kernel to directly access the "raw" EPC, but that's
conceptually in line with hardware where priveleged software can directly
"access" the EPC (or rather, the abort page for all intents and purposes).
I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
but IMO it's not unusual.

Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
/dev/sgx/virtualmachine might be more appropriate.

> I do think it makes sense to have QEMU delegate the various ENCLS
> operations (especially EINIT) to the regular SGX interface, which will
> mean that VM guests will have exactly the same access controls applied
> as regular user programs, which is probably what we want.

To what end? Except for EINIT, none of the ENCLS leafs are interesting
from a permissions perspective. Trapping and re-executing ENCLS leafs
is painful, e.g. most leafs have multiple virtual addresses that need to
be translated. And routing everything through the regular interface
would make SGX even slower than it already is, e.g. every ENCLS would
take an additional ~900 cycles just to handle the VM-Exit, and that's
not accounting for any additional overhead in the SGX code, e.g. using
the regular interface would mean superfluous locks, etc...

The only benefit is that it would theoretically allow oversubscribing
guest EPC without hardware support, but that would require a lot of code
to do the necessary SECS tracking and refcounting.

> If so,
> there will need to be a way to get INITTOKEN privilege for the purpose
> of running non-Linux OSes in the VM, which isn't the end of the world.

Couldn't we require the same privilege/capability for VMs and and EINIT
tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that
can also generate tokens.

> We might still want the actual ioctl to do EINIT using an actual
> explicit token to be somehow restricted in a way that strongly
> discourages its use by anything other than a hypervisor. Or I suppose
> we could just straight-up ignore the guest-provided init token.
>
> P.S. Is Intel ever going to consider a way to make guests get their
> own set of keys that are different from the host's keys and other
> guests' keys?


2019-01-09 17:17:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Tue, Jan 08, 2019 at 09:24:38PM -0800, Huang, Kai wrote:
> > On Tue, Jan 08, 2019 at 11:27:11AM -0800, Huang, Kai wrote:
> > > > >
> > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > > opening a new instance of /dev/sgx for each encalve?
> > > >
> > > > Directly associating /dev/sgx with an enclave means /dev/sgx can't
> > > > be used to provide ioctl()'s for other SGX-related needs, e.g. to
> > > > mmap() raw EPC and expose it a VM. Proposed layout in the link
> > > > below. I'll also respond to Jarkko's question about exposing EPC
> > > > through /dev/sgx instead of having KVM allocate it on behalf of the VM.
> > > >
> > > > https://lkml.kernel.org/r/[email protected]
> > >
> > > Hi Sean,
> > >
> > > Sorry for replying to old email. But IMHO it is not a must that Qemu
> > > needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual
> > > EPC slot, instead, KVM could create private slot, which is not visible
> > > to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation
> > > API directly.
> >
> > That's possible, but it has several downsides.
> >
> > - Duplicates a lot of code in KVM for managing memory regions.
>
> I don't see why there will be duplicated code. you can simply call
> __x86_set_memory_region to create private slot. It is KVM x86 equivalent
> to KVM_SET_USER_MEMORY_REGION from userspace. The only difference is Qemu
> is not aware of the private slot.

What about when we allow removing an EPC region? At that point you'd be
fully duplicating KVM_SET_USER_MEMORY_REGION. And that's not a purely
theoretical idea, it's something I'd like to support in the future, e.g.
via the WIP virtio-mem interface.

https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio-mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf

>
> > - Artificially restricts userspace to a single EPC region, unless
> > even more code is duplicated to handle multiple private regions.
>
> You can have multiple private slots, by calling __x86_set_memory_region
> for each EPC section. KVM receives EPC section/sections info from Qemu,
> via CPUID, or dedicated IOCTL (is this you are going to add?), and simply
> creates private EPC slot/slots.

This would require a dynamic number of private memslots, which breaks (or
at least changes) the semantics of KVM_CAP_NR_MEMSLOTS.

>
> > - Requires additional ioctls() or capabilities to probe EPC support
>
> No. EPC info is from Qemu at the beginning (size is given by parameter,
> base is calculated by Qemu), and actually it is Qemu notifies KVM EPC info,
> so I don't think we require additional ioctls or capabilities here.

How does Qemu know KVM supports virtual EPC? Probing /dev/sgx doesn't
convey any information about KVM support. Maybe you could report it via
KVM_GET_SUPPORTED_CPUID, but that would be problematic for Qemu since it
would have to create vCPUs before initializing the machine.

>
> > - Does not fit with Qemu/KVM's memory model, e.g. all other types of
> > memory are exposed to a guest through
> > KVM_SET_USER_MEMORY_REGION.
>
> EPC is different. I am not sure whether EPC needs to fit such model. There
> are already examples in KVM which uses private slot w/o using
> KVM_SET_USER_MEMORY_REGION, for example, APIC access page.

EPC has unique access and lifecycle semantics, but that doesnt make it a
special snowflake, e.g. NVDIMM has special properties, as does memory that
is encrypted via SME or MKTME, and so on and so forth.

The private memslots are private for a reason, e.g. the guest is completely
unaware that they exist and Qemu is only aware of their existence because
KVM needs userspace to tell it what GPA range won't conflict with its memory
model. And in the APIC access page case, Qemu isn't aware at all since KVM
doesn't allow relocating the guest's APIC.

The other aspect of private memslots is that they are not exposed to L2,
because again, from the guest's perspective, they do not exist. We can
obviously hackaround that restriction, but it's yet another hint that
shoving EPC into a private memslot is the wrong approach.

>
> > - Prevents userspace from debugging a guest's enclave. I'm not saying
> > this is a likely scenario, but I also don't think we should preclude
> > it without good reason.
>
> I am not sure how important it is, so don't know whether this can be a
> justification. To me we don't need to consider this. Qemu normally doesn't
> access guest memory unless it has to (ie, for device model).

In terms of debug, "Qemu" accesses guest memory all the time, e.g. reading
guest memory via Qemu's monitor interface is something I do on a regular
basis. KVM even provides an ABI (KVM_SET_GUEST_DEBUG) to allow userspace
to set breakpoints on guest memory. We'd be saying "you can break on
accesses in a debug enclave, but you can't actually access its memory".

>
> > - KVM is now responsible for managing the lifecycle of EPC, e.g. what
> > happens if an EPC cgroup limit is lowered on a running VM and
> > KVM can't gracefully reclaim EPC? The userspace hypervisor should
> > ultimately decide how to handle such an event.
>
> Even using KVM_SET_USER_MEMORY_REGION, KVM is also responsible for managing the lifecycle of EPC, no? And "managing the lifecycle of EPC" you mean doesn't need to be "managing EPC itself" I suppose, since EPC should always be managed by core SGX code.
>
> I don't see the difference between private slot and KVM_SET_USER_MEMORY_REGION,
> in terms of how does KVM reclaim EPC, or how does KVM do when it fails to
> reclaim EPC.

Userspace can delete the EPC region via KVM_SET_USER_MEMORY_REGION. As
mentioned above, on-demand deletion of a private memslot means duplicating
basically all of the KVM_SET_USER_MEMORY_REGION flow.

For example, consider a paravirtualized guest that allows EPC regions to
be "unplugged" if the host runs out of EPC.

>
> > - SGX logic is split between SGX and KVM, e.g. VA page management for
> > oversubscription will likely be common to SGX and KVM. From a long
> > term maintenance perspective, this means that changes to the EPC
> > management could potentially need to be Acked by KVM, and vice versa.
>
> I think most of the code should be in core SGX code under x86. KVM should
> only have the code that is specifically related to virtualization, ie, ENCLV.
>
> VA page allocation should be under code SGX code. KVM might need to call
> function such as alloc_va_page, etc, but this is not a problem. There are
> many other cases now.
>
> And this is not related to private slot vs KVM_SET_USER_MEMORY_REGION.

Yes it is. Allocating EPC via KVM means exporting additional functions
from SGX core, and using a private memslot effectively forces KVM to
allocate guest EPC. E.g. using KVM_SET_USER_MEMORY_REGION and having
SGX own EPC allocation means KVM can be blissfully unaware of VA pages,
EPC swap backing, etc...

>
> >
> > > I am not sure what's the good of allowing userspace to alloc/mmap a
> > > raw EPC region? Userspace is not allowed to touch EPC anyway, expect
> > > enclave code.
> > >
> > > To me KVM creates private EPC slot is cleaner than exposing
> > > /dev/sgx/epc and allowing userspace to map some raw EPC region.
> >
> > Cleaner in the sense that it's faster to get basic support up and running since
> > there are fewer touchpoints, but there are long term ramifications to
> > cramming EPC management in KVM.
> >
> > And at this point I'm not stating any absolutes, e.g. how EPC will be handled
> > by KVM. What I'm pushing for is to not eliminate the possibility of having
> > the SGX subsystem own all EPC management, e.g. don't tie /dev/sgx to a
> > single enclave.
>
> I suppose "SGX subsystem" you mean here is core SGX code under x86. IMHO
> EPC should always be managed by such SGX subsystem, and KVM and SGX driver
> are just consumers (ie, calling EPC allocation function, etc).
>
> IMHO I think /dev/sgx (or whatever /dev/sgx/xxx) should be in SGX driver,
> but not SGX core code. For example, if we don't consider KVM EPC
> oversubscription here, theoretically we only need below code in core SGX
> code to make KVM SGX work:
>
> 1) SGX feature detection. There should be some core structure (such as
> boot_cpu_data, etc) where KVM can query SGX capabilities, ie, whether FLC
> is available.
> 2) EPC management. KVM simply calls EPC management APIs when it needs. For
> example, when EPC slot is created, we should populate all EPC pages, and
> fail to create VM if running out of EPC.
> 3) Other code to deal with ENCLS, SGX data structure, etc, since we have
> agreed even with EPC static allocation, we should trap EINIT.
>
> We probably even don't need code to deal with enclave, if only for KVM. Of
> course, in order to support KVM EPC oversubscription, we should also include
> enclave management code in core SGX code, but this doesn't necessary mean we
> should include /dev/sgx/xxx in core SGX code.
>
> It seems there still are lots of design issues we haven't got consensus in
> terms of how SGX driver should be (or how SGX stack is supposed to work),
> but if we can focus on what core SGX code should really have (w/o involving
> SGX driver logic), we can at least get KVM SGX code working. After all, we
> also have window SGX support.

I'm all for getting KVM support in ASAP, i.e. I'd love to avoid having to
wait for "full" SGX support, but that doesn't obviate the need to hammer
out the uapi. Taking a shortcut and shoving everything into KVM could bite
us in the long run.

2019-01-10 00:23:44

by Kai Huang

[permalink] [raw]
Subject: RE: x86/sgx: uapi change proposal

> > > That's possible, but it has several downsides.
> > >
> > > - Duplicates a lot of code in KVM for managing memory regions.
> >
> > I don't see why there will be duplicated code. you can simply call
> > __x86_set_memory_region to create private slot. It is KVM x86
> > equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only
> > difference is Qemu is not aware of the private slot.
>
> What about when we allow removing an EPC region? At that point you'd be
> fully duplicating KVM_SET_USER_MEMORY_REGION. And that's not a purely
> theoretical idea, it's something I'd like to support in the future, e.g.
> via the WIP virtio-mem interface.

OK. Isn't virtio-balloon good enough for us?

Removing EPC is not consistent with hardware behaviour, but if you really want to support then should also be fine since we are not strictly following HW spec anyway.

>
> https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio-
> mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf
>
> >
> > > - Artificially restricts userspace to a single EPC region, unless
> > > even more code is duplicated to handle multiple private regions.
> >
> > You can have multiple private slots, by calling
> > __x86_set_memory_region for each EPC section. KVM receives EPC
> > section/sections info from Qemu, via CPUID, or dedicated IOCTL (is
> > this you are going to add?), and simply creates private EPC slot/slots.
>
> This would require a dynamic number of private memslots, which breaks (or
> at least changes) the semantics of KVM_CAP_NR_MEMSLOTS.

You are right. I forgot this one.

>
> >
> > > - Requires additional ioctls() or capabilities to probe EPC
> > > support
> >
> > No. EPC info is from Qemu at the beginning (size is given by
> > parameter, base is calculated by Qemu), and actually it is Qemu
> > notifies KVM EPC info, so I don't think we require additional ioctls or
> capabilities here.
>
> How does Qemu know KVM supports virtual EPC? Probing /dev/sgx doesn't
> convey any information about KVM support. Maybe you could report it via
> KVM_GET_SUPPORTED_CPUID, but that would be problematic for Qemu
> since it would have to create vCPUs before initializing the machine.

KVM_GET_SUPPORTED_CPUID is the one. I don't think KVM_GET_SUPPORTED_CPUID require creating vcpu prior, since it is global thing that platform supports. No?

>
> >
> > > - Does not fit with Qemu/KVM's memory model, e.g. all other types of
> > > memory are exposed to a guest through
> > > KVM_SET_USER_MEMORY_REGION.
> >
> > EPC is different. I am not sure whether EPC needs to fit such model.
> > There are already examples in KVM which uses private slot w/o using
> > KVM_SET_USER_MEMORY_REGION, for example, APIC access page.
>
> EPC has unique access and lifecycle semantics, but that doesnt make it a
> special snowflake, e.g. NVDIMM has special properties, as does memory that
> is encrypted via SME or MKTME, and so on and so forth.
>
> The private memslots are private for a reason, e.g. the guest is completely
> unaware that they exist and Qemu is only aware of their existence because
> KVM needs userspace to tell it what GPA range won't conflict with its
> memory model. And in the APIC access page case, Qemu isn't aware at all
> since KVM doesn't allow relocating the guest's APIC.
>
> The other aspect of private memslots is that they are not exposed to L2,
> because again, from the guest's perspective, they do not exist. We can
> obviously hackaround that restriction, but it's yet another hint that shoving
> EPC into a private memslot is the wrong approach.

But guest is aware of SGX and EPC so I don't see why it cannot be exposed to L2 even with private slot.

But that doesn't matter. I agree with you letting Qemu create EPC slots is probably better since we can support multiple EPC better (and potential EPC removal if needed).

And

[snip]

>
> I'm all for getting KVM support in ASAP, i.e. I'd love to avoid having to wait
> for "full" SGX support, but that doesn't obviate the need to hammer out the
> uapi. Taking a shortcut and shoving everything into KVM could bite us in the
> long run.

I agree.

Thanks,
-Kai



2019-01-10 00:43:31

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Jan 09, 2019 at 04:21:19PM -0800, Huang, Kai wrote:
> > > > That's possible, but it has several downsides.
> > > >
> > > > - Duplicates a lot of code in KVM for managing memory regions.
> > >
> > > I don't see why there will be duplicated code. you can simply call
> > > __x86_set_memory_region to create private slot. It is KVM x86
> > > equivalent to KVM_SET_USER_MEMORY_REGION from userspace. The only
> > > difference is Qemu is not aware of the private slot.
> >
> > What about when we allow removing an EPC region? At that point you'd be
> > fully duplicating KVM_SET_USER_MEMORY_REGION. And that's not a purely
> > theoretical idea, it's something I'd like to support in the future, e.g.
> > via the WIP virtio-mem interface.
>
> OK. Isn't virtio-balloon good enough for us?
>
> Removing EPC is not consistent with hardware behaviour, but if you really
> want to support then should also be fine since we are not strictly
> following HW spec anyway.

Even if virtio-balloon is sufficient from a performance perspective, the
virtio-mem approach can provide unique capabilities, e.g. hotplugging
EPC into a VM or "I'm done with SGX, have all of my EPC back".

> >
> > https://events.linuxfoundation.org/wp-content/uploads/2017/12/virtio-
> > mem-Paravirtualized-Memory-David-Hildenbrand-Red-Hat-1.pdf
> >
> > > No. EPC info is from Qemu at the beginning (size is given by
> > > parameter, base is calculated by Qemu), and actually it is Qemu
> > > notifies KVM EPC info, so I don't think we require additional ioctls or
> > capabilities here.
> >
> > How does Qemu know KVM supports virtual EPC? Probing /dev/sgx doesn't
> > convey any information about KVM support. Maybe you could report it via
> > KVM_GET_SUPPORTED_CPUID, but that would be problematic for Qemu
> > since it would have to create vCPUs before initializing the machine.
>
> KVM_GET_SUPPORTED_CPUID is the one. I don't think KVM_GET_SUPPORTED_CPUID
> require creating vcpu prior, since it is global thing that platform supports. No?

It's not a KVM requirement, but rather Qemu's code flow. It sets up the
"machine" and then creates the vCPUs. The CPUID info is a CPU capability
and so isn't necessarily available when the "machine" is being created.

> >
> > The other aspect of private memslots is that they are not exposed to L2,
> > because again, from the guest's perspective, they do not exist. We can
> > obviously hackaround that restriction, but it's yet another hint that shoving
> > EPC into a private memslot is the wrong approach.
>
> But guest is aware of SGX and EPC so I don't see why it cannot be exposed
> to L2 even with private slot.

It's not that it can't be done, but rather we'd have to explicitly tell
KVM "this private slot isn't really private, expose it to L2". See
commit 3a2936dedd20 ("kvm: mmu: Don't expose private memslots to L2").

2019-01-10 17:45:24

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Tue, Jan 08, 2019 at 07:27:11PM +0000, Huang, Kai wrote:
> > >
> > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > opening a new instance of /dev/sgx for each encalve?
> >
> > Directly associating /dev/sgx with an enclave means /dev/sgx can't be used
> > to provide ioctl()'s for other SGX-related needs, e.g. to mmap() raw EPC and
> > expose it a VM. Proposed layout in the link below. I'll also respond to
> > Jarkko's question about exposing EPC through /dev/sgx instead of having
> > KVM allocate it on behalf of the VM.
> >
> > https://lkml.kernel.org/r/[email protected]
>
> Hi Sean,
>
> Sorry for replying to old email. But IMHO it is not a must that Qemu needs to open some /dev/sgx and allocate/mmap EPC for guest's virtual EPC slot, instead, KVM could create private slot, which is not visible to Qemu, for virtual EPC, and KVM could call core-SGX EPC allocation API directly.
>
> I am not sure what's the good of allowing userspace to alloc/mmap a raw EPC region? Userspace is not allowed to touch EPC anyway, expect enclave code.
>
> To me KVM creates private EPC slot is cleaner than exposing /dev/sgx/epc and allowing userspace to map some raw EPC region.
>
> Thanks,
> -Kai

Side-note: this particular use case does not necessarily be solved in
the first upstream patch set, does it? Just try to keep the patch set
as small as possible (still be a huge one).

/Jarkko

2019-01-10 17:48:09

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> I do think it makes sense to have QEMU delegate the various ENCLS
> operations (especially EINIT) to the regular SGX interface, which will
> mean that VM guests will have exactly the same access controls applied
> as regular user programs, which is probably what we want. If so,
> there will need to be a way to get INITTOKEN privilege for the purpose
> of running non-Linux OSes in the VM, which isn't the end of the world.
> We might still want the actual ioctl to do EINIT using an actual
> explicit token to be somehow restricted in a way that strongly
> discourages its use by anything other than a hypervisor. Or I suppose
> we could just straight-up ignore the guest-provided init token.

Does it even matter if just leave EINITTOKENKEY attribute unprivileged
given that Linux requires that MSRs are writable? Maybe I'll just
whitelist that attribute to any enclave?

/Jarkko

2019-01-10 22:05:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

>> On Jan 9, 2019, at 8:31 AM, Sean Christopherson <[email protected]> wrote:
>>
>> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
>> On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
>> <[email protected]> wrote:
>>>
>>> Cleaner in the sense that it's faster to get basic support up and running
>>> since there are fewer touchpoints, but there are long term ramifications
>>> to cramming EPC management in KVM.
>>>
>>> And at this point I'm not stating any absolutes, e.g. how EPC will be
>>> handled by KVM. What I'm pushing for is to not eliminate the possibility
>>> of having the SGX subsystem own all EPC management, e.g. don't tie
>>> /dev/sgx to a single enclave.
>>
>> I haven't gone and re-read all the relevant SDM bits, so I'll just
>> ask: what, if anything, are the actual semantics of mapping "raw EPC"
>> like this? You can't actually do anything with the mapping from user
>> mode unless you actually get an enclave created and initialized in it
>> and have it mapped at the correct linear address, right? I still
>> think you have the right idea, but it is a bit unusual.
>
> Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
> But I'd argue that it's not unusual, just different. And really it's not
> all that different than userspace mmap'ing /dev/sgx/enclave prior to
> ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
> access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
> to consider any faulting EPC address without an associated enclave as
> illegal, e.g. signals SIGBUS.
>
> The /dev/sgx/epc case simply has different semantics for moving pages in
> and out of the EPC, i.e. different fault and eviction semantics. Yes,
> this allows the guest kernel to directly access the "raw" EPC, but that's
> conceptually in line with hardware where priveleged software can directly
> "access" the EPC (or rather, the abort page for all intents and purposes).
> I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
> but IMO it's not unusual.
>
> Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
> /dev/sgx/virtualmachine might be more appropriate.
>
>> I do think it makes sense to have QEMU delegate the various ENCLS
>> operations (especially EINIT) to the regular SGX interface, which will
>> mean that VM guests will have exactly the same access controls applied
>> as regular user programs, which is probably what we want.
>
> To what end? Except for EINIT, none of the ENCLS leafs are interesting
> from a permissions perspective. Trapping and re-executing ENCLS leafs
> is painful, e.g. most leafs have multiple virtual addresses that need to
> be translated. And routing everything through the regular interface
> would make SGX even slower than it already is, e.g. every ENCLS would
> take an additional ~900 cycles just to handle the VM-Exit, and that's
> not accounting for any additional overhead in the SGX code, e.g. using
> the regular interface would mean superfluous locks, etc...

Trapping EINIT is what I have in mind.

>
> Couldn't we require the same privilege/capability for VMs and and EINIT
> tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that
> can also generate tokens.

Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy.

Concretely, I think there are two things we care about:

First, if the host enforces some policy as to which enclaves can
launch, then it should apply the same policy to guests — otherwise KVM
lets programs do an end run around the policy. So, in the initial
incarnation of this, QEMU should probably have to open the provision
attribute fd if it wants its guest to be able to EINIT a provisioning
enclave. When someone inevitably adds an EINIT LSM hook, the KVM
interface should also call it.

Second, the normal enclave interface won't allow user code to supply
an EINITTOKEN, so the KVM interface will presumably need to be
different, unless we're going to emulate EINIT by ignoring the token.
That seems like a very strange thing to do.

2019-01-10 22:05:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Jan 10, 2019 at 9:46 AM Jarkko Sakkinen
<[email protected]> wrote:
>
> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > I do think it makes sense to have QEMU delegate the various ENCLS
> > operations (especially EINIT) to the regular SGX interface, which will
> > mean that VM guests will have exactly the same access controls applied
> > as regular user programs, which is probably what we want. If so,
> > there will need to be a way to get INITTOKEN privilege for the purpose
> > of running non-Linux OSes in the VM, which isn't the end of the world.
> > We might still want the actual ioctl to do EINIT using an actual
> > explicit token to be somehow restricted in a way that strongly
> > discourages its use by anything other than a hypervisor. Or I suppose
> > we could just straight-up ignore the guest-provided init token.
>
> Does it even matter if just leave EINITTOKENKEY attribute unprivileged
> given that Linux requires that MSRs are writable? Maybe I'll just
> whitelist that attribute to any enclave?
>

I would at least make it work like the PROVISIONKEY bit (or whatever
it's called). Or just deny it at first. It's easy to start allowing
it if we need to down the road, but it's harder to start denying it.

Also, I don't really want to see some SDK invoke a launch enclave
because that's how it works on Windows and then do nothing with the
resulting EINITOKEN. If we don't allow it, then the SDKs will be
forced to do a better job, which is probably a good thing.

2019-01-11 00:57:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Jan 10, 2019 at 3:54 PM Sean Christopherson
<[email protected]> wrote:
>
> On Thu, Jan 10, 2019 at 01:34:44PM -0800, Andy Lutomirski wrote:
> > >> On Jan 9, 2019, at 8:31 AM, Sean Christopherson <[email protected]> wrote:
> > >>
> > >> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > >
> > >> I do think it makes sense to have QEMU delegate the various ENCLS
> > >> operations (especially EINIT) to the regular SGX interface, which will
> > >> mean that VM guests will have exactly the same access controls applied
> > >> as regular user programs, which is probably what we want.
> > >
> > > To what end? Except for EINIT, none of the ENCLS leafs are interesting
> > > from a permissions perspective. Trapping and re-executing ENCLS leafs
> > > is painful, e.g. most leafs have multiple virtual addresses that need to
> > > be translated. And routing everything through the regular interface
> > > would make SGX even slower than it already is, e.g. every ENCLS would
> > > take an additional ~900 cycles just to handle the VM-Exit, and that's
> > > not accounting for any additional overhead in the SGX code, e.g. using
> > > the regular interface would mean superfluous locks, etc...
> >
> > Trapping EINIT is what I have in mind.
>
> Phew, had me worried :-)
>
> > >
> > > Couldn't we require the same privilege/capability for VMs and and EINIT
> > > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that
> > > can also generate tokens.
> >
> > Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy.
> >
> > Concretely, I think there are two things we care about:
> >
> > First, if the host enforces some policy as to which enclaves can
> > launch, then it should apply the same policy to guests — otherwise KVM
> > lets programs do an end run around the policy. So, in the initial
> > incarnation of this, QEMU should probably have to open the provision
> > attribute fd if it wants its guest to be able to EINIT a provisioning
> > enclave. When someone inevitably adds an EINIT LSM hook, the KVM
> > interface should also call it.
>
> Sort of. A guest that is running under KVM (i.e. VMX) is much more
> contained than a random userspace program. A rogue enclave in a VMX
> guest can attack the guest kernel/OS, but barring a bug (or more likely,
> several major bugs) elsewhere in the virtualization stack the enclave
> can't do anything nasty to the host. An enclave would let someone hide
> code, but enclaves are even more restricted than cpl3, i.e. there's not
> a lot it can do without coordinating with unencrypted code in the guest.
>
> And if someone has sufficient permissions to run a KVM guest, they're
> much more likely to do something malcious in the guest kernel, not an
> enclave.

Are you sure? On my laptop, /dev/kvm is 0666, and that's the distro
default. I don't think this is at all unusual. I'm not particularly
concerned about a guest attacking itself, but it's conceptually
straightforward to bypass whatever restrictions the host has by simply
opening /dev/kvm and sticking your enclave in a VM.

>
> All that aside, I don't see any justification for singling out SGX for
> extra scrutiny, there are other ways for a user with KVM permissions to
> hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}.

I'm not singling out SGX. I'm just saying that the KVM should not
magically bypass host policy. If you want to assign a virtual
function on your NIC to a KVM guest, you need to give your QEMU
process that privilege. Similarly, if someone has a MAC policy that
controls which processes can launch which enclaves and they want to
run Windows with full SGX support in a VM guest, then they should
authorize that in their MAC policy by giving QEMU unrestricted launch
privileges.

Similarly, if access to a persistent provisioning identifier is
restricted, access to /dev/kvm shouldn't magically bypass it. Just
give the QEMU process the relevant privileges.

2019-01-11 01:34:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Jan 10, 2019 at 04:30:06PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 10, 2019 at 3:54 PM Sean Christopherson
> <[email protected]> wrote:
> >
> > Sort of. A guest that is running under KVM (i.e. VMX) is much more
> > contained than a random userspace program. A rogue enclave in a VMX
> > guest can attack the guest kernel/OS, but barring a bug (or more likely,
> > several major bugs) elsewhere in the virtualization stack the enclave
> > can't do anything nasty to the host. An enclave would let someone hide
> > code, but enclaves are even more restricted than cpl3, i.e. there's not
> > a lot it can do without coordinating with unencrypted code in the guest.
> >
> > And if someone has sufficient permissions to run a KVM guest, they're
> > much more likely to do something malcious in the guest kernel, not an
> > enclave.
>
> Are you sure? On my laptop, /dev/kvm is 0666, and that's the distro
> default. I don't think this is at all unusual.

Wow, that's suprising. A quick search suggests that this may be Debian
specific[1], e.g. my Ubuntu systems have:

crw-rw---- 1 root kvm 10, 232 Jan 9 09:30 /dev/kvm

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1431876

> I'm not particularly
> concerned about a guest attacking itself, but it's conceptually
> straightforward to bypass whatever restrictions the host has by simply
> opening /dev/kvm and sticking your enclave in a VM.

VMs by nature allow a user to bypass all sorts of restrictions, e.g. the
kernel doesn't let userspace run arbitrary cpl0 code, but launch a VM
and voila. It's what you can do with the new privileges that matters.

> > All that aside, I don't see any justification for singling out SGX for
> > extra scrutiny, there are other ways for a user with KVM permissions to
> > hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}.
>
> I'm not singling out SGX. I'm just saying that the KVM should not
> magically bypass host policy. If you want to assign a virtual
> function on your NIC to a KVM guest, you need to give your QEMU
> process that privilege. Similarly, if someone has a MAC policy that
> controls which processes can launch which enclaves and they want to
> run Windows with full SGX support in a VM guest, then they should
> authorize that in their MAC policy by giving QEMU unrestricted launch
> privileges.

MAC systems exist to protect assets, and IMO an enclave isn't an asset.
E.g. AppArmor (via LSM) isn't protecting files, it's protecting the
contents of the file or what can be done with the file. And the MAC
is only part of the overall protection scheme, e.g. userspace is also
relying on the kernel to not screw up the page tables.

In SGX terms, a LSM hook might use enclave signatures to protect some
asset 'X', e.g. access to persistent identifier. But that doesn't mean
that whitelisting enclave signatures is the only way to protect 'X'.

> Similarly, if access to a persistent provisioning identifier is
> restricted, access to /dev/kvm shouldn't magically bypass it. Just
> give the QEMU process the relevant privileges.

Agreed, but that's not same as applying a host's whitelist against a
guest's enclaves.

2019-01-11 01:50:48

by Kai Huang

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, 2019-01-10 at 13:34 -0800, Andy Lutomirski wrote:
> > > On Jan 9, 2019, at 8:31 AM, Sean Christopherson <[email protected]> wrote:
> > >
> > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
> > > <[email protected]> wrote:
> > > >
> > > > Cleaner in the sense that it's faster to get basic support up and running
> > > > since there are fewer touchpoints, but there are long term ramifications
> > > > to cramming EPC management in KVM.
> > > >
> > > > And at this point I'm not stating any absolutes, e.g. how EPC will be
> > > > handled by KVM. What I'm pushing for is to not eliminate the possibility
> > > > of having the SGX subsystem own all EPC management, e.g. don't tie
> > > > /dev/sgx to a single enclave.
> > >
> > > I haven't gone and re-read all the relevant SDM bits, so I'll just
> > > ask: what, if anything, are the actual semantics of mapping "raw EPC"
> > > like this? You can't actually do anything with the mapping from user
> > > mode unless you actually get an enclave created and initialized in it
> > > and have it mapped at the correct linear address, right? I still
> > > think you have the right idea, but it is a bit unusual.
> >
> > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
> > But I'd argue that it's not unusual, just different. And really it's not
> > all that different than userspace mmap'ing /dev/sgx/enclave prior to
> > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
> > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
> > to consider any faulting EPC address without an associated enclave as
> > illegal, e.g. signals SIGBUS.
> >
> > The /dev/sgx/epc case simply has different semantics for moving pages in
> > and out of the EPC, i.e. different fault and eviction semantics. Yes,
> > this allows the guest kernel to directly access the "raw" EPC, but that's
> > conceptually in line with hardware where priveleged software can directly
> > "access" the EPC (or rather, the abort page for all intents and purposes).
> > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
> > but IMO it's not unusual.
> >
> > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
> > /dev/sgx/virtualmachine might be more appropriate.
> >
> > > I do think it makes sense to have QEMU delegate the various ENCLS
> > > operations (especially EINIT) to the regular SGX interface, which will
> > > mean that VM guests will have exactly the same access controls applied
> > > as regular user programs, which is probably what we want.
> >
> > To what end? Except for EINIT, none of the ENCLS leafs are interesting
> > from a permissions perspective. Trapping and re-executing ENCLS leafs
> > is painful, e.g. most leafs have multiple virtual addresses that need to
> > be translated. And routing everything through the regular interface
> > would make SGX even slower than it already is, e.g. every ENCLS would
> > take an additional ~900 cycles just to handle the VM-Exit, and that's
> > not accounting for any additional overhead in the SGX code, e.g. using
> > the regular interface would mean superfluous locks, etc...
>
> Trapping EINIT is what I have in mind.
>
> >
> > Couldn't we require the same privilege/capability for VMs and and EINIT
> > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that
> > can also generate tokens.
>
> Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy.
>
> Concretely, I think there are two things we care about:
>
> First, if the host enforces some policy as to which enclaves can
> launch, then it should apply the same policy to guests — otherwise KVM
> lets programs do an end run around the policy. So, in the initial
> incarnation of this, QEMU should probably have to open the provision
> attribute fd if it wants its guest to be able to EINIT a provisioning
> enclave. When someone inevitably adds an EINIT LSM hook, the KVM
> interface should also call it.
>
> Second, the normal enclave interface won't allow user code to supply
> an EINITTOKEN, so the KVM interface will presumably need to be
> different, unless we're going to emulate EINIT by ignoring the token.
> That seems like a very strange thing to do.

Hi Andy,

IMHO applying policy to enclave in VM should be different to applying policy to enclave in host. SGX
sw stack in host should be able to run inside VM without any modification, so for example, if host
sets policy that we cannot run LE (except LE in host), then basically we are disabling SGX in VM. In
general KVM SGX is supposed to run all guest OSes with SGX. And for provisioning enclave, do you see
any reason that we need to disallow to run it inside VM?

Maybe some more general questions: What policy/policies should we have in host? Should they in core-
SGX code, or should they belong to SGX driver's scope? Do we need to figure out all of them and how
to control before we can actually think about upstreaming virtualization support?

Thanks,
-Kai

2019-01-11 02:40:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Jan 10, 2019 at 01:34:44PM -0800, Andy Lutomirski wrote:
> >> On Jan 9, 2019, at 8:31 AM, Sean Christopherson <[email protected]> wrote:
> >>
> >> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> >
> >> I do think it makes sense to have QEMU delegate the various ENCLS
> >> operations (especially EINIT) to the regular SGX interface, which will
> >> mean that VM guests will have exactly the same access controls applied
> >> as regular user programs, which is probably what we want.
> >
> > To what end? Except for EINIT, none of the ENCLS leafs are interesting
> > from a permissions perspective. Trapping and re-executing ENCLS leafs
> > is painful, e.g. most leafs have multiple virtual addresses that need to
> > be translated. And routing everything through the regular interface
> > would make SGX even slower than it already is, e.g. every ENCLS would
> > take an additional ~900 cycles just to handle the VM-Exit, and that's
> > not accounting for any additional overhead in the SGX code, e.g. using
> > the regular interface would mean superfluous locks, etc...
>
> Trapping EINIT is what I have in mind.

Phew, had me worried :-)

> >
> > Couldn't we require the same privilege/capability for VMs and and EINIT
> > tokens? I.e. /dev/sgx/virtualmachine can only be opened by a user that
> > can also generate tokens.
>
> Hmm, maybe. Or we can use Jarkko’s securityfs attribute thingy.
>
> Concretely, I think there are two things we care about:
>
> First, if the host enforces some policy as to which enclaves can
> launch, then it should apply the same policy to guests — otherwise KVM
> lets programs do an end run around the policy. So, in the initial
> incarnation of this, QEMU should probably have to open the provision
> attribute fd if it wants its guest to be able to EINIT a provisioning
> enclave. When someone inevitably adds an EINIT LSM hook, the KVM
> interface should also call it.

Sort of. A guest that is running under KVM (i.e. VMX) is much more
contained than a random userspace program. A rogue enclave in a VMX
guest can attack the guest kernel/OS, but barring a bug (or more likely,
several major bugs) elsewhere in the virtualization stack the enclave
can't do anything nasty to the host. An enclave would let someone hide
code, but enclaves are even more restricted than cpl3, i.e. there's not
a lot it can do without coordinating with unencrypted code in the guest.

And if someone has sufficient permissions to run a KVM guest, they're
much more likely to do something malcious in the guest kernel, not an
enclave.

All that aside, I don't see any justification for singling out SGX for
extra scrutiny, there are other ways for a user with KVM permissions to
hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}.

> Second, the normal enclave interface won't allow user code to supply
> an EINITTOKEN, so the KVM interface will presumably need to be
> different, unless we're going to emulate EINIT by ignoring the token.
> That seems like a very strange thing to do.

2019-01-11 14:20:50

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote:
> On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
> > <[email protected]> wrote:
> > >
> > > Cleaner in the sense that it's faster to get basic support up and running
> > > since there are fewer touchpoints, but there are long term ramifications
> > > to cramming EPC management in KVM.
> > >
> > > And at this point I'm not stating any absolutes, e.g. how EPC will be
> > > handled by KVM. What I'm pushing for is to not eliminate the possibility
> > > of having the SGX subsystem own all EPC management, e.g. don't tie
> > > /dev/sgx to a single enclave.
> >
> > I haven't gone and re-read all the relevant SDM bits, so I'll just
> > ask: what, if anything, are the actual semantics of mapping "raw EPC"
> > like this? You can't actually do anything with the mapping from user
> > mode unless you actually get an enclave created and initialized in it
> > and have it mapped at the correct linear address, right? I still
> > think you have the right idea, but it is a bit unusual.
>
> Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
> But I'd argue that it's not unusual, just different. And really it's not
> all that different than userspace mmap'ing /dev/sgx/enclave prior to
> ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
> access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
> to consider any faulting EPC address without an associated enclave as
> illegal, e.g. signals SIGBUS.
>
> The /dev/sgx/epc case simply has different semantics for moving pages in
> and out of the EPC, i.e. different fault and eviction semantics. Yes,
> this allows the guest kernel to directly access the "raw" EPC, but that's
> conceptually in line with hardware where priveleged software can directly
> "access" the EPC (or rather, the abort page for all intents and purposes).
> I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
> but IMO it's not unusual.
>
> Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
> /dev/sgx/virtualmachine might be more appropriate.

What do you mean by saying "requiring certain privileges"? Are you
saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I
will use for the device *if* it is required) device would require
differet privileged than /dev/sgx?

/Jarkko

2019-01-11 17:12:25

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Jan 11, 2019 at 02:58:26PM +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote:
> > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
> > > <[email protected]> wrote:
> > > >
> > > > Cleaner in the sense that it's faster to get basic support up and running
> > > > since there are fewer touchpoints, but there are long term ramifications
> > > > to cramming EPC management in KVM.
> > > >
> > > > And at this point I'm not stating any absolutes, e.g. how EPC will be
> > > > handled by KVM. What I'm pushing for is to not eliminate the possibility
> > > > of having the SGX subsystem own all EPC management, e.g. don't tie
> > > > /dev/sgx to a single enclave.
> > >
> > > I haven't gone and re-read all the relevant SDM bits, so I'll just
> > > ask: what, if anything, are the actual semantics of mapping "raw EPC"
> > > like this? You can't actually do anything with the mapping from user
> > > mode unless you actually get an enclave created and initialized in it
> > > and have it mapped at the correct linear address, right? I still
> > > think you have the right idea, but it is a bit unusual.
> >
> > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
> > But I'd argue that it's not unusual, just different. And really it's not
> > all that different than userspace mmap'ing /dev/sgx/enclave prior to
> > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
> > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
> > to consider any faulting EPC address without an associated enclave as
> > illegal, e.g. signals SIGBUS.
> >
> > The /dev/sgx/epc case simply has different semantics for moving pages in
> > and out of the EPC, i.e. different fault and eviction semantics. Yes,
> > this allows the guest kernel to directly access the "raw" EPC, but that's
> > conceptually in line with hardware where priveleged software can directly
> > "access" the EPC (or rather, the abort page for all intents and purposes).
> > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
> > but IMO it's not unusual.
> >
> > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
> > /dev/sgx/virtualmachine might be more appropriate.
>
> What do you mean by saying "requiring certain privileges"? Are you
> saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I
> will use for the device *if* it is required) device would require
> differet privileged than /dev/sgx?

I'd say that it would be a good goal to be able to create a VM as a
normal user and use SGX.

/Jarkko

2019-01-11 20:52:41

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Thu, Jan 10, 2019 at 01:36:15PM -0800, Andy Lutomirski wrote:
> > Does it even matter if just leave EINITTOKENKEY attribute unprivileged
> > given that Linux requires that MSRs are writable? Maybe I'll just
> > whitelist that attribute to any enclave?
> >
>
> I would at least make it work like the PROVISIONKEY bit (or whatever
> it's called). Or just deny it at first. It's easy to start allowing
> it if we need to down the road, but it's harder to start denying it.

I think that would be a great idea to add another file to securityfs
for this. Would fit perfectly to your "systemd privilege sharing"
daemon example. Here consistency would be really nice.

/Jarkko

2019-01-11 23:21:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Jan 11, 2019 at 02:58:26PM +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote:
> > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
> > > <[email protected]> wrote:
> > > >
> > > > Cleaner in the sense that it's faster to get basic support up and running
> > > > since there are fewer touchpoints, but there are long term ramifications
> > > > to cramming EPC management in KVM.
> > > >
> > > > And at this point I'm not stating any absolutes, e.g. how EPC will be
> > > > handled by KVM. What I'm pushing for is to not eliminate the possibility
> > > > of having the SGX subsystem own all EPC management, e.g. don't tie
> > > > /dev/sgx to a single enclave.
> > >
> > > I haven't gone and re-read all the relevant SDM bits, so I'll just
> > > ask: what, if anything, are the actual semantics of mapping "raw EPC"
> > > like this? You can't actually do anything with the mapping from user
> > > mode unless you actually get an enclave created and initialized in it
> > > and have it mapped at the correct linear address, right? I still
> > > think you have the right idea, but it is a bit unusual.
> >
> > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
> > But I'd argue that it's not unusual, just different. And really it's not
> > all that different than userspace mmap'ing /dev/sgx/enclave prior to
> > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
> > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
> > to consider any faulting EPC address without an associated enclave as
> > illegal, e.g. signals SIGBUS.
> >
> > The /dev/sgx/epc case simply has different semantics for moving pages in
> > and out of the EPC, i.e. different fault and eviction semantics. Yes,
> > this allows the guest kernel to directly access the "raw" EPC, but that's
> > conceptually in line with hardware where priveleged software can directly
> > "access" the EPC (or rather, the abort page for all intents and purposes).
> > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
> > but IMO it's not unusual.
> >
> > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
> > /dev/sgx/virtualmachine might be more appropriate.
>
> What do you mean by saying "requiring certain privileges"? Are you
> saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I
> will use for the device *if* it is required) device would require
> differet privileged than /dev/sgx?

I don't think it would be mandatory, especially if PROVISION and EINITTOKEN
attributes are routed through securityfs, but it might be nice to have
since the functionality provided by /dev/vmsgx would be different than
/dev/sgx.

Side topic, what's the reasoning for doing /dev/sgx and /dev/vmsgx instead
of /dev/sgx/{enclave,vm,etc...}?

2019-01-18 14:39:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: x86/sgx: uapi change proposal

On Fri, Jan 11, 2019 at 03:19:56PM -0800, Sean Christopherson wrote:
> On Fri, Jan 11, 2019 at 02:58:26PM +0200, Jarkko Sakkinen wrote:
> > On Wed, Jan 09, 2019 at 08:31:37AM -0800, Sean Christopherson wrote:
> > > On Tue, Jan 08, 2019 at 02:54:11PM -0800, Andy Lutomirski wrote:
> > > > On Tue, Jan 8, 2019 at 2:09 PM Sean Christopherson
> > > > <[email protected]> wrote:
> > > > >
> > > > > Cleaner in the sense that it's faster to get basic support up and running
> > > > > since there are fewer touchpoints, but there are long term ramifications
> > > > > to cramming EPC management in KVM.
> > > > >
> > > > > And at this point I'm not stating any absolutes, e.g. how EPC will be
> > > > > handled by KVM. What I'm pushing for is to not eliminate the possibility
> > > > > of having the SGX subsystem own all EPC management, e.g. don't tie
> > > > > /dev/sgx to a single enclave.
> > > >
> > > > I haven't gone and re-read all the relevant SDM bits, so I'll just
> > > > ask: what, if anything, are the actual semantics of mapping "raw EPC"
> > > > like this? You can't actually do anything with the mapping from user
> > > > mode unless you actually get an enclave created and initialized in it
> > > > and have it mapped at the correct linear address, right? I still
> > > > think you have the right idea, but it is a bit unusual.
> > >
> > > Correct, the EPC is inaccessible until a range is "mapped" with ECREATE.
> > > But I'd argue that it's not unusual, just different. And really it's not
> > > all that different than userspace mmap'ing /dev/sgx/enclave prior to
> > > ioctl(ENCLAVE_CREATE). In that case, userspace can still (attempt to)
> > > access the "raw" EPC, i.e. generate a #PF, the kernel/driver just happens
> > > to consider any faulting EPC address without an associated enclave as
> > > illegal, e.g. signals SIGBUS.
> > >
> > > The /dev/sgx/epc case simply has different semantics for moving pages in
> > > and out of the EPC, i.e. different fault and eviction semantics. Yes,
> > > this allows the guest kernel to directly access the "raw" EPC, but that's
> > > conceptually in line with hardware where priveleged software can directly
> > > "access" the EPC (or rather, the abort page for all intents and purposes).
> > > I.e. it's an argument for requiring certain privileges to open /dev/sgx/epc,
> > > but IMO it's not unusual.
> > >
> > > Maybe /dev/sgx/epc is a poor name and is causing confusion, e.g.
> > > /dev/sgx/virtualmachine might be more appropriate.
> >
> > What do you mean by saying "requiring certain privileges"? Are you
> > saying that "raw EPC" (lets say /dev/vmsgx, which probably the name I
> > will use for the device *if* it is required) device would require
> > differet privileged than /dev/sgx?
>
> I don't think it would be mandatory, especially if PROVISION and EINITTOKEN
> attributes are routed through securityfs, but it might be nice to have
> since the functionality provided by /dev/vmsgx would be different than
> /dev/sgx.
>
> Side topic, what's the reasoning for doing /dev/sgx and /dev/vmsgx instead
> of /dev/sgx/{enclave,vm,etc...}?

I don't see we having more than two devices.

Directory hierarchies would make sense if there was variable numer of
stuff initialized.

/Jarkko