2020-04-28 15:28:03

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 00/75] x86: SEV-ES Guest Support

Hi,

here is the next version of changes to enable Linux to run as an SEV-ES
guest. The code was rebased to v5.7-rc3 and got a fair number of changes
since the last version.

What is SEV-ES
==============

SEV-ES is an acronym for 'Secure Encrypted Virtualization - Encrypted
State' and means a hardware feature of AMD processors which hides the
register state of VCPUs to the hypervisor by encrypting it. The
hypervisor can't read or make changes to the guests register state.

Most intercepts set by the hypervisor do not cause a #VMEXIT of the
guest anymore, but turn into a VMM Communication Exception (#VC
exception, vector 29) inside the guest. The error-code of this exception
is the intercept exit-code that caused the exception. The guest handles
the #VC exception by communicating with the hypervisor through a shared
data structure, the 'Guest-Hypervisor-Communication-Block'.

With SEV-ES an untrusted hypervisor can no longer steal secrets from a
guest via inspecting guest memory or guest register contents. A
malicious hypervisor can still interfere with guest operations, but the
SEV-ES client code does its best to detect such situations and crash the
VM if it happens.

More information about the implementation details can be found in the
cover letter of the initial post of these patches:

https://lore.kernel.org/lkml/[email protected]/

Current State of the Patches
============================

The patches posted here are considered feature complete and survived a
good amount of functional testing:

1) Booting an SEV-ES guest VM to the graphical desktop.

2) Running a 16-vcpu SEV-ES VM with 'perf top' and the x86
selftests shipped with the kernel source in a loop in
parallel for 18 hours showed no issues.

3) Various compile testing has been done, including
allno/allmod/allyes/defconfig for x86-64 and i386.

4) Compiled every single commit (single .config only) to check
if they build and do not introduce new warnings.

5) Boot-tested the changes on various machines, including
bare-metal on an AMD (with and without mem_encrypt=on) and
Intel machine.

A git-tree with these patches applied can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=sev-es-client-v5.7-rc3

Changes to the previous version
===============================

Here is an incomplete list of changes to the previous version of this
patch-set. There have been so many small changes that I havn't kept
track of all, but the most important ones are listed:


- Rebased to v5.7-rc3

- Changes the #VC exception handler to use an IST stack. This
includes a couple of additional patches to set up and map the
IST stack, to make dumpstack aware of them and to fix a race
with the NMI handler when shifting the #VC handlers IST entry.

- The NMI_Complete message to the hypervisor the re-open the NMI
window is now sent at the very beginning of do_nmi().

- The GHCB used in the pre-decompression code is now re-mapped
encrypted and flushed from the cache before jumping to the
decompressed kernel image. This is needed to make sure the
running kernel can safely re-use the page. Actually the page
is also unmapped after being re-encrypted to catch any
use-after-remap.

- Added CPUID caching.

- Mapped the GHCBs to the EFI page-tables as the UEFI BIOS
expects to be able to re-use the OS GHCBs.

- RDTSC and RDTSCP are now also handled in the pre-decompression
boot code. These instructions are used by KASLR and some
hypervisors might intercept them.

- Re-implemented nested GHCB handling by keeping a backup GHCB
around. This supports one level of GHCB nesting, which is
sufficient for now.

- Moved all SEV-ES related per-cpu data into
'struct sev_es_runtime_data'. This struct is allocated and
initialized at boot per cpu.

- Correctly set the protocol and ghcb_usage information when
talking to the hypervisor.

The previous version of the patch-set can be found here:

https://lore.kernel.org/lkml/[email protected]/

Please review.


Thanks,

Joerg

Borislav Petkov (1):
KVM: SVM: Use __packed shorthand

Doug Covelli (1):
x86/vmware: Add VMware specific handling for VMMCALL under SEV-ES

Joerg Roedel (53):
KVM: SVM: Add GHCB Accessor functions
x86/traps: Move some definitions to <asm/trap_defs.h>
x86/insn: Make inat-tables.c suitable for pre-decompression code
x86/umip: Factor out instruction fetch
x86/umip: Factor out instruction decoding
x86/insn: Add insn_get_modrm_reg_off()
x86/insn: Add insn_rep_prefix() helper
x86/boot/compressed/64: Disable red-zone usage
x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loaded
x86/boot/compressed/64: Add IDT Infrastructure
x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c
x86/boot/compressed/64: Add page-fault handler
x86/boot/compressed/64: Always switch to own page-table
x86/boot/compressed/64: Don't pre-map memory in KASLR code
x86/boot/compressed/64: Change add_identity_map() to take start and
end
x86/boot/compressed/64: Add stage1 #VC handler
x86/boot/compressed/64: Call set_sev_encryption_mask earlier
x86/boot/compressed/64: Check return value of
kernel_ident_mapping_init()
x86/boot/compressed/64: Add set_page_en/decrypted() helpers
x86/boot/compressed/64: Setup GHCB Based VC Exception handler
x86/boot/compressed/64: Unmap GHCB page before booting the kernel
x86/fpu: Move xgetbv()/xsetbv() into separate header
x86/idt: Move IDT to data segment
x86/idt: Split idt_data setup out of set_intr_gate()
x86/idt: Move two function from k/idt.c to i/a/desc.h
x86/head/64: Install boot GDT
x86/head/64: Reload GDT after switch to virtual addresses
x86/head/64: Load segment registers earlier
x86/head/64: Switch to initial stack earlier
x86/head/64: Build k/head64.c with -fno-stack-protector
x86/head/64: Load IDT earlier
x86/head/64: Move early exception dispatch to C code
x86/sev-es: Add SEV-ES Feature Detection
x86/sev-es: Print SEV-ES info into kernel log
x86/sev-es: Compile early handler code into kernel image
x86/sev-es: Setup early #VC handler
x86/sev-es: Setup GHCB based boot #VC handler
x86/sev-es: Allocate and Map IST stacks for #VC handler
x86/dumpstack/64: Handle #VC exception stacks
x86/sev-es: Shift #VC IST Stack in nmi_enter()/nmi_exit()
x86/sev-es: Wire up existing #VC exit-code handlers
x86/sev-es: Handle instruction fetches from user-space
x86/sev-es: Do not crash on #VC exceptions from user-space
x86/sev-es: Handle MMIO String Instructions
x86/sev-es: Handle #AC Events
x86/sev-es: Handle #DB Events
x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES
x86/realmode: Add SEV-ES specific trampoline entry point
x86/head/64: Setup TSS early for secondary CPUs
x86/head/64: Don't call verify_cpu() on starting APs
x86/head/64: Rename start_cpu0
x86/sev-es: Support CPU offline/online
x86/sev-es: Handle NMI State

Mike Stunes (1):
x86/sev-es: Cache CPUID results for improved performance

Tom Lendacky (19):
KVM: SVM: Add GHCB definitions
x86/cpufeatures: Add SEV-ES CPU feature
x86/sev-es: Add support for handling IOIO exceptions
x86/sev-es: Add CPUID handling to #VC handler
x86/sev-es: Setup per-cpu GHCBs for the runtime handler
x86/sev-es: Add Runtime #VC Exception Handler
x86/sev-es: Handle MMIO events
x86/sev-es: Handle MSR events
x86/sev-es: Handle DR7 read/write events
x86/sev-es: Handle WBINVD Events
x86/sev-es: Handle RDTSC(P) Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle VMMCALL Events
x86/kvm: Add KVM specific VMMCALL handling under SEV-ES
x86/realmode: Setup AP jump table
x86/efi: Add GHCB mappings when SEV-ES is active

arch/x86/Kconfig | 1 +
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/Makefile | 9 +-
arch/x86/boot/compressed/head_64.S | 40 +-
arch/x86/boot/compressed/ident_map_64.c | 339 +++++
arch/x86/boot/compressed/idt_64.c | 53 +
arch/x86/boot/compressed/idt_handlers_64.S | 76 ++
arch/x86/boot/compressed/kaslr.c | 36 +-
arch/x86/boot/compressed/kaslr_64.c | 153 ---
arch/x86/boot/compressed/misc.c | 7 +
arch/x86/boot/compressed/misc.h | 45 +-
arch/x86/boot/compressed/sev-es.c | 210 +++
arch/x86/entry/entry_64.S | 4 +
arch/x86/include/asm/cpu.h | 2 +-
arch/x86/include/asm/cpu_entry_area.h | 62 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/desc.h | 30 +
arch/x86/include/asm/desc_defs.h | 10 +
arch/x86/include/asm/fpu/internal.h | 29 +-
arch/x86/include/asm/fpu/xcr.h | 32 +
arch/x86/include/asm/hardirq.h | 14 +
arch/x86/include/asm/insn-eval.h | 6 +
arch/x86/include/asm/mem_encrypt.h | 5 +
arch/x86/include/asm/msr-index.h | 3 +
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/pgtable.h | 2 +-
arch/x86/include/asm/processor.h | 1 +
arch/x86/include/asm/realmode.h | 4 +
arch/x86/include/asm/segment.h | 2 +-
arch/x86/include/asm/setup.h | 1 -
arch/x86/include/asm/sev-es.h | 107 ++
arch/x86/include/asm/stacktrace.h | 4 +
arch/x86/include/asm/svm.h | 115 +-
arch/x86/include/asm/trap_defs.h | 50 +
arch/x86/include/asm/traps.h | 51 +-
arch/x86/include/asm/x86_init.h | 16 +-
arch/x86/include/uapi/asm/svm.h | 11 +
arch/x86/kernel/Makefile | 5 +
arch/x86/kernel/asm-offsets_64.c | 1 +
arch/x86/kernel/cpu/amd.c | 3 +-
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/cpu/vmware.c | 50 +-
arch/x86/kernel/dumpstack_64.c | 47 +
arch/x86/kernel/head64.c | 70 +-
arch/x86/kernel/head_32.S | 4 +-
arch/x86/kernel/head_64.S | 176 ++-
arch/x86/kernel/idt.c | 52 +-
arch/x86/kernel/kvm.c | 35 +-
arch/x86/kernel/nmi.c | 8 +
arch/x86/kernel/sev-es-shared.c | 479 +++++++
arch/x86/kernel/sev-es.c | 1428 ++++++++++++++++++++
arch/x86/kernel/smpboot.c | 4 +-
arch/x86/kernel/traps.c | 3 +
arch/x86/kernel/umip.c | 49 +-
arch/x86/lib/insn-eval.c | 130 ++
arch/x86/mm/extable.c | 1 +
arch/x86/mm/mem_encrypt.c | 39 +-
arch/x86/mm/mem_encrypt_identity.c | 3 +
arch/x86/platform/efi/efi_64.c | 10 +
arch/x86/realmode/init.c | 12 +
arch/x86/realmode/rm/header.S | 3 +
arch/x86/realmode/rm/trampoline_64.S | 20 +
arch/x86/tools/gen-insn-attr-x86.awk | 50 +-
tools/arch/x86/tools/gen-insn-attr-x86.awk | 50 +-
65 files changed, 3842 insertions(+), 426 deletions(-)
create mode 100644 arch/x86/boot/compressed/ident_map_64.c
create mode 100644 arch/x86/boot/compressed/idt_64.c
create mode 100644 arch/x86/boot/compressed/idt_handlers_64.S
delete mode 100644 arch/x86/boot/compressed/kaslr_64.c
create mode 100644 arch/x86/boot/compressed/sev-es.c
create mode 100644 arch/x86/include/asm/fpu/xcr.h
create mode 100644 arch/x86/include/asm/sev-es.h
create mode 100644 arch/x86/include/asm/trap_defs.h
create mode 100644 arch/x86/kernel/sev-es-shared.c
create mode 100644 arch/x86/kernel/sev-es.c

--
2.17.1


2020-04-28 15:28:06

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 28/75] x86/idt: Move IDT to data segment

From: Joerg Roedel <[email protected]>

With SEV-ES, exception handling is needed very early, even before the
kernel has cleared the bss segment. In order to prevent clearing the
currently used IDT, move the IDT to the data segment.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kernel/idt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 87ef69a72c52..a8fc01ea602a 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -165,8 +165,12 @@ static const __initconst struct idt_data dbg_idts[] = {
};
#endif

-/* Must be page-aligned because the real IDT is used in a fixmap. */
-gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
+/*
+ * Must be page-aligned because the real IDT is used in a fixmap.
+ * Also needs to be in the .data segment, because the idt_table is
+ * needed before the kernel clears the .bss segment.
+ */
+gate_desc idt_table[IDT_ENTRIES] __page_aligned_data;

struct desc_ptr idt_descr __ro_after_init = {
.size = (IDT_ENTRIES * 2 * sizeof(unsigned long)) - 1,
--
2.17.1

2020-04-28 15:28:28

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 29/75] x86/idt: Split idt_data setup out of set_intr_gate()

From: Joerg Roedel <[email protected]>

The code to setup idt_data is needed for early exception handling, but
set_intr_gate() can't be used that early because it has pv-ops in its
code path, which don't work that early.

Split out the idt_data initialization part from set_intr_gate() so
that it can be used separatly.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kernel/idt.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a8fc01ea602a..c752027abc9e 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -231,18 +231,24 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sy
}
}

+static void init_idt_data(struct idt_data *data, unsigned int n,
+ const void *addr)
+{
+ BUG_ON(n > 0xFF);
+
+ memset(data, 0, sizeof(*data));
+ data->vector = n;
+ data->addr = addr;
+ data->segment = __KERNEL_CS;
+ data->bits.type = GATE_INTERRUPT;
+ data->bits.p = 1;
+}
+
static void set_intr_gate(unsigned int n, const void *addr)
{
struct idt_data data;

- BUG_ON(n > 0xFF);
-
- memset(&data, 0, sizeof(data));
- data.vector = n;
- data.addr = addr;
- data.segment = __KERNEL_CS;
- data.bits.type = GATE_INTERRUPT;
- data.bits.p = 1;
+ init_idt_data(&data, n, addr);

idt_setup_from_table(idt_table, &data, 1, false);
}
--
2.17.1

2020-04-28 15:28:46

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 23/75] x86/boot/compressed/64: Setup GHCB Based VC Exception handler

From: Joerg Roedel <[email protected]>

Install an exception handler for #VC exception that uses a GHCB. Also
add the infrastructure for handling different exit-codes by decoding
the instruction that caused the exception and error handling.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/boot/compressed/Makefile | 3 +
arch/x86/boot/compressed/idt_64.c | 4 +
arch/x86/boot/compressed/idt_handlers_64.S | 3 +-
arch/x86/boot/compressed/misc.c | 7 +
arch/x86/boot/compressed/misc.h | 7 +
arch/x86/boot/compressed/sev-es.c | 110 +++++++++++++++
arch/x86/include/asm/sev-es.h | 39 ++++++
arch/x86/include/uapi/asm/svm.h | 1 +
arch/x86/kernel/sev-es-shared.c | 154 +++++++++++++++++++++
10 files changed, 328 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1197b5596d5a..2ba5f74f186d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1523,6 +1523,7 @@ config AMD_MEM_ENCRYPT
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+ select INSTRUCTION_DECODER
---help---
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index a7847a1ef63a..8372b85c9c0e 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -41,6 +41,9 @@ KBUILD_CFLAGS += -Wno-pointer-sign
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

+# sev-es.c inludes generated $(objtree)/arch/x86/lib/inat-tables.c
+CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
+
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
UBSAN_SANITIZE :=n
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index f8295d68b3e1..44d20c4f47c9 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -45,5 +45,9 @@ void load_stage2_idt(void)

set_idt_entry(X86_TRAP_PF, boot_page_fault);

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ set_idt_entry(X86_TRAP_VC, boot_stage2_vc);
+#endif
+
load_boot_idt(&boot_idt_desc);
}
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
index 8473bf88e64e..bd058aa21e4f 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -71,5 +71,6 @@ SYM_FUNC_END(\name)
EXCEPTION_HANDLER boot_page_fault do_boot_page_fault error_code=1

#ifdef CONFIG_AMD_MEM_ENCRYPT
-EXCEPTION_HANDLER boot_stage1_vc do_vc_no_ghcb error_code=1
+EXCEPTION_HANDLER boot_stage1_vc do_vc_no_ghcb error_code=1
+EXCEPTION_HANDLER boot_stage2_vc do_boot_stage2_vc error_code=1
#endif
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 9652d5c2afda..dba49e75095a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -441,6 +441,13 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
parse_elf(output);
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");
+
+ /*
+ * Flush GHCB from cache and map it encrypted again when running as
+ * SEV-ES guest.
+ */
+ sev_es_shutdown_ghcb();
+
return output;
}

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 5e569e8a7d75..4d37a28370ed 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -115,6 +115,12 @@ static inline void console_init(void)

void set_sev_encryption_mask(void);

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+void sev_es_shutdown_ghcb(void);
+#else
+static inline void sev_es_shutdown_ghcb(void) { }
+#endif
+
/* acpi.c */
#ifdef CONFIG_ACPI
acpi_physical_address get_rsdp_addr(void);
@@ -144,5 +150,6 @@ extern struct desc_ptr boot_idt_desc;
/* IDT Entry Points */
void boot_page_fault(void);
void boot_stage1_vc(void);
+void boot_stage2_vc(void);

#endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index bb91cbb5920e..940d72571fc9 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -13,10 +13,16 @@
#include "misc.h"

#include <asm/sev-es.h>
+#include <asm/trap_defs.h>
#include <asm/msr-index.h>
#include <asm/ptrace.h>
#include <asm/svm.h>

+#include "error.h"
+
+struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
+struct ghcb *boot_ghcb;
+
static inline u64 sev_es_rd_ghcb_msr(void)
{
unsigned long low, high;
@@ -38,8 +44,112 @@ static inline void sev_es_wr_ghcb_msr(u64 val)
"a"(low), "d" (high) : "memory");
}

+static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt)
+{
+ char buffer[MAX_INSN_SIZE];
+ enum es_result ret;
+
+ memcpy(buffer, (unsigned char *)ctxt->regs->ip, MAX_INSN_SIZE);
+
+ insn_init(&ctxt->insn, buffer, MAX_INSN_SIZE, 1);
+ insn_get_length(&ctxt->insn);
+
+ ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;
+
+ return ret;
+}
+
+static enum es_result vc_write_mem(struct es_em_ctxt *ctxt,
+ void *dst, char *buf, size_t size)
+{
+ memcpy(dst, buf, size);
+
+ return ES_OK;
+}
+
+static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
+ void *src, char *buf, size_t size)
+{
+ memcpy(buf, src, size);
+
+ return ES_OK;
+}
+
#undef __init
+#undef __pa
#define __init
+#define __pa(x) ((unsigned long)(x))
+
+#define __BOOT_COMPRESSED
+
+/* Basic instruction decoding support needed */
+#include "../../lib/inat.c"
+#include "../../lib/insn.c"

/* Include code for early handlers */
#include "../../kernel/sev-es-shared.c"
+
+static bool sev_es_setup_ghcb(void)
+{
+ if (!sev_es_negotiate_protocol())
+ sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
+
+ if (set_page_decrypted((unsigned long)&boot_ghcb_page))
+ return false;
+
+ /* Page is now mapped decrypted, clear it */
+ memset(&boot_ghcb_page, 0, sizeof(boot_ghcb_page));
+
+ boot_ghcb = &boot_ghcb_page;
+
+ /* Initialize lookup tables for the instruction decoder */
+ inat_init_tables();
+
+ return true;
+}
+
+void sev_es_shutdown_ghcb(void)
+{
+ if (!boot_ghcb)
+ return;
+
+ /*
+ * GHCB Page must be flushed from the cache and mapped encrypted again.
+ * Otherwise the running kernel will see strange cache effects when
+ * trying to use that page.
+ */
+ if (set_page_encrypted((unsigned long)&boot_ghcb_page))
+ error("Can't map GHCB page encrypted");
+}
+
+void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)
+{
+ struct es_em_ctxt ctxt;
+ enum es_result result;
+
+ if (!boot_ghcb && !sev_es_setup_ghcb())
+ sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+ vc_ghcb_invalidate(boot_ghcb);
+ result = vc_init_em_ctxt(&ctxt, regs, exit_code);
+ if (result != ES_OK)
+ goto finish;
+
+ switch (exit_code) {
+ default:
+ result = ES_UNSUPPORTED;
+ break;
+ }
+
+finish:
+ if (result == ES_OK) {
+ vc_finish_insn(&ctxt);
+ } else if (result != ES_RETRY) {
+ /*
+ * For now, just halt the machine. That makes debugging easier,
+ * later we just call sev_es_terminate() here.
+ */
+ while (true)
+ asm volatile("hlt\n");
+ }
+}
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 5d49a8a429d3..7c0807b84546 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -9,7 +9,14 @@
#define __ASM_ENCRYPTED_STATE_H

#include <linux/types.h>
+#include <asm/insn.h>

+#define GHCB_SEV_INFO 0x001UL
+#define GHCB_SEV_INFO_REQ 0x002UL
+#define GHCB_INFO(v) ((v) & 0xfffUL)
+#define GHCB_PROTO_MAX(v) (((v) >> 48) & 0xffffUL)
+#define GHCB_PROTO_MIN(v) (((v) >> 32) & 0xffffUL)
+#define GHCB_PROTO_OUR 0x0001UL
#define GHCB_SEV_CPUID_REQ 0x004UL
#define GHCB_CPUID_REQ_EAX 0
#define GHCB_CPUID_REQ_EBX 1
@@ -19,12 +26,44 @@
(((unsigned long)reg & 3) << 30) | \
(((unsigned long)fn) << 32))

+#define GHCB_PROTOCOL_MAX 0x0001UL
+#define GHCB_DEFAULT_USAGE 0x0000UL
+
#define GHCB_SEV_CPUID_RESP 0x005UL
#define GHCB_SEV_TERMINATE 0x100UL
+#define GHCB_SEV_TERMINATE_REASON(reason_set, reason_val) \
+ (((((u64)reason_set) & 0x7) << 12) | \
+ ((((u64)reason_val) & 0xff) << 16))
+#define GHCB_SEV_ES_REASON_GENERAL_REQUEST 0
+#define GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED 1

#define GHCB_SEV_GHCB_RESP_CODE(v) ((v) & 0xfff)
#define VMGEXIT() { asm volatile("rep; vmmcall\n\r"); }

+enum es_result {
+ ES_OK, /* All good */
+ ES_UNSUPPORTED, /* Requested operation not supported */
+ ES_VMM_ERROR, /* Unexpected state from the VMM */
+ ES_DECODE_FAILED, /* Instruction decoding failed */
+ ES_EXCEPTION, /* Instruction caused exception */
+ ES_RETRY, /* Retry instruction emulation */
+};
+
+struct es_fault_info {
+ unsigned long vector;
+ unsigned long error_code;
+ unsigned long cr2;
+};
+
+struct pt_regs;
+
+/* ES instruction emulation context */
+struct es_em_ctxt {
+ struct pt_regs *regs;
+ struct insn insn;
+ struct es_fault_info fi;
+};
+
void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code);

static inline u64 lower_bits(u64 val, unsigned int bits)
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..c68d1618c9b0 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -29,6 +29,7 @@
#define SVM_EXIT_WRITE_DR6 0x036
#define SVM_EXIT_WRITE_DR7 0x037
#define SVM_EXIT_EXCP_BASE 0x040
+#define SVM_EXIT_LAST_EXCP 0x05f
#define SVM_EXIT_INTR 0x060
#define SVM_EXIT_NMI 0x061
#define SVM_EXIT_SMI 0x062
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 5927152487ad..22eb3ed89186 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -9,6 +9,118 @@
* and is included directly into both code-bases.
*/

+static void sev_es_terminate(unsigned int reason)
+{
+ u64 val = GHCB_SEV_TERMINATE;
+
+ /*
+ * Tell the hypervisor what went wrong - only reason-set 0 is
+ * currently supported.
+ */
+ val |= GHCB_SEV_TERMINATE_REASON(0, reason);
+
+ /* Request Guest Termination from Hypvervisor */
+ sev_es_wr_ghcb_msr(val);
+ VMGEXIT();
+
+ while (true)
+ asm volatile("hlt\n" : : : "memory");
+}
+
+static bool sev_es_negotiate_protocol(void)
+{
+ u64 val;
+
+ /* Do the GHCB protocol version negotiation */
+ sev_es_wr_ghcb_msr(GHCB_SEV_INFO_REQ);
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_INFO(val) != GHCB_SEV_INFO)
+ return false;
+
+ if (GHCB_PROTO_MAX(val) < GHCB_PROTO_OUR ||
+ GHCB_PROTO_MIN(val) > GHCB_PROTO_OUR)
+ return false;
+
+ return true;
+}
+
+static void vc_ghcb_invalidate(struct ghcb *ghcb)
+{
+ memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
+}
+
+static bool vc_decoding_needed(unsigned long exit_code)
+{
+ /* Exceptions don't require to decode the instruction */
+ return !(exit_code >= SVM_EXIT_EXCP_BASE &&
+ exit_code <= SVM_EXIT_LAST_EXCP);
+}
+
+static enum es_result vc_init_em_ctxt(struct es_em_ctxt *ctxt,
+ struct pt_regs *regs,
+ unsigned long exit_code)
+{
+ enum es_result ret = ES_OK;
+
+ memset(ctxt, 0, sizeof(*ctxt));
+ ctxt->regs = regs;
+
+ if (vc_decoding_needed(exit_code))
+ ret = vc_decode_insn(ctxt);
+
+ return ret;
+}
+
+static void vc_finish_insn(struct es_em_ctxt *ctxt)
+{
+ ctxt->regs->ip += ctxt->insn.length;
+}
+
+static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+ struct es_em_ctxt *ctxt,
+ u64 exit_code, u64 exit_info_1,
+ u64 exit_info_2)
+{
+ enum es_result ret;
+
+ /* Fill in protocol and format specifiers */
+ ghcb->protocol_version = GHCB_PROTOCOL_MAX;
+ ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
+
+ ghcb_set_sw_exit_code(ghcb, exit_code);
+ ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
+ ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
+ u64 info = ghcb->save.sw_exit_info_2;
+ unsigned long v;
+
+ info = ghcb->save.sw_exit_info_2;
+ v = info & SVM_EVTINJ_VEC_MASK;
+
+ /* Check if exception information from hypervisor is sane. */
+ if ((info & SVM_EVTINJ_VALID) &&
+ ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
+ ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
+ ctxt->fi.vector = v;
+ if (info & SVM_EVTINJ_VALID_ERR)
+ ctxt->fi.error_code = info >> 32;
+ ret = ES_EXCEPTION;
+ } else {
+ ret = ES_VMM_ERROR;
+ }
+ } else {
+ ret = ES_OK;
+ }
+
+ return ret;
+}
+
/*
* Boot VC Handler - This is the first VC handler during boot, there is no GHCB
* page yet, so it only supports the MSR based communication with the
@@ -63,3 +175,45 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
while (true)
asm volatile("hlt\n");
}
+
+static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
+ void *src, char *buf,
+ unsigned int data_size,
+ unsigned int count,
+ bool backwards)
+{
+ int i, b = backwards ? -1 : 1;
+ enum es_result ret = ES_OK;
+
+ for (i = 0; i < count; i++) {
+ void *s = src + (i * data_size * b);
+ char *d = buf + (i * data_size);
+
+ ret = vc_read_mem(ctxt, s, d, data_size);
+ if (ret != ES_OK)
+ break;
+ }
+
+ return ret;
+}
+
+static enum es_result vc_insn_string_write(struct es_em_ctxt *ctxt,
+ void *dst, char *buf,
+ unsigned int data_size,
+ unsigned int count,
+ bool backwards)
+{
+ int i, s = backwards ? -1 : 1;
+ enum es_result ret = ES_OK;
+
+ for (i = 0; i < count; i++) {
+ void *d = dst + (i * data_size * s);
+ char *b = buf + (i * data_size);
+
+ ret = vc_write_mem(ctxt, d, b, data_size);
+ if (ret != ES_OK)
+ break;
+ }
+
+ return ret;
+}
--
2.17.1

2020-04-28 15:29:18

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 02/75] KVM: SVM: Add GHCB Accessor functions

From: Joerg Roedel <[email protected]>

Building a correct GHCB for the hypervisor requires setting valid bits
in the GHCB. Simplify that process by providing accessor functions to
set values and to update the valid bitmap.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/svm.h | 61 ++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f36288c659b5..e4e9f6bacfaa 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -333,4 +333,65 @@ struct __attribute__ ((__packed__)) vmcb {

#define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)

+/* GHCB Accessor functions */
+
+#define DEFINE_GHCB_INDICES(field) \
+ u16 idx = offsetof(struct vmcb_save_area, field) / 8; \
+ u16 byte_idx = idx / 8; \
+ u16 bit_idx = idx % 8; \
+ BUILD_BUG_ON(byte_idx > ARRAY_SIZE(ghcb->save.valid_bitmap));
+
+#define GHCB_SET_VALID(ghcb, field) \
+ { \
+ DEFINE_GHCB_INDICES(field) \
+ (ghcb)->save.valid_bitmap[byte_idx] |= BIT(bit_idx); \
+ }
+
+#define DEFINE_GHCB_SETTER(field) \
+ static inline void \
+ ghcb_set_##field(struct ghcb *ghcb, u64 value) \
+ { \
+ GHCB_SET_VALID(ghcb, field) \
+ (ghcb)->save.field = value; \
+ }
+
+#define DEFINE_GHCB_ACCESSORS(field) \
+ static inline bool ghcb_is_valid_##field(const struct ghcb *ghcb) \
+ { \
+ DEFINE_GHCB_INDICES(field) \
+ return !!((ghcb)->save.valid_bitmap[byte_idx] \
+ & BIT(bit_idx)); \
+ } \
+ \
+ static inline void \
+ ghcb_set_##field(struct ghcb *ghcb, u64 value) \
+ { \
+ GHCB_SET_VALID(ghcb, field) \
+ (ghcb)->save.field = value; \
+ }
+
+DEFINE_GHCB_ACCESSORS(cpl)
+DEFINE_GHCB_ACCESSORS(rip)
+DEFINE_GHCB_ACCESSORS(rsp)
+DEFINE_GHCB_ACCESSORS(rax)
+DEFINE_GHCB_ACCESSORS(rcx)
+DEFINE_GHCB_ACCESSORS(rdx)
+DEFINE_GHCB_ACCESSORS(rbx)
+DEFINE_GHCB_ACCESSORS(rbp)
+DEFINE_GHCB_ACCESSORS(rsi)
+DEFINE_GHCB_ACCESSORS(rdi)
+DEFINE_GHCB_ACCESSORS(r8)
+DEFINE_GHCB_ACCESSORS(r9)
+DEFINE_GHCB_ACCESSORS(r10)
+DEFINE_GHCB_ACCESSORS(r11)
+DEFINE_GHCB_ACCESSORS(r12)
+DEFINE_GHCB_ACCESSORS(r13)
+DEFINE_GHCB_ACCESSORS(r14)
+DEFINE_GHCB_ACCESSORS(r15)
+DEFINE_GHCB_ACCESSORS(sw_exit_code)
+DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
+DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
+DEFINE_GHCB_ACCESSORS(sw_scratch)
+DEFINE_GHCB_ACCESSORS(xcr0)
+
#endif
--
2.17.1

2020-04-28 15:29:51

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 16/75] x86/boot/compressed/64: Always switch to own page-table

From: Joerg Roedel <[email protected]>

When booted through startup_64 the kernel keeps running on the EFI
page-table until the KASLR code sets up its own page-table. Without
KASLR the pre-decompression boot code never switches off the EFI
page-table. Change that by unconditionally switching to a kernel
controlled page-table after relocation.

This makes sure we can make changes to the mapping when necessary, for
example map pages unencrypted in SEV and SEV-ES guests.

Also remove the debug_putstr() calls in initialize_identity_maps()
because the function now runs before console_init() is called.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 3 +-
arch/x86/boot/compressed/ident_map_64.c | 51 +++++++++++++++----------
arch/x86/boot/compressed/kaslr.c | 3 --
3 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 089b9e676498..af571127c9ba 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -534,10 +534,11 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
rep stosq

/*
- * Load stage2 IDT
+ * Load stage2 IDT and switch to our own page-table
*/
pushq %rsi
call load_stage2_idt
+ call initialize_identity_maps
popq %rsi

/*
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 33bdf923cbab..aa55e7b5cade 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -88,9 +88,31 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
*/
static struct x86_mapping_info mapping_info;

+/*
+ * Adds the specified range to what will become the new identity mappings.
+ * Once all ranges have been added, the new mapping is activated by calling
+ * finalize_identity_maps() below.
+ */
+void add_identity_map(unsigned long start, unsigned long size)
+{
+ unsigned long end = start + size;
+
+ /* Align boundary to 2M. */
+ start = round_down(start, PMD_SIZE);
+ end = round_up(end, PMD_SIZE);
+ if (start >= end)
+ return;
+
+ /* Build the mapping. */
+ kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
+ start, end);
+}
+
/* Locates and clears a region for a new top level page table. */
void initialize_identity_maps(void)
{
+ unsigned long start, size;
+
/* If running as an SEV guest, the encryption mask is required. */
set_sev_encryption_mask();

@@ -123,37 +145,24 @@ void initialize_identity_maps(void)
*/
top_level_pgt = read_cr3_pa();
if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
- debug_putstr("booted via startup_32()\n");
pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
} else {
- debug_putstr("booted via startup_64()\n");
pgt_data.pgt_buf = _pgtable;
pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
}
-}

-/*
- * Adds the specified range to what will become the new identity mappings.
- * Once all ranges have been added, the new mapping is activated by calling
- * finalize_identity_maps() below.
- */
-void add_identity_map(unsigned long start, unsigned long size)
-{
- unsigned long end = start + size;
-
- /* Align boundary to 2M. */
- start = round_down(start, PMD_SIZE);
- end = round_up(end, PMD_SIZE);
- if (start >= end)
- return;
-
- /* Build the mapping. */
- kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
- start, end);
+ /*
+ * New page-table is set up - map the kernel image and load it
+ * into cr3.
+ */
+ start = (unsigned long)_head;
+ size = _end - _head;
+ add_identity_map(start, size);
+ write_cr3(top_level_pgt);
}

/*
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 7c61a8c5b9cf..856dc1c9bb0d 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -903,9 +903,6 @@ void choose_random_location(unsigned long input,

boot_params->hdr.loadflags |= KASLR_FLAG;

- /* Prepare to add new identity pagetables on demand. */
- initialize_identity_maps();
-
/* Record the various known unsafe memory ranges. */
mem_avoid_init(input, input_size, *output);

--
2.17.1

2020-04-28 15:29:55

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 15/75] x86/boot/compressed/64: Add page-fault handler

From: Joerg Roedel <[email protected]>

Install a page-fault handler to add an identity mapping to addresses
not yet mapped. Also do some checking whether the error code is sane.

This makes non SEV-ES machines use the exception handling
infrastructure in the pre-decompressions boot code too, making it less
likely to break in the future.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 33 ++++++++++++++++++++++
arch/x86/boot/compressed/idt_64.c | 2 ++
arch/x86/boot/compressed/idt_handlers_64.S | 2 ++
arch/x86/boot/compressed/misc.h | 6 ++++
4 files changed, 43 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 3a2115582920..33bdf923cbab 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -19,11 +19,13 @@
/* No PAGE_TABLE_ISOLATION support needed either: */
#undef CONFIG_PAGE_TABLE_ISOLATION

+#include "error.h"
#include "misc.h"

/* These actually do the work of building the kernel identity maps. */
#include <asm/init.h>
#include <asm/pgtable.h>
+#include <asm/trap_defs.h>
/* Use the static base for this part of the boot process */
#undef __PAGE_OFFSET
#define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -163,3 +165,34 @@ void finalize_identity_maps(void)
{
write_cr3(top_level_pgt);
}
+
+void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+ unsigned long address = native_read_cr2();
+
+ /*
+ * Check for unexpected error codes. Unexpected are:
+ * - Faults on present pages
+ * - User faults
+ * - Reserved bits set
+ */
+ if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD)) {
+ /* Print some information for debugging */
+ error_putstr("Unexpected page-fault:");
+ error_putstr("\nError Code: ");
+ error_puthex(error_code);
+ error_putstr("\nCR2: 0x");
+ error_puthex(address);
+ error_putstr("\nRIP relative to _head: 0x");
+ error_puthex(regs->ip - (unsigned long)_head);
+ error_putstr("\n");
+
+ error("Stopping.\n");
+ }
+
+ /*
+ * Error code is sane - now identity map the 2M region around
+ * the faulting address.
+ */
+ add_identity_map(address & PMD_MASK, PMD_SIZE);
+}
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index 46ecea671b90..99cc78062684 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -39,5 +39,7 @@ void load_stage2_idt(void)
{
boot_idt_desc.address = (unsigned long)boot_idt;

+ set_idt_entry(X86_TRAP_PF, boot_page_fault);
+
load_boot_idt(&boot_idt_desc);
}
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
index f86ea872d860..eda50cbdafa0 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -67,3 +67,5 @@ SYM_FUNC_END(\name)

.text
.code64
+
+EXCEPTION_HANDLER boot_page_fault do_boot_page_fault error_code=1
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3a030a878d53..345c90fbc500 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -37,6 +37,9 @@
#define memptr unsigned
#endif

+/* boot/compressed/vmlinux start and end markers */
+extern char _head[], _end[];
+
/* misc.c */
extern memptr free_mem_ptr;
extern memptr free_mem_end_ptr;
@@ -146,4 +149,7 @@ extern pteval_t __default_kernel_pte_mask;
extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
extern struct desc_ptr boot_idt_desc;

+/* IDT Entry Points */
+void boot_page_fault(void);
+
#endif /* BOOT_COMPRESSED_MISC_H */
--
2.17.1

2020-04-28 15:30:09

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 01/75] KVM: SVM: Add GHCB definitions

From: Tom Lendacky <[email protected]>

Extend the vmcb_safe_area with SEV-ES fields and add a new
'struct ghcb' which will be used for guest-hypervisor communication.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/svm.h | 42 ++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6ece8561ba66..f36288c659b5 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -201,6 +201,48 @@ struct __attribute__ ((__packed__)) vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
+
+ /*
+ * The following part of the save area is valid only for
+ * SEV-ES guests when referenced through the GHCB.
+ */
+ u8 reserved_7[104];
+ u64 reserved_8; /* rax already available at 0x01f8 */
+ u64 rcx;
+ u64 rdx;
+ u64 rbx;
+ u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 rbp;
+ u64 rsi;
+ u64 rdi;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u8 reserved_10[16];
+ u64 sw_exit_code;
+ u64 sw_exit_info_1;
+ u64 sw_exit_info_2;
+ u64 sw_scratch;
+ u8 reserved_11[56];
+ u64 xcr0;
+ u8 valid_bitmap[16];
+ u64 x87_state_gpa;
+ u8 reserved_12[1016];
+};
+
+struct __attribute__ ((__packed__)) ghcb {
+ struct vmcb_save_area save;
+
+ u8 shared_buffer[2032];
+
+ u8 reserved_1[10];
+ u16 protocol_version; /* negotiated SEV-ES/GHCB protocol version */
+ u32 ghcb_usage;
};

struct __attribute__ ((__packed__)) vmcb {
--
2.17.1

2020-04-28 15:30:21

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 08/75] x86/umip: Factor out instruction decoding

From: Joerg Roedel <[email protected]>

Factor out the code used to decode an instruction with the correct
address and operand sizes to a helper function.

No functional changes.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/insn-eval.h | 2 ++
arch/x86/kernel/umip.c | 23 +---------------
arch/x86/lib/insn-eval.c | 45 ++++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index b8b9ef1bbd06..b4ff3e3316d1 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -21,5 +21,7 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
int insn_get_code_seg_params(struct pt_regs *regs);
int insn_fetch_from_user(struct pt_regs *regs,
unsigned char buf[MAX_INSN_SIZE]);
+bool insn_decode(struct pt_regs *regs, struct insn *insn,
+ unsigned char buf[MAX_INSN_SIZE], int buf_size);

#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index c9e5345da793..47d4d32e9cad 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -324,7 +324,6 @@ bool fixup_umip_exception(struct pt_regs *regs)
unsigned long *reg_addr;
void __user *uaddr;
struct insn insn;
- int seg_defs;

if (!regs)
return false;
@@ -339,27 +338,7 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!nr_copied)
return false;

- insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
-
- /*
- * Override the default operand and address sizes with what is specified
- * in the code segment descriptor. The instruction decoder only sets
- * the address size it to either 4 or 8 address bytes and does nothing
- * for the operand bytes. This OK for most of the cases, but we could
- * have special cases where, for instance, a 16-bit code segment
- * descriptor is used.
- * If there is an address override prefix, the instruction decoder
- * correctly updates these values, even for 16-bit defaults.
- */
- seg_defs = insn_get_code_seg_params(regs);
- if (seg_defs == -EINVAL)
- return false;
-
- insn.addr_bytes = INSN_CODE_SEG_ADDR_SZ(seg_defs);
- insn.opnd_bytes = INSN_CODE_SEG_OPND_SZ(seg_defs);
-
- insn_get_length(&insn);
- if (nr_copied < insn.length)
+ if (!insn_decode(regs, &insn, buf, nr_copied))
return false;

umip_inst = identify_insn(&insn);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 0c4f7ebc261b..0bbb814d4851 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1407,3 +1407,48 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE])

return MAX_INSN_SIZE - not_copied;
}
+
+/**
+ * insn_decode() - Decode an instruction
+ * @regs: Structure with register values as seen when entering kernel mode
+ * @insn: Structure to store decoded instruction
+ * @buf: Buffer containing the instruction bytes
+ * @buf_size: Number of instruction bytes available in buf
+ *
+ * Decodes the instruction provided in buf and stores the decoding results in
+ * insn. Also determines the correct address and operand sizes.
+ *
+ * Returns:
+ *
+ * True if instruction was decoded, False otherwise.
+ */
+bool insn_decode(struct pt_regs *regs, struct insn *insn,
+ unsigned char buf[MAX_INSN_SIZE], int buf_size)
+{
+ int seg_defs;
+
+ insn_init(insn, buf, buf_size, user_64bit_mode(regs));
+
+ /*
+ * Override the default operand and address sizes with what is specified
+ * in the code segment descriptor. The instruction decoder only sets
+ * the address size it to either 4 or 8 address bytes and does nothing
+ * for the operand bytes. This OK for most of the cases, but we could
+ * have special cases where, for instance, a 16-bit code segment
+ * descriptor is used.
+ * If there is an address override prefix, the instruction decoder
+ * correctly updates these values, even for 16-bit defaults.
+ */
+ seg_defs = insn_get_code_seg_params(regs);
+ if (seg_defs == -EINVAL)
+ return false;
+
+ insn->addr_bytes = INSN_CODE_SEG_ADDR_SZ(seg_defs);
+ insn->opnd_bytes = INSN_CODE_SEG_OPND_SZ(seg_defs);
+
+ insn_get_length(insn);
+ if (buf_size < insn->length)
+ return false;
+
+ return true;
+}
--
2.17.1

2020-04-28 15:30:25

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v3 11/75] x86/boot/compressed/64: Disable red-zone usage

From: Joerg Roedel <[email protected]>

The x86-64 ABI defines a red-zone on the stack:

The 128-byte area beyond the location pointed to by %rsp is considered
to be reserved and shall not be modified by signal or interrupt
handlers. Therefore, functions may use this area for temporary data
that is not needed across function calls. In particular, leaf
functions may use this area for their entire stack frame, rather than
adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.

This is not compatible with exception handling, because the IRET frame
written by the hardware at the stack pointer and the functions to handle
the exception will overwrite the temporary variables of the interrupted
function, causing undefined behavior. So disable red-zones for the
pre-decompression boot code.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index e17be90ab312..93f1320fc7bf 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -65,7 +65,7 @@ clean-files += cpustr.h

# ---------------------------------------------------------------------------

-KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
+KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -mno-red-zone
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 5f7c262bcc99..085d5f083f50 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -30,7 +30,7 @@ KBUILD_CFLAGS := -m$(BITS) -O2
KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
cflags-$(CONFIG_X86_32) := -march=i386
-cflags-$(CONFIG_X86_64) := -mcmodel=small
+cflags-$(CONFIG_X86_64) := -mcmodel=small -mno-red-zone
KBUILD_CFLAGS += $(cflags-y)
KBUILD_CFLAGS += -mno-mmx -mno-sse
KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
--
2.17.1

2020-04-30 16:34:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 08/75] x86/umip: Factor out instruction decoding

On Tue, Apr 28, 2020 at 05:16:18PM +0200, Joerg Roedel wrote:
> +/**
> + * insn_decode() - Decode an instruction
> + * @regs: Structure with register values as seen when entering kernel mode
> + * @insn: Structure to store decoded instruction
> + * @buf: Buffer containing the instruction bytes
> + * @buf_size: Number of instruction bytes available in buf
> + *
> + * Decodes the instruction provided in buf and stores the decoding results in
> + * insn. Also determines the correct address and operand sizes.
> + *
> + * Returns:
> + *
> + * True if instruction was decoded, False otherwise.
> + */
> +bool insn_decode(struct pt_regs *regs, struct insn *insn,
> + unsigned char buf[MAX_INSN_SIZE], int buf_size)

Right, let's have @insn be the first function argument in all those
insn-handling functions.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-05-11 20:11:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 23/75] x86/boot/compressed/64: Setup GHCB Based VC Exception handler

On Tue, Apr 28, 2020 at 05:16:33PM +0200, Joerg Roedel wrote:
> @@ -63,3 +175,45 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> while (true)
> asm volatile("hlt\n");
> }
> +
> +static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
> + void *src, char *buf,
> + unsigned int data_size,
> + unsigned int count,
> + bool backwards)
> +{
> + int i, b = backwards ? -1 : 1;
> + enum es_result ret = ES_OK;
> +
> + for (i = 0; i < count; i++) {
> + void *s = src + (i * data_size * b);
> + char *d = buf + (i * data_size);

From a previous review:

Where are we checking whether that count is not exceeding @buf or
similar discrepancies?

Ditto below.

> +
> + ret = vc_read_mem(ctxt, s, d, data_size);
> + if (ret != ES_OK)
> + break;
> + }
> +
> + return ret;
> +}
> +
> +static enum es_result vc_insn_string_write(struct es_em_ctxt *ctxt,
> + void *dst, char *buf,
> + unsigned int data_size,
> + unsigned int count,
> + bool backwards)
> +{
> + int i, s = backwards ? -1 : 1;
> + enum es_result ret = ES_OK;
> +
> + for (i = 0; i < count; i++) {
> + void *d = dst + (i * data_size * s);
> + char *b = buf + (i * data_size);
> +
> + ret = vc_write_mem(ctxt, d, b, data_size);
> + if (ret != ES_OK)
> + break;
> + }
> +
> + return ret;
> +}
> --
> 2.17.1
>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-05-12 18:15:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 23/75] x86/boot/compressed/64: Setup GHCB Based VC Exception handler

On Tue, Apr 28, 2020 at 05:16:33PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> Install an exception handler for #VC exception that uses a GHCB. Also
> add the infrastructure for handling different exit-codes by decoding
> the instruction that caused the exception and error handling.
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/boot/compressed/Makefile | 3 +
> arch/x86/boot/compressed/idt_64.c | 4 +
> arch/x86/boot/compressed/idt_handlers_64.S | 3 +-
> arch/x86/boot/compressed/misc.c | 7 +
> arch/x86/boot/compressed/misc.h | 7 +
> arch/x86/boot/compressed/sev-es.c | 110 +++++++++++++++
> arch/x86/include/asm/sev-es.h | 39 ++++++
> arch/x86/include/uapi/asm/svm.h | 1 +
> arch/x86/kernel/sev-es-shared.c | 154 +++++++++++++++++++++
> 10 files changed, 328 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 1197b5596d5a..2ba5f74f186d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1523,6 +1523,7 @@ config AMD_MEM_ENCRYPT
> select DYNAMIC_PHYSICAL_MASK
> select ARCH_USE_MEMREMAP_PROT
> select ARCH_HAS_FORCE_DMA_UNENCRYPTED
> + select INSTRUCTION_DECODER
> ---help---
> Say yes to enable support for the encryption of system memory.
> This requires an AMD processor that supports Secure Memory
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index a7847a1ef63a..8372b85c9c0e 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -41,6 +41,9 @@ KBUILD_CFLAGS += -Wno-pointer-sign
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>
> +# sev-es.c inludes generated $(objtree)/arch/x86/lib/inat-tables.c

"includes"

> +CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/

Does it?

I see

#include "../../lib/inat.c"
#include "../../lib/insn.c"

only and with the above CFLAGS-line removed, it builds still.

Leftover from earlier?

> +
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> UBSAN_SANITIZE :=n
> diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
> index f8295d68b3e1..44d20c4f47c9 100644
> --- a/arch/x86/boot/compressed/idt_64.c
> +++ b/arch/x86/boot/compressed/idt_64.c
> @@ -45,5 +45,9 @@ void load_stage2_idt(void)
>
> set_idt_entry(X86_TRAP_PF, boot_page_fault);
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + set_idt_entry(X86_TRAP_VC, boot_stage2_vc);
> +#endif

if IS_ENABLED()...

...

> +static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt)
> +{
> + char buffer[MAX_INSN_SIZE];
> + enum es_result ret;
> +
> + memcpy(buffer, (unsigned char *)ctxt->regs->ip, MAX_INSN_SIZE);
> +
> + insn_init(&ctxt->insn, buffer, MAX_INSN_SIZE, 1);
> + insn_get_length(&ctxt->insn);
> +
> + ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;

Why are we checking whether the immediate? insn_get_length() sets
insn->length unconditionally while insn_get_immediate() can error out
and not set ->got... ?

> +
> + return ret;
> +}

...

> +static bool sev_es_setup_ghcb(void)
> +{
> + if (!sev_es_negotiate_protocol())
> + sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
> +
> + if (set_page_decrypted((unsigned long)&boot_ghcb_page))
> + return false;
> +
> + /* Page is now mapped decrypted, clear it */
> + memset(&boot_ghcb_page, 0, sizeof(boot_ghcb_page));
> +
> + boot_ghcb = &boot_ghcb_page;
> +
> + /* Initialize lookup tables for the instruction decoder */
> + inat_init_tables();

Yeah, that call doesn't logically belong in this function AFAICT as this
function should setup the GHCB only. You can move it to the caller.

> +
> + return true;
> +}
> +
> +void sev_es_shutdown_ghcb(void)
> +{
> + if (!boot_ghcb)
> + return;
> +
> + /*
> + * GHCB Page must be flushed from the cache and mapped encrypted again.
> + * Otherwise the running kernel will see strange cache effects when
> + * trying to use that page.
> + */
> + if (set_page_encrypted((unsigned long)&boot_ghcb_page))
> + error("Can't map GHCB page encrypted");

Is that error() call enough?

Shouldn't we BUG_ON() here or mark that page Reserved or so, so that
nothing uses it during the system lifetime and thus avoid the strange
cache effects?

...

> +static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> + struct es_em_ctxt *ctxt,
> + u64 exit_code, u64 exit_info_1,
> + u64 exit_info_2)
> +{
> + enum es_result ret;
> +
> + /* Fill in protocol and format specifiers */
> + ghcb->protocol_version = GHCB_PROTOCOL_MAX;
> + ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
> +
> + ghcb_set_sw_exit_code(ghcb, exit_code);
> + ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
> + ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
> +
> + sev_es_wr_ghcb_msr(__pa(ghcb));
> + VMGEXIT();
> +
> + if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
^^^^^^^^^^^

(1UL << 32) - 1

I guess.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-05-12 21:10:29

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v3 23/75] x86/boot/compressed/64: Setup GHCB Based VC Exception handler

On Tue, May 12, 2020 at 08:11:57PM +0200, Borislav Petkov wrote:
> > +# sev-es.c inludes generated $(objtree)/arch/x86/lib/inat-tables.c
>
> "includes"
>
> > +CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
>
> Does it?
>
> I see
>
> #include "../../lib/inat.c"
> #include "../../lib/insn.c"
>
> only and with the above CFLAGS-line removed, it builds still.
>
> Leftover from earlier?

No, this is a recent addition, otherwise this breaks out-of-tree builds
(make O=/some/path ...) because inat-tables.c (included from inat.c) is
generated during build and ends up in the $(objtree).

> > + insn_init(&ctxt->insn, buffer, MAX_INSN_SIZE, 1);
> > + insn_get_length(&ctxt->insn);
> > +
> > + ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;
>
> Why are we checking whether the immediate? insn_get_length() sets
> insn->length unconditionally while insn_get_immediate() can error out
> and not set ->got... ?

Because the immediate is the last part of the instruction which is
decoded (even if there is no immediate). The .got field is set when
either the immediate was decoded successfully or, in case the
instruction has no immediate, when the rest of the instruction was
decoded successfully. So testing immediate.got is the indicator whether
decoding was successful.

>
> > +
> > + return ret;
> > +}
>
> ...
>
> > +static bool sev_es_setup_ghcb(void)
> > +{
> > + if (!sev_es_negotiate_protocol())
> > + sev_es_terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
> > +
> > + if (set_page_decrypted((unsigned long)&boot_ghcb_page))
> > + return false;
> > +
> > + /* Page is now mapped decrypted, clear it */
> > + memset(&boot_ghcb_page, 0, sizeof(boot_ghcb_page));
> > +
> > + boot_ghcb = &boot_ghcb_page;
> > +
> > + /* Initialize lookup tables for the instruction decoder */
> > + inat_init_tables();
>
> Yeah, that call doesn't logically belong in this function AFAICT as this
> function should setup the GHCB only. You can move it to the caller.

Probably better rename the function, it also does the sev-es protocol
version negotiation and all other related setup tasks. Maybe
sev_es_setup() is a better name?

> > + if (set_page_encrypted((unsigned long)&boot_ghcb_page))
> > + error("Can't map GHCB page encrypted");
>
> Is that error() call enough?
>
> Shouldn't we BUG_ON() here or mark that page Reserved or so, so that
> nothing uses it during the system lifetime and thus avoid the strange
> cache effects?

If the above call fails its the end of the systems lifetime, because we
can't continue to boot an SEV-ES guest when we have no GHCB.

BUG_ON() and friends are also not available in the pre-decompression
boot stage.

>
> ...
>
> > +static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> > + struct es_em_ctxt *ctxt,
> > + u64 exit_code, u64 exit_info_1,
> > + u64 exit_info_2)
> > +{
> > + enum es_result ret;
> > +
> > + /* Fill in protocol and format specifiers */
> > + ghcb->protocol_version = GHCB_PROTOCOL_MAX;
> > + ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
> > +
> > + ghcb_set_sw_exit_code(ghcb, exit_code);
> > + ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
> > + ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
> > +
> > + sev_es_wr_ghcb_msr(__pa(ghcb));
> > + VMGEXIT();
> > +
> > + if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
> ^^^^^^^^^^^
>
> (1UL << 32) - 1
>
> I guess.

Or lower_32_bits(), probably. I'll change it.

Thanks,

Joerg

2020-05-13 09:02:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 23/75] x86/boot/compressed/64: Setup GHCB Based VC Exception handler

On Tue, May 12, 2020 at 11:08:12PM +0200, Joerg Roedel wrote:
> No, this is a recent addition, otherwise this breaks out-of-tree builds
> (make O=/some/path ...) because inat-tables.c (included from inat.c) is
> generated during build and ends up in the $(objtree).

Please add a blurb about this above it otherwise no one would have a
clue why it is there.

> Because the immediate is the last part of the instruction which is
> decoded (even if there is no immediate). The .got field is set when
> either the immediate was decoded successfully or, in case the
> instruction has no immediate, when the rest of the instruction was
> decoded successfully. So testing immediate.got is the indicator whether
> decoding was successful.

Hm, whether the immediate was parsed correctly or it wasn't there and
using that as the sign whether the instruction was decoded successfully
sounds kinda arbitrary.

@Masami: shouldn't that insn_get_length() thing return a value to denote
whether the decode was successful or not instead of testing arbitrary
fields?

Pasting for you the code sequence again:

...
insn_get_length(&ctxt->insn);

ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;
...

I wonder if one would be able to do instead:

if (insn_get_length(&ctxt->insn))
return ES_OK;

return ES_DECODE_FAILED;

to have it straight-forward...

> Probably better rename the function, it also does the sev-es protocol
> version negotiation and all other related setup tasks. Maybe
> sev_es_setup() is a better name?

Sure.

> If the above call fails its the end of the systems lifetime, because we
> can't continue to boot an SEV-ES guest when we have no GHCB.
>
> BUG_ON() and friends are also not available in the pre-decompression
> boot stage.

Oh ok, error() does hlt the system.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-03 10:11:30

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v3 23/75] x86/boot/compressed/64: Setup GHCB Based VC Exception handler

On Mon, May 11, 2020 at 10:07:09PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2020 at 05:16:33PM +0200, Joerg Roedel wrote:
> > @@ -63,3 +175,45 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> > while (true)
> > asm volatile("hlt\n");
> > }
> > +
> > +static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
> > + void *src, char *buf,
> > + unsigned int data_size,
> > + unsigned int count,
> > + bool backwards)
> > +{
> > + int i, b = backwards ? -1 : 1;
> > + enum es_result ret = ES_OK;
> > +
> > + for (i = 0; i < count; i++) {
> > + void *s = src + (i * data_size * b);
> > + char *d = buf + (i * data_size);
>
> >From a previous review:
>
> Where are we checking whether that count is not exceeding @buf or
> similar discrepancies?

These two functions are only called from vc_handle_ioio() and buf always
points to ghcb->shared_buffer.

In general, the caller has to make sure that sizeof(*buf) is at least
data_size*count, and handle_ioio() calculates count based on the size of
*buf.


Joerg