2023-05-15 17:29:58

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Tianyu Lan <[email protected]>

This patchset is to add AMD sev-snp enlightened guest
support on hyperv. Hyperv uses Linux direct boot mode
to boot up Linux kernel and so it needs to pvalidate
system memory by itself.

In hyperv case, there is no boot loader and so cc blob
is prepared by hypervisor. In this series, hypervisor
set the cc blob address directly into boot parameter
of Linux kernel.

Shared memory between guests and hypervisor should be
decrypted and zero memory after decrypt memory. The data
in the target address. It maybe smearedto avoid smearing
data.

Introduce #HV exception support in AMD sev snp code and
#HV handler.

Change since v6:
- Merge Ashish Kalr patch https://github.com/
ashkalra/linux/commit/6975484094b7cb8d703c45066780dd85043cd040
- Merge patch "x86/sev: Fix interrupt exit code paths from
#HV exception" with patch "x86/sev: Add AMD sev-snp enlightened guest
support on hyperv".
- Fix getting processor num in the hv_snp_get_smp_config() when ealry is false.

Change since v4:
- Use pgcount to free intput arg page.
- Fix encrypt and free page order.
- struct_size to calculate array size
- Share asm code between #HV and #VC exception.

Change since v3:
- Replace struct sev_es_save_area with struct vmcb_save_area
- Move smp, cpu and memory enumerating code from mshyperv.c to ivm.c
- Handle nested entry case of do_exc_hv() case.
- Check NMI event when irq is disabled

Change since v2:
- Remove validate kernel memory code at boot stage
- Split #HV page patch into two parts
- Remove HV-APIC change due to enable x2apic from
host side
- Rework vmbus code to handle error of decrypt page
- Spilt memory and cpu initialization patch.
Change since v1:
- Remove boot param changes for cc blob address and
use setup head to pass cc blob info
- Remove unnessary WARN and BUG check
- Add system vector table map in the #HV exception
- Fix interrupt exit issue when use #HV exception

Ashish Kalra (1):
x86/sev: optimize system vector processing invoked from #HV exception

Tianyu Lan (13):
x86/sev: Add a #HV exception handler
x86/sev: Add Check of #HV event in path
x86/sev: Add AMD sev-snp enlightened guest support on hyperv
x86/hyperv: Add sev-snp enlightened guest static key
x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP
enlightened guest
x86/hyperv: Set Virtual Trust Level in VMBus init message
x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
enlightened guest
clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp
enlightened guest
hv: vmbus: Mask VMBus pages unencrypted for sev-snp enlightened guest
drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
enlightened guest
x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
x86/hyperv: Add smp support for sev-snp guest
x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES

arch/x86/entry/entry_64.S | 46 ++-
arch/x86/hyperv/hv_init.c | 42 +++
arch/x86/hyperv/ivm.c | 196 +++++++++++++
arch/x86/include/asm/cpu_entry_area.h | 6 +
arch/x86/include/asm/hyperv-tlfs.h | 7 +
arch/x86/include/asm/idtentry.h | 52 +++-
arch/x86/include/asm/irqflags.h | 14 +-
arch/x86/include/asm/mem_encrypt.h | 2 +
arch/x86/include/asm/mshyperv.h | 74 ++++-
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/include/uapi/asm/svm.h | 4 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/mshyperv.c | 42 ++-
arch/x86/kernel/dumpstack_64.c | 9 +-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 404 +++++++++++++++++++++++---
arch/x86/kernel/traps.c | 60 ++++
arch/x86/kernel/vmlinux.lds.S | 7 +
arch/x86/mm/cpu_entry_area.c | 2 +
drivers/clocksource/hyperv_timer.c | 2 +-
drivers/hv/connection.c | 1 +
drivers/hv/hv.c | 37 ++-
drivers/hv/hv_common.c | 27 +-
include/asm-generic/hyperv-tlfs.h | 3 +-
include/asm-generic/mshyperv.h | 1 +
include/linux/hyperv.h | 4 +-
28 files changed, 960 insertions(+), 87 deletions(-)

--
2.25.1



2023-05-15 17:31:22

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

From: Tianyu Lan <[email protected]>

Add a #HV exception handler that uses IST stack.

Co-developed-by: Kalra Ashish <[email protected]>
Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V5:
* Merge Ashish Kalr patch https://github.com/
ashkalra/linux/commit/6975484094b7cb8d703c45066780dd85043cd040

Change since RFC V2:
* Remove unnecessary line in the change log.
---
arch/x86/entry/entry_64.S | 22 ++++++----
arch/x86/include/asm/cpu_entry_area.h | 6 +++
arch/x86/include/asm/idtentry.h | 40 +++++++++++++++++-
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/dumpstack_64.c | 9 ++++-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 53 ++++++++++++++++++++++++
arch/x86/kernel/traps.c | 58 +++++++++++++++++++++++++++
arch/x86/mm/cpu_entry_area.c | 2 +
12 files changed, 183 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index eccc3431e515..653b1f10699b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -496,7 +496,7 @@ SYM_CODE_END(\asmsym)

#ifdef CONFIG_AMD_MEM_ENCRYPT
/**
- * idtentry_vc - Macro to generate entry stub for #VC
+ * idtentry_sev - Macro to generate entry stub for #VC
* @vector: Vector number
* @asmsym: ASM symbol for the entry point
* @cfunc: C function to be called
@@ -515,14 +515,18 @@ SYM_CODE_END(\asmsym)
*
* The macro is only used for one vector, but it is planned to be extended in
* the future for the #HV exception.
- */
-.macro idtentry_vc vector asmsym cfunc
+*/
+.macro idtentry_sev vector asmsym cfunc has_error_code:req
SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS
ENDBR
ASM_CLAC
cld

+ .if \vector == X86_TRAP_HV
+ pushq $-1 /* ORIG_RAX: no syscall */
+ .endif
+
/*
* If the entry is from userspace, switch stacks and treat it as
* a normal entry.
@@ -545,7 +549,12 @@ SYM_CODE_START(\asmsym)
* stack.
*/
movq %rsp, %rdi /* pt_regs pointer */
- call vc_switch_off_ist
+ .if \vector == X86_TRAP_VC
+ call vc_switch_off_ist
+ .else
+ call hv_switch_off_ist
+ .endif
+
movq %rax, %rsp /* Switch to new stack */

ENCODE_FRAME_POINTER
@@ -568,10 +577,7 @@ SYM_CODE_START(\asmsym)

/* Switch to the regular task stack */
.Lfrom_usermode_switch_stack_\@:
- idtentry_body user_\cfunc, has_error_code=1
-
-_ASM_NOKPROBE(\asmsym)
-SYM_CODE_END(\asmsym)
+ idtentry_body user_\cfunc, \has_error_code
.endm
#endif

diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 462fc34f1317..2186ed601b4a 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -30,6 +30,10 @@
char VC_stack[optional_stack_size]; \
char VC2_stack_guard[guardsize]; \
char VC2_stack[optional_stack_size]; \
+ char HV_stack_guard[guardsize]; \
+ char HV_stack[optional_stack_size]; \
+ char HV2_stack_guard[guardsize]; \
+ char HV2_stack[optional_stack_size]; \
char IST_top_guard[guardsize]; \

/* The exception stacks' physical storage. No guard pages required */
@@ -52,6 +56,8 @@ enum exception_stack_ordering {
ESTACK_MCE,
ESTACK_VC,
ESTACK_VC2,
+ ESTACK_HV,
+ ESTACK_HV2,
N_EXCEPTION_STACKS
};

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b241af4ce9b4..b0f3501b2767 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -317,6 +317,19 @@ static __always_inline void __##func(struct pt_regs *regs)
__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code); \
__visible noinstr void user_##func(struct pt_regs *regs, unsigned long error_code)

+
+/**
+ * DECLARE_IDTENTRY_HV - Declare functions for the HV entry point
+ * @vector: Vector number (ignored for C)
+ * @func: Function name of the entry point
+ *
+ * Maps to DECLARE_IDTENTRY_RAW, but declares also the user C handler.
+ */
+#define DECLARE_IDTENTRY_HV(vector, func) \
+ DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func); \
+ __visible noinstr void kernel_##func(struct pt_regs *regs); \
+ __visible noinstr void user_##func(struct pt_regs *regs)
+
/**
* DEFINE_IDTENTRY_IST - Emit code for IST entry points
* @func: Function name of the entry point
@@ -376,6 +389,26 @@ static __always_inline void __##func(struct pt_regs *regs)
#define DEFINE_IDTENTRY_VC_USER(func) \
DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func)

+/**
+ * DEFINE_IDTENTRY_HV_KERNEL - Emit code for HV injection handler
+ * when raised from kernel mode
+ * @func: Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_KERNEL(func) \
+ DEFINE_IDTENTRY_RAW(kernel_##func)
+
+/**
+ * DEFINE_IDTENTRY_HV_USER - Emit code for HV injection handler
+ * when raised from user mode
+ * @func: Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_USER(func) \
+ DEFINE_IDTENTRY_RAW(user_##func)
+
#else /* CONFIG_X86_64 */

/**
@@ -463,8 +496,10 @@ __visible noinstr void func(struct pt_regs *regs, \
DECLARE_IDTENTRY(vector, func)

# define DECLARE_IDTENTRY_VC(vector, func) \
- idtentry_vc vector asm_##func func
+ idtentry_sev vector asm_##func func has_error_code=1

+# define DECLARE_IDTENTRY_HV(vector, func) \
+ idtentry_sev vector asm_##func func has_error_code=0
#else
# define DECLARE_IDTENTRY_MCE(vector, func) \
DECLARE_IDTENTRY(vector, func)
@@ -618,9 +653,10 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
#endif

-/* #VC */
+/* #VC & #HV */
#ifdef CONFIG_AMD_MEM_ENCRYPT
DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication);
+DECLARE_IDTENTRY_HV(X86_TRAP_HV, exc_hv_injection);
#endif

#ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index e9e2c3ba5923..0bd7dab676c5 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -29,6 +29,7 @@
#define IST_INDEX_DB 2
#define IST_INDEX_MCE 3
#define IST_INDEX_VC 4
+#define IST_INDEX_HV 5

/*
* Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h
index f5d2325aa0b7..c6583631cecb 100644
--- a/arch/x86/include/asm/trapnr.h
+++ b/arch/x86/include/asm/trapnr.h
@@ -26,6 +26,7 @@
#define X86_TRAP_XF 19 /* SIMD Floating-Point Exception */
#define X86_TRAP_VE 20 /* Virtualization Exception */
#define X86_TRAP_CP 21 /* Control Protection Exception */
+#define X86_TRAP_HV 28 /* HV injected exception in SNP restricted mode */
#define X86_TRAP_VC 29 /* VMM Communication Exception */
#define X86_TRAP_IRET 32 /* IRET Exception */

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..6795d3e517d6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,6 +16,7 @@ asmlinkage __visible notrace
struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *eregs);
#endif

extern bool ibt_selftest(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..5bc44bcf6e48 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2172,6 +2172,7 @@ static inline void tss_setup_ist(struct tss_struct *tss)
tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
/* Only mapped when SEV-ES is active */
tss->x86_tss.ist[IST_INDEX_VC] = __this_cpu_ist_top_va(VC);
+ tss->x86_tss.ist[IST_INDEX_HV] = __this_cpu_ist_top_va(HV);
}

#else /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f05339fee778..6d8f8864810c 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -26,11 +26,14 @@ static const char * const exception_stack_names[] = {
[ ESTACK_MCE ] = "#MC",
[ ESTACK_VC ] = "#VC",
[ ESTACK_VC2 ] = "#VC2",
+ [ ESTACK_HV ] = "#HV",
+ [ ESTACK_HV2 ] = "#HV2",
+
};

const char *stack_type_name(enum stack_type type)
{
- BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+ BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);

if (type == STACK_TYPE_TASK)
return "TASK";
@@ -89,6 +92,8 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
EPAGERANGE(MCE),
EPAGERANGE(VC),
EPAGERANGE(VC2),
+ EPAGERANGE(HV),
+ EPAGERANGE(HV2),
};

static __always_inline bool in_exception_stack(unsigned long *stack, struct stack_info *info)
@@ -98,7 +103,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac
struct pt_regs *regs;
unsigned int k;

- BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+ BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);

begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
/*
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a58c6bc1cd68..48c0a7e1dbcb 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -113,6 +113,7 @@ static const __initconst struct idt_data def_idts[] = {

#ifdef CONFIG_AMD_MEM_ENCRYPT
ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
+ ISTG(X86_TRAP_HV, asm_exc_hv_injection, IST_INDEX_HV),
#endif

SYSG(X86_TRAP_OF, asm_exc_overflow),
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b031244d6d2d..e25445de0957 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2006,6 +2006,59 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
irqentry_exit_to_user_mode(regs);
}

+static bool hv_raw_handle_exception(struct pt_regs *regs)
+{
+ return false;
+}
+
+static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
+{
+ unsigned long sp = (unsigned long)regs;
+
+ return (sp >= __this_cpu_ist_bottom_va(HV2) && sp < __this_cpu_ist_top_va(HV2));
+}
+
+DEFINE_IDTENTRY_HV_USER(exc_hv_injection)
+{
+ irqentry_enter_from_user_mode(regs);
+ instrumentation_begin();
+
+ if (!hv_raw_handle_exception(regs)) {
+ /*
+ * Do not kill the machine if user-space triggered the
+ * exception. Send SIGBUS instead and let user-space deal
+ * with it.
+ */
+ force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
+ }
+
+ instrumentation_end();
+ irqentry_exit_to_user_mode(regs);
+}
+
+DEFINE_IDTENTRY_HV_KERNEL(exc_hv_injection)
+{
+ irqentry_state_t irq_state;
+
+ irq_state = irqentry_enter(regs);
+ instrumentation_begin();
+
+ if (!hv_raw_handle_exception(regs)) {
+ pr_emerg("PANIC: Unhandled #HV exception in kernel space\n");
+
+ /* Show some debug info */
+ show_regs(regs);
+
+ /* Ask hypervisor to sev_es_terminate */
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
+ panic("Returned from Terminate-Request to Hypervisor\n");
+ }
+
+ instrumentation_end();
+ irqentry_exit(regs, irq_state);
+}
+
bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
{
unsigned long exit_code = regs->orig_ax;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d317dc3d06a3..5dca05d0fa38 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -905,6 +905,64 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r

return regs_ret;
}
+
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *regs)
+{
+ unsigned long sp, *stack;
+ struct stack_info info;
+ struct pt_regs *regs_ret;
+
+ /*
+ * In the SYSCALL entry path the RSP value comes from user-space - don't
+ * trust it and switch to the current kernel stack
+ */
+ if (ip_within_syscall_gap(regs)) {
+ sp = this_cpu_read(pcpu_hot.top_of_stack);
+ goto sync;
+ }
+
+ /*
+ * From here on the RSP value is trusted. Now check whether entry
+ * happened from a safe stack. Not safe are the entry or unknown stacks,
+ * use the fall-back stack instead in this case.
+ */
+ sp = regs->sp;
+ stack = (unsigned long *)sp;
+
+ /*
+ * We support nested #HV exceptions once the IST stack is
+ * switched out. The HV can always inject an #HV, but as per
+ * GHCB specs, the HV will not inject another #HV, if
+ * PendingEvent.NoFurtherSignal is set and we only clear this
+ * after switching out the IST stack and handling the current
+ * #HV. But there is still a window before the IST stack is
+ * switched out, where a malicious HV can inject nested #HV.
+ * The code below checks the interrupted stack to check if
+ * it is the IST stack, and if so panic as this is
+ * not supported and this nested #HV would have corrupted
+ * the iret frame of the previous #HV on the IST stack.
+ */
+ if (get_stack_info_noinstr(stack, current, &info) &&
+ (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) ||
+ info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2)))
+ panic("Nested #HV exception, HV IST corrupted, stack type = %d\n", info.type);
+
+ if (!get_stack_info_noinstr(stack, current, &info) || info.type == STACK_TYPE_ENTRY ||
+ info.type > STACK_TYPE_EXCEPTION_LAST)
+ sp = __this_cpu_ist_top_va(HV2);
+sync:
+ /*
+ * Found a safe stack - switch to it as if the entry didn't happen via
+ * IST stack. The code below only copies pt_regs, the real switch happens
+ * in assembly code.
+ */
+ sp = ALIGN_DOWN(sp, 8) - sizeof(*regs_ret);
+
+ regs_ret = (struct pt_regs *)sp;
+ *regs_ret = *regs;
+
+ return regs_ret;
+}
#endif

asmlinkage __visible noinstr struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index e91500a80963..97554fa0ff30 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -160,6 +160,8 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
cea_map_stack(VC);
cea_map_stack(VC2);
+ cea_map_stack(HV);
+ cea_map_stack(HV2);
}
}
}
--
2.25.1


2023-05-15 17:31:56

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 09/14] clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Hyper-V tsc page is shared with hypervisor and it should
be decrypted in sev-snp enlightened guest when it's used.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Change the Subject line prefix
---
drivers/clocksource/hyperv_timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index bcd9042a0c9f..66e29a19770b 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -376,7 +376,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
static union {
struct ms_hyperv_tsc_page page;
u8 reserved[PAGE_SIZE];
-} tsc_pg __aligned(PAGE_SIZE);
+} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);

static struct ms_hyperv_tsc_page *tsc_page = &tsc_pg.page;
static unsigned long tsc_pfn;
--
2.25.1


2023-05-15 17:32:29

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]>

The wakeup_secondary_cpu callback was populated with wakeup_
cpu_via_vmgexit() which doesn't work for Hyper-V and Hyper-V
requires to call Hyper-V specific hvcall to start APs. So override
it with Hyper-V specific hook to start AP sev_es_save_area data
structure.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC v5:
* Remove some redundant structure definitions

Change sicne RFC v3:
* Replace struct sev_es_save_area with struct
vmcb_save_area
* Move code from mshyperv.c to ivm.c

Change since RFC v2:
* Add helper function to initialize segment
* Fix some coding style
---
arch/x86/hyperv/ivm.c | 98 +++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 10 ++++
arch/x86/kernel/cpu/mshyperv.c | 13 +++-
include/asm-generic/hyperv-tlfs.h | 3 +-
4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 85e4378f052f..b7b8e1ba8223 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -22,11 +22,15 @@
#include <asm/sev.h>
#include <asm/realmode.h>
#include <asm/e820/api.h>
+#include <asm/desc.h>

#ifdef CONFIG_AMD_MEM_ENCRYPT

#define GHCB_USAGE_HYPERV_CALL 1

+static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
+
union hv_ghcb {
struct ghcb ghcb;
struct {
@@ -443,6 +447,100 @@ __init void hv_sev_init_mem_and_cpu(void)
}
}

+#define hv_populate_vmcb_seg(seg, gdtr_base) \
+do { \
+ if (seg.selector) { \
+ seg.base = 0; \
+ seg.limit = HV_AP_SEGMENT_LIMIT; \
+ seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
+ seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
+ } \
+} while (0) \
+
+int hv_snp_boot_ap(int cpu, unsigned long start_ip)
+{
+ struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
+ __get_free_page(GFP_KERNEL | __GFP_ZERO);
+ struct desc_ptr gdtr;
+ u64 ret, rmp_adjust, retry = 5;
+ struct hv_enable_vp_vtl *start_vp_input;
+ unsigned long flags;
+
+ native_store_gdt(&gdtr);
+
+ vmsa->gdtr.base = gdtr.address;
+ vmsa->gdtr.limit = gdtr.size;
+
+ asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
+ hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
+
+ asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
+ hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
+
+ asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
+ hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
+
+ asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
+ hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
+
+ vmsa->efer = native_read_msr(MSR_EFER);
+
+ asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
+ asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
+ asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
+
+ vmsa->xcr0 = 1;
+ vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
+ vmsa->rip = (u64)secondary_startup_64_no_verify;
+ vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
+
+ /*
+ * Set the SNP-specific fields for this VMSA:
+ * VMPL level
+ * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+ */;
+ vmsa->vmpl = 0;
+ vmsa->sev_features = sev_status >> 2;
+
+ /*
+ * Running at VMPL0 allows the kernel to change the VMSA bit for a page
+ * using the RMPADJUST instruction. However, for the instruction to
+ * succeed it must target the permissions of a lesser privileged
+ * (higher numbered) VMPL level, so use VMPL1 (refer to the RMPADJUST
+ * instruction in the AMD64 APM Volume 3).
+ */
+ rmp_adjust = RMPADJUST_VMSA_PAGE_BIT | 1;
+ ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
+ rmp_adjust);
+ if (ret != 0) {
+ pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
+ return ret;
+ }
+
+ local_irq_save(flags);
+ start_vp_input =
+ (struct hv_enable_vp_vtl *)ap_start_input_arg;
+ memset(start_vp_input, 0, sizeof(*start_vp_input));
+ start_vp_input->partition_id = -1;
+ start_vp_input->vp_index = cpu;
+ start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
+ *(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
+
+ do {
+ ret = hv_do_hypercall(HVCALL_START_VP,
+ start_vp_input, NULL);
+ } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
+
+ if (!hv_result_success(ret)) {
+ pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
+ goto done;
+ }
+
+done:
+ local_irq_restore(flags);
+ return ret;
+}
+
void __init hv_vtom_init(void)
{
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 84e024ffacd5..9ad2a0f21d68 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -65,6 +65,13 @@ struct memory_map_entry {
u32 reserved;
};

+/*
+ * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
+ * to start AP in enlightened SEV guest.
+ */
+#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT 0xffffffff
+
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -263,6 +270,7 @@ struct irq_domain *hv_create_pci_msi_domain(void);
int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);

#ifdef CONFIG_AMD_MEM_ENCRYPT
void hv_ghcb_msr_write(u64 msr, u64 value);
@@ -271,6 +279,7 @@ bool hv_ghcb_negotiate_protocol(void);
void hv_ghcb_terminate(unsigned int set, unsigned int reason);
void hv_vtom_init(void);
void hv_sev_init_mem_and_cpu(void);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);
#else
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
@@ -278,6 +287,7 @@ static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
static inline void hv_vtom_init(void) {}
static inline void hv_sev_init_mem_and_cpu(void) {}
+static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
#endif

extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index dea9b881180b..0c5f9f7bd7ba 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -295,6 +295,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)

native_smp_prepare_cpus(max_cpus);

+ /*
+ * Override wakeup_secondary_cpu_64 callback for SEV-SNP
+ * enlightened guest.
+ */
+ if (hv_isolation_type_en_snp())
+ apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
+
+ if (!hv_root_partition)
+ return;
+
#ifdef CONFIG_X86_64
for_each_present_cpu(i) {
if (i == 0)
@@ -502,8 +512,7 @@ static void __init ms_hyperv_init_platform(void)

# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
- if (hv_root_partition)
- smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+ smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
# endif

/*
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index f4e4cc4f965f..92dcc530350c 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -146,9 +146,9 @@ union hv_reference_tsc_msr {
/* Declare the various hypercall operations. */
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
-#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
+#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
@@ -223,6 +223,7 @@ enum HV_GENERIC_SET_FORMAT {
#define HV_STATUS_INVALID_PORT_ID 17
#define HV_STATUS_INVALID_CONNECTION_ID 18
#define HV_STATUS_INSUFFICIENT_BUFFERS 19
+#define HV_STATUS_TIME_OUT 120
#define HV_STATUS_VTL_ALREADY_ENABLED 134

/*
--
2.25.1


2023-05-15 17:33:03

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 14/14] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES

From: Tianyu Lan <[email protected]>

Add Hyperv-specific handling for faults caused by VMMCALL
instructions.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 0c5f9f7bd7ba..3469b369e627 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,7 @@
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
+#include <asm/svm.h>

/* Is Linux running as the root partition? */
bool hv_root_partition;
@@ -577,6 +578,20 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
}

+static void hv_sev_es_hcall_prepare(struct ghcb *ghcb, struct pt_regs *regs)
+{
+ /* RAX and CPL are already in the GHCB */
+ ghcb_set_rcx(ghcb, regs->cx);
+ ghcb_set_rdx(ghcb, regs->dx);
+ ghcb_set_r8(ghcb, regs->r8);
+}
+
+static bool hv_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+ /* No checking of the return state needed */
+ return true;
+}
+
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.name = "Microsoft Hyper-V",
.detect = ms_hyperv_platform,
@@ -584,4 +599,6 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.init.x2apic_available = ms_hyperv_x2apic_available,
.init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
+ .runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare,
+ .runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish,
};
--
2.25.1


2023-05-15 17:34:14

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 07/14] x86/hyperv: Set Virtual Trust Level in VMBus init message

From: Tianyu Lan <[email protected]>

sev-snp guest provides vtl(Virtual Trust Level) and
get it from hyperv hvcall via HVCALL_GET_VP_REGISTERS.
Set target vtl in the VMBus init message.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC v4:
* Use struct_size to calculate array size.
* Fix some coding style

Change since RFC v3:
* Use the standard helper functions to check hypercall result
* Fix coding style

Change since RFC v2:
* Rename get_current_vtl() to get_vtl()
* Fix some coding style issues
---
arch/x86/hyperv/hv_init.c | 36 ++++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 7 ++++++
drivers/hv/connection.c | 1 +
include/asm-generic/mshyperv.h | 1 +
include/linux/hyperv.h | 4 ++--
5 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 9f3e2d71d015..331b855314b7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -384,6 +384,40 @@ static void __init hv_get_partition_id(void)
local_irq_restore(flags);
}

+static u8 __init get_vtl(void)
+{
+ u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
+ struct hv_get_vp_registers_input *input;
+ struct hv_get_vp_registers_output *output;
+ u64 vtl = 0;
+ u64 ret;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = (struct hv_get_vp_registers_output *)input;
+ if (!input) {
+ local_irq_restore(flags);
+ goto done;
+ }
+
+ memset(input, 0, struct_size(input, element, 1));
+ input->header.partitionid = HV_PARTITION_ID_SELF;
+ input->header.vpindex = HV_VP_INDEX_SELF;
+ input->header.inputvtl = 0;
+ input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
+
+ ret = hv_do_hypercall(control, input, output);
+ if (hv_result_success(ret))
+ vtl = output->as64.low & HV_X64_VTL_MASK;
+ else
+ pr_err("Hyper-V: failed to get VTL! %lld", ret);
+ local_irq_restore(flags);
+
+done:
+ return vtl;
+}
+
/*
* This function is to be invoked early in the boot sequence after the
* hypervisor has been detected.
@@ -512,6 +546,8 @@ void __init hyperv_init(void)
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);

+ /* Find the VTL */
+ ms_hyperv.vtl = get_vtl();
return;

clean_guest_os_id:
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index cea95dcd27c2..4bf0b315b0ce 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -301,6 +301,13 @@ enum hv_isolation_type {
#define HV_X64_MSR_TIME_REF_COUNT HV_REGISTER_TIME_REF_COUNT
#define HV_X64_MSR_REFERENCE_TSC HV_REGISTER_REFERENCE_TSC

+/*
+ * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
+ * there is not associated MSR address.
+ */
+#define HV_X64_REGISTER_VSM_VP_STATUS 0x000D0003
+#define HV_X64_VTL_MASK GENMASK(3, 0)
+
/* Hyper-V memory host visibility */
enum hv_mem_host_visibility {
VMBUS_PAGE_NOT_VISIBLE = 0,
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5978e9dbc286..02b54f85dc60 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
*/
if (version >= VERSION_WIN10_V5) {
msg->msg_sint = VMBUS_MESSAGE_SINT;
+ msg->msg_vtl = ms_hyperv.vtl;
vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID_4;
} else {
msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 402a8c1c202d..3052130ba4ef 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -48,6 +48,7 @@ struct ms_hyperv_info {
};
};
u64 shared_gpa_boundary;
+ u8 vtl;
};
extern struct ms_hyperv_info ms_hyperv;
extern bool hv_nested;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index bfbc37ce223b..1f2bfec4abde 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
u64 interrupt_page;
struct {
u8 msg_sint;
- u8 padding1[3];
- u32 padding2;
+ u8 msg_vtl;
+ u8 reserved[6];
};
};
u64 monitor_page1;
--
2.25.1


2023-05-15 17:34:38

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 10/14] hv: vmbus: Mask VMBus pages unencrypted for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

VMBus post msg, synic event and message pages is necessary to
shared with hypervisor and so mask these pages unencrypted in
the sev-snp guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change sicne RFC V4:
* Fix encrypt and free page order.

Change since RFC V3:
* Set encrypt page back in the hv_synic_free()

Change since RFC V2:
* Fix error in the error code path and encrypt
pages correctly when decryption failure happens.
---
drivers/hv/hv.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index de6708dbe0df..d29bbf0c7108 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -20,6 +20,7 @@
#include <linux/interrupt.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
+#include <linux/set_memory.h>
#include "hyperv_vmbus.h"

/* The one and only */
@@ -78,7 +79,7 @@ int hv_post_message(union hv_connection_id connection_id,

int hv_synic_alloc(void)
{
- int cpu;
+ int cpu, ret;
struct hv_per_cpu_context *hv_cpu;

/*
@@ -123,9 +124,29 @@ int hv_synic_alloc(void)
goto err;
}
}
+
+ if (hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ if (ret)
+ goto err;
+
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ if (ret)
+ goto err_decrypt_event_page;
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ }
}

return 0;
+
+err_decrypt_event_page:
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+
err:
/*
* Any memory allocations that succeeded will be freed when
@@ -143,8 +164,18 @@ void hv_synic_free(void)
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

- free_page((unsigned long)hv_cpu->synic_event_page);
- free_page((unsigned long)hv_cpu->synic_message_page);
+ if (hv_isolation_type_en_snp()) {
+ if (!set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1))
+ free_page((unsigned long)hv_cpu->synic_event_page);
+
+ if (!set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1))
+ free_page((unsigned long)hv_cpu->synic_message_page);
+ } else {
+ free_page((unsigned long)hv_cpu->synic_event_page);
+ free_page((unsigned long)hv_cpu->synic_message_page);
+ }
}

kfree(hv_context.hv_numa_map);
--
2.25.1


2023-05-15 17:35:32

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 11/14] drivers: hv: Decrypt percpu hvcall input arg page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Hypervisor needs to access iput arg page and guest should decrypt
the page.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V4:
* Use pgcount to free intput arg page

Change since RFC V3:
* Use pgcount to decrypt memory.

Change since RFC V2:
* Set inputarg to be zero after kfree()
* Not free mem when fail to encrypt mem in the hv_common_cpu_die().
---
drivers/hv/hv_common.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 179bc5f5bf52..15d3054f3440 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -24,6 +24,7 @@
#include <linux/kmsg_dump.h>
#include <linux/slab.h>
#include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>

@@ -359,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu)
u64 msr_vp_index;
gfp_t flags;
int pgcount = hv_root_partition ? 2 : 1;
+ int ret;

/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -368,6 +370,17 @@ int hv_common_cpu_init(unsigned int cpu)
if (!(*inputarg))
return -ENOMEM;

+ if (hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+ if (ret) {
+ kfree(*inputarg);
+ *inputarg = NULL;
+ return ret;
+ }
+
+ memset(*inputarg, 0x00, pgcount * PAGE_SIZE);
+ }
+
if (hv_root_partition) {
outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -387,6 +400,7 @@ int hv_common_cpu_die(unsigned int cpu)
{
unsigned long flags;
void **inputarg, **outputarg;
+ int pgcount = hv_root_partition ? 2 : 1;
void *mem;

local_irq_save(flags);
@@ -402,7 +416,12 @@ int hv_common_cpu_die(unsigned int cpu)

local_irq_restore(flags);

- kfree(mem);
+ if (hv_isolation_type_en_snp()) {
+ if (!set_memory_encrypted((unsigned long)mem, pgcount))
+ kfree(mem);
+ } else {
+ kfree(mem);
+ }

return 0;
}
--
2.25.1


2023-05-15 17:35:46

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 06/14] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest

From: Tianyu Lan <[email protected]>

hv vp assist page needs to be shared between SEV-SNP guest and Hyper-V.
So mark the page unencrypted in the SEV-SNP guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e1..9f3e2d71d015 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,7 @@
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/idtentry.h>
+#include <asm/set_memory.h>
#include <linux/kexec.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
@@ -113,6 +114,11 @@ static int hv_cpu_init(unsigned int cpu)

}
if (!WARN_ON(!(*hvp))) {
+ if (hv_isolation_type_en_snp()) {
+ WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
+ memset(*hvp, 0, PAGE_SIZE);
+ }
+
msr.enable = 1;
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
}
--
2.25.1


2023-05-15 17:36:07

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V6 12/14] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Read processor amd memory info from specific address which are
populated by Hyper-V. Initialize smp cpu related ops, pvalidate
system memory and add it into e820 table.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFCv5:
* Fix getting processor num in the
hv_snp_get_smp_config() when ealry is false.

Change since RFCv4:
* Add mem info addr to get mem layout info
---
arch/x86/hyperv/ivm.c | 87 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 17 +++++++
arch/x86/kernel/cpu/mshyperv.c | 3 ++
3 files changed, 107 insertions(+)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 368b2731950e..85e4378f052f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -17,6 +17,11 @@
#include <asm/mem_encrypt.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
+#include <asm/coco.h>
+#include <asm/io_apic.h>
+#include <asm/sev.h>
+#include <asm/realmode.h>
+#include <asm/e820/api.h>

#ifdef CONFIG_AMD_MEM_ENCRYPT

@@ -57,6 +62,8 @@ union hv_ghcb {

static u16 hv_ghcb_version __ro_after_init;

+static u32 processor_count;
+
u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
{
union hv_ghcb *hv_ghcb;
@@ -356,6 +363,86 @@ static bool hv_is_private_mmio(u64 addr)
return false;
}

+static __init void hv_snp_get_smp_config(unsigned int early)
+{
+ if (early)
+ return;
+
+ /*
+ * There is no firmware and ACPI MADT table spport in
+ * in the Hyper-V SEV-SNP enlightened guest. Set smp
+ * related config variable.
+ */
+ while (num_processors < processor_count) {
+ early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
+ early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
+ physid_set(num_processors, phys_cpu_present_map);
+ set_cpu_possible(num_processors, true);
+ set_cpu_present(num_processors, true);
+ num_processors++;
+ }
+}
+
+__init void hv_sev_init_mem_and_cpu(void)
+{
+ struct memory_map_entry *entry;
+ struct e820_entry *e820_entry;
+ u64 e820_end;
+ u64 ram_end;
+ u64 page;
+
+ /*
+ * Hyper-V enlightened snp guest boots kernel
+ * directly without bootloader and so roms,
+ * bios regions and reserve resources are not
+ * available. Set these callback to NULL.
+ */
+ x86_platform.legacy.rtc = 0;
+ x86_platform.legacy.reserve_bios_regions = 0;
+ x86_platform.set_wallclock = set_rtc_noop;
+ x86_platform.get_wallclock = get_rtc_noop;
+ x86_init.resources.probe_roms = x86_init_noop;
+ x86_init.resources.reserve_resources = x86_init_noop;
+ x86_init.mpparse.find_smp_config = x86_init_noop;
+ x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
+
+ /*
+ * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
+ * and legacy APIC page read/write. Switch to hv apic here.
+ */
+ disable_ioapic_support();
+
+ /* Get processor and mem info. */
+ processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
+ entry = (struct memory_map_entry *)__va(EN_SEV_SNP_MEM_INFO_ADDR);
+
+ /*
+ * There is no bootloader/EFI firmware in the SEV SNP guest.
+ * E820 table in the memory just describes memory for kernel,
+ * ACPI table, cmdline, boot params and ramdisk. The dynamic
+ * data(e.g, vcpu nnumber and the rest memory layout) needs to
+ * be read from EN_SEV_SNP_PROCESSOR_INFO_ADDR.
+ */
+ for (; entry->numpages != 0; entry++) {
+ e820_entry = &e820_table->entries[
+ e820_table->nr_entries - 1];
+ e820_end = e820_entry->addr + e820_entry->size;
+ ram_end = (entry->starting_gpn +
+ entry->numpages) * PAGE_SIZE;
+
+ if (e820_end < entry->starting_gpn * PAGE_SIZE)
+ e820_end = entry->starting_gpn * PAGE_SIZE;
+
+ if (e820_end < ram_end) {
+ pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
+ e820__range_add(e820_end, ram_end - e820_end,
+ E820_TYPE_RAM);
+ for (page = e820_end; page < ram_end; page += PAGE_SIZE)
+ pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
+ }
+ }
+}
+
void __init hv_vtom_init(void)
{
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 939373791249..84e024ffacd5 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -50,6 +50,21 @@ extern bool hv_isolation_type_en_snp(void);

extern union hv_ghcb * __percpu *hv_ghcb_pg;

+/*
+ * Hyper-V puts processor and memory layout info
+ * to this address in SEV-SNP enlightened guest.
+ */
+#define EN_SEV_SNP_PROCESSOR_INFO_ADDR 0x802000
+#define EN_SEV_SNP_MEM_INFO_ADDR 0x802018
+
+struct memory_map_entry {
+ u64 starting_gpn;
+ u64 numpages;
+ u16 type;
+ u16 flags;
+ u32 reserved;
+};
+
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -255,12 +270,14 @@ void hv_ghcb_msr_read(u64 msr, u64 *value);
bool hv_ghcb_negotiate_protocol(void);
void hv_ghcb_terminate(unsigned int set, unsigned int reason);
void hv_vtom_init(void);
+void hv_sev_init_mem_and_cpu(void);
#else
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
static inline void hv_vtom_init(void) {}
+static inline void hv_sev_init_mem_and_cpu(void) {}
#endif

extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 63a2bfbfe701..dea9b881180b 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -529,6 +529,9 @@ static void __init ms_hyperv_init_platform(void)
if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
mark_tsc_unstable("running on Hyper-V");

+ if (hv_isolation_type_en_snp())
+ hv_sev_init_mem_and_cpu();
+
hardlockup_detector_disable();
}

--
2.25.1


2023-05-16 05:50:34

by Saurabh Singh Sengar

[permalink] [raw]
Subject: RE: [EXTERNAL] [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest



> -----Original Message-----
> From: Tianyu Lan <[email protected]>
> Sent: Monday, May 15, 2023 10:29 PM
> To: [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Tianyu Lan
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXTERNAL] [RFC PATCH V6 13/14] x86/hyperv: Add smp support for
> sev-snp guest
>
> From: Tianyu Lan <[email protected]>
>
> The wakeup_secondary_cpu callback was populated with wakeup_
> cpu_via_vmgexit() which doesn't work for Hyper-V and Hyper-V requires to
> call Hyper-V specific hvcall to start APs. So override it with Hyper-V specific
> hook to start AP sev_es_save_area data structure.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC v5:
> * Remove some redundant structure definitions
>
> Change sicne RFC v3:
> * Replace struct sev_es_save_area with struct
> vmcb_save_area
> * Move code from mshyperv.c to ivm.c
>
> Change since RFC v2:
> * Add helper function to initialize segment
> * Fix some coding style
> ---
> arch/x86/hyperv/ivm.c | 98 +++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 10 ++++
> arch/x86/kernel/cpu/mshyperv.c | 13 +++-
> include/asm-generic/hyperv-tlfs.h | 3 +-
> 4 files changed, 121 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index
> 85e4378f052f..b7b8e1ba8223 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -22,11 +22,15 @@
> #include <asm/sev.h>
> #include <asm/realmode.h>
> #include <asm/e820/api.h>
> +#include <asm/desc.h>
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> #define GHCB_USAGE_HYPERV_CALL 1
>
> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted
> +__aligned(PAGE_SIZE); static u8 ap_start_stack[PAGE_SIZE]
> +__aligned(PAGE_SIZE);
> +
> union hv_ghcb {
> struct ghcb ghcb;
> struct {
> @@ -443,6 +447,100 @@ __init void hv_sev_init_mem_and_cpu(void)
> }
> }
>
> +#define hv_populate_vmcb_seg(seg, gdtr_base) \
> +do { \
> + if (seg.selector) { \
> + seg.base = 0; \
> + seg.limit = HV_AP_SEGMENT_LIMIT; \
> + seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
> + seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
> + } \
> +} while (0) \
> +
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip) {
> + struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
> + __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + struct desc_ptr gdtr;
> + u64 ret, rmp_adjust, retry = 5;
> + struct hv_enable_vp_vtl *start_vp_input;
> + unsigned long flags;
> +
> + native_store_gdt(&gdtr);
> +
> + vmsa->gdtr.base = gdtr.address;
> + vmsa->gdtr.limit = gdtr.size;
> +
> + asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> + hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
> +
> + asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> + hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> + hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> + hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
> +
> + vmsa->efer = native_read_msr(MSR_EFER);
> +
> + asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
> + asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
> + asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
> +
> + vmsa->xcr0 = 1;
> + vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
> + vmsa->rip = (u64)secondary_startup_64_no_verify;
> + vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
> +
> + /*
> + * Set the SNP-specific fields for this VMSA:
> + * VMPL level
> + * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
> + */;
> + vmsa->vmpl = 0;
> + vmsa->sev_features = sev_status >> 2;
> +
> + /*
> + * Running at VMPL0 allows the kernel to change the VMSA bit for a
> page
> + * using the RMPADJUST instruction. However, for the instruction to
> + * succeed it must target the permissions of a lesser privileged
> + * (higher numbered) VMPL level, so use VMPL1 (refer to the
> RMPADJUST
> + * instruction in the AMD64 APM Volume 3).
> + */
> + rmp_adjust = RMPADJUST_VMSA_PAGE_BIT | 1;
> + ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
> + rmp_adjust);
> + if (ret != 0) {
> + pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
> + return ret;
> + }
> +
> + local_irq_save(flags);
> + start_vp_input =
> + (struct hv_enable_vp_vtl *)ap_start_input_arg;
> + memset(start_vp_input, 0, sizeof(*start_vp_input));
> + start_vp_input->partition_id = -1;
> + start_vp_input->vp_index = cpu;
> + start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
> + *(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
> +
> + do {
> + ret = hv_do_hypercall(HVCALL_START_VP,
> + start_vp_input, NULL);
> + } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);

can we restore local_irq here ?

> +
> + if (!hv_result_success(ret)) {
> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
> + goto done;

No need of goto here.

Regards,
Saurabh

> + }
> +
> +done:
> + local_irq_restore(flags);
> + return ret;
> +}
> +
> void __init hv_vtom_init(void)
> {
> /*
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h index 84e024ffacd5..9ad2a0f21d68
> 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -65,6 +65,13 @@ struct memory_map_entry {
> u32 reserved;
> };
>
> +/*
> + * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> + * to start AP in enlightened SEV guest.
> + */
> +#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT 0xffffffff
> +
> int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages); int
> hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id); int
> hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags); @@ -
> 263,6 +270,7 @@ struct irq_domain *hv_create_pci_msi_domain(void); int
> hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> struct hv_interrupt_entry *entry);
> int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry
> *entry);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> void hv_ghcb_msr_write(u64 msr, u64 value); @@ -271,6 +279,7 @@ bool
> hv_ghcb_negotiate_protocol(void); void hv_ghcb_terminate(unsigned int set,
> unsigned int reason); void hv_vtom_init(void); void
> hv_sev_init_mem_and_cpu(void);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
> #else
> static inline void hv_ghcb_msr_write(u64 msr, u64 value) {} static inline void
> hv_ghcb_msr_read(u64 msr, u64 *value) {} @@ -278,6 +287,7 @@ static
> inline bool hv_ghcb_negotiate_protocol(void) { return false; } static inline
> void hv_ghcb_terminate(unsigned int set, unsigned int reason) {} static inline
> void hv_vtom_init(void) {} static inline void hv_sev_init_mem_and_cpu(void)
> {}
> +static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
> #endif
>
> extern bool hv_isolation_type_snp(void); diff --git
> a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index
> dea9b881180b..0c5f9f7bd7ba 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -295,6 +295,16 @@ static void __init hv_smp_prepare_cpus(unsigned int
> max_cpus)
>
> native_smp_prepare_cpus(max_cpus);
>
> + /*
> + * Override wakeup_secondary_cpu_64 callback for SEV-SNP
> + * enlightened guest.
> + */
> + if (hv_isolation_type_en_snp())
> + apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
> +
> + if (!hv_root_partition)
> + return;
> +
> #ifdef CONFIG_X86_64
> for_each_present_cpu(i) {
> if (i == 0)
> @@ -502,8 +512,7 @@ static void __init ms_hyperv_init_platform(void)
>
> # ifdef CONFIG_SMP
> smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> - if (hv_root_partition)
> - smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> # endif
>
> /*
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-
> tlfs.h
> index f4e4cc4f965f..92dcc530350c 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -146,9 +146,9 @@ union hv_reference_tsc_msr {
> /* Declare the various hypercall operations. */
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
> -#define HVCALL_ENABLE_VP_VTL 0x000f
> #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
> #define HVCALL_SEND_IPI 0x000b
> +#define HVCALL_ENABLE_VP_VTL 0x000f
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> @@ -223,6 +223,7 @@ enum HV_GENERIC_SET_FORMAT {
> #define HV_STATUS_INVALID_PORT_ID 17
> #define HV_STATUS_INVALID_CONNECTION_ID 18
> #define HV_STATUS_INSUFFICIENT_BUFFERS 19
> +#define HV_STATUS_TIME_OUT 120
> #define HV_STATUS_VTL_ALREADY_ENABLED 134
>
> /*
> --
> 2.25.1


2023-05-16 09:43:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On Mon, May 15, 2023 at 12:59:03PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Add a #HV exception handler that uses IST stack.
>

Urgh.. that is entirely insufficient. Like it doesn't even begin to
start to cover things.

The whole existing VC IST stack abuse is already a nightmare and you're
duplicating that.. without any explanation for why this would be needed
and how it is correct.

Please try again.

2023-05-17 08:36:19

by Tianyu Lan

[permalink] [raw]
Subject: Re: [EXTERNAL] [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest

On 5/16/2023 1:16 PM, Saurabh Singh Sengar wrote:
>> + (struct hv_enable_vp_vtl *)ap_start_input_arg;
>> + memset(start_vp_input, 0, sizeof(*start_vp_input));
>> + start_vp_input->partition_id = -1;
>> + start_vp_input->vp_index = cpu;
>> + start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
>> + *(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
>> +
>> + do {
>> + ret = hv_do_hypercall(HVCALL_START_VP,
>> + start_vp_input, NULL);
>> + } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
> can we restore local_irq here ?
>
>> +
>> + if (!hv_result_success(ret)) {
>> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
>> + goto done;
> No need of goto here.
>

Nice catch. The goto label should be removed here. Will update in the
next version.

2023-05-17 09:26:19

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On 5/16/2023 5:30 PM, Peter Zijlstra wrote:
> On Mon, May 15, 2023 at 12:59:03PM -0400, Tianyu Lan wrote:
>> From: Tianyu Lan<[email protected]>
>>
>> Add a #HV exception handler that uses IST stack.
>>
> Urgh.. that is entirely insufficient. Like it doesn't even begin to
> start to cover things.
>
> The whole existing VC IST stack abuse is already a nightmare and you're
> duplicating that.. without any explanation for why this would be needed
> and how it is correct.
>
> Please try again.

Hi Peter:
Thanks for your review. Will add more explanation in the next version.

2023-05-30 12:19:51

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler


>> Add a #HV exception handler that uses IST stack.
>>
>
> Urgh.. that is entirely insufficient. Like it doesn't even begin to
> start to cover things.
>
> The whole existing VC IST stack abuse is already a nightmare and you're
> duplicating that.. without any explanation for why this would be needed
> and how it is correct.
>
> Please try again.

#HV handler handles both #NMI & #MCE in the guest and nested #HV is
never raised by the hypervisor. Next #HV exception is only raised by the
hypervisor when Guest acknowledges the pending #HV exception by clearing
"NoFurtherSignal” bit in the doorbell page.

There is still protection (please see hv_switch_off_ist()) to gracefully
exit the guest if by any chance a malicious hypervisor sends nested #HV.
This saves with most of the nested IST stack pitfalls with #NMI & #MCE,
also #DB is handled in noinstr code
block(exc_vmm_communication()->vc_is_db {...}) hence avoid any recursive
#DBs.

Do you see anything else needs to be handled in #HV IST handling?

Thanks,
Pankaj



2023-05-30 14:46:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote:
>
> > > Add a #HV exception handler that uses IST stack.
> > >
> >
> > Urgh.. that is entirely insufficient. Like it doesn't even begin to
> > start to cover things.
> >
> > The whole existing VC IST stack abuse is already a nightmare and you're
> > duplicating that.. without any explanation for why this would be needed
> > and how it is correct.
> >
> > Please try again.
>
> #HV handler handles both #NMI & #MCE in the guest and nested #HV is never
> raised by the hypervisor.

I thought all this confidental computing nonsense was about not trusting
the hypervisor, so how come we're now relying on the hypervisor being
sane?

2023-05-30 15:22:14

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On 5/30/23 05:16, Gupta, Pankaj wrote:
> #HV handler handles both #NMI & #MCE in the guest and nested #HV is
> never raised by the hypervisor. Next #HV exception is only raised by the
> hypervisor when Guest acknowledges the pending #HV exception by clearing
> "NoFurtherSignal” bit in the doorbell page.

There's a big difference between "is never raised by" and "cannot be
raised by".

Either way, this series (and this patch in particular) needs some much
better changelogs so that this behavior is clear. It would also be nice
to reference the relevant parts of the hardware specs if the "hardware"*
is helping to provide these guarantees.

* I say "hardware" in quotes because on TDX a big chunk of this behavior
is implemented in software in the TDX module. SEV probably does it in
microcode (or maybe in the secure processor), but I kinda doubt it's
purely silicon.

2023-05-30 16:26:28

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On 5/30/23 09:35, Peter Zijlstra wrote:
> On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote:
>>
>>>> Add a #HV exception handler that uses IST stack.
>>>>
>>>
>>> Urgh.. that is entirely insufficient. Like it doesn't even begin to
>>> start to cover things.
>>>
>>> The whole existing VC IST stack abuse is already a nightmare and you're
>>> duplicating that.. without any explanation for why this would be needed
>>> and how it is correct.
>>>
>>> Please try again.
>>
>> #HV handler handles both #NMI & #MCE in the guest and nested #HV is never
>> raised by the hypervisor.
>
> I thought all this confidental computing nonsense was about not trusting
> the hypervisor, so how come we're now relying on the hypervisor being
> sane?

That should really say that a nested #HV should never be raised by the
hypervisor, but if it is, then the guest should detect that and
self-terminate knowing that the hypervisor is possibly being malicious.

Thanks,
Tom

2023-05-30 19:05:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On Tue, May 30, 2023 at 10:59:01AM -0500, Tom Lendacky wrote:
> On 5/30/23 09:35, Peter Zijlstra wrote:
> > On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote:
> > >
> > > > > Add a #HV exception handler that uses IST stack.
> > > > >
> > > >
> > > > Urgh.. that is entirely insufficient. Like it doesn't even begin to
> > > > start to cover things.
> > > >
> > > > The whole existing VC IST stack abuse is already a nightmare and you're
> > > > duplicating that.. without any explanation for why this would be needed
> > > > and how it is correct.
> > > >
> > > > Please try again.
> > >
> > > #HV handler handles both #NMI & #MCE in the guest and nested #HV is never
> > > raised by the hypervisor.
> >
> > I thought all this confidental computing nonsense was about not trusting
> > the hypervisor, so how come we're now relying on the hypervisor being
> > sane?
>
> That should really say that a nested #HV should never be raised by the
> hypervisor, but if it is, then the guest should detect that and
> self-terminate knowing that the hypervisor is possibly being malicious.

I've yet to see code that can do that reliably.

2023-05-30 19:36:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On 5/30/23 11:52, Peter Zijlstra wrote:
>> That should really say that a nested #HV should never be raised by the
>> hypervisor, but if it is, then the guest should detect that and
>> self-terminate knowing that the hypervisor is possibly being malicious.
> I've yet to see code that can do that reliably.

By "#HV should never be raised by the hypervisor", I think Tom means:

#HV can and will be raised by malicious hypervisors and the
guest must be able to unambiguously handle it in a way that
will not result in the guest getting rooted.

Right? ;)

2023-05-31 09:50:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote:

> > That should really say that a nested #HV should never be raised by the
> > hypervisor, but if it is, then the guest should detect that and
> > self-terminate knowing that the hypervisor is possibly being malicious.
>
> I've yet to see code that can do that reliably.

Tom; could you please investigate if this can be enforced in ucode?

Ideally #HV would have an internal latch such that a recursive #HV will
terminate the guest (much like double #MC and tripple-fault).

But unlike the #MC trainwreck, can we please not leave a glaring hole in
this latch and use a spare bit in the IRET frame please?

So have #HV delivery:
- check internal latch; if set, terminate machine
- set latch
- write IRET frame with magic bit set

have IRET:
- check magic bit and reset #HV latch


2023-06-06 06:18:26

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler


>> That should really say that a nested #HV should never be raised by the
>> hypervisor, but if it is, then the guest should detect that and
>> self-terminate knowing that the hypervisor is possibly being malicious.
>
> I've yet to see code that can do that reliably.

- Currently, we are detecting the direct nested #HV with below check and
guest self terminate.

<snip>
if (get_stack_info_noinstr(stack, current, &info) &&
(info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) ||
info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2)))
panic("Nested #HV exception, HV IST corrupted, stack
type = %d\n", info.type);
</snip>

- Thinking about below solution to detect the nested
#HV reliably:

-- Make reliable IST stack switching for #VC -> #HV -> #VC case
(similar to done in __sev_es_ist_enter/__sev_es_ist_exit for NMI
IST stack).

-- In addition to this, we can make nested #HV detection (with another
exception type) more reliable with refcounting (percpu?).

Need your inputs before I implement this solution. Or any other idea in
software you have in mind?

Thanks,
Pankaj


2023-06-06 08:06:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On Tue, Jun 06, 2023 at 08:00:32AM +0200, Gupta, Pankaj wrote:
>
> > > That should really say that a nested #HV should never be raised by the
> > > hypervisor, but if it is, then the guest should detect that and
> > > self-terminate knowing that the hypervisor is possibly being malicious.
> >
> > I've yet to see code that can do that reliably.
>
> - Currently, we are detecting the direct nested #HV with below check and
> guest self terminate.
>
> <snip>
> if (get_stack_info_noinstr(stack, current, &info) &&
> (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) ||
> info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2)))
> panic("Nested #HV exception, HV IST corrupted, stack
> type = %d\n", info.type);
> </snip>
>
> - Thinking about below solution to detect the nested
> #HV reliably:
>
> -- Make reliable IST stack switching for #VC -> #HV -> #VC case
> (similar to done in __sev_es_ist_enter/__sev_es_ist_exit for NMI
> IST stack).

I'm not convinced any of that is actually correct; there is a *huge*
window between NMI hitting and calling __sev_es_ist_enter(), idem on the
exit side.

> -- In addition to this, we can make nested #HV detection (with another
> exception type) more reliable with refcounting (percpu?).

There is also #DB and the MOVSS shadow.

And no, I don't think any of that is what you'd call 'robust'. This is
what I call a trainwreck :/

And I'm more than willing to say no until the hardware is more sane.

Supervisor Shadow Stack support is in the same boat, that's on hold
until FRED makes things workable.

2023-06-07 18:44:23

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler

On 5/31/23 04:14, Peter Zijlstra wrote:
> On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote:
>
>>> That should really say that a nested #HV should never be raised by the
>>> hypervisor, but if it is, then the guest should detect that and
>>> self-terminate knowing that the hypervisor is possibly being malicious.
>>
>> I've yet to see code that can do that reliably.
>
> Tom; could you please investigate if this can be enforced in ucode?
>
> Ideally #HV would have an internal latch such that a recursive #HV will
> terminate the guest (much like double #MC and tripple-fault).
>
> But unlike the #MC trainwreck, can we please not leave a glaring hole in
> this latch and use a spare bit in the IRET frame please?
>
> So have #HV delivery:
> - check internal latch; if set, terminate machine
> - set latch
> - write IRET frame with magic bit set
>
> have IRET:
> - check magic bit and reset #HV latch

Hi Peter,

I talked with the hardware team about this and, unfortunately, it is not
practical to implement. The main concerns are that there are already two
generations of hardware out there with the current support and, given
limited patch space, in addition to the ucode support to track and perform
the latch support, additional ucode support would be required to
save/restore the latch information when handling a VMEXIT during #HV
processing.

Thanks,
Tom

>