2023-01-22 03:11:52

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Tianyu Lan <[email protected]>

This patchset is to add AMD sev-snp enlightened guest
support on hyperv. Hyperv uses Linux direct boot mode
to boot up Linux kernel and so it needs to pvalidate
system memory by itself.

In hyperv case, there is no boot loader and so cc blob
is prepared by hypervisor. In this series, hypervisor
set the cc blob address directly into boot parameter
of Linux kernel. If the magic number on cc blob address
is valid, kernel will read cc blob.

Shared memory between guests and hypervisor should be
decrypted and zero memory after decrypt memory. The data
in the target address. It maybe smearedto avoid smearing
data.

Introduce #HV exception support in AMD sev snp code and
#HV handler.

Change since v2:
- Remove validate kernel memory code at boot stage
- Split #HV page patch into two parts
- Remove HV-APIC change due to enable x2apic from
host side
- Rework vmbus code to handle error of decrypt page
- Spilt memory and cpu initialization patch.

Change since v1:
- Remove boot param changes for cc blob address and
use setup head to pass cc blob info
- Remove unnessary WARN and BUG check
- Add system vector table map in the #HV exception
- Fix interrupt exit issue when use #HV exception

Ashish Kalra (2):
x86/sev: optimize system vector processing invoked from #HV exception
x86/sev: Fix interrupt exit code paths from #HV exception

Tianyu Lan (14):
x86/hyperv: Add sev-snp enlightened guest specific config
x86/hyperv: Decrypt hv vp assist page in sev-snp enlightened guest
x86/hyperv: Set Virtual Trust Level in vmbus init message
x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
enlightened guest
clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp
enlightened guest
x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest
drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
enlightened guest
x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
x86/hyperv: SEV-SNP enlightened guest don't support legacy rtc
x86/hyperv: Add smp support for sev-snp guest
x86/hyperv: Add hyperv-specific hadling for VMMCALL under SEV-ES
x86/sev: Add a #HV exception handler
x86/sev: Add Check of #HV event in path
x86/sev: Initialize #HV doorbell and handle interrupt requests

arch/x86/entry/entry_64.S | 82 ++++++
arch/x86/hyperv/hv_init.c | 43 +++
arch/x86/hyperv/ivm.c | 10 +
arch/x86/include/asm/cpu_entry_area.h | 6 +
arch/x86/include/asm/hyperv-tlfs.h | 4 +
arch/x86/include/asm/idtentry.h | 105 ++++++-
arch/x86/include/asm/irqflags.h | 10 +
arch/x86/include/asm/mem_encrypt.h | 2 +
arch/x86/include/asm/mshyperv.h | 56 +++-
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/sev.h | 13 +
arch/x86/include/asm/svm.h | 59 +++-
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/include/asm/x86_init.h | 2 +
arch/x86/include/uapi/asm/svm.h | 4 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/mshyperv.c | 228 ++++++++++++++-
arch/x86/kernel/dumpstack_64.c | 9 +-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 395 ++++++++++++++++++++++----
arch/x86/kernel/traps.c | 42 +++
arch/x86/kernel/vmlinux.lds.S | 7 +
arch/x86/kernel/x86_init.c | 4 +-
arch/x86/mm/cpu_entry_area.c | 2 +
drivers/clocksource/hyperv_timer.c | 2 +-
drivers/hv/connection.c | 1 +
drivers/hv/hv.c | 33 ++-
drivers/hv/hv_common.c | 26 +-
include/asm-generic/hyperv-tlfs.h | 19 ++
include/asm-generic/mshyperv.h | 2 +
include/linux/hyperv.h | 4 +-
33 files changed, 1102 insertions(+), 79 deletions(-)

--
2.25.1


2023-01-22 03:11:58

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 04/16] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

In sev-snp enlightened guest, Hyper-V hypercall needs
to use vmmcall to trigger vmexit and notify hypervisor
to handle hypercall request.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Fix indentation style
---
arch/x86/include/asm/mshyperv.h | 44 ++++++++++++++++++++++++---------
1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 285df71150e4..1a4af0a4f29a 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -44,16 +44,25 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
u64 hv_status;

#ifdef CONFIG_X86_64
- if (!hv_hypercall_pg)
- return U64_MAX;
+ if (hv_isolation_type_en_snp()) {
+ __asm__ __volatile__("mov %4, %%r8\n"
+ "vmmcall"
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input_address)
+ : "r" (output_address)
+ : "cc", "memory", "r8", "r9", "r10", "r11");
+ } else {
+ if (!hv_hypercall_pg)
+ return U64_MAX;

- __asm__ __volatile__("mov %4, %%r8\n"
- CALL_NOSPEC
- : "=a" (hv_status), ASM_CALL_CONSTRAINT,
- "+c" (control), "+d" (input_address)
- : "r" (output_address),
- THUNK_TARGET(hv_hypercall_pg)
- : "cc", "memory", "r8", "r9", "r10", "r11");
+ __asm__ __volatile__("mov %4, %%r8\n"
+ CALL_NOSPEC
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input_address)
+ : "r" (output_address),
+ THUNK_TARGET(hv_hypercall_pg)
+ : "cc", "memory", "r8", "r9", "r10", "r11");
+ }
#else
u32 input_address_hi = upper_32_bits(input_address);
u32 input_address_lo = lower_32_bits(input_address);
@@ -81,7 +90,13 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;

#ifdef CONFIG_X86_64
- {
+ if (hv_isolation_type_en_snp()) {
+ __asm__ __volatile__(
+ "vmmcall"
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input1)
+ :: "cc", "r8", "r9", "r10", "r11");
+ } else {
__asm__ __volatile__(CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
"+c" (control), "+d" (input1)
@@ -112,7 +127,14 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;

#ifdef CONFIG_X86_64
- {
+ if (hv_isolation_type_en_snp()) {
+ __asm__ __volatile__("mov %4, %%r8\n"
+ "vmmcall"
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input1)
+ : "r" (input2)
+ : "cc", "r8", "r9", "r10", "r11");
+ } else {
__asm__ __volatile__("mov %4, %%r8\n"
CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
--
2.25.1

2023-01-22 03:12:56

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]>

The wakeup_secondary_cpu callback was populated with wakeup_
cpu_via_vmgexit() which doesn't work for Hyper-V. Override it
with Hyper-V specific hook which uses HVCALL_START_VIRTUAL_
PROCESSOR hvcall to start AP with vmsa data structure.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC v2:
* Add helper function to initialize segment
* Fix some coding style
---
arch/x86/include/asm/mshyperv.h | 2 +
arch/x86/include/asm/sev.h | 13 ++++
arch/x86/include/asm/svm.h | 47 +++++++++++++
arch/x86/kernel/cpu/mshyperv.c | 112 ++++++++++++++++++++++++++++--
include/asm-generic/hyperv-tlfs.h | 19 +++++
5 files changed, 189 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 7266d71d30d6..c69051eec0e1 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -203,6 +203,8 @@ struct irq_domain *hv_create_pci_msi_domain(void);
int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);

#ifdef CONFIG_AMD_MEM_ENCRYPT
void hv_ghcb_msr_write(u64 msr, u64 value);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..e34aaf730220 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

+union sev_rmp_adjust {
+ u64 as_uint64;
+ struct {
+ unsigned long target_vmpl : 8;
+ unsigned long enable_read : 1;
+ unsigned long enable_write : 1;
+ unsigned long enable_user_execute : 1;
+ unsigned long enable_kernel_execute : 1;
+ unsigned long reserved1 : 4;
+ unsigned long vmsa : 1;
+ };
+};
+
/* SNP Guest message request */
struct snp_req_data {
unsigned long req_gpa;
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b1..f8b321a11ee4 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -336,6 +336,53 @@ struct vmcb_save_area {
u64 last_excp_to;
u8 reserved_0x298[72];
u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
+ u8 reserved_7b[4];
+ u32 pkru;
+ u8 reserved_7a[20];
+ u64 reserved_8; /* rax already available at 0x01f8 */
+ u64 rcx;
+ u64 rdx;
+ u64 rbx;
+ u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 rbp;
+ u64 rsi;
+ u64 rdi;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u8 reserved_10[16];
+ u64 sw_exit_code;
+ u64 sw_exit_info_1;
+ u64 sw_exit_info_2;
+ u64 sw_scratch;
+ union {
+ u64 sev_features;
+ struct {
+ u64 sev_feature_snp : 1;
+ u64 sev_feature_vtom : 1;
+ u64 sev_feature_reflectvc : 1;
+ u64 sev_feature_restrict_injection : 1;
+ u64 sev_feature_alternate_injection : 1;
+ u64 sev_feature_full_debug : 1;
+ u64 sev_feature_reserved1 : 1;
+ u64 sev_feature_snpbtb_isolation : 1;
+ u64 sev_feature_resrved2 : 56;
+ };
+ };
+ u64 vintr_ctrl;
+ u64 guest_error_code;
+ u64 virtual_tom;
+ u64 tlb_id;
+ u64 pcpu_id;
+ u64 event_inject;
+ u64 xcr0;
+ u8 valid_bitmap[16];
+ u64 x87_state_gpa;
} __packed;

/* Save area definition for SEV-ES and SEV-SNP guests */
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 197c8f2ec4eb..9d547751a1a7 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -39,6 +39,13 @@
#include <asm/realmode.h>
#include <asm/e820/api.h>

+/*
+ * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
+ * to start AP in enlightened SEV guest.
+ */
+#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT 0xffffffff
+
/* Is Linux running as the root partition? */
bool hv_root_partition;
struct ms_hyperv_info ms_hyperv;
@@ -230,6 +237,94 @@ static void __init hv_smp_prepare_boot_cpu(void)
#endif
}

+static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+#define hv_populate_vmcb_seg(seg, gdtr_base) \
+do { \
+ if (seg.selector) { \
+ seg.base = 0; \
+ seg.limit = HV_AP_SEGMENT_LIMIT; \
+ seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
+ seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
+ } \
+} while (0) \
+
+int hv_snp_boot_ap(int cpu, unsigned long start_ip)
+{
+ struct vmcb_save_area *vmsa = (struct vmcb_save_area *)
+ __get_free_page(GFP_KERNEL | __GFP_ZERO);
+ struct desc_ptr gdtr;
+ u64 ret, retry = 5;
+ struct hv_start_virtual_processor_input *start_vp_input;
+ union sev_rmp_adjust rmp_adjust;
+ unsigned long flags;
+
+ native_store_gdt(&gdtr);
+
+ vmsa->gdtr.base = gdtr.address;
+ vmsa->gdtr.limit = gdtr.size;
+
+ asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
+ hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
+
+ asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
+ hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
+
+ asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
+ hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
+
+ asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
+ hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
+
+ vmsa->efer = native_read_msr(MSR_EFER);
+
+ asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
+ asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
+ asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
+
+ vmsa->xcr0 = 1;
+ vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
+ vmsa->rip = (u64)secondary_startup_64_no_verify;
+ vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
+
+ vmsa->sev_feature_snp = 1;
+ vmsa->sev_feature_restrict_injection = 1;
+
+ rmp_adjust.as_uint64 = 0;
+ rmp_adjust.target_vmpl = 1;
+ rmp_adjust.vmsa = 1;
+ ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
+ rmp_adjust.as_uint64);
+ if (ret != 0) {
+ pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
+ return ret;
+ }
+
+ local_irq_save(flags);
+ start_vp_input =
+ (struct hv_start_virtual_processor_input *)ap_start_input_arg;
+ memset(start_vp_input, 0, sizeof(*start_vp_input));
+ start_vp_input->partitionid = -1;
+ start_vp_input->vpindex = cpu;
+ start_vp_input->targetvtl = ms_hyperv.vtl;
+ *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
+
+ do {
+ ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
+ start_vp_input, NULL);
+ } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
+
+ if (!hv_result_success(ret)) {
+ pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
+ goto done;
+ }
+
+done:
+ local_irq_restore(flags);
+ return ret;
+}
+
static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
{
#ifdef CONFIG_X86_64
@@ -239,6 +334,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)

native_smp_prepare_cpus(max_cpus);

+ /*
+ * Override wakeup_secondary_cpu callback for SEV-SNP
+ * enlightened guest.
+ */
+ if (hv_isolation_type_en_snp())
+ apic->wakeup_secondary_cpu = hv_snp_boot_ap;
+
+ if (!hv_root_partition)
+ return;
+
#ifdef CONFIG_X86_64
for_each_present_cpu(i) {
if (i == 0)
@@ -475,8 +580,7 @@ static void __init ms_hyperv_init_platform(void)

# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
- if (hv_root_partition)
- smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+ smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
# endif

/*
@@ -501,7 +605,7 @@ static void __init ms_hyperv_init_platform(void)
if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
mark_tsc_unstable("running on Hyper-V");

- if (isolation_type_en_snp()) {
+ if (hv_isolation_type_en_snp()) {
/*
* Hyper-V enlightened snp guest boots kernel
* directly without bootloader and so roms,
@@ -511,7 +615,7 @@ static void __init ms_hyperv_init_platform(void)
x86_platform.legacy.rtc = 0;
x86_platform.set_wallclock = set_rtc_noop;
x86_platform.get_wallclock = get_rtc_noop;
- x86_platform.legacy.reserve_bios_regions = x86_init_noop;
+ x86_platform.legacy.reserve_bios_regions = 0;
x86_init.resources.probe_roms = x86_init_noop;
x86_init.resources.reserve_resources = x86_init_noop;
x86_init.mpparse.find_smp_config = x86_init_noop;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index c1cc3ec36ad5..3d7c67be9f56 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -148,6 +148,7 @@ union hv_reference_tsc_msr {
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
+#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
@@ -165,6 +166,7 @@ union hv_reference_tsc_msr {
#define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
#define HVCALL_RETARGET_INTERRUPT 0x007e
+#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
@@ -219,6 +221,7 @@ enum HV_GENERIC_SET_FORMAT {
#define HV_STATUS_INVALID_PORT_ID 17
#define HV_STATUS_INVALID_CONNECTION_ID 18
#define HV_STATUS_INSUFFICIENT_BUFFERS 19
+#define HV_STATUS_TIME_OUT 0x78

/*
* The Hyper-V TimeRefCount register and the TSC
@@ -778,6 +781,22 @@ struct hv_input_unmap_device_interrupt {
struct hv_interrupt_entry interrupt_entry;
} __packed;

+struct hv_enable_vp_vtl_input {
+ u64 partitionid;
+ u32 vpindex;
+ u8 targetvtl;
+ u8 padding[3];
+ u8 context[0xe0];
+} __packed;
+
+struct hv_start_virtual_processor_input {
+ u64 partitionid;
+ u32 vpindex;
+ u8 targetvtl;
+ u8 padding[3];
+ u8 context[0xe0];
+} __packed;
+
#define HV_SOURCE_SHADOW_NONE 0x0
#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1

--
2.25.1

2023-01-22 03:16:54

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 16/16] x86/sev: Fix interrupt exit code paths from #HV exception

From: Ashish Kalra <[email protected]>

Add checks in interrupt exit code paths in case of returns
to user mode to check if currently executing the #HV handler
then don't follow the irqentry_exit_to_user_mode path as
that can potentially cause the #HV handler to be
preempted and rescheduled on another CPU. Rescheduled #HV
handler on another cpu will cause interrupts to be handled
on a different cpu than the injected one, causing
invalid EOIs and missed/lost guest interrupts and
corresponding hangs and/or per-cpu IRQs handled on
non-intended cpu.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 30 +++++++++++++++
2 files changed, 96 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 652fea10d377..45b47132be7c 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -13,6 +13,10 @@

#include <asm/irq_stack.h>

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
+#endif
+
/**
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points
* No error code pushed by hardware
@@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
#define DECLARE_IDTENTRY_IRQ(vector, func) \
DECLARE_IDTENTRY_ERRORCODE(vector, func)

+#ifndef CONFIG_AMD_MEM_ENCRYPT
/**
* DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
* @func: Function name of the entry point
@@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
} \
\
static noinline void __##func(struct pt_regs *regs, u32 vector)
+#else
+
+#define DEFINE_IDTENTRY_IRQ(func) \
+static void __##func(struct pt_regs *regs, u32 vector); \
+ \
+__visible noinstr void func(struct pt_regs *regs, \
+ unsigned long error_code) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ u32 vector = (u32)(u8)error_code; \
+ \
+ instrumentation_begin(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ run_irq_on_irqstack_cond(__##func, regs, vector); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static noinline void __##func(struct pt_regs *regs, u32 vector)
+#endif

/**
* DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
@@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
#define DECLARE_IDTENTRY_SYSVEC(vector, func) \
DECLARE_IDTENTRY(vector, func)

+#ifndef CONFIG_AMD_MEM_ENCRYPT
/**
* DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
* @func: Function name of the entry point
@@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
} \
\
static noinline void __##func(struct pt_regs *regs)
+#else
+
+#define DEFINE_IDTENTRY_SYSVEC(func) \
+static void __##func(struct pt_regs *regs); \
+ \
+__visible noinstr void func(struct pt_regs *regs) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ \
+ instrumentation_begin(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ run_sysvec_on_irqstack_cond(__##func, regs); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static noinline void __##func(struct pt_regs *regs)
+#endif
+
+#ifndef CONFIG_AMD_MEM_ENCRYPT

/**
* DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
@@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
} \
\
static __always_inline void __##func(struct pt_regs *regs)
+#else
+
+#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
+static __always_inline void __##func(struct pt_regs *regs); \
+ \
+__visible noinstr void func(struct pt_regs *regs) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ \
+ instrumentation_begin(); \
+ __irq_enter_raw(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ __##func(regs); \
+ __irq_exit_raw(); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static __always_inline void __##func(struct pt_regs *regs)
+#endif

/**
* DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b1a98c2a52f8..23f15e95838b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -147,6 +147,10 @@ struct sev_hv_doorbell_page {

struct sev_snp_runtime_data {
struct sev_hv_doorbell_page hv_doorbell_page;
+ /*
+ * Indication that we are currently handling #HV events.
+ */
+ bool hv_handling_events;
};

static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
@@ -200,6 +204,8 @@ static void do_exc_hv(struct pt_regs *regs)
union hv_pending_events pending_events;
u8 vector;

+ this_cpu_read(snp_runtime_data)->hv_handling_events = true;
+
while (sev_hv_pending()) {
pending_events.events = xchg(
&sev_snp_current_doorbell_page()->pending_events.events,
@@ -234,6 +240,8 @@ static void do_exc_hv(struct pt_regs *regs)
common_interrupt(regs, pending_events.vector);
}
}
+
+ this_cpu_read(snp_runtime_data)->hv_handling_events = false;
}

static __always_inline bool on_vc_stack(struct pt_regs *regs)
@@ -2529,3 +2537,25 @@ static int __init snp_init_platform_device(void)
return 0;
}
device_initcall(snp_init_platform_device);
+
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
+{
+ /*
+ * Check whether this returns to user mode, if so and if
+ * we are currently executing the #HV handler then we don't
+ * want to follow the irqentry_exit_to_user_mode path as
+ * that can potentially cause the #HV handler to be
+ * preempted and rescheduled on another CPU. Rescheduled #HV
+ * handler on another cpu will cause interrupts to be handled
+ * on a different cpu than the injected one, causing
+ * invalid EOIs and missed/lost guest interrupts and
+ * corresponding hangs and/or per-cpu IRQs handled on
+ * non-intended cpu.
+ */
+ if (user_mode(regs) &&
+ this_cpu_read(snp_runtime_data)->hv_handling_events)
+ return;
+
+ /* follow normal interrupt return/exit path */
+ irqentry_exit(regs, state);
+}
--
2.25.1

2023-01-22 03:19:50

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 14/16] x86/sev: Initialize #HV doorbell and handle interrupt requests

From: Tianyu Lan <[email protected]>

Enable #HV exception to handle interrupt requests from hypervisor.

Co-developed-by: Lendacky Thomas <[email protected]>
Co-developed-by: Kalra Ashish <[email protected]>
Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 2 +
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/include/asm/svm.h | 12 +-
arch/x86/include/uapi/asm/svm.h | 4 +
arch/x86/kernel/sev.c | 307 +++++++++++++++++++++++------
arch/x86/kernel/traps.c | 2 +
6 files changed, 272 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 72ca90552b6a..7264ca5f5b2d 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
void __init mem_encrypt_free_decrypted_mem(void);

void __init sev_es_init_vc_handling(void);
+void __init sev_snp_init_hv_handling(void);

#define __bss_decrypted __section(".bss..decrypted")

@@ -72,6 +73,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
static inline void __init sme_enable(struct boot_params *bp) { }

static inline void sev_es_init_vc_handling(void) { }
+static inline void sev_snp_init_hv_handling(void) { }

static inline int __init
early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6a6e70e792a4..70af0ce5f2c4 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -562,11 +562,17 @@
#define MSR_AMD64_SEV_ENABLED_BIT 0
#define MSR_AMD64_SEV_ES_ENABLED_BIT 1
#define MSR_AMD64_SEV_SNP_ENABLED_BIT 2
+#define MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT 4
+#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT 5
+#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT 6
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
#define MSR_AMD64_SNP_VTOM_ENABLED BIT_ULL(3)

+#define MSR_AMD64_SEV_REFLECTVC_ENABLED BIT_ULL(MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT)
+#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT)
+#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT)
#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

/* AMD Collaborative Processor Performance Control MSRs */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f8b321a11ee4..911c991fec78 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -568,12 +568,12 @@ static inline void __unused_size_checks(void)

/* Check offsets of reserved fields */

- BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
- BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
- BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
- BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
- BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
- BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
+// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
+// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
+// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
+// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
+// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
+// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);

BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index f69c168391aa..85d6882262e7 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -115,6 +115,10 @@
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
#define SVM_VMGEXIT_AP_CREATE 1
#define SVM_VMGEXIT_AP_DESTROY 2
+#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
+#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
+#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
+#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index fe5e5e41433d..03d99fad9e76 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -122,6 +122,150 @@ struct sev_config {

static struct sev_config sev_cfg __read_mostly;

+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
+static noinstr void __sev_put_ghcb(struct ghcb_state *state);
+static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
+static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
+
+union hv_pending_events {
+ u16 events;
+ struct {
+ u8 vector;
+ u8 nmi : 1;
+ u8 mc : 1;
+ u8 reserved1 : 5;
+ u8 no_further_signal : 1;
+ };
+};
+
+struct sev_hv_doorbell_page {
+ union hv_pending_events pending_events;
+ u8 no_eoi_required;
+ u8 reserved2[61];
+ u8 padding[4032];
+};
+
+struct sev_snp_runtime_data {
+ struct sev_hv_doorbell_page hv_doorbell_page;
+};
+
+static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+ return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+}
+
+static __always_inline void sev_es_wr_ghcb_msr(u64 val)
+{
+ u32 low, high;
+
+ low = (u32)(val);
+ high = (u32)(val >> 32);
+
+ native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+}
+
+struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
+{
+ return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
+}
+
+static u8 sev_hv_pending(void)
+{
+ return sev_snp_current_doorbell_page()->pending_events.events;
+}
+
+static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
+{
+ if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
+ return;
+
+ BUG_ON(reg != APIC_EOI);
+ apic->write(reg, val);
+}
+
+static void do_exc_hv(struct pt_regs *regs)
+{
+ union hv_pending_events pending_events;
+ u8 vector;
+
+ while (sev_hv_pending()) {
+ pending_events.events = xchg(
+ &sev_snp_current_doorbell_page()->pending_events.events,
+ 0);
+
+ if (pending_events.nmi)
+ exc_nmi(regs);
+
+#ifdef CONFIG_X86_MCE
+ if (pending_events.mc)
+ exc_machine_check(regs);
+#endif
+
+ if (!pending_events.vector)
+ return;
+
+ if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
+ /* Exception vectors */
+ WARN(1, "exception shouldn't happen\n");
+ } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
+ sysvec_irq_move_cleanup(regs);
+ } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
+ WARN(1, "syscall shouldn't happen\n");
+ } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
+ switch (pending_events.vector) {
+#if IS_ENABLED(CONFIG_HYPERV)
+ case HYPERV_STIMER0_VECTOR:
+ sysvec_hyperv_stimer0(regs);
+ break;
+ case HYPERVISOR_CALLBACK_VECTOR:
+ sysvec_hyperv_callback(regs);
+ break;
+#endif
+#ifdef CONFIG_SMP
+ case RESCHEDULE_VECTOR:
+ sysvec_reschedule_ipi(regs);
+ break;
+ case IRQ_MOVE_CLEANUP_VECTOR:
+ sysvec_irq_move_cleanup(regs);
+ break;
+ case REBOOT_VECTOR:
+ sysvec_reboot(regs);
+ break;
+ case CALL_FUNCTION_SINGLE_VECTOR:
+ sysvec_call_function_single(regs);
+ break;
+ case CALL_FUNCTION_VECTOR:
+ sysvec_call_function(regs);
+ break;
+#endif
+#ifdef CONFIG_X86_LOCAL_APIC
+ case ERROR_APIC_VECTOR:
+ sysvec_error_interrupt(regs);
+ break;
+ case SPURIOUS_APIC_VECTOR:
+ sysvec_spurious_apic_interrupt(regs);
+ break;
+ case LOCAL_TIMER_VECTOR:
+ sysvec_apic_timer_interrupt(regs);
+ break;
+ case X86_PLATFORM_IPI_VECTOR:
+ sysvec_x86_platform_ipi(regs);
+ break;
+#endif
+ case 0x0:
+ break;
+ default:
+ panic("Unexpected vector %d\n", vector);
+ unreachable();
+ }
+ } else {
+ common_interrupt(regs, pending_events.vector);
+ }
+ }
+}
+
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -179,11 +323,6 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
}

-static void do_exc_hv(struct pt_regs *regs)
-{
- /* Handle #HV exception. */
-}
-
void check_hv_pending(struct pt_regs *regs)
{
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
@@ -232,68 +371,38 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
}

-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+static bool sev_restricted_injection_enabled(void)
+{
+ return sev_status & MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED;
+}
+
+void __init sev_snp_init_hv_handling(void)
{
+ struct sev_snp_runtime_data *snp_data;
struct sev_es_runtime_data *data;
+ struct ghcb_state state;
struct ghcb *ghcb;
+ unsigned long flags;
+ int cpu;
+ int err;

WARN_ON(!irqs_disabled());
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
+ return;

data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;

- if (unlikely(data->ghcb_active)) {
- /* GHCB is already in use - save its contents */
-
- if (unlikely(data->backup_ghcb_active)) {
- /*
- * Backup-GHCB is also already in use. There is no way
- * to continue here so just kill the machine. To make
- * panic() work, mark GHCBs inactive so that messages
- * can be printed out.
- */
- data->ghcb_active = false;
- data->backup_ghcb_active = false;
-
- instrumentation_begin();
- panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
- instrumentation_end();
- }
-
- /* Mark backup_ghcb active before writing to it */
- data->backup_ghcb_active = true;
-
- state->ghcb = &data->backup_ghcb;
+ local_irq_save(flags);

- /* Backup GHCB content */
- *state->ghcb = *ghcb;
- } else {
- state->ghcb = NULL;
- data->ghcb_active = true;
- }
+ ghcb = __sev_get_ghcb(&state);

- return ghcb;
-}
+ sev_snp_setup_hv_doorbell_page(ghcb);

-static inline u64 sev_es_rd_ghcb_msr(void)
-{
- return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
-}
-
-static __always_inline void sev_es_wr_ghcb_msr(u64 val)
-{
- u32 low, high;
+ __sev_put_ghcb(&state);

- low = (u32)(val);
- high = (u32)(val >> 32);
+ apic_set_eoi_write(hv_doorbell_apic_eoi_write);

- native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+ local_irq_restore(flags);
}

static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
@@ -554,6 +663,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"

+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (unlikely(data->ghcb_active)) {
+ /* GHCB is already in use - save its contents */
+
+ if (unlikely(data->backup_ghcb_active)) {
+ /*
+ * Backup-GHCB is also already in use. There is no way
+ * to continue here so just kill the machine. To make
+ * panic() work, mark GHCBs inactive so that messages
+ * can be printed out.
+ */
+ data->ghcb_active = false;
+ data->backup_ghcb_active = false;
+
+ instrumentation_begin();
+ panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+ instrumentation_end();
+ }
+
+ /* Mark backup_ghcb active before writing to it */
+ data->backup_ghcb_active = true;
+
+ state->ghcb = &data->backup_ghcb;
+
+ /* Backup GHCB content */
+ *state->ghcb = *ghcb;
+ } else {
+ state->ghcb = NULL;
+ data->ghcb_active = true;
+ }
+
+ return ghcb;
+}
+
+static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
+{
+ u64 pa;
+ enum es_result ret;
+
+ pa = __pa(sev_snp_current_doorbell_page());
+ vc_ghcb_invalidate(ghcb);
+ ret = vmgexit_hv_doorbell_page(ghcb,
+ SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
+ pa);
+ if (ret != ES_OK)
+ panic("SEV-SNP: failed to set up #HV doorbell page");
+}
+
static noinstr void __sev_put_ghcb(struct ghcb_state *state)
{
struct sev_es_runtime_data *data;
@@ -1282,6 +1454,7 @@ static void snp_register_per_cpu_ghcb(void)
ghcb = &data->ghcb_page;

snp_register_ghcb_early(__pa(ghcb));
+ sev_snp_setup_hv_doorbell_page(ghcb);
}

void setup_ghcb(void)
@@ -1321,6 +1494,11 @@ void setup_ghcb(void)
snp_register_ghcb_early(__pa(&boot_ghcb_page));
}

+int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
+{
+ return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
+}
+
#ifdef CONFIG_HOTPLUG_CPU
static void sev_es_ap_hlt_loop(void)
{
@@ -1394,6 +1572,7 @@ static void __init alloc_runtime_data(int cpu)
static void __init init_ghcb(int cpu)
{
struct sev_es_runtime_data *data;
+ struct sev_snp_runtime_data *snp_data;
int err;

data = per_cpu(runtime_data, cpu);
@@ -1405,6 +1584,19 @@ static void __init init_ghcb(int cpu)

memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));

+ snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
+ if (!snp_data)
+ panic("Can't allocate SEV-SNP runtime data");
+
+ err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
+ sizeof(snp_data->hv_doorbell_page));
+ if (err)
+ panic("Can't map #HV doorbell pages unencrypted");
+
+ memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
+
+ per_cpu(snp_runtime_data, cpu) = snp_data;
+
data->ghcb_active = false;
data->backup_ghcb_active = false;
}
@@ -2045,7 +2237,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)

static bool hv_raw_handle_exception(struct pt_regs *regs)
{
- return false;
+ /* Clear the no_further_signal bit */
+ sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
+
+ check_hv_pending(regs);
+
+ return true;
}

static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d29debec8134..1aa6cab2394b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1503,5 +1503,7 @@ void __init trap_init(void)
cpu_init_exception_handling();
/* Setup traps as cpu_init() might #GP */
idt_setup_traps();
+ sev_snp_init_hv_handling();
+
cpu_init();
}
--
2.25.1

2023-01-22 04:00:27

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 06/16] x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Vmbus post msg, synic event and message pages are shared
with hypervisor and so decrypt these pages in the sev-snp guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Fix error in the error code path and encrypt
pages correctly when decryption failure happens.
---
drivers/hv/hv.c | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 410e6c4e80d3..52edc54c8172 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -20,6 +20,7 @@
#include <linux/interrupt.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
+#include <linux/set_memory.h>
#include "hyperv_vmbus.h"

/* The one and only */
@@ -117,7 +118,7 @@ int hv_post_message(union hv_connection_id connection_id,

int hv_synic_alloc(void)
{
- int cpu;
+ int cpu, ret;
struct hv_per_cpu_context *hv_cpu;

/*
@@ -168,9 +169,39 @@ int hv_synic_alloc(void)
pr_err("Unable to allocate post msg page\n");
goto err;
}
+
+ if (hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ if (ret)
+ goto err;
+
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ if (ret)
+ goto err_decrypt_event_page;
+
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->post_msg_page, 1);
+ if (ret)
+ goto err_decrypt_msg_page;
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
+ }
}

return 0;
+
+err_decrypt_msg_page:
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+
+err_decrypt_event_page:
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+
err:
/*
* Any memory allocations that succeeded will be freed when
--
2.25.1

2023-01-22 04:00:36

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V3 05/16] clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Hyper-V tsc page is shared with hypervisor and it should
be decrypted in sev-snp enlightened guest when it's used.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Change the Subject line prefix
---
drivers/clocksource/hyperv_timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index c0cef92b12b8..44da16ca203c 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -365,7 +365,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
static union {
struct ms_hyperv_tsc_page page;
u8 reserved[PAGE_SIZE];
-} tsc_pg __aligned(PAGE_SIZE);
+} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);

static struct ms_hyperv_tsc_page *tsc_page = &tsc_pg.page;
static unsigned long tsc_pfn;
--
2.25.1

2023-01-23 15:30:34

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

On 1/21/23 20:46, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> The wakeup_secondary_cpu callback was populated with wakeup_
> cpu_via_vmgexit() which doesn't work for Hyper-V. Override it

An explanation as to why is doesn't work would be nice here.

> with Hyper-V specific hook which uses HVCALL_START_VIRTUAL_
> PROCESSOR hvcall to start AP with vmsa data structure.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC v2:
> * Add helper function to initialize segment
> * Fix some coding style
> ---
> arch/x86/include/asm/mshyperv.h | 2 +
> arch/x86/include/asm/sev.h | 13 ++++
> arch/x86/include/asm/svm.h | 47 +++++++++++++
> arch/x86/kernel/cpu/mshyperv.c | 112 ++++++++++++++++++++++++++++--
> include/asm-generic/hyperv-tlfs.h | 19 +++++
> 5 files changed, 189 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 7266d71d30d6..c69051eec0e1 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -203,6 +203,8 @@ struct irq_domain *hv_create_pci_msi_domain(void);
> int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> struct hv_interrupt_entry *entry);
> int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> void hv_ghcb_msr_write(u64 msr, u64 value);
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index ebc271bb6d8e..e34aaf730220 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> +union sev_rmp_adjust {
> + u64 as_uint64;
> + struct {
> + unsigned long target_vmpl : 8;
> + unsigned long enable_read : 1;
> + unsigned long enable_write : 1;
> + unsigned long enable_user_execute : 1;
> + unsigned long enable_kernel_execute : 1;
> + unsigned long reserved1 : 4;
> + unsigned long vmsa : 1;
> + };
> +};
> +
> /* SNP Guest message request */
> struct snp_req_data {
> unsigned long req_gpa;
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index cb1ee53ad3b1..f8b321a11ee4 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -336,6 +336,53 @@ struct vmcb_save_area {

Please don't update the vmcb_save_area, you should be using/updating the
sev_es_save_area structure for SNP.

> u64 last_excp_to;
> u8 reserved_0x298[72];
> u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
> + u8 reserved_7b[4];
> + u32 pkru;
> + u8 reserved_7a[20];
> + u64 reserved_8; /* rax already available at 0x01f8 */
> + u64 rcx;
> + u64 rdx;
> + u64 rbx;
> + u64 reserved_9; /* rsp already available at 0x01d8 */
> + u64 rbp;
> + u64 rsi;
> + u64 rdi;
> + u64 r8;
> + u64 r9;
> + u64 r10;
> + u64 r11;
> + u64 r12;
> + u64 r13;
> + u64 r14;
> + u64 r15;
> + u8 reserved_10[16];
> + u64 sw_exit_code;
> + u64 sw_exit_info_1;
> + u64 sw_exit_info_2;
> + u64 sw_scratch;
> + union {
> + u64 sev_features;
> + struct {
> + u64 sev_feature_snp : 1;
> + u64 sev_feature_vtom : 1;
> + u64 sev_feature_reflectvc : 1;
> + u64 sev_feature_restrict_injection : 1;
> + u64 sev_feature_alternate_injection : 1;
> + u64 sev_feature_full_debug : 1;
> + u64 sev_feature_reserved1 : 1;
> + u64 sev_feature_snpbtb_isolation : 1;
> + u64 sev_feature_resrved2 : 56;

For the bits definition, use:

u64 sev_feature_snp : 1,
sev_feature_vtom : 1,
sev_feature_reflectvc : 1,
...

Thanks,
Tom

> + };
> + };
> + u64 vintr_ctrl;
> + u64 guest_error_code;
> + u64 virtual_tom;
> + u64 tlb_id;
> + u64 pcpu_id;
> + u64 event_inject;
> + u64 xcr0;
> + u8 valid_bitmap[16];
> + u64 x87_state_gpa;
> } __packed;
>
> /* Save area definition for SEV-ES and SEV-SNP guests */
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 197c8f2ec4eb..9d547751a1a7 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -39,6 +39,13 @@
> #include <asm/realmode.h>
> #include <asm/e820/api.h>
>
> +/*
> + * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> + * to start AP in enlightened SEV guest.
> + */
> +#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT 0xffffffff
> +
> /* Is Linux running as the root partition? */
> bool hv_root_partition;
> struct ms_hyperv_info ms_hyperv;
> @@ -230,6 +237,94 @@ static void __init hv_smp_prepare_boot_cpu(void)
> #endif
> }
>
> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
> +static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +#define hv_populate_vmcb_seg(seg, gdtr_base) \
> +do { \
> + if (seg.selector) { \
> + seg.base = 0; \
> + seg.limit = HV_AP_SEGMENT_LIMIT; \
> + seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
> + seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
> + } \
> +} while (0) \
> +
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip)
> +{
> + struct vmcb_save_area *vmsa = (struct vmcb_save_area *)
> + __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + struct desc_ptr gdtr;
> + u64 ret, retry = 5;
> + struct hv_start_virtual_processor_input *start_vp_input;
> + union sev_rmp_adjust rmp_adjust;
> + unsigned long flags;
> +
> + native_store_gdt(&gdtr);
> +
> + vmsa->gdtr.base = gdtr.address;
> + vmsa->gdtr.limit = gdtr.size;
> +
> + asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> + hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
> +
> + asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> + hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> + hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> + hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
> +
> + vmsa->efer = native_read_msr(MSR_EFER);
> +
> + asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
> + asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
> + asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
> +
> + vmsa->xcr0 = 1;
> + vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
> + vmsa->rip = (u64)secondary_startup_64_no_verify;
> + vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
> +
> + vmsa->sev_feature_snp = 1;
> + vmsa->sev_feature_restrict_injection = 1;
> +
> + rmp_adjust.as_uint64 = 0;
> + rmp_adjust.target_vmpl = 1;
> + rmp_adjust.vmsa = 1;
> + ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
> + rmp_adjust.as_uint64);
> + if (ret != 0) {
> + pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
> + return ret;
> + }
> +
> + local_irq_save(flags);
> + start_vp_input =
> + (struct hv_start_virtual_processor_input *)ap_start_input_arg;
> + memset(start_vp_input, 0, sizeof(*start_vp_input));
> + start_vp_input->partitionid = -1;
> + start_vp_input->vpindex = cpu;
> + start_vp_input->targetvtl = ms_hyperv.vtl;
> + *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
> +
> + do {
> + ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
> + start_vp_input, NULL);
> + } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
> +
> + if (!hv_result_success(ret)) {
> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
> + goto done;
> + }
> +
> +done:
> + local_irq_restore(flags);
> + return ret;
> +}
> +
> static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> {
> #ifdef CONFIG_X86_64
> @@ -239,6 +334,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
>
> native_smp_prepare_cpus(max_cpus);
>
> + /*
> + * Override wakeup_secondary_cpu callback for SEV-SNP
> + * enlightened guest.
> + */
> + if (hv_isolation_type_en_snp())
> + apic->wakeup_secondary_cpu = hv_snp_boot_ap;
> +
> + if (!hv_root_partition)
> + return;
> +
> #ifdef CONFIG_X86_64
> for_each_present_cpu(i) {
> if (i == 0)
> @@ -475,8 +580,7 @@ static void __init ms_hyperv_init_platform(void)
>
> # ifdef CONFIG_SMP
> smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> - if (hv_root_partition)
> - smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> # endif
>
> /*
> @@ -501,7 +605,7 @@ static void __init ms_hyperv_init_platform(void)
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
>
> - if (isolation_type_en_snp()) {
> + if (hv_isolation_type_en_snp()) {
> /*
> * Hyper-V enlightened snp guest boots kernel
> * directly without bootloader and so roms,
> @@ -511,7 +615,7 @@ static void __init ms_hyperv_init_platform(void)
> x86_platform.legacy.rtc = 0;
> x86_platform.set_wallclock = set_rtc_noop;
> x86_platform.get_wallclock = get_rtc_noop;
> - x86_platform.legacy.reserve_bios_regions = x86_init_noop;
> + x86_platform.legacy.reserve_bios_regions = 0;
> x86_init.resources.probe_roms = x86_init_noop;
> x86_init.resources.reserve_resources = x86_init_noop;
> x86_init.mpparse.find_smp_config = x86_init_noop;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index c1cc3ec36ad5..3d7c67be9f56 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -148,6 +148,7 @@ union hv_reference_tsc_msr {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
> #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
> #define HVCALL_SEND_IPI 0x000b
> +#define HVCALL_ENABLE_VP_VTL 0x000f
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> @@ -165,6 +166,7 @@ union hv_reference_tsc_msr {
> #define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
> #define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
> #define HVCALL_RETARGET_INTERRUPT 0x007e
> +#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> @@ -219,6 +221,7 @@ enum HV_GENERIC_SET_FORMAT {
> #define HV_STATUS_INVALID_PORT_ID 17
> #define HV_STATUS_INVALID_CONNECTION_ID 18
> #define HV_STATUS_INSUFFICIENT_BUFFERS 19
> +#define HV_STATUS_TIME_OUT 0x78
>
> /*
> * The Hyper-V TimeRefCount register and the TSC
> @@ -778,6 +781,22 @@ struct hv_input_unmap_device_interrupt {
> struct hv_interrupt_entry interrupt_entry;
> } __packed;
>
> +struct hv_enable_vp_vtl_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];
> +} __packed;
> +
> +struct hv_start_virtual_processor_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];
> +} __packed;
> +
> #define HV_SOURCE_SHADOW_NONE 0x0
> #define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1
>

2023-01-31 17:59:06

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V3 06/16] x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Saturday, January 21, 2023 6:46 PM
>

As I comment on v2 of this patch, the Subject prefix should be
"Drivers: hv: vmbus:"

> Vmbus post msg, synic event and message pages are shared

s/Vmbus/VMBus/

We're trying to be consistent about the capitalization of "VMBus"
in comments and other text. :-)

> with hypervisor and so decrypt these pages in the sev-snp guest.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC V2:
> * Fix error in the error code path and encrypt
> pages correctly when decryption failure happens.
> ---
> drivers/hv/hv.c | 33 ++++++++++++++++++++++++++++++++-
> 1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 410e6c4e80d3..52edc54c8172 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -20,6 +20,7 @@
> #include <linux/interrupt.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/mshyperv.h>
> +#include <linux/set_memory.h>
> #include "hyperv_vmbus.h"
>
> /* The one and only */
> @@ -117,7 +118,7 @@ int hv_post_message(union hv_connection_id connection_id,
>
> int hv_synic_alloc(void)
> {
> - int cpu;
> + int cpu, ret;
> struct hv_per_cpu_context *hv_cpu;
>
> /*
> @@ -168,9 +169,39 @@ int hv_synic_alloc(void)
> pr_err("Unable to allocate post msg page\n");
> goto err;
> }
> +
> + if (hv_isolation_type_en_snp()) {
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + if (ret)
> + goto err;
> +
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + if (ret)
> + goto err_decrypt_event_page;
> +
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->post_msg_page, 1);
> + if (ret)
> + goto err_decrypt_msg_page;
> +
> + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> + memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
> + }

Having decrypted the pages here in hv_synic_alloc(), shouldn't
there be corresponding re-encryption in hv_synic_free()?

> }
>
> return 0;
> +
> +err_decrypt_msg_page:
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> +
> +err_decrypt_event_page:
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> +
> err:
> /*
> * Any memory allocations that succeeded will be freed when
> --
> 2.25.1


2023-01-31 18:35:08

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]> Sent: Saturday, January 21, 2023 6:46 PM
>
> The wakeup_secondary_cpu callback was populated with wakeup_
> cpu_via_vmgexit() which doesn't work for Hyper-V. Override it
> with Hyper-V specific hook which uses HVCALL_START_VIRTUAL_
> PROCESSOR hvcall to start AP with vmsa data structure.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC v2:
> * Add helper function to initialize segment
> * Fix some coding style
> ---
> arch/x86/include/asm/mshyperv.h | 2 +
> arch/x86/include/asm/sev.h | 13 ++++
> arch/x86/include/asm/svm.h | 47 +++++++++++++
> arch/x86/kernel/cpu/mshyperv.c | 112 ++++++++++++++++++++++++++++--
> include/asm-generic/hyperv-tlfs.h | 19 +++++
> 5 files changed, 189 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 7266d71d30d6..c69051eec0e1 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -203,6 +203,8 @@ struct irq_domain *hv_create_pci_msi_domain(void);
> int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> struct hv_interrupt_entry *entry);
> int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> void hv_ghcb_msr_write(u64 msr, u64 value);
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index ebc271bb6d8e..e34aaf730220 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> +union sev_rmp_adjust {
> + u64 as_uint64;
> + struct {
> + unsigned long target_vmpl : 8;
> + unsigned long enable_read : 1;
> + unsigned long enable_write : 1;
> + unsigned long enable_user_execute : 1;
> + unsigned long enable_kernel_execute : 1;
> + unsigned long reserved1 : 4;
> + unsigned long vmsa : 1;
> + };
> +};
> +
> /* SNP Guest message request */
> struct snp_req_data {
> unsigned long req_gpa;
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index cb1ee53ad3b1..f8b321a11ee4 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -336,6 +336,53 @@ struct vmcb_save_area {
> u64 last_excp_to;
> u8 reserved_0x298[72];
> u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
> + u8 reserved_7b[4];
> + u32 pkru;
> + u8 reserved_7a[20];
> + u64 reserved_8; /* rax already available at 0x01f8 */
> + u64 rcx;
> + u64 rdx;
> + u64 rbx;
> + u64 reserved_9; /* rsp already available at 0x01d8 */
> + u64 rbp;
> + u64 rsi;
> + u64 rdi;
> + u64 r8;
> + u64 r9;
> + u64 r10;
> + u64 r11;
> + u64 r12;
> + u64 r13;
> + u64 r14;
> + u64 r15;
> + u8 reserved_10[16];
> + u64 sw_exit_code;
> + u64 sw_exit_info_1;
> + u64 sw_exit_info_2;
> + u64 sw_scratch;
> + union {
> + u64 sev_features;
> + struct {
> + u64 sev_feature_snp : 1;
> + u64 sev_feature_vtom : 1;
> + u64 sev_feature_reflectvc : 1;
> + u64 sev_feature_restrict_injection : 1;
> + u64 sev_feature_alternate_injection : 1;
> + u64 sev_feature_full_debug : 1;
> + u64 sev_feature_reserved1 : 1;
> + u64 sev_feature_snpbtb_isolation : 1;
> + u64 sev_feature_resrved2 : 56;
> + };
> + };
> + u64 vintr_ctrl;
> + u64 guest_error_code;
> + u64 virtual_tom;
> + u64 tlb_id;
> + u64 pcpu_id;
> + u64 event_inject;
> + u64 xcr0;
> + u8 valid_bitmap[16];
> + u64 x87_state_gpa;
> } __packed;
>
> /* Save area definition for SEV-ES and SEV-SNP guests */
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 197c8f2ec4eb..9d547751a1a7 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -39,6 +39,13 @@
> #include <asm/realmode.h>
> #include <asm/e820/api.h>
>
> +/*
> + * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> + * to start AP in enlightened SEV guest.
> + */
> +#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT 0xffffffff

If these values are defined by Hyper-V, they should probably go in
hyperv-tlfs.h.

> +
> /* Is Linux running as the root partition? */
> bool hv_root_partition;
> struct ms_hyperv_info ms_hyperv;
> @@ -230,6 +237,94 @@ static void __init hv_smp_prepare_boot_cpu(void)
> #endif
> }
>
> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
> +static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +#define hv_populate_vmcb_seg(seg, gdtr_base) \
> +do { \
> + if (seg.selector) { \
> + seg.base = 0; \
> + seg.limit = HV_AP_SEGMENT_LIMIT; \
> + seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
> + seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
> + } \
> +} while (0) \
> +
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip)
> +{
> + struct vmcb_save_area *vmsa = (struct vmcb_save_area *)
> + __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + struct desc_ptr gdtr;
> + u64 ret, retry = 5;
> + struct hv_start_virtual_processor_input *start_vp_input;
> + union sev_rmp_adjust rmp_adjust;
> + unsigned long flags;
> +
> + native_store_gdt(&gdtr);
> +
> + vmsa->gdtr.base = gdtr.address;
> + vmsa->gdtr.limit = gdtr.size;
> +
> + asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> + hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
> +
> + asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> + hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> + hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> + hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
> +
> + vmsa->efer = native_read_msr(MSR_EFER);
> +
> + asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
> + asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
> + asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
> +
> + vmsa->xcr0 = 1;
> + vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
> + vmsa->rip = (u64)secondary_startup_64_no_verify;
> + vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
> +
> + vmsa->sev_feature_snp = 1;
> + vmsa->sev_feature_restrict_injection = 1;
> +
> + rmp_adjust.as_uint64 = 0;
> + rmp_adjust.target_vmpl = 1;
> + rmp_adjust.vmsa = 1;
> + ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
> + rmp_adjust.as_uint64);
> + if (ret != 0) {
> + pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
> + return ret;
> + }
> +
> + local_irq_save(flags);
> + start_vp_input =
> + (struct hv_start_virtual_processor_input *)ap_start_input_arg;
> + memset(start_vp_input, 0, sizeof(*start_vp_input));
> + start_vp_input->partitionid = -1;
> + start_vp_input->vpindex = cpu;
> + start_vp_input->targetvtl = ms_hyperv.vtl;
> + *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
> +
> + do {
> + ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
> + start_vp_input, NULL);
> + } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
> +
> + if (!hv_result_success(ret)) {
> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
> + goto done;
> + }
> +
> +done:
> + local_irq_restore(flags);
> + return ret;
> +}
> +

Like a comment in an earlier patch, I'm wondering if the bulk of
this code could move to ivm.c, to avoid overloading mshyperv.c.

> static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> {
> #ifdef CONFIG_X86_64
> @@ -239,6 +334,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
>
> native_smp_prepare_cpus(max_cpus);
>
> + /*
> + * Override wakeup_secondary_cpu callback for SEV-SNP
> + * enlightened guest.
> + */
> + if (hv_isolation_type_en_snp())
> + apic->wakeup_secondary_cpu = hv_snp_boot_ap;
> +
> + if (!hv_root_partition)
> + return;
> +
> #ifdef CONFIG_X86_64
> for_each_present_cpu(i) {
> if (i == 0)
> @@ -475,8 +580,7 @@ static void __init ms_hyperv_init_platform(void)
>
> # ifdef CONFIG_SMP
> smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> - if (hv_root_partition)
> - smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> # endif
>
> /*
> @@ -501,7 +605,7 @@ static void __init ms_hyperv_init_platform(void)
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
>
> - if (isolation_type_en_snp()) {
> + if (hv_isolation_type_en_snp()) {

Also a bug fix to an earlier patch in this series.

> /*
> * Hyper-V enlightened snp guest boots kernel
> * directly without bootloader and so roms,
> @@ -511,7 +615,7 @@ static void __init ms_hyperv_init_platform(void)
> x86_platform.legacy.rtc = 0;
> x86_platform.set_wallclock = set_rtc_noop;
> x86_platform.get_wallclock = get_rtc_noop;
> - x86_platform.legacy.reserve_bios_regions = x86_init_noop;
> + x86_platform.legacy.reserve_bios_regions = 0;

This looks like a bug fix to Patch 8 of the series. It should be fixed
in patch 8.

> x86_init.resources.probe_roms = x86_init_noop;
> x86_init.resources.reserve_resources = x86_init_noop;
> x86_init.mpparse.find_smp_config = x86_init_noop;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index c1cc3ec36ad5..3d7c67be9f56 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -148,6 +148,7 @@ union hv_reference_tsc_msr {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
> #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
> #define HVCALL_SEND_IPI 0x000b
> +#define HVCALL_ENABLE_VP_VTL 0x000f
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> @@ -165,6 +166,7 @@ union hv_reference_tsc_msr {
> #define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
> #define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
> #define HVCALL_RETARGET_INTERRUPT 0x007e
> +#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> @@ -219,6 +221,7 @@ enum HV_GENERIC_SET_FORMAT {
> #define HV_STATUS_INVALID_PORT_ID 17
> #define HV_STATUS_INVALID_CONNECTION_ID 18
> #define HV_STATUS_INSUFFICIENT_BUFFERS 19
> +#define HV_STATUS_TIME_OUT 0x78
>
> /*
> * The Hyper-V TimeRefCount register and the TSC
> @@ -778,6 +781,22 @@ struct hv_input_unmap_device_interrupt {
> struct hv_interrupt_entry interrupt_entry;
> } __packed;
>
> +struct hv_enable_vp_vtl_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];
> +} __packed;
> +
> +struct hv_start_virtual_processor_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];
> +} __packed;
> +
> #define HV_SOURCE_SHADOW_NONE 0x0
> #define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1
>
> --
> 2.25.1


2023-02-02 23:01:12

by Zhi Wang

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

On Sat, 21 Jan 2023 21:45:50 -0500
Tianyu Lan <[email protected]> wrote:

1) I am thinking if it is a good time to organize a common code path for
enlightened VM on hyper-v.

Wouldn't it be better to have a common flag for enlightened VM?
Like bool hv_isolation_type_enlightened()

Many of the decryption of the post msg page... are also required
in the enlightened TDX guest, they are not AMD-specific.

Then in the "TDX guest on hyper-V" patch set, Dexuan can save some LOCs instead
of ending up with if (hv_isolation_type_en_snp() ||
hv_isolation_type_en_tdx())...

2) It seems the AMD SEV-SNP enlightened guest on hyper-V is implemented as
CC_VENDOR_AMD, while TDX enlightened guest is still implemented as
CC_VENDOR_HYPERV. I am curious about the reason.

> From: Tianyu Lan <[email protected]>
>
> This patchset is to add AMD sev-snp enlightened guest
> support on hyperv. Hyperv uses Linux direct boot mode
> to boot up Linux kernel and so it needs to pvalidate
> system memory by itself.
>
> In hyperv case, there is no boot loader and so cc blob
> is prepared by hypervisor. In this series, hypervisor
> set the cc blob address directly into boot parameter
> of Linux kernel. If the magic number on cc blob address
> is valid, kernel will read cc blob.
>
> Shared memory between guests and hypervisor should be
> decrypted and zero memory after decrypt memory. The data
> in the target address. It maybe smearedto avoid smearing
> data.
>
> Introduce #HV exception support in AMD sev snp code and
> #HV handler.
>
> Change since v2:
> - Remove validate kernel memory code at boot stage
> - Split #HV page patch into two parts
> - Remove HV-APIC change due to enable x2apic from
> host side
> - Rework vmbus code to handle error of decrypt page
> - Spilt memory and cpu initialization patch.
>
> Change since v1:
> - Remove boot param changes for cc blob address and
> use setup head to pass cc blob info
> - Remove unnessary WARN and BUG check
> - Add system vector table map in the #HV exception
> - Fix interrupt exit issue when use #HV exception
>
> Ashish Kalra (2):
> x86/sev: optimize system vector processing invoked from #HV exception
> x86/sev: Fix interrupt exit code paths from #HV exception
>
> Tianyu Lan (14):
> x86/hyperv: Add sev-snp enlightened guest specific config
> x86/hyperv: Decrypt hv vp assist page in sev-snp enlightened guest
> x86/hyperv: Set Virtual Trust Level in vmbus init message
> x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
> enlightened guest
> clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp
> enlightened guest
> x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest
> drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
> enlightened guest
> x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
> x86/hyperv: SEV-SNP enlightened guest don't support legacy rtc
> x86/hyperv: Add smp support for sev-snp guest
> x86/hyperv: Add hyperv-specific hadling for VMMCALL under SEV-ES
> x86/sev: Add a #HV exception handler
> x86/sev: Add Check of #HV event in path
> x86/sev: Initialize #HV doorbell and handle interrupt requests
>
> arch/x86/entry/entry_64.S | 82 ++++++
> arch/x86/hyperv/hv_init.c | 43 +++
> arch/x86/hyperv/ivm.c | 10 +
> arch/x86/include/asm/cpu_entry_area.h | 6 +
> arch/x86/include/asm/hyperv-tlfs.h | 4 +
> arch/x86/include/asm/idtentry.h | 105 ++++++-
> arch/x86/include/asm/irqflags.h | 10 +
> arch/x86/include/asm/mem_encrypt.h | 2 +
> arch/x86/include/asm/mshyperv.h | 56 +++-
> arch/x86/include/asm/msr-index.h | 6 +
> arch/x86/include/asm/page_64_types.h | 1 +
> arch/x86/include/asm/sev.h | 13 +
> arch/x86/include/asm/svm.h | 59 +++-
> arch/x86/include/asm/trapnr.h | 1 +
> arch/x86/include/asm/traps.h | 1 +
> arch/x86/include/asm/x86_init.h | 2 +
> arch/x86/include/uapi/asm/svm.h | 4 +
> arch/x86/kernel/cpu/common.c | 1 +
> arch/x86/kernel/cpu/mshyperv.c | 228 ++++++++++++++-
> arch/x86/kernel/dumpstack_64.c | 9 +-
> arch/x86/kernel/idt.c | 1 +
> arch/x86/kernel/sev.c | 395 ++++++++++++++++++++++----
> arch/x86/kernel/traps.c | 42 +++
> arch/x86/kernel/vmlinux.lds.S | 7 +
> arch/x86/kernel/x86_init.c | 4 +-
> arch/x86/mm/cpu_entry_area.c | 2 +
> drivers/clocksource/hyperv_timer.c | 2 +-
> drivers/hv/connection.c | 1 +
> drivers/hv/hv.c | 33 ++-
> drivers/hv/hv_common.c | 26 +-
> include/asm-generic/hyperv-tlfs.h | 19 ++
> include/asm-generic/mshyperv.h | 2 +
> include/linux/hyperv.h | 4 +-
> 33 files changed, 1102 insertions(+), 79 deletions(-)
>


2023-02-02 23:20:44

by Zhi Wang

[permalink] [raw]
Subject: Re: [RFC PATCH V3 16/16] x86/sev: Fix interrupt exit code paths from #HV exception

On Sat, 21 Jan 2023 21:46:06 -0500
Tianyu Lan <[email protected]> wrote:

> From: Ashish Kalra <[email protected]>
>
> Add checks in interrupt exit code paths in case of returns
> to user mode to check if currently executing the #HV handler
> then don't follow the irqentry_exit_to_user_mode path as
> that can potentially cause the #HV handler to be
> preempted and rescheduled on another CPU. Rescheduled #HV
> handler on another cpu will cause interrupts to be handled
> on a different cpu than the injected one, causing
> invalid EOIs and missed/lost guest interrupts and
> corresponding hangs and/or per-cpu IRQs handled on
> non-intended cpu.
>

Why doesn't this problem happen in #VC handler? As #VC handler doesn't have
this special handling.

> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 30 +++++++++++++++
> 2 files changed, 96 insertions(+)
>
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 652fea10d377..45b47132be7c 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -13,6 +13,10 @@
>
> #include <asm/irq_stack.h>
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
> +#endif
> +
> /**
> * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
> * No error code pushed by hardware
> @@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
> #define DECLARE_IDTENTRY_IRQ(vector, func) \
> DECLARE_IDTENTRY_ERRORCODE(vector, func)
>
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
> /**
> * DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
> * @func: Function name of the entry point
> @@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
> } \
> \
> static noinline void __##func(struct pt_regs *regs, u32 vector)
> +#else
> +
> +#define DEFINE_IDTENTRY_IRQ(func) \
> +static void __##func(struct pt_regs *regs, u32 vector); \
> + \
> +__visible noinstr void func(struct pt_regs *regs, \
> + unsigned long error_code) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + u32 vector = (u32)(u8)error_code; \
> + \
> + instrumentation_begin(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + run_irq_on_irqstack_cond(__##func, regs, vector); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static noinline void __##func(struct pt_regs *regs, u32 vector)
> +#endif
>
> /**
> * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
> @@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
> #define DECLARE_IDTENTRY_SYSVEC(vector, func) \
> DECLARE_IDTENTRY(vector, func)
>
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
> /**
> * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
> * @func: Function name of the entry point
> @@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
> } \
> \
> static noinline void __##func(struct pt_regs *regs)
> +#else
> +
> +#define DEFINE_IDTENTRY_SYSVEC(func) \
> +static void __##func(struct pt_regs *regs); \
> + \
> +__visible noinstr void func(struct pt_regs *regs) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + \
> + instrumentation_begin(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + run_sysvec_on_irqstack_cond(__##func, regs); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static noinline void __##func(struct pt_regs *regs)
> +#endif
> +
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
>
> /**
> * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
> @@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
> } \
> \
> static __always_inline void __##func(struct pt_regs *regs)
> +#else
> +
> +#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
> +static __always_inline void __##func(struct pt_regs *regs); \
> + \
> +__visible noinstr void func(struct pt_regs *regs) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + \
> + instrumentation_begin(); \
> + __irq_enter_raw(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + __##func(regs); \
> + __irq_exit_raw(); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static __always_inline void __##func(struct pt_regs *regs)
> +#endif
>
> /**
> * DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index b1a98c2a52f8..23f15e95838b 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -147,6 +147,10 @@ struct sev_hv_doorbell_page {
>
> struct sev_snp_runtime_data {
> struct sev_hv_doorbell_page hv_doorbell_page;
> + /*
> + * Indication that we are currently handling #HV events.
> + */
> + bool hv_handling_events;
> };
>
> static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> @@ -200,6 +204,8 @@ static void do_exc_hv(struct pt_regs *regs)
> union hv_pending_events pending_events;
> u8 vector;
>
> + this_cpu_read(snp_runtime_data)->hv_handling_events = true;
> +
> while (sev_hv_pending()) {
> pending_events.events = xchg(
> &sev_snp_current_doorbell_page()->pending_events.events,
> @@ -234,6 +240,8 @@ static void do_exc_hv(struct pt_regs *regs)
> common_interrupt(regs, pending_events.vector);
> }
> }
> +
> + this_cpu_read(snp_runtime_data)->hv_handling_events = false;
> }
>
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> @@ -2529,3 +2537,25 @@ static int __init snp_init_platform_device(void)
> return 0;
> }
> device_initcall(snp_init_platform_device);
> +
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
> +{
> + /*
> + * Check whether this returns to user mode, if so and if
> + * we are currently executing the #HV handler then we don't
> + * want to follow the irqentry_exit_to_user_mode path as
> + * that can potentially cause the #HV handler to be
> + * preempted and rescheduled on another CPU. Rescheduled #HV
> + * handler on another cpu will cause interrupts to be handled
> + * on a different cpu than the injected one, causing
> + * invalid EOIs and missed/lost guest interrupts and
> + * corresponding hangs and/or per-cpu IRQs handled on
> + * non-intended cpu.
> + */
> + if (user_mode(regs) &&
> + this_cpu_read(snp_runtime_data)->hv_handling_events)
> + return;
> +
> + /* follow normal interrupt return/exit path */
> + irqentry_exit(regs, state);
> +}


2023-02-03 04:04:25

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Zhi Wang <[email protected]> Sent: Thursday, February 2, 2023 3:01 PM
>
> On Sat, 21 Jan 2023 21:45:50 -0500
> Tianyu Lan <[email protected]> wrote:
>
> 1) I am thinking if it is a good time to organize a common code path for
> enlightened VM on hyper-v.
>
> Wouldn't it be better to have a common flag for enlightened VM?
> Like bool hv_isolation_type_enlightened()
>
> Many of the decryption of the post msg page... are also required
> in the enlightened TDX guest, they are not AMD-specific.
>
> Then in the "TDX guest on hyper-V" patch set, Dexuan can save some LOCs instead
> of ending up with if (hv_isolation_type_en_snp() ||
> hv_isolation_type_en_tdx())...

I've had the same thought, and have briefly discussed the
idea with Dexuan and Tianyu. But there's some code coming
for a non-confidential VM scenario that hasn't yet been posted
upstream, and it adds yet more cases to consider. We were
thinking to wait a bit until all the cases were evident, and then
find the right simplification. If we try to do the simplification
now, we may need to do it again.

>
> 2) It seems the AMD SEV-SNP enlightened guest on hyper-V is implemented as
> CC_VENDOR_AMD, while TDX enlightened guest is still implemented as
> CC_VENDOR_HYPERV. I am curious about the reason.

Patch set [1] makes CC_VENDOR_HYPERV go away. Once that
happens, the TDX enlightened guest uses CC_VENDOR_INTEL.

Michael

[1] https://lore.kernel.org/linux-hyperv/[email protected]/T/#m4639d697e9a6619edfcdceffc1b0613a9016f601



>
> > From: Tianyu Lan <[email protected]>
> >
> > This patchset is to add AMD sev-snp enlightened guest
> > support on hyperv. Hyperv uses Linux direct boot mode
> > to boot up Linux kernel and so it needs to pvalidate
> > system memory by itself.
> >
> > In hyperv case, there is no boot loader and so cc blob
> > is prepared by hypervisor. In this series, hypervisor
> > set the cc blob address directly into boot parameter
> > of Linux kernel. If the magic number on cc blob address
> > is valid, kernel will read cc blob.
> >
> > Shared memory between guests and hypervisor should be
> > decrypted and zero memory after decrypt memory. The data
> > in the target address. It maybe smearedto avoid smearing
> > data.
> >
> > Introduce #HV exception support in AMD sev snp code and
> > #HV handler.
> >
> > Change since v2:
> > - Remove validate kernel memory code at boot stage
> > - Split #HV page patch into two parts
> > - Remove HV-APIC change due to enable x2apic from
> > host side
> > - Rework vmbus code to handle error of decrypt page
> > - Spilt memory and cpu initialization patch.
> >
> > Change since v1:
> > - Remove boot param changes for cc blob address and
> > use setup head to pass cc blob info
> > - Remove unnessary WARN and BUG check
> > - Add system vector table map in the #HV exception
> > - Fix interrupt exit issue when use #HV exception
> >
> > Ashish Kalra (2):
> > x86/sev: optimize system vector processing invoked from #HV exception
> > x86/sev: Fix interrupt exit code paths from #HV exception
> >
> > Tianyu Lan (14):
> > x86/hyperv: Add sev-snp enlightened guest specific config
> > x86/hyperv: Decrypt hv vp assist page in sev-snp enlightened guest
> > x86/hyperv: Set Virtual Trust Level in vmbus init message
> > x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
> > enlightened guest
> > clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp
> > enlightened guest
> > x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest
> > drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
> > enlightened guest
> > x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
> > x86/hyperv: SEV-SNP enlightened guest don't support legacy rtc
> > x86/hyperv: Add smp support for sev-snp guest
> > x86/hyperv: Add hyperv-specific hadling for VMMCALL under SEV-ES
> > x86/sev: Add a #HV exception handler
> > x86/sev: Add Check of #HV event in path
> > x86/sev: Initialize #HV doorbell and handle interrupt requests
> >
> > arch/x86/entry/entry_64.S | 82 ++++++
> > arch/x86/hyperv/hv_init.c | 43 +++
> > arch/x86/hyperv/ivm.c | 10 +
> > arch/x86/include/asm/cpu_entry_area.h | 6 +
> > arch/x86/include/asm/hyperv-tlfs.h | 4 +
> > arch/x86/include/asm/idtentry.h | 105 ++++++-
> > arch/x86/include/asm/irqflags.h | 10 +
> > arch/x86/include/asm/mem_encrypt.h | 2 +
> > arch/x86/include/asm/mshyperv.h | 56 +++-
> > arch/x86/include/asm/msr-index.h | 6 +
> > arch/x86/include/asm/page_64_types.h | 1 +
> > arch/x86/include/asm/sev.h | 13 +
> > arch/x86/include/asm/svm.h | 59 +++-
> > arch/x86/include/asm/trapnr.h | 1 +
> > arch/x86/include/asm/traps.h | 1 +
> > arch/x86/include/asm/x86_init.h | 2 +
> > arch/x86/include/uapi/asm/svm.h | 4 +
> > arch/x86/kernel/cpu/common.c | 1 +
> > arch/x86/kernel/cpu/mshyperv.c | 228 ++++++++++++++-
> > arch/x86/kernel/dumpstack_64.c | 9 +-
> > arch/x86/kernel/idt.c | 1 +
> > arch/x86/kernel/sev.c | 395 ++++++++++++++++++++++----
> > arch/x86/kernel/traps.c | 42 +++
> > arch/x86/kernel/vmlinux.lds.S | 7 +
> > arch/x86/kernel/x86_init.c | 4 +-
> > arch/x86/mm/cpu_entry_area.c | 2 +
> > drivers/clocksource/hyperv_timer.c | 2 +-
> > drivers/hv/connection.c | 1 +
> > drivers/hv/hv.c | 33 ++-
> > drivers/hv/hv_common.c | 26 +-
> > include/asm-generic/hyperv-tlfs.h | 19 ++
> > include/asm-generic/mshyperv.h | 2 +
> > include/linux/hyperv.h | 4 +-
> > 33 files changed, 1102 insertions(+), 79 deletions(-)
> >


2023-02-03 04:11:52

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 06/16] x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest

On 2/1/2023 1:58 AM, Michael Kelley (LINUX) wrote:
>> +
>> + ret = set_memory_decrypted((unsigned long)
>> + hv_cpu->post_msg_page, 1);
>> + if (ret)
>> + goto err_decrypt_msg_page;
>> +
>> + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
>> + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
>> + memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
>> + }
> Having decrypted the pages here in hv_synic_alloc(), shouldn't
> there be corresponding re-encryption in hv_synic_free()?
>

Yes, I ignore in this version and will fix it.

Thanks.

2023-02-03 06:11:44

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

On 2/1/2023 2:34 AM, Michael Kelley (LINUX) wrote:
>> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
>> + goto done;
>> + }
>> +
>> +done:
>> + local_irq_restore(flags);
>> + return ret;
>> +}
>> +
> Like a comment in an earlier patch, I'm wondering if the bulk of
> this code could move to ivm.c, to avoid overloading mshyperv.c.

Sure. Will update in the next version.
>

2023-02-03 07:01:00

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

On 1/23/2023 11:30 PM, Tom Lendacky wrote:
> On 1/21/23 20:46, Tianyu Lan wrote:
>> From: Tianyu Lan <[email protected]>
>>
>> The wakeup_secondary_cpu callback was populated with wakeup_
>> cpu_via_vmgexit() which doesn't work for Hyper-V. Override it
>
> An explanation as to why is doesn't work would be nice here.

Hi Thomas:
Thanks for your review. Good idea. Will update.

>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index cb1ee53ad3b1..f8b321a11ee4 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -336,6 +336,53 @@ struct vmcb_save_area {
>
> Please don't update the vmcb_save_area, you should be using/updating the
> sev_es_save_area structure for SNP.

OK. Will update in the next version.

>>             u64 sev_feature_restrict_injection    : 1;
>> +            u64 sev_feature_alternate_injection    : 1;
>> +            u64 sev_feature_full_debug        : 1;
>> +            u64 sev_feature_reserved1        : 1;
>> +            u64 sev_feature_snpbtb_isolation    : 1;
>> +            u64 sev_feature_resrved2        : 56;
>
> For the bits definition, use:
>
>             u64 sev_feature_snp            : 1,
>                 sev_feature_vtom            : 1,
>                 sev_feature_reflectvc        : 1,
>                 ...
>

Good suggestion. Thanks.

2023-02-06 20:11:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

On Fri, Feb 03, 2023 at 03:00:44PM +0800, Tianyu Lan wrote:
> > For the bits definition, use:
> >
> >             u64 sev_feature_snp            : 1,
> >                 sev_feature_vtom            : 1,
> >                 sev_feature_reflectvc        : 1,
> >                 ...
> >
>
> Good suggestion. Thanks.

Actually, I'd prefer if you used a named union and drop all this
"sev_feature_" prefixes everywhere:

union {
struct {
u64 snp : 1;
u64 vtom : 1;
u64 reflectvc : 1;
u64 restrict_injection : 1;
u64 alternate_injection : 1;
u64 full_debug : 1;
u64 reserved1 : 1;
u64 snpbtb_isolation : 1;
u64 resrved2 : 56;
};
u64 val;
} sev_features;



so that you can do in code:

struct sev_es_save_area *sev;

...

sev->sev_features.snp = ...

and so on.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-07 13:50:04

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 10/16] x86/hyperv: Add smp support for sev-snp guest

On 2/7/2023 4:11 AM, Borislav Petkov wrote:
> On Fri, Feb 03, 2023 at 03:00:44PM +0800, Tianyu Lan wrote:
>>> For the bits definition, use:
>>>
>>>             u64 sev_feature_snp            : 1,
>>>                 sev_feature_vtom            : 1,
>>>                 sev_feature_reflectvc        : 1,
>>>                 ...
>>>
>>
>> Good suggestion. Thanks.
>
> Actually, I'd prefer if you used a named union and drop all this
> "sev_feature_" prefixes everywhere:
>
> union {
> struct {
> u64 snp : 1;
> u64 vtom : 1;
> u64 reflectvc : 1;
> u64 restrict_injection : 1;
> u64 alternate_injection : 1;
> u64 full_debug : 1;
> u64 reserved1 : 1;
> u64 snpbtb_isolation : 1;
> u64 resrved2 : 56;
> };
> u64 val;
> } sev_features;
>
>
>
> so that you can do in code:
>
> struct sev_es_save_area *sev;
>
> ...
>
> sev->sev_features.snp = ...
>
> and so on.

Hi Boris:
Thanks a lot for your suggestion. Will update.

2023-02-08 23:53:43

by Ashish Kalra

[permalink] [raw]
Subject: Re: [RFC PATCH V3 16/16] x86/sev: Fix interrupt exit code paths from #HV exception

On 2/2/2023 5:20 PM, Zhi Wang wrote:
> On Sat, 21 Jan 2023 21:46:06 -0500
> Tianyu Lan <[email protected]> wrote:
>
>> From: Ashish Kalra <[email protected]>
>>
>> Add checks in interrupt exit code paths in case of returns
>> to user mode to check if currently executing the #HV handler
>> then don't follow the irqentry_exit_to_user_mode path as
>> that can potentially cause the #HV handler to be
>> preempted and rescheduled on another CPU. Rescheduled #HV
>> handler on another cpu will cause interrupts to be handled
>> on a different cpu than the injected one, causing
>> invalid EOIs and missed/lost guest interrupts and
>> corresponding hangs and/or per-cpu IRQs handled on
>> non-intended cpu.
>>
>
> Why doesn't this problem happen in #VC handler? As #VC handler doesn't have
> this special handling.
>

Because the #VC handler does not invoke common_interrupt() handler to do
IRQ processing. Doing IRQ handling is specific to #HV exception handler
as all guest interrupt handling is invoked from #HV exception handler
once restricted interrupt injection support is enabled.

Thanks,
Ashish

>> Signed-off-by: Ashish Kalra <[email protected]>
>> ---
>> arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
>> arch/x86/kernel/sev.c | 30 +++++++++++++++
>> 2 files changed, 96 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
>> index 652fea10d377..45b47132be7c 100644
>> --- a/arch/x86/include/asm/idtentry.h
>> +++ b/arch/x86/include/asm/idtentry.h
>> @@ -13,6 +13,10 @@
>>
>> #include <asm/irq_stack.h>
>>
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
>> +#endif
>> +
>> /**
>> * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
>> * No error code pushed by hardware
>> @@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
>> #define DECLARE_IDTENTRY_IRQ(vector, func) \
>> DECLARE_IDTENTRY_ERRORCODE(vector, func)
>>
>> +#ifndef CONFIG_AMD_MEM_ENCRYPT
>> /**
>> * DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
>> * @func: Function name of the entry point
>> @@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
>> } \
>> \
>> static noinline void __##func(struct pt_regs *regs, u32 vector)
>> +#else
>> +
>> +#define DEFINE_IDTENTRY_IRQ(func) \
>> +static void __##func(struct pt_regs *regs, u32 vector); \
>> + \
>> +__visible noinstr void func(struct pt_regs *regs, \
>> + unsigned long error_code) \
>> +{ \
>> + irqentry_state_t state = irqentry_enter(regs); \
>> + u32 vector = (u32)(u8)error_code; \
>> + \
>> + instrumentation_begin(); \
>> + kvm_set_cpu_l1tf_flush_l1d(); \
>> + run_irq_on_irqstack_cond(__##func, regs, vector); \
>> + instrumentation_end(); \
>> + irqentry_exit_hv_cond(regs, state); \
>> +} \
>> + \
>> +static noinline void __##func(struct pt_regs *regs, u32 vector)
>> +#endif
>>
>> /**
>> * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
>> @@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
>> #define DECLARE_IDTENTRY_SYSVEC(vector, func) \
>> DECLARE_IDTENTRY(vector, func)
>>
>> +#ifndef CONFIG_AMD_MEM_ENCRYPT
>> /**
>> * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
>> * @func: Function name of the entry point
>> @@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
>> } \
>> \
>> static noinline void __##func(struct pt_regs *regs)
>> +#else
>> +
>> +#define DEFINE_IDTENTRY_SYSVEC(func) \
>> +static void __##func(struct pt_regs *regs); \
>> + \
>> +__visible noinstr void func(struct pt_regs *regs) \
>> +{ \
>> + irqentry_state_t state = irqentry_enter(regs); \
>> + \
>> + instrumentation_begin(); \
>> + kvm_set_cpu_l1tf_flush_l1d(); \
>> + run_sysvec_on_irqstack_cond(__##func, regs); \
>> + instrumentation_end(); \
>> + irqentry_exit_hv_cond(regs, state); \
>> +} \
>> + \
>> +static noinline void __##func(struct pt_regs *regs)
>> +#endif
>> +
>> +#ifndef CONFIG_AMD_MEM_ENCRYPT
>>
>> /**
>> * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
>> @@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
>> } \
>> \
>> static __always_inline void __##func(struct pt_regs *regs)
>> +#else
>> +
>> +#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
>> +static __always_inline void __##func(struct pt_regs *regs); \
>> + \
>> +__visible noinstr void func(struct pt_regs *regs) \
>> +{ \
>> + irqentry_state_t state = irqentry_enter(regs); \
>> + \
>> + instrumentation_begin(); \
>> + __irq_enter_raw(); \
>> + kvm_set_cpu_l1tf_flush_l1d(); \
>> + __##func(regs); \
>> + __irq_exit_raw(); \
>> + instrumentation_end(); \
>> + irqentry_exit_hv_cond(regs, state); \
>> +} \
>> + \
>> +static __always_inline void __##func(struct pt_regs *regs)
>> +#endif
>>
>> /**
>> * DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index b1a98c2a52f8..23f15e95838b 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -147,6 +147,10 @@ struct sev_hv_doorbell_page {
>>
>> struct sev_snp_runtime_data {
>> struct sev_hv_doorbell_page hv_doorbell_page;
>> + /*
>> + * Indication that we are currently handling #HV events.
>> + */
>> + bool hv_handling_events;
>> };
>>
>> static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
>> @@ -200,6 +204,8 @@ static void do_exc_hv(struct pt_regs *regs)
>> union hv_pending_events pending_events;
>> u8 vector;
>>
>> + this_cpu_read(snp_runtime_data)->hv_handling_events = true;
>> +
>> while (sev_hv_pending()) {
>> pending_events.events = xchg(
>> &sev_snp_current_doorbell_page()->pending_events.events,
>> @@ -234,6 +240,8 @@ static void do_exc_hv(struct pt_regs *regs)
>> common_interrupt(regs, pending_events.vector);
>> }
>> }
>> +
>> + this_cpu_read(snp_runtime_data)->hv_handling_events = false;
>> }
>>
>> static __always_inline bool on_vc_stack(struct pt_regs *regs)
>> @@ -2529,3 +2537,25 @@ static int __init snp_init_platform_device(void)
>> return 0;
>> }
>> device_initcall(snp_init_platform_device);
>> +
>> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
>> +{
>> + /*
>> + * Check whether this returns to user mode, if so and if
>> + * we are currently executing the #HV handler then we don't
>> + * want to follow the irqentry_exit_to_user_mode path as
>> + * that can potentially cause the #HV handler to be
>> + * preempted and rescheduled on another CPU. Rescheduled #HV
>> + * handler on another cpu will cause interrupts to be handled
>> + * on a different cpu than the injected one, causing
>> + * invalid EOIs and missed/lost guest interrupts and
>> + * corresponding hangs and/or per-cpu IRQs handled on
>> + * non-intended cpu.
>> + */
>> + if (user_mode(regs) &&
>> + this_cpu_read(snp_runtime_data)->hv_handling_events)
>> + return;
>> +
>> + /* follow normal interrupt return/exit path */
>> + irqentry_exit(regs, state);
>> +}
>

2023-02-09 11:50:56

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

Hi Tianyu,

> This patchset is to add AMD sev-snp enlightened guest
> support on hyperv. Hyperv uses Linux direct boot mode
> to boot up Linux kernel and so it needs to pvalidate
> system memory by itself.
>
> In hyperv case, there is no boot loader and so cc blob
> is prepared by hypervisor. In this series, hypervisor
> set the cc blob address directly into boot parameter
> of Linux kernel. If the magic number on cc blob address
> is valid, kernel will read cc blob.
>
> Shared memory between guests and hypervisor should be
> decrypted and zero memory after decrypt memory. The data
> in the target address. It maybe smearedto avoid smearing
> data.
>
> Introduce #HV exception support in AMD sev snp code and
> #HV handler.

I am interested to test the Linux guest #HV exception handling (patches
12-16 in this series) for the restricted interrupt injection with the
Linux/KVM host.

Do you have a git tree which or any base commit on which
I can use to apply these patches?

Thank You,
Pankaj

2023-02-16 14:46:55

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 14/16] x86/sev: Initialize #HV doorbell and handle interrupt requests

On 1/22/2023 3:46 AM, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Enable #HV exception to handle interrupt requests from hypervisor.
>
> Co-developed-by: Lendacky Thomas <[email protected]>
> Co-developed-by: Kalra Ashish <[email protected]>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 +
> arch/x86/include/asm/msr-index.h | 6 +
> arch/x86/include/asm/svm.h | 12 +-
> arch/x86/include/uapi/asm/svm.h | 4 +
> arch/x86/kernel/sev.c | 307 +++++++++++++++++++++++------
> arch/x86/kernel/traps.c | 2 +
> 6 files changed, 272 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 72ca90552b6a..7264ca5f5b2d 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> void __init mem_encrypt_free_decrypted_mem(void);
>
> void __init sev_es_init_vc_handling(void);
> +void __init sev_snp_init_hv_handling(void);
>
> #define __bss_decrypted __section(".bss..decrypted")
>
> @@ -72,6 +73,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline void sev_es_init_vc_handling(void) { }
> +static inline void sev_snp_init_hv_handling(void) { }
>
> static inline int __init
> early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 6a6e70e792a4..70af0ce5f2c4 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -562,11 +562,17 @@
> #define MSR_AMD64_SEV_ENABLED_BIT 0
> #define MSR_AMD64_SEV_ES_ENABLED_BIT 1
> #define MSR_AMD64_SEV_SNP_ENABLED_BIT 2
> +#define MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT 4
> +#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT 5
> +#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT 6
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> #define MSR_AMD64_SNP_VTOM_ENABLED BIT_ULL(3)
>
> +#define MSR_AMD64_SEV_REFLECTVC_ENABLED BIT_ULL(MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT)
> +#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT)


> +#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT)
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> /* AMD Collaborative Processor Performance Control MSRs */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index f8b321a11ee4..911c991fec78 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -568,12 +568,12 @@ static inline void __unused_size_checks(void)
>
> /* Check offsets of reserved fields */
>
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
>
> BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
> BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index f69c168391aa..85d6882262e7 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -115,6 +115,10 @@
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> #define SVM_VMGEXIT_AP_DESTROY 2
> +#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
> +#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
> +#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
> +#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
> #define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
> #define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index fe5e5e41433d..03d99fad9e76 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -122,6 +122,150 @@ struct sev_config {
>
> static struct sev_config sev_cfg __read_mostly;
>
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
> +static noinstr void __sev_put_ghcb(struct ghcb_state *state);
> +static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
> +
> +union hv_pending_events {
> + u16 events;
> + struct {
> + u8 vector;
> + u8 nmi : 1;
> + u8 mc : 1;
> + u8 reserved1 : 5;
> + u8 no_further_signal : 1;
> + };
> +};
> +
> +struct sev_hv_doorbell_page {
> + union hv_pending_events pending_events;
> + u8 no_eoi_required;
> + u8 reserved2[61];
> + u8 padding[4032];
> +};
> +
> +struct sev_snp_runtime_data {
> + struct sev_hv_doorbell_page hv_doorbell_page;
> +};
> +
> +static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> +
> +static inline u64 sev_es_rd_ghcb_msr(void)
> +{
> + return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> +}
> +
> +static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> +{
> + u32 low, high;
> +
> + low = (u32)(val);
> + high = (u32)(val >> 32);
> +
> + native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +}
> +
> +struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
> +{
> + return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
> +}
> +
> +static u8 sev_hv_pending(void)
> +{
> + return sev_snp_current_doorbell_page()->pending_events.events;
> +}
> +
> +static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
> +{
> + if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
> + return;
> +
> + BUG_ON(reg != APIC_EOI);
> + apic->write(reg, val);
> +}
> +
> +static void do_exc_hv(struct pt_regs *regs)
> +{
> + union hv_pending_events pending_events;
> + u8 vector;

Unused variable.

> +
> + while (sev_hv_pending()) {
> + pending_events.events = xchg(
> + &sev_snp_current_doorbell_page()->pending_events.events,
> + 0);
> +
> + if (pending_events.nmi)
> + exc_nmi(regs);
> +
> +#ifdef CONFIG_X86_MCE
> + if (pending_events.mc)
> + exc_machine_check(regs);
> +#endif
> +
> + if (!pending_events.vector)
> + return;
> +
> + if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
> + /* Exception vectors */
> + WARN(1, "exception shouldn't happen\n");
> + } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
> + sysvec_irq_move_cleanup(regs);
> + } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
> + WARN(1, "syscall shouldn't happen\n");
> + } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
> + switch (pending_events.vector) {
> +#if IS_ENABLED(CONFIG_HYPERV)
> + case HYPERV_STIMER0_VECTOR:
> + sysvec_hyperv_stimer0(regs);
> + break;
> + case HYPERVISOR_CALLBACK_VECTOR:
> + sysvec_hyperv_callback(regs);
> + break;
> +#endif
> +#ifdef CONFIG_SMP
> + case RESCHEDULE_VECTOR:
> + sysvec_reschedule_ipi(regs);
> + break;
> + case IRQ_MOVE_CLEANUP_VECTOR:
> + sysvec_irq_move_cleanup(regs);
> + break;
> + case REBOOT_VECTOR:
> + sysvec_reboot(regs);
> + break;
> + case CALL_FUNCTION_SINGLE_VECTOR:
> + sysvec_call_function_single(regs);
> + break;
> + case CALL_FUNCTION_VECTOR:
> + sysvec_call_function(regs);
> + break;
> +#endif
> +#ifdef CONFIG_X86_LOCAL_APIC
> + case ERROR_APIC_VECTOR:
> + sysvec_error_interrupt(regs);
> + break;
> + case SPURIOUS_APIC_VECTOR:
> + sysvec_spurious_apic_interrupt(regs);
> + break;
> + case LOCAL_TIMER_VECTOR:
> + sysvec_apic_timer_interrupt(regs);
> + break;
> + case X86_PLATFORM_IPI_VECTOR:
> + sysvec_x86_platform_ipi(regs);
> + break;
> +#endif
> + case 0x0:
> + break;
> + default:
> + panic("Unexpected vector %d\n", vector);
> + unreachable();
> + }
> + } else {
> + common_interrupt(regs, pending_events.vector);
> + }
> + }
> +}
> +
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> {
> unsigned long sp = regs->sp;
> @@ -179,11 +323,6 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
> }
>
> -static void do_exc_hv(struct pt_regs *regs)
> -{
> - /* Handle #HV exception. */
> -}
> -
> void check_hv_pending(struct pt_regs *regs)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> @@ -232,68 +371,38 @@ void noinstr __sev_es_ist_exit(void)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
> }
>
> -/*
> - * Nothing shall interrupt this code path while holding the per-CPU
> - * GHCB. The backup GHCB is only for NMIs interrupting this path.
> - *
> - * Callers must disable local interrupts around it.
> - */
> -static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +static bool sev_restricted_injection_enabled(void)
> +{
> + return sev_status & MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED;
> +}
> +
> +void __init sev_snp_init_hv_handling(void)
> {
> + struct sev_snp_runtime_data *snp_data;
unused variable.

> struct sev_es_runtime_data *data;
> + struct ghcb_state state;
> struct ghcb *ghcb;
> + unsigned long flags;
> + int cpu;

unused variable.

> + int err;

unused variable.
>
> WARN_ON(!irqs_disabled());
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
> + return;
>
> data = this_cpu_read(runtime_data);
> - ghcb = &data->ghcb_page;
>
> - if (unlikely(data->ghcb_active)) {
> - /* GHCB is already in use - save its contents */
> -
> - if (unlikely(data->backup_ghcb_active)) {
> - /*
> - * Backup-GHCB is also already in use. There is no way
> - * to continue here so just kill the machine. To make
> - * panic() work, mark GHCBs inactive so that messages
> - * can be printed out.
> - */
> - data->ghcb_active = false;
> - data->backup_ghcb_active = false;
> -
> - instrumentation_begin();
> - panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> - instrumentation_end();
> - }
> -
> - /* Mark backup_ghcb active before writing to it */
> - data->backup_ghcb_active = true;
> -
> - state->ghcb = &data->backup_ghcb;
> + local_irq_save(flags);
>
> - /* Backup GHCB content */
> - *state->ghcb = *ghcb;
> - } else {
> - state->ghcb = NULL;
> - data->ghcb_active = true;
> - }
> + ghcb = __sev_get_ghcb(&state);
>
> - return ghcb;
> -}
> + sev_snp_setup_hv_doorbell_page(ghcb);
>
> -static inline u64 sev_es_rd_ghcb_msr(void)
> -{
> - return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> -}
> -
> -static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> -{
> - u32 low, high;
> + __sev_put_ghcb(&state);
>
> - low = (u32)(val);
> - high = (u32)(val >> 32);
> + apic_set_eoi_write(hv_doorbell_apic_eoi_write);
>
> - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> + local_irq_restore(flags);
> }
>
> static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
> @@ -554,6 +663,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
> /* Include code shared with pre-decompression boot stage */
> #include "sev-shared.c"
>
> +/*
> + * Nothing shall interrupt this code path while holding the per-CPU
> + * GHCB. The backup GHCB is only for NMIs interrupting this path.
> + *
> + * Callers must disable local interrupts around it.
> + */
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +{
> + struct sev_es_runtime_data *data;
> + struct ghcb *ghcb;
> +
> + WARN_ON(!irqs_disabled());
> +
> + data = this_cpu_read(runtime_data);
> + ghcb = &data->ghcb_page;
> +
> + if (unlikely(data->ghcb_active)) {
> + /* GHCB is already in use - save its contents */
> +
> + if (unlikely(data->backup_ghcb_active)) {
> + /*
> + * Backup-GHCB is also already in use. There is no way
> + * to continue here so just kill the machine. To make
> + * panic() work, mark GHCBs inactive so that messages
> + * can be printed out.
> + */
> + data->ghcb_active = false;
> + data->backup_ghcb_active = false;
> +
> + instrumentation_begin();
> + panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> + instrumentation_end();
> + }
> +
> + /* Mark backup_ghcb active before writing to it */
> + data->backup_ghcb_active = true;
> +
> + state->ghcb = &data->backup_ghcb;
> +
> + /* Backup GHCB content */
> + *state->ghcb = *ghcb;
> + } else {
> + state->ghcb = NULL;
> + data->ghcb_active = true;
> + }
> +
> + return ghcb;
> +}
> +
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
> +{
> + u64 pa;
> + enum es_result ret;
> +
> + pa = __pa(sev_snp_current_doorbell_page());
> + vc_ghcb_invalidate(ghcb);
> + ret = vmgexit_hv_doorbell_page(ghcb,
> + SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
> + pa);
> + if (ret != ES_OK)
> + panic("SEV-SNP: failed to set up #HV doorbell page");
> +}
> +
> static noinstr void __sev_put_ghcb(struct ghcb_state *state)
> {
> struct sev_es_runtime_data *data;
> @@ -1282,6 +1454,7 @@ static void snp_register_per_cpu_ghcb(void)
> ghcb = &data->ghcb_page;
>
> snp_register_ghcb_early(__pa(ghcb));
> + sev_snp_setup_hv_doorbell_page(ghcb);
> }
>
> void setup_ghcb(void)
> @@ -1321,6 +1494,11 @@ void setup_ghcb(void)
> snp_register_ghcb_early(__pa(&boot_ghcb_page));
> }
>
> +int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
> +{
> + return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
> +}
> +
> #ifdef CONFIG_HOTPLUG_CPU
> static void sev_es_ap_hlt_loop(void)
> {
> @@ -1394,6 +1572,7 @@ static void __init alloc_runtime_data(int cpu)
> static void __init init_ghcb(int cpu)
> {
> struct sev_es_runtime_data *data;
> + struct sev_snp_runtime_data *snp_data;
> int err;
>
> data = per_cpu(runtime_data, cpu);
> @@ -1405,6 +1584,19 @@ static void __init init_ghcb(int cpu)
>
> memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));
>
> + snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
> + if (!snp_data)
> + panic("Can't allocate SEV-SNP runtime data");
> +
> + err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
> + sizeof(snp_data->hv_doorbell_page));
> + if (err)
> + panic("Can't map #HV doorbell pages unencrypted");
> +
> + memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
> +
> + per_cpu(snp_runtime_data, cpu) = snp_data;
> +
> data->ghcb_active = false;
> data->backup_ghcb_active = false;
> }
> @@ -2045,7 +2237,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
>
> static bool hv_raw_handle_exception(struct pt_regs *regs)
> {
> - return false;
> + /* Clear the no_further_signal bit */
> + sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
> +
> + check_hv_pending(regs);
> +
> + return true;
> }
>
> static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index d29debec8134..1aa6cab2394b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1503,5 +1503,7 @@ void __init trap_init(void)
> cpu_init_exception_handling();
> /* Setup traps as cpu_init() might #GP */
> idt_setup_traps();
> + sev_snp_init_hv_handling();
> +
> cpu_init();
> }


2023-02-17 12:46:01

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 14/16] x86/sev: Initialize #HV doorbell and handle interrupt requests


> Enable #HV exception to handle interrupt requests from hypervisor.
>
> Co-developed-by: Lendacky Thomas <[email protected]>
> Co-developed-by: Kalra Ashish <[email protected]>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 +
> arch/x86/include/asm/msr-index.h | 6 +
> arch/x86/include/asm/svm.h | 12 +-
> arch/x86/include/uapi/asm/svm.h | 4 +
> arch/x86/kernel/sev.c | 307 +++++++++++++++++++++++------
> arch/x86/kernel/traps.c | 2 +
> 6 files changed, 272 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 72ca90552b6a..7264ca5f5b2d 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> void __init mem_encrypt_free_decrypted_mem(void);
>
> void __init sev_es_init_vc_handling(void);
> +void __init sev_snp_init_hv_handling(void);
>
> #define __bss_decrypted __section(".bss..decrypted")
>
> @@ -72,6 +73,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline void sev_es_init_vc_handling(void) { }
> +static inline void sev_snp_init_hv_handling(void) { }
>
> static inline int __init
> early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 6a6e70e792a4..70af0ce5f2c4 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -562,11 +562,17 @@
> #define MSR_AMD64_SEV_ENABLED_BIT 0
> #define MSR_AMD64_SEV_ES_ENABLED_BIT 1
> #define MSR_AMD64_SEV_SNP_ENABLED_BIT 2
> +#define MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT 4
> +#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT 5
> +#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT 6

These are already commited part of:
("8c29f0165405 x86/sev: Add SEV-SNP guest feature negotiation support")

Thanks,
Pankaj
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> #define MSR_AMD64_SNP_VTOM_ENABLED BIT_ULL(3)
>
> +#define MSR_AMD64_SEV_REFLECTVC_ENABLED BIT_ULL(MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT)
> +#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT)
> +#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> /* AMD Collaborative Processor Performance Control MSRs */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index f8b321a11ee4..911c991fec78 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -568,12 +568,12 @@ static inline void __unused_size_checks(void)
>
> /* Check offsets of reserved fields */
>
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
>
> BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
> BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index f69c168391aa..85d6882262e7 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -115,6 +115,10 @@
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> #define SVM_VMGEXIT_AP_DESTROY 2
> +#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
> +#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
> +#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
> +#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
> #define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
> #define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index fe5e5e41433d..03d99fad9e76 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -122,6 +122,150 @@ struct sev_config {
>
> static struct sev_config sev_cfg __read_mostly;
>
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
> +static noinstr void __sev_put_ghcb(struct ghcb_state *state);
> +static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
> +
> +union hv_pending_events {
> + u16 events;
> + struct {
> + u8 vector;
> + u8 nmi : 1;
> + u8 mc : 1;
> + u8 reserved1 : 5;
> + u8 no_further_signal : 1;
> + };
> +};
> +
> +struct sev_hv_doorbell_page {
> + union hv_pending_events pending_events;
> + u8 no_eoi_required;
> + u8 reserved2[61];
> + u8 padding[4032];
> +};
> +
> +struct sev_snp_runtime_data {
> + struct sev_hv_doorbell_page hv_doorbell_page;
> +};
> +
> +static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> +
> +static inline u64 sev_es_rd_ghcb_msr(void)
> +{
> + return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> +}
> +
> +static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> +{
> + u32 low, high;
> +
> + low = (u32)(val);
> + high = (u32)(val >> 32);
> +
> + native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +}
> +
> +struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
> +{
> + return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
> +}
> +
> +static u8 sev_hv_pending(void)
> +{
> + return sev_snp_current_doorbell_page()->pending_events.events;
> +}
> +
> +static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
> +{
> + if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
> + return;
> +
> + BUG_ON(reg != APIC_EOI);
> + apic->write(reg, val);
> +}
> +
> +static void do_exc_hv(struct pt_regs *regs)
> +{
> + union hv_pending_events pending_events;
> + u8 vector;
> +
> + while (sev_hv_pending()) {
> + pending_events.events = xchg(
> + &sev_snp_current_doorbell_page()->pending_events.events,
> + 0);
> +
> + if (pending_events.nmi)
> + exc_nmi(regs);
> +
> +#ifdef CONFIG_X86_MCE
> + if (pending_events.mc)
> + exc_machine_check(regs);
> +#endif
> +
> + if (!pending_events.vector)
> + return;
> +
> + if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
> + /* Exception vectors */
> + WARN(1, "exception shouldn't happen\n");
> + } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
> + sysvec_irq_move_cleanup(regs);
> + } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
> + WARN(1, "syscall shouldn't happen\n");
> + } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
> + switch (pending_events.vector) {
> +#if IS_ENABLED(CONFIG_HYPERV)
> + case HYPERV_STIMER0_VECTOR:
> + sysvec_hyperv_stimer0(regs);
> + break;
> + case HYPERVISOR_CALLBACK_VECTOR:
> + sysvec_hyperv_callback(regs);
> + break;
> +#endif
> +#ifdef CONFIG_SMP
> + case RESCHEDULE_VECTOR:
> + sysvec_reschedule_ipi(regs);
> + break;
> + case IRQ_MOVE_CLEANUP_VECTOR:
> + sysvec_irq_move_cleanup(regs);
> + break;
> + case REBOOT_VECTOR:
> + sysvec_reboot(regs);
> + break;
> + case CALL_FUNCTION_SINGLE_VECTOR:
> + sysvec_call_function_single(regs);
> + break;
> + case CALL_FUNCTION_VECTOR:
> + sysvec_call_function(regs);
> + break;
> +#endif
> +#ifdef CONFIG_X86_LOCAL_APIC
> + case ERROR_APIC_VECTOR:
> + sysvec_error_interrupt(regs);
> + break;
> + case SPURIOUS_APIC_VECTOR:
> + sysvec_spurious_apic_interrupt(regs);
> + break;
> + case LOCAL_TIMER_VECTOR:
> + sysvec_apic_timer_interrupt(regs);
> + break;
> + case X86_PLATFORM_IPI_VECTOR:
> + sysvec_x86_platform_ipi(regs);
> + break;
> +#endif
> + case 0x0:
> + break;
> + default:
> + panic("Unexpected vector %d\n", vector);
> + unreachable();
> + }
> + } else {
> + common_interrupt(regs, pending_events.vector);
> + }
> + }
> +}
> +
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> {
> unsigned long sp = regs->sp;
> @@ -179,11 +323,6 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
> }
>
> -static void do_exc_hv(struct pt_regs *regs)
> -{
> - /* Handle #HV exception. */
> -}
> -
> void check_hv_pending(struct pt_regs *regs)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> @@ -232,68 +371,38 @@ void noinstr __sev_es_ist_exit(void)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
> }
>
> -/*
> - * Nothing shall interrupt this code path while holding the per-CPU
> - * GHCB. The backup GHCB is only for NMIs interrupting this path.
> - *
> - * Callers must disable local interrupts around it.
> - */
> -static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +static bool sev_restricted_injection_enabled(void)
> +{
> + return sev_status & MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED;
> +}
> +
> +void __init sev_snp_init_hv_handling(void)
> {
> + struct sev_snp_runtime_data *snp_data;
> struct sev_es_runtime_data *data;
> + struct ghcb_state state;
> struct ghcb *ghcb;
> + unsigned long flags;
> + int cpu;
> + int err;
>
> WARN_ON(!irqs_disabled());
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
> + return;
>
> data = this_cpu_read(runtime_data);
> - ghcb = &data->ghcb_page;
>
> - if (unlikely(data->ghcb_active)) {
> - /* GHCB is already in use - save its contents */
> -
> - if (unlikely(data->backup_ghcb_active)) {
> - /*
> - * Backup-GHCB is also already in use. There is no way
> - * to continue here so just kill the machine. To make
> - * panic() work, mark GHCBs inactive so that messages
> - * can be printed out.
> - */
> - data->ghcb_active = false;
> - data->backup_ghcb_active = false;
> -
> - instrumentation_begin();
> - panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> - instrumentation_end();
> - }
> -
> - /* Mark backup_ghcb active before writing to it */
> - data->backup_ghcb_active = true;
> -
> - state->ghcb = &data->backup_ghcb;
> + local_irq_save(flags);
>
> - /* Backup GHCB content */
> - *state->ghcb = *ghcb;
> - } else {
> - state->ghcb = NULL;
> - data->ghcb_active = true;
> - }
> + ghcb = __sev_get_ghcb(&state);
>
> - return ghcb;
> -}
> + sev_snp_setup_hv_doorbell_page(ghcb);
>
> -static inline u64 sev_es_rd_ghcb_msr(void)
> -{
> - return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> -}
> -
> -static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> -{
> - u32 low, high;
> + __sev_put_ghcb(&state);
>
> - low = (u32)(val);
> - high = (u32)(val >> 32);
> + apic_set_eoi_write(hv_doorbell_apic_eoi_write);
>
> - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> + local_irq_restore(flags);
> }
>
> static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
> @@ -554,6 +663,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
> /* Include code shared with pre-decompression boot stage */
> #include "sev-shared.c"
>
> +/*
> + * Nothing shall interrupt this code path while holding the per-CPU
> + * GHCB. The backup GHCB is only for NMIs interrupting this path.
> + *
> + * Callers must disable local interrupts around it.
> + */
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +{
> + struct sev_es_runtime_data *data;
> + struct ghcb *ghcb;
> +
> + WARN_ON(!irqs_disabled());
> +
> + data = this_cpu_read(runtime_data);
> + ghcb = &data->ghcb_page;
> +
> + if (unlikely(data->ghcb_active)) {
> + /* GHCB is already in use - save its contents */
> +
> + if (unlikely(data->backup_ghcb_active)) {
> + /*
> + * Backup-GHCB is also already in use. There is no way
> + * to continue here so just kill the machine. To make
> + * panic() work, mark GHCBs inactive so that messages
> + * can be printed out.
> + */
> + data->ghcb_active = false;
> + data->backup_ghcb_active = false;
> +
> + instrumentation_begin();
> + panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> + instrumentation_end();
> + }
> +
> + /* Mark backup_ghcb active before writing to it */
> + data->backup_ghcb_active = true;
> +
> + state->ghcb = &data->backup_ghcb;
> +
> + /* Backup GHCB content */
> + *state->ghcb = *ghcb;
> + } else {
> + state->ghcb = NULL;
> + data->ghcb_active = true;
> + }
> +
> + return ghcb;
> +}
> +
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
> +{
> + u64 pa;
> + enum es_result ret;
> +
> + pa = __pa(sev_snp_current_doorbell_page());
> + vc_ghcb_invalidate(ghcb);
> + ret = vmgexit_hv_doorbell_page(ghcb,
> + SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
> + pa);
> + if (ret != ES_OK)
> + panic("SEV-SNP: failed to set up #HV doorbell page");
> +}
> +
> static noinstr void __sev_put_ghcb(struct ghcb_state *state)
> {
> struct sev_es_runtime_data *data;
> @@ -1282,6 +1454,7 @@ static void snp_register_per_cpu_ghcb(void)
> ghcb = &data->ghcb_page;
>
> snp_register_ghcb_early(__pa(ghcb));
> + sev_snp_setup_hv_doorbell_page(ghcb);
> }
>
> void setup_ghcb(void)
> @@ -1321,6 +1494,11 @@ void setup_ghcb(void)
> snp_register_ghcb_early(__pa(&boot_ghcb_page));
> }
>
> +int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
> +{
> + return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
> +}
> +
> #ifdef CONFIG_HOTPLUG_CPU
> static void sev_es_ap_hlt_loop(void)
> {
> @@ -1394,6 +1572,7 @@ static void __init alloc_runtime_data(int cpu)
> static void __init init_ghcb(int cpu)
> {
> struct sev_es_runtime_data *data;
> + struct sev_snp_runtime_data *snp_data;
> int err;
>
> data = per_cpu(runtime_data, cpu);
> @@ -1405,6 +1584,19 @@ static void __init init_ghcb(int cpu)
>
> memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));
>
> + snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
> + if (!snp_data)
> + panic("Can't allocate SEV-SNP runtime data");
> +
> + err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
> + sizeof(snp_data->hv_doorbell_page));
> + if (err)
> + panic("Can't map #HV doorbell pages unencrypted");
> +
> + memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
> +
> + per_cpu(snp_runtime_data, cpu) = snp_data;
> +
> data->ghcb_active = false;
> data->backup_ghcb_active = false;
> }
> @@ -2045,7 +2237,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
>
> static bool hv_raw_handle_exception(struct pt_regs *regs)
> {
> - return false;
> + /* Clear the no_further_signal bit */
> + sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
> +
> + check_hv_pending(regs);
> +
> + return true;
> }
>
> static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index d29debec8134..1aa6cab2394b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1503,5 +1503,7 @@ void __init trap_init(void)
> cpu_init_exception_handling();
> /* Setup traps as cpu_init() might #GP */
> idt_setup_traps();
> + sev_snp_init_hv_handling();
> +
> cpu_init();
> }


2023-02-17 12:48:12

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

On 2/9/2023 12:36 PM, Gupta, Pankaj wrote:
> Hi Tianyu,
>
>> This patchset is to add AMD sev-snp enlightened guest
>> support on hyperv. Hyperv uses Linux direct boot mode
>> to boot up Linux kernel and so it needs to pvalidate
>> system memory by itself.
>>
>> In hyperv case, there is no boot loader and so cc blob
>> is prepared by hypervisor. In this series, hypervisor
>> set the cc blob address directly into boot parameter
>> of Linux kernel. If the magic number on cc blob address
>> is valid, kernel will read cc blob.
>>
>> Shared memory between guests and hypervisor should be
>> decrypted and zero memory after decrypt memory. The data
>> in the target address. It maybe smearedto avoid smearing
>> data.
>>
>> Introduce #HV exception support in AMD sev snp code and
>> #HV handler.
>
> I am interested to test the Linux guest #HV exception handling (patches
> 12-16 in this series) for the restricted interrupt injection with the
> Linux/KVM host.
>
> Do you have a git tree which or any base commit on which
> I can use to apply these patches?

Never mind. I could apply the patches 12-16 on master (except minor
tweak in patch 14). Now, will try to test.

Thanks,
Pankaj


2023-02-18 07:16:20

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

On 2/17/2023 8:47 PM, Gupta, Pankaj wrote:
> On 2/9/2023 12:36 PM, Gupta, Pankaj wrote:
>> Hi Tianyu,
>>
>>> This patchset is to add AMD sev-snp enlightened guest
>>> support on hyperv. Hyperv uses Linux direct boot mode
>>> to boot up Linux kernel and so it needs to pvalidate
>>> system memory by itself.
>>>
>>> In hyperv case, there is no boot loader and so cc blob
>>> is prepared by hypervisor. In this series, hypervisor
>>> set the cc blob address directly into boot parameter
>>> of Linux kernel. If the magic number on cc blob address
>>> is valid, kernel will read cc blob.
>>>
>>> Shared memory between guests and hypervisor should be
>>> decrypted and zero memory after decrypt memory. The data
>>> in the target address. It maybe smearedto avoid smearing
>>> data.
>>>
>>> Introduce #HV exception support in AMD sev snp code and
>>> #HV handler.
>>
>> I am interested to test the Linux guest #HV exception handling
>> (patches 12-16 in this series) for the restricted interrupt injection
>> with the Linux/KVM host.
>>
>> Do you have a git tree which or any base commit on which
>> I can use to apply these patches?
>
> Never mind. I could apply the patches 12-16 on master (except minor
> tweak in patch 14). Now, will try to test.
>

Hi Pankaj:
Sorry. I missed your first mail. Please let me know any issue son KVM
side if available。Thanks in advance.

2023-02-21 16:46:53

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 16/16] x86/sev: Fix interrupt exit code paths from #HV exception

On 1/22/2023 3:46 AM, Tianyu Lan wrote:
> From: Ashish Kalra <[email protected]>
>
> Add checks in interrupt exit code paths in case of returns
> to user mode to check if currently executing the #HV handler
> then don't follow the irqentry_exit_to_user_mode path as
> that can potentially cause the #HV handler to be
> preempted and rescheduled on another CPU. Rescheduled #HV
> handler on another cpu will cause interrupts to be handled
> on a different cpu than the injected one, causing
> invalid EOIs and missed/lost guest interrupts and
> corresponding hangs and/or per-cpu IRQs handled on
> non-intended cpu.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 30 +++++++++++++++
> 2 files changed, 96 insertions(+)
>
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 652fea10d377..45b47132be7c 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -13,6 +13,10 @@
>
> #include <asm/irq_stack.h>
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
> +#endif
> +
> /**
> * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
> * No error code pushed by hardware
> @@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
> #define DECLARE_IDTENTRY_IRQ(vector, func) \
> DECLARE_IDTENTRY_ERRORCODE(vector, func)
>
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
> /**
> * DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
> * @func: Function name of the entry point
> @@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
> } \
> \
> static noinline void __##func(struct pt_regs *regs, u32 vector)
> +#else
> +
> +#define DEFINE_IDTENTRY_IRQ(func) \
> +static void __##func(struct pt_regs *regs, u32 vector); \
> + \
> +__visible noinstr void func(struct pt_regs *regs, \
> + unsigned long error_code) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + u32 vector = (u32)(u8)error_code; \
> + \
> + instrumentation_begin(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + run_irq_on_irqstack_cond(__##func, regs, vector); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static noinline void __##func(struct pt_regs *regs, u32 vector)
> +#endif
>
> /**
> * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
> @@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
> #define DECLARE_IDTENTRY_SYSVEC(vector, func) \
> DECLARE_IDTENTRY(vector, func)
>
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
> /**
> * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
> * @func: Function name of the entry point
> @@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
> } \
> \
> static noinline void __##func(struct pt_regs *regs)
> +#else
> +
> +#define DEFINE_IDTENTRY_SYSVEC(func) \
> +static void __##func(struct pt_regs *regs); \
> + \
> +__visible noinstr void func(struct pt_regs *regs) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + \
> + instrumentation_begin(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + run_sysvec_on_irqstack_cond(__##func, regs); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static noinline void __##func(struct pt_regs *regs)
> +#endif
> +
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
>
> /**
> * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
> @@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
> } \
> \
> static __always_inline void __##func(struct pt_regs *regs)
> +#else
> +
> +#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
> +static __always_inline void __##func(struct pt_regs *regs); \
> + \
> +__visible noinstr void func(struct pt_regs *regs) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + \
> + instrumentation_begin(); \
> + __irq_enter_raw(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + __##func(regs); \
> + __irq_exit_raw(); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static __always_inline void __##func(struct pt_regs *regs)
> +#endif
>
> /**
> * DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index b1a98c2a52f8..23f15e95838b 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -147,6 +147,10 @@ struct sev_hv_doorbell_page {
>
> struct sev_snp_runtime_data {
> struct sev_hv_doorbell_page hv_doorbell_page;
> + /*
> + * Indication that we are currently handling #HV events.
> + */
> + bool hv_handling_events;
> };
>
> static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> @@ -200,6 +204,8 @@ static void do_exc_hv(struct pt_regs *regs)
> union hv_pending_events pending_events;
> u8 vector;
>
> + this_cpu_read(snp_runtime_data)->hv_handling_events = true;
> +
> while (sev_hv_pending()) {
> pending_events.events = xchg(
> &sev_snp_current_doorbell_page()->pending_events.events,
> @@ -234,6 +240,8 @@ static void do_exc_hv(struct pt_regs *regs)
> common_interrupt(regs, pending_events.vector);
> }
> }
> +
> + this_cpu_read(snp_runtime_data)->hv_handling_events = false;
> }
>
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> @@ -2529,3 +2537,25 @@ static int __init snp_init_platform_device(void)
> return 0;
> }
> device_initcall(snp_init_platform_device);
> +
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
> +{

This code path is being called even for the guest without SNP. Ran
a SEV guest and guest crashed in this code path. Checking & returning
made guest (non SNP) to boot with some call traces. But this branch
needs to be avoided for non-SNP guests and host as well.

Thanks,
Pankaj

+++ b/arch/x86/kernel/sev.c
@@ -2540,6 +2540,9 @@ device_initcall(snp_init_platform_device);

noinstr void irqentry_exit_hv_cond(struct pt_regs *regs,
irqentry_state_t state)
{
+
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;

> + /*
> + * Check whether this returns to user mode, if so and if
> + * we are currently executing the #HV handler then we don't
> + * want to follow the irqentry_exit_to_user_mode path as
> + * that can potentially cause the #HV handler to be
> + * preempted and rescheduled on another CPU. Rescheduled #HV
> + * handler on another cpu will cause interrupts to be handled
> + * on a different cpu than the injected one, causing
> + * invalid EOIs and missed/lost guest interrupts and
> + * corresponding hangs and/or per-cpu IRQs handled on
> + * non-intended cpu.
> + */
> + if (user_mode(regs) &&
> + this_cpu_read(snp_runtime_data)->hv_handling_events)
> + return;
> +
> + /* follow normal interrupt return/exit path */
> + irqentry_exit(regs, state);
> +}


2023-03-01 19:34:36

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 14/16] x86/sev: Initialize #HV doorbell and handle interrupt requests

On 1/22/2023 3:46 AM, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Enable #HV exception to handle interrupt requests from hypervisor.
>
> Co-developed-by: Lendacky Thomas <[email protected]>
> Co-developed-by: Kalra Ashish <[email protected]>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 +
> arch/x86/include/asm/msr-index.h | 6 +
> arch/x86/include/asm/svm.h | 12 +-
> arch/x86/include/uapi/asm/svm.h | 4 +
> arch/x86/kernel/sev.c | 307 +++++++++++++++++++++++------
> arch/x86/kernel/traps.c | 2 +
> 6 files changed, 272 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 72ca90552b6a..7264ca5f5b2d 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> void __init mem_encrypt_free_decrypted_mem(void);
>
> void __init sev_es_init_vc_handling(void);
> +void __init sev_snp_init_hv_handling(void);
>
> #define __bss_decrypted __section(".bss..decrypted")
>
> @@ -72,6 +73,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline void sev_es_init_vc_handling(void) { }
> +static inline void sev_snp_init_hv_handling(void) { }
>
> static inline int __init
> early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 6a6e70e792a4..70af0ce5f2c4 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -562,11 +562,17 @@
> #define MSR_AMD64_SEV_ENABLED_BIT 0
> #define MSR_AMD64_SEV_ES_ENABLED_BIT 1
> #define MSR_AMD64_SEV_SNP_ENABLED_BIT 2
> +#define MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT 4
> +#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT 5
> +#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT 6
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> #define MSR_AMD64_SNP_VTOM_ENABLED BIT_ULL(3)
>
> +#define MSR_AMD64_SEV_REFLECTVC_ENABLED BIT_ULL(MSR_AMD64_SEV_REFLECTVC_ENABLED_BIT)
> +#define MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED_BIT)
> +#define MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED BIT_ULL(MSR_AMD64_SEV_ALTERNATE_INJECTION_ENABLED_BIT)
> #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
>
> /* AMD Collaborative Processor Performance Control MSRs */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index f8b321a11ee4..911c991fec78 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -568,12 +568,12 @@ static inline void __unused_size_checks(void)
>
> /* Check offsets of reserved fields */
>
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> - BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xa0);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xcc);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0xd8);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> +// BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
>
> BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
> BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index f69c168391aa..85d6882262e7 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -115,6 +115,10 @@
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> #define SVM_VMGEXIT_AP_DESTROY 2
> +#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
> +#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
> +#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
> +#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
> #define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
> #define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index fe5e5e41433d..03d99fad9e76 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -122,6 +122,150 @@ struct sev_config {
>
> static struct sev_config sev_cfg __read_mostly;
>
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
> +static noinstr void __sev_put_ghcb(struct ghcb_state *state);
> +static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
> +
> +union hv_pending_events {
> + u16 events;
> + struct {
> + u8 vector;
> + u8 nmi : 1;
> + u8 mc : 1;
> + u8 reserved1 : 5;
> + u8 no_further_signal : 1;
> + };
> +};
> +
> +struct sev_hv_doorbell_page {
> + union hv_pending_events pending_events;
> + u8 no_eoi_required;
> + u8 reserved2[61];
> + u8 padding[4032];
> +};
> +
> +struct sev_snp_runtime_data {
> + struct sev_hv_doorbell_page hv_doorbell_page;
> +};
> +
> +static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> +
> +static inline u64 sev_es_rd_ghcb_msr(void)
> +{
> + return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> +}
> +
> +static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> +{
> + u32 low, high;
> +
> + low = (u32)(val);
> + high = (u32)(val >> 32);
> +
> + native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +}
> +
> +struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
> +{
> + return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
> +}
> +
> +static u8 sev_hv_pending(void)
> +{
> + return sev_snp_current_doorbell_page()->pending_events.events;
> +}
> +
> +static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
> +{
> + if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
> + return;
> +
> + BUG_ON(reg != APIC_EOI);
> + apic->write(reg, val);
> +}
> +
> +static void do_exc_hv(struct pt_regs *regs)
> +{
> + union hv_pending_events pending_events;
> + u8 vector;
> +
> + while (sev_hv_pending()) {
> + pending_events.events = xchg(
> + &sev_snp_current_doorbell_page()->pending_events.events,
> + 0);
> +
> + if (pending_events.nmi)
> + exc_nmi(regs);
> +
> +#ifdef CONFIG_X86_MCE
> + if (pending_events.mc)
> + exc_machine_check(regs);
> +#endif
> +
> + if (!pending_events.vector)
> + return;
> +
> + if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
> + /* Exception vectors */
> + WARN(1, "exception shouldn't happen\n");
> + } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
> + sysvec_irq_move_cleanup(regs);
> + } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
> + WARN(1, "syscall shouldn't happen\n");
> + } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
> + switch (pending_events.vector) {
> +#if IS_ENABLED(CONFIG_HYPERV)
> + case HYPERV_STIMER0_VECTOR:
> + sysvec_hyperv_stimer0(regs);
> + break;
> + case HYPERVISOR_CALLBACK_VECTOR:
> + sysvec_hyperv_callback(regs);
> + break;
> +#endif
> +#ifdef CONFIG_SMP
> + case RESCHEDULE_VECTOR:
> + sysvec_reschedule_ipi(regs);
> + break;
> + case IRQ_MOVE_CLEANUP_VECTOR:
> + sysvec_irq_move_cleanup(regs);
> + break;
> + case REBOOT_VECTOR:
> + sysvec_reboot(regs);
> + break;
> + case CALL_FUNCTION_SINGLE_VECTOR:
> + sysvec_call_function_single(regs);
> + break;
> + case CALL_FUNCTION_VECTOR:
> + sysvec_call_function(regs);
> + break;
> +#endif
> +#ifdef CONFIG_X86_LOCAL_APIC
> + case ERROR_APIC_VECTOR:
> + sysvec_error_interrupt(regs);
> + break;
> + case SPURIOUS_APIC_VECTOR:
> + sysvec_spurious_apic_interrupt(regs);
> + break;
> + case LOCAL_TIMER_VECTOR:
> + sysvec_apic_timer_interrupt(regs);
> + break;
> + case X86_PLATFORM_IPI_VECTOR:
> + sysvec_x86_platform_ipi(regs);
> + break;
> +#endif
> + case 0x0:
> + break;
> + default:
> + panic("Unexpected vector %d\n", vector);
> + unreachable();
> + }
> + } else {
> + common_interrupt(regs, pending_events.vector);
> + }
> + }
> +}
> +
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> {
> unsigned long sp = regs->sp;
> @@ -179,11 +323,6 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
> }
>
> -static void do_exc_hv(struct pt_regs *regs)
> -{
> - /* Handle #HV exception. */
> -}
> -
> void check_hv_pending(struct pt_regs *regs)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> @@ -232,68 +371,38 @@ void noinstr __sev_es_ist_exit(void)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
> }
>
> -/*
> - * Nothing shall interrupt this code path while holding the per-CPU
> - * GHCB. The backup GHCB is only for NMIs interrupting this path.
> - *
> - * Callers must disable local interrupts around it.
> - */
> -static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +static bool sev_restricted_injection_enabled(void)
> +{
> + return sev_status & MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED;
> +}
> +
> +void __init sev_snp_init_hv_handling(void)
> {
> + struct sev_snp_runtime_data *snp_data;
> struct sev_es_runtime_data *data;
> + struct ghcb_state state;
> struct ghcb *ghcb;
> + unsigned long flags;
> + int cpu;
> + int err;
>
> WARN_ON(!irqs_disabled());
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
> + return;
>
> data = this_cpu_read(runtime_data);
> - ghcb = &data->ghcb_page;
>
> - if (unlikely(data->ghcb_active)) {
> - /* GHCB is already in use - save its contents */
> -
> - if (unlikely(data->backup_ghcb_active)) {
> - /*
> - * Backup-GHCB is also already in use. There is no way
> - * to continue here so just kill the machine. To make
> - * panic() work, mark GHCBs inactive so that messages
> - * can be printed out.
> - */
> - data->ghcb_active = false;
> - data->backup_ghcb_active = false;
> -
> - instrumentation_begin();
> - panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> - instrumentation_end();
> - }
> -
> - /* Mark backup_ghcb active before writing to it */
> - data->backup_ghcb_active = true;
> -
> - state->ghcb = &data->backup_ghcb;
> + local_irq_save(flags);
>
> - /* Backup GHCB content */
> - *state->ghcb = *ghcb;
> - } else {
> - state->ghcb = NULL;
> - data->ghcb_active = true;
> - }
> + ghcb = __sev_get_ghcb(&state);
>
> - return ghcb;
> -}
> + sev_snp_setup_hv_doorbell_page(ghcb);
>
> -static inline u64 sev_es_rd_ghcb_msr(void)
> -{
> - return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> -}
> -
> -static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> -{
> - u32 low, high;
> + __sev_put_ghcb(&state);
>
> - low = (u32)(val);
> - high = (u32)(val >> 32);
> + apic_set_eoi_write(hv_doorbell_apic_eoi_write);
>
> - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> + local_irq_restore(flags);
> }
>
> static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
> @@ -554,6 +663,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
> /* Include code shared with pre-decompression boot stage */
> #include "sev-shared.c"
>
> +/*
> + * Nothing shall interrupt this code path while holding the per-CPU
> + * GHCB. The backup GHCB is only for NMIs interrupting this path.
> + *
> + * Callers must disable local interrupts around it.
> + */
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +{
> + struct sev_es_runtime_data *data;
> + struct ghcb *ghcb;
> +
> + WARN_ON(!irqs_disabled());
> +
> + data = this_cpu_read(runtime_data);
> + ghcb = &data->ghcb_page;
> +
> + if (unlikely(data->ghcb_active)) {
> + /* GHCB is already in use - save its contents */
> +
> + if (unlikely(data->backup_ghcb_active)) {
> + /*
> + * Backup-GHCB is also already in use. There is no way
> + * to continue here so just kill the machine. To make
> + * panic() work, mark GHCBs inactive so that messages
> + * can be printed out.
> + */
> + data->ghcb_active = false;
> + data->backup_ghcb_active = false;
> +
> + instrumentation_begin();
> + panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> + instrumentation_end();
> + }
> +
> + /* Mark backup_ghcb active before writing to it */
> + data->backup_ghcb_active = true;
> +
> + state->ghcb = &data->backup_ghcb;
> +
> + /* Backup GHCB content */
> + *state->ghcb = *ghcb;
> + } else {
> + state->ghcb = NULL;
> + data->ghcb_active = true;
> + }
> +
> + return ghcb;
> +}
> +
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
> +{
> + u64 pa;
> + enum es_result ret;
> +
> + pa = __pa(sev_snp_current_doorbell_page());
> + vc_ghcb_invalidate(ghcb);
> + ret = vmgexit_hv_doorbell_page(ghcb,
> + SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
> + pa);
> + if (ret != ES_OK)
> + panic("SEV-SNP: failed to set up #HV doorbell page");
> +}
> +
> static noinstr void __sev_put_ghcb(struct ghcb_state *state)
> {
> struct sev_es_runtime_data *data;
> @@ -1282,6 +1454,7 @@ static void snp_register_per_cpu_ghcb(void)
> ghcb = &data->ghcb_page;
>
> snp_register_ghcb_early(__pa(ghcb));
> + sev_snp_setup_hv_doorbell_page(ghcb);
> }
>
> void setup_ghcb(void)
> @@ -1321,6 +1494,11 @@ void setup_ghcb(void)
> snp_register_ghcb_early(__pa(&boot_ghcb_page));
> }
>
> +int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
> +{
> + return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
> +}
> +
> #ifdef CONFIG_HOTPLUG_CPU
> static void sev_es_ap_hlt_loop(void)
> {
> @@ -1394,6 +1572,7 @@ static void __init alloc_runtime_data(int cpu)
> static void __init init_ghcb(int cpu)
> {
> struct sev_es_runtime_data *data;
> + struct sev_snp_runtime_data *snp_data;
> int err;
>
> data = per_cpu(runtime_data, cpu);
> @@ -1405,6 +1584,19 @@ static void __init init_ghcb(int cpu)
>
> memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));
>
> + snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
> + if (!snp_data)
> + panic("Can't allocate SEV-SNP runtime data");
> +
> + err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
> + sizeof(snp_data->hv_doorbell_page));
> + if (err)
> + panic("Can't map #HV doorbell pages unencrypted");
> +
> + memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
> +
> + per_cpu(snp_runtime_data, cpu) = snp_data;
> +
> data->ghcb_active = false;
> data->backup_ghcb_active = false;
> }
> @@ -2045,7 +2237,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
>
> static bool hv_raw_handle_exception(struct pt_regs *regs)
> {
> - return false;
> + /* Clear the no_further_signal bit */
> + sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;

Do we need clearing of "no_further_signal" here? as we reset it in the
handler (do_exc_hv()) as well?

Thanks,
Pankaj

> +
> + check_hv_pending(regs);
> +
> + return true;
> }
>
> static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index d29debec8134..1aa6cab2394b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1503,5 +1503,7 @@ void __init trap_init(void)
> cpu_init_exception_handling();
> /* Setup traps as cpu_init() might #GP */
> idt_setup_traps();
> + sev_snp_init_hv_handling();
> +
> cpu_init();
> }


2023-03-10 15:46:32

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv



Hi Tianyu,

While testing the guest patches on KVM host, My guest kernel is stuck
at early bootup. As it did not seem a hang but sort of loop where
interrupts are getting processed from "pv_native_irq_enable" path
repeatedly and prevent boot process to make progress IIUC. Did you face
any such scenario in your testing?

It seems to me "native_irq_enable" enable interrupts and
"check_hv_pending_irq_enable" starts handling the interrupts (after
disabling irqs). But "check_hv_pending_irq_enable=>do_exc_hv" can again
call "pv_native_irq_enable" in interrupt handling path and execute the
same loop?

Also pasting below the stack dump [1].

Thanks,
Pankaj

[1]
[ 20.530786] Call Trace:^M
[ 20.531099] <IRQ>^M
[ 20.531360] dump_stack_lvl+0x4d/0x67^M
[ 20.531820] dump_stack+0x14/0x1a^M
[ 20.532235] do_exc_hv.cold+0x11/0xec^M
[ 20.532792] check_hv_pending_irq_enable+0x64/0x80^M
[ 20.533390] pv_native_irq_enable+0xe/0x20^M ====> here
[ 20.533902] __do_softirq+0x89/0x2f3^M
[ 20.534352] __irq_exit_rcu+0x9f/0x110^M
[ 20.534825] irq_exit_rcu+0x12/0x20^M
[ 20.535267] common_interrupt+0xca/0xf0^M
[ 20.535745] </IRQ>^M
[ 20.536014] <TASK>^M
[ 20.536286] do_exc_hv.cold+0xda/0xec^M
[ 20.536826] check_hv_pending_irq_enable+0x64/0x80^M
[ 20.537429] pv_native_irq_enable+0xe/0x20^M ====> here
[ 20.537942] _raw_spin_unlock_irqrestore+0x21/0x50^M
[ 20.538539] __setup_irq+0x3be/0x740^M
[ 20.538990] request_threaded_irq+0x116/0x180^M
[ 20.539533] hpet_time_init+0x35/0x56^M
[ 20.539994] x86_late_time_init+0x1f/0x3d^M
[ 20.540556] start_kernel+0x8af/0x970^M
[ 20.541033] x86_64_start_reservations+0x28/0x2e^M
[ 20.541607] x86_64_start_kernel+0x96/0xa0^M
[ 20.542126] secondary_startup_64_no_verify+0xe5/0xeb^M
[ 20.542757] </TASK>^M

2023-03-10 16:06:11

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 16/16] x86/sev: Fix interrupt exit code paths from #HV exception

On 2/22/2023 12:44 AM, Gupta, Pankaj wrote:
>> @@ -2529,3 +2537,25 @@ static int __init snp_init_platform_device(void)
>>       return 0;
>>   }
>>   device_initcall(snp_init_platform_device);
>> +
>> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs,
>> irqentry_state_t state)
>> +{
>
> This code path is being called even for the guest without SNP. Ran
> a SEV guest and guest crashed in this code path. Checking & returning
> made guest (non SNP) to boot with some call traces. But this branch
> needs to be avoided for non-SNP guests and host as well.
>

Nice catch! I will fix it in the next version.

Thanks.

2023-03-10 16:24:16

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv


On 3/10/2023 11:35 PM, Gupta, Pankaj wrote:
>
>
> Hi Tianyu,
>
> While testing the guest patches on KVM host, My guest kernel is stuck
> at early bootup. As it did not seem a hang but sort of loop where
> interrupts are getting processed from "pv_native_irq_enable" path
> repeatedly and prevent boot process to make progress IIUC. Did you face
> any such scenario in your testing?
>
> It seems to me "native_irq_enable" enable interrupts and
> "check_hv_pending_irq_enable" starts handling the interrupts (after
> disabling irqs). But "check_hv_pending_irq_enable=>do_exc_hv" can again
> call "pv_native_irq_enable" in interrupt handling path and execute the
> same loop?


I don't meet the issue. Thanks for report. I will double check and
report back.

> Also pasting below the stack dump [1].
>
> Thanks,
> Pankaj
>
> [1]
> [   20.530786] Call Trace:^M
> [   20.531099]  <IRQ>^M
> [   20.531360]  dump_stack_lvl+0x4d/0x67^M
> [   20.531820]  dump_stack+0x14/0x1a^M
> [   20.532235]  do_exc_hv.cold+0x11/0xec^M
> [   20.532792]  check_hv_pending_irq_enable+0x64/0x80^M
> [   20.533390]  pv_native_irq_enable+0xe/0x20^M   ====> here
> [   20.533902]  __do_softirq+0x89/0x2f3^M
> [   20.534352]  __irq_exit_rcu+0x9f/0x110^M
> [   20.534825]  irq_exit_rcu+0x12/0x20^M
> [   20.535267]  common_interrupt+0xca/0xf0^M
> [   20.535745]  </IRQ>^M
> [   20.536014]  <TASK>^M
> [   20.536286]  do_exc_hv.cold+0xda/0xec^M
> [   20.536826]  check_hv_pending_irq_enable+0x64/0x80^M
> [   20.537429]  pv_native_irq_enable+0xe/0x20^M    ====> here
> [   20.537942]  _raw_spin_unlock_irqrestore+0x21/0x50^M
> [   20.538539]  __setup_irq+0x3be/0x740^M
> [   20.538990]  request_threaded_irq+0x116/0x180^M
> [   20.539533]  hpet_time_init+0x35/0x56^M
> [   20.539994]  x86_late_time_init+0x1f/0x3d^M
> [   20.540556]  start_kernel+0x8af/0x970^M
> [   20.541033]  x86_64_start_reservations+0x28/0x2e^M
> [   20.541607]  x86_64_start_kernel+0x96/0xa0^M
> [   20.542126]  secondary_startup_64_no_verify+0xe5/0xeb^M
> [   20.542757]  </TASK>^M

2023-03-15 06:40:42

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V3 00/16] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

Hi Tianyu,

>> Hi Tianyu,
>>
>> While testing the guest patches on KVM host, My guest kernel is stuck
>> at early bootup. As it did not seem a hang but sort of loop where
>> interrupts are getting processed from "pv_native_irq_enable" path
>> repeatedly and prevent boot process to make progress IIUC. Did you
>> face any such scenario in your testing?
>>
>> It seems to me "native_irq_enable" enable interrupts and
>> "check_hv_pending_irq_enable" starts handling the interrupts (after
>> disabling irqs). But "check_hv_pending_irq_enable=>do_exc_hv" can
>> again call "pv_native_irq_enable" in interrupt handling path and
>> execute the same loop?
>
>
> I don't meet the issue. Thanks for report. I will double check and
> report back.

Thank you!

More testing with the patches: After I commented out "do_exc_hv" from
pv_native_irq_enable()->check_hv_pending_irq_enable() code path. Now, I
am getting below [2] stack trace repeatedly when I dump stack.

This seems to me after IST stack return from #VC handling
for "native_cpuid", paranoid_exit =>"do_exc_hv" is handling interrupts.
As we don't disable interrupts in check_hv_pending()=>do_exc_hv(), so
interrupts are handled continuously here. This also prevents the boot
processor to make progress and stuck here.

Thoughts please? as I might be missing some important details here.

Thanks,
Pankaj

[2]

[ 59.845396] Call Trace:^M
[ 59.845703] <TASK>^M
[ 59.845980] dump_stack_lvl+0x4d/0x67^M
[ 59.846432] dump_stack+0x14/0x1a^M
[ 59.846842] do_exc_hv.cold+0x22/0xfd^M
[ 59.847301] check_hv_pending+0x38/0x50^M
[ 59.847773] paranoid_exit+0x8/0x70^M
[ 59.848205] RIP: 0010:native_cpuid+0x19/0x30^M
[ 59.848729] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
f3 0f 1e fa 55 49 89 f8 49 89 c9 48 89 d7 41 8b 00 48 89 e5 53 8b 0a 0f
a2 <41> 89 00 89 1e 48 8b 5d f8 89 0f 41 89 11 c9 e9 f7 bc df 00 0f 1f^M
[ 59.850995] RSP: 0000:ffffffffbd403e48 EFLAGS: 00010202^M
[ 59.851636] RAX: 000000000100007b RBX: 0000000000000000 RCX:
0000000000000000^M
[ 59.852498] RDX: 0000000000000000 RSI: ffffffffbd403e64 RDI:
ffffffffbd403e68^M
[ 59.853361] RBP: ffffffffbd403e50 R08: ffffffffbd403e60 R09:
ffffffffbd403e6c^M
[ 59.854240] R10: ffffffffbd403d10 R11: ffff9af5bff3cfe8 R12:
0000000000000056^M
[ 59.855111] R13: ffff9af5bffc8e40 R14: 0000000000000000 R15:
ffffffffbd41a120^M
[ 59.855976] kvm_arch_para_features+0x4e/0x80^M
[ 59.856511] pv_ipi_supported+0xe/0x34^M
[ 59.856973] kvm_apic_init+0x12/0x3f^M
[ 59.857414] apic_intr_mode_init+0x8d/0x10d^M
[ 59.857939] x86_late_time_init+0x28/0x3d^M
[ 59.858435] start_kernel+0x8af/0x970^M
[ 59.858894] x86_64_start_reservations+0x28/0x2e^M
[ 59.859461] x86_64_start_kernel+0x96/0xa0^M
[ 59.859965] secondary_startup_64_no_verify+0xe5/0xeb^M
[ 59.860583] </TASK>^M