2022-11-19 03:58:00

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 00/18] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Tianyu Lan <[email protected]>

This patchset is to add AMD sev-snp enlightened guest
support on hyperv. Hyperv uses Linux direct boot mode
to boot up Linux kernel and so it needs to pvalidate
system memory by itself.

In hyperv case, there is no boot loader and so cc blob
is prepared by hypervisor. In this series, hypervisor
set the cc blob address directly into boot parameter
of Linux kernel. If the magic number on cc blob address
is valid, kernel will read cc blob.

Shared memory between guests and hypervisor should be
decrypted and zero memory after decrypt memory. The data
in the target address. It maybe smearedto avoid smearing
data.

Introduce #HV exception support in AMD sev snp code and
#HV handler.

Change since v1:
- Remove boot param changes for cc blob address and
use setup head to pass cc blob info
- Remove unnessary WARN and BUG check
- Add system vector table map in the #HV exception
- Fix interrupt exit issue when use #HV exception

Ashish Kalra (2):
x86/sev: optimize system vector processing invoked from #HV exception
x86/sev: Fix interrupt exit code paths from #HV exception

Tianyu Lan (16):
x86/sev: Pvalidate memory gab for decompressing kernel
x86/hyperv: Add sev-snp enlightened guest specific config
x86/hyperv: apic change for sev-snp enlightened guest
x86/hyperv: Decrypt hv vp assist page in sev-snp enlightened guest
x86/hyperv: Get Virtual Trust Level via hvcall
x86/hyperv: Use vmmcall to implement hvcall in sev-snp enlightened
guest
clocksource: hyper-v: decrypt hyperv tsc page in sev-snp enlightened
guest
x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest
x86/hyperv: set target vtl in the vmbus init message
drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
enlightened guest
Drivers: hv: vmbus: Decrypt vmbus ring buffer
x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
x86/hyperv: Add smp support for sev-snp guest
x86/hyperv: Add hyperv-specific hadling for VMMCALL under SEV-ES
x86/sev: Add a #HV exception handler
x86/sev: Initialize #HV doorbell and handle interrupt requests

arch/x86/boot/compressed/head_64.S | 8 +
arch/x86/boot/compressed/sev.c | 84 ++++++
arch/x86/entry/entry_64.S | 82 +++++
arch/x86/hyperv/hv_apic.c | 79 +++--
arch/x86/hyperv/hv_init.c | 47 +++
arch/x86/hyperv/ivm.c | 12 +-
arch/x86/include/asm/cpu_entry_area.h | 6 +
arch/x86/include/asm/idtentry.h | 107 ++++++-
arch/x86/include/asm/irqflags.h | 19 ++
arch/x86/include/asm/mem_encrypt.h | 2 +
arch/x86/include/asm/mshyperv.h | 68 +++--
arch/x86/include/asm/msr-index.h | 6 +
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/sev.h | 13 +
arch/x86/include/asm/svm.h | 55 +++-
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/include/uapi/asm/svm.h | 4 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/mshyperv.c | 267 ++++++++++++++++-
arch/x86/kernel/dumpstack_64.c | 9 +-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 412 ++++++++++++++++++++++----
arch/x86/kernel/traps.c | 50 ++++
arch/x86/kernel/vmlinux.lds.S | 7 +
arch/x86/mm/cpu_entry_area.c | 2 +
drivers/clocksource/hyperv_timer.c | 2 +-
drivers/hv/connection.c | 14 +
drivers/hv/hv.c | 32 +-
drivers/hv/hv_common.c | 22 ++
drivers/hv/ring_buffer.c | 7 +-
include/asm-generic/hyperv-tlfs.h | 19 ++
include/asm-generic/mshyperv.h | 2 +
include/linux/hyperv.h | 4 +-
34 files changed, 1340 insertions(+), 106 deletions(-)

--
2.25.1



2022-11-19 04:00:12

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 02/18] x86/hyperv: Add sev-snp enlightened guest specific config

From: Tianyu Lan <[email protected]>

Introduce static key isolation_type_en_snp for enlightened
guest check and add some specific options in ms_hyperv_init_
platform().

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 12 +++++++++++-
arch/x86/include/asm/mshyperv.h | 2 ++
arch/x86/kernel/cpu/mshyperv.c | 29 ++++++++++++++++++++++++-----
drivers/hv/hv_common.c | 7 +++++++
4 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 1dbcbd9da74d..e9c30dad3419 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -259,10 +259,20 @@ bool hv_is_isolation_supported(void)
}

DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
+
+/*
+ * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
+ * isolation enlightened VM.
+ */
+bool hv_isolation_type_en_snp(void)
+{
+ return static_branch_unlikely(&isolation_type_en_snp);
+}

/*
* hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
- * isolation VM.
+ * isolation VM with vTOM support.
*/
bool hv_isolation_type_snp(void)
{
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 61f0c206bff0..9b8c3f638845 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -14,6 +14,7 @@
union hv_ghcb;

DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);

typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
@@ -32,6 +33,7 @@ extern u64 hv_current_partition_id;

extern union hv_ghcb * __percpu *hv_ghcb_pg;

+extern bool hv_isolation_type_en_snp(void);
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 831613959a92..2ea4f21c6172 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -273,6 +273,21 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);

+ /*
+ * Add custom configuration for SEV-SNP Enlightened guest
+ */
+ if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+ ms_hyperv.features |= HV_ACCESS_FREQUENCY_MSRS;
+ ms_hyperv.misc_features |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
+ ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+ ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
+ ms_hyperv.hints |= HV_X64_APIC_ACCESS_RECOMMENDED;
+ ms_hyperv.hints |= HV_X64_CLUSTER_IPI_RECOMMENDED;
+ }
+
+ pr_info("Hyper-V: enlightment features 0x%x, hints 0x%x, misc 0x%x\n",
+ ms_hyperv.features, ms_hyperv.hints, ms_hyperv.misc_features);
+
hv_max_functions_eax = cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS);

pr_info("Hyper-V: privilege flags low 0x%x, high 0x%x, hints 0x%x, misc 0x%x\n",
@@ -328,18 +343,22 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.shared_gpa_boundary =
BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);

- pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
- ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
-
- if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
+ if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+ static_branch_enable(&isolation_type_en_snp);
+ } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(&isolation_type_snp);
#ifdef CONFIG_SWIOTLB
swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
#endif
}
+
+ pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
+ ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
+
/* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
- if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
+ if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE
+ && !cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
cc_set_vendor(CC_VENDOR_HYPERV);
}
}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..2c6602571c47 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -268,6 +268,13 @@ bool __weak hv_isolation_type_snp(void)
}
EXPORT_SYMBOL_GPL(hv_isolation_type_snp);

+bool __weak hv_isolation_type_en_snp(void)
+{
+ return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
+
+
void __weak hv_setup_vmbus_handler(void (*handler)(void))
{
}
--
2.25.1


2022-11-19 04:27:21

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 18/18] x86/sev: Fix interrupt exit code paths from #HV exception

From: Ashish Kalra <[email protected]>

Add checks in interrupt exit code paths in case of returns
to user mode to check if currently executing the #HV handler
then don't follow the irqentry_exit_to_user_mode path as
that can potentially cause the #HV handler to be
preempted and rescheduled on another CPU. Rescheduled #HV
handler on another cpu will cause interrupts to be handled
on a different cpu than the injected one, causing
invalid EOIs and missed/lost guest interrupts and
corresponding hangs and/or per-cpu IRQs handled on
non-intended cpu.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 30 +++++++++++++++
2 files changed, 96 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 652fea10d377..45b47132be7c 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -13,6 +13,10 @@

#include <asm/irq_stack.h>

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
+#endif
+
/**
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points
* No error code pushed by hardware
@@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
#define DECLARE_IDTENTRY_IRQ(vector, func) \
DECLARE_IDTENTRY_ERRORCODE(vector, func)

+#ifndef CONFIG_AMD_MEM_ENCRYPT
/**
* DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
* @func: Function name of the entry point
@@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
} \
\
static noinline void __##func(struct pt_regs *regs, u32 vector)
+#else
+
+#define DEFINE_IDTENTRY_IRQ(func) \
+static void __##func(struct pt_regs *regs, u32 vector); \
+ \
+__visible noinstr void func(struct pt_regs *regs, \
+ unsigned long error_code) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ u32 vector = (u32)(u8)error_code; \
+ \
+ instrumentation_begin(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ run_irq_on_irqstack_cond(__##func, regs, vector); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static noinline void __##func(struct pt_regs *regs, u32 vector)
+#endif

/**
* DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
@@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
#define DECLARE_IDTENTRY_SYSVEC(vector, func) \
DECLARE_IDTENTRY(vector, func)

+#ifndef CONFIG_AMD_MEM_ENCRYPT
/**
* DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
* @func: Function name of the entry point
@@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
} \
\
static noinline void __##func(struct pt_regs *regs)
+#else
+
+#define DEFINE_IDTENTRY_SYSVEC(func) \
+static void __##func(struct pt_regs *regs); \
+ \
+__visible noinstr void func(struct pt_regs *regs) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ \
+ instrumentation_begin(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ run_sysvec_on_irqstack_cond(__##func, regs); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static noinline void __##func(struct pt_regs *regs)
+#endif
+
+#ifndef CONFIG_AMD_MEM_ENCRYPT

/**
* DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
@@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
} \
\
static __always_inline void __##func(struct pt_regs *regs)
+#else
+
+#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
+static __always_inline void __##func(struct pt_regs *regs); \
+ \
+__visible noinstr void func(struct pt_regs *regs) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ \
+ instrumentation_begin(); \
+ __irq_enter_raw(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ __##func(regs); \
+ __irq_exit_raw(); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static __always_inline void __##func(struct pt_regs *regs)
+#endif

/**
* DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 5a2f59022c98..ef6a123c50fe 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -153,6 +153,10 @@ struct sev_hv_doorbell_page {

struct sev_snp_runtime_data {
struct sev_hv_doorbell_page hv_doorbell_page;
+ /*
+ * Indication that we are currently handling #HV events.
+ */
+ bool hv_handling_events;
};

static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
@@ -206,6 +210,8 @@ static void do_exc_hv(struct pt_regs *regs)
union hv_pending_events pending_events;
u8 vector;

+ this_cpu_read(snp_runtime_data)->hv_handling_events = true;
+
while (sev_hv_pending()) {
asm volatile("cli" : : : "memory");

@@ -244,6 +250,8 @@ static void do_exc_hv(struct pt_regs *regs)

asm volatile("sti" : : : "memory");
}
+
+ this_cpu_read(snp_runtime_data)->hv_handling_events = false;
}

void check_hv_pending(struct pt_regs *regs)
@@ -2541,3 +2549,25 @@ static int __init snp_init_platform_device(void)
return 0;
}
device_initcall(snp_init_platform_device);
+
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
+{
+ /*
+ * Check whether this returns to user mode, if so and if
+ * we are currently executing the #HV handler then we don't
+ * want to follow the irqentry_exit_to_user_mode path as
+ * that can potentially cause the #HV handler to be
+ * preempted and rescheduled on another CPU. Rescheduled #HV
+ * handler on another cpu will cause interrupts to be handled
+ * on a different cpu than the injected one, causing
+ * invalid EOIs and missed/lost guest interrupts and
+ * corresponding hangs and/or per-cpu IRQs handled on
+ * non-intended cpu.
+ */
+ if (user_mode(regs) &&
+ this_cpu_read(snp_runtime_data)->hv_handling_events)
+ return;
+
+ /* follow normal interrupt return/exit path */
+ irqentry_exit(regs, state);
+}
--
2.25.1


2022-11-19 04:47:08

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 11/18] Drivers: hv: vmbus: Decrypt vmbus ring buffer

From: Tianyu Lan <[email protected]>

The ring buffer is remapped in the hv_ringbuffer_init()
and it should be with decrypt flag in order to share it
with hypervisor in sev-snp enlightened guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/ring_buffer.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 59a4aa86d1f3..391995c76be7 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -18,6 +18,7 @@
#include <linux/slab.h>
#include <linux/prefetch.h>
#include <linux/io.h>
+#include <linux/set_memory.h>
#include <asm/mshyperv.h>

#include "hyperv_vmbus.h"
@@ -233,14 +234,18 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,

ring_info->ring_buffer = (struct hv_ring_buffer *)
vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
+ hv_isolation_type_en_snp() ?
+ pgprot_decrypted(PAGE_KERNEL_NOENC) :
PAGE_KERNEL);

+ if (hv_isolation_type_en_snp())
+ memset(ring_info->ring_buffer, 0x00, page_cnt * PAGE_SIZE);
+
kfree(pages_wraparound);
if (!ring_info->ring_buffer)
return -ENOMEM;
}

-
ring_info->ring_buffer->read_index =
ring_info->ring_buffer->write_index = 0;

--
2.25.1


2022-11-19 04:49:14

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 07/18] clocksource: hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Hyper-V tsc page is shared with hypervisor and it should be decrypted
in sev-snp enlightened guest when it's used.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/clocksource/hyperv_timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index bb47610bbd1c..aa68eebed5ee 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -364,7 +364,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
static union {
struct ms_hyperv_tsc_page page;
u8 reserved[PAGE_SIZE];
-} tsc_pg __aligned(PAGE_SIZE);
+} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);

struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
{
--
2.25.1


2022-11-19 04:53:04

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 13/18] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]>

The wakeup_secondary_cpu callback was populated with wakeup_
cpu_via_vmgexit() which doesn't work for Hyper-V. Override it
with Hyper-V specific hook which uses HVCALL_START_VIRTUAL_
PROCESSOR hvcall to start AP with vmsa data structure.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/include/asm/sev.h | 13 +++
arch/x86/include/asm/svm.h | 55 ++++++++++-
arch/x86/kernel/cpu/mshyperv.c | 147 +++++++++++++++++++++++++++++-
include/asm-generic/hyperv-tlfs.h | 18 ++++
4 files changed, 230 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..e34aaf730220 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

+union sev_rmp_adjust {
+ u64 as_uint64;
+ struct {
+ unsigned long target_vmpl : 8;
+ unsigned long enable_read : 1;
+ unsigned long enable_write : 1;
+ unsigned long enable_user_execute : 1;
+ unsigned long enable_kernel_execute : 1;
+ unsigned long reserved1 : 4;
+ unsigned long vmsa : 1;
+ };
+};
+
/* SNP Guest message request */
struct snp_req_data {
unsigned long req_gpa;
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 0361626841bc..fc54d3e7f817 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -328,8 +328,61 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
- u8 reserved_6[72];
+
+ /*
+ * The following part of the save area is valid only for
+ * SEV-ES guests when referenced through the GHCB or for
+ * saving to the host save area.
+ */
+ u8 reserved_7[72];
u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
+ u8 reserved_7b[4];
+ u32 pkru;
+ u8 reserved_7a[20];
+ u64 reserved_8; /* rax already available at 0x01f8 */
+ u64 rcx;
+ u64 rdx;
+ u64 rbx;
+ u64 reserved_9; /* rsp already available at 0x01d8 */
+ u64 rbp;
+ u64 rsi;
+ u64 rdi;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u8 reserved_10[16];
+ u64 sw_exit_code;
+ u64 sw_exit_info_1;
+ u64 sw_exit_info_2;
+ u64 sw_scratch;
+ union {
+ u64 sev_features;
+ struct {
+ u64 sev_feature_snp : 1;
+ u64 sev_feature_vtom : 1;
+ u64 sev_feature_reflectvc : 1;
+ u64 sev_feature_restrict_injection : 1;
+ u64 sev_feature_alternate_injection : 1;
+ u64 sev_feature_full_debug : 1;
+ u64 sev_feature_reserved1 : 1;
+ u64 sev_feature_snpbtb_isolation : 1;
+ u64 sev_feature_resrved2 : 56;
+ };
+ };
+ u64 vintr_ctrl;
+ u64 guest_error_code;
+ u64 virtual_tom;
+ u64 tlb_id;
+ u64 pcpu_id;
+ u64 event_inject;
+ u64 xcr0;
+ u8 valid_bitmap[16];
+ u64 x87_state_gpa;
} __packed;

/* Save area definition for SEV-ES and SEV-SNP guests */
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f0c97210c64a..b266f648e5cd 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -41,6 +41,10 @@
#include <asm/realmode.h>
#include <asm/e820/api.h>

+#define EN_SEV_SNP_PROCESSOR_INFO_ADDR 0x802000
+#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT 0xffffffff
+
/* Is Linux running as the root partition? */
bool hv_root_partition;
struct ms_hyperv_info ms_hyperv;
@@ -232,6 +236,136 @@ static void __init hv_smp_prepare_boot_cpu(void)
#endif
}

+static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+int hv_snp_boot_ap(int cpu, unsigned long start_ip)
+{
+ struct vmcb_save_area *vmsa = (struct vmcb_save_area *)
+ __get_free_page(GFP_KERNEL | __GFP_ZERO);
+ struct desc_ptr gdtr;
+ u64 ret, retry = 5;
+ struct hv_enable_vp_vtl_input *enable_vtl_input;
+ struct hv_start_virtual_processor_input *start_vp_input;
+ union sev_rmp_adjust rmp_adjust;
+ void **arg;
+ unsigned long flags;
+
+ *(void **)per_cpu_ptr(hyperv_pcpu_input_arg, cpu) = ap_start_input_arg;
+
+ hv_vp_index[cpu] = cpu;
+
+ /* Prevent APs from entering busy calibration loop */
+ preset_lpj = lpj_fine;
+
+ /* Replace the provided real-mode start_ip */
+ start_ip = (unsigned long)secondary_startup_64_no_verify;
+
+ native_store_gdt(&gdtr);
+
+ vmsa->gdtr.base = gdtr.address;
+ vmsa->gdtr.limit = gdtr.size;
+
+ asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
+ if (vmsa->es.selector) {
+ vmsa->es.base = 0;
+ vmsa->es.limit = HV_AP_SEGMENT_LIMIT;
+ vmsa->es.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->es.selector + 5);
+ vmsa->es.attrib = (vmsa->es.attrib & 0xFF) | ((vmsa->es.attrib >> 4) & 0xF00);
+ }
+
+ asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
+ if (vmsa->cs.selector) {
+ vmsa->cs.base = 0;
+ vmsa->cs.limit = HV_AP_SEGMENT_LIMIT;
+ vmsa->cs.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->cs.selector + 5);
+ vmsa->cs.attrib = (vmsa->cs.attrib & 0xFF) | ((vmsa->cs.attrib >> 4) & 0xF00);
+ }
+
+ asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
+ if (vmsa->ss.selector) {
+ vmsa->ss.base = 0;
+ vmsa->ss.limit = HV_AP_SEGMENT_LIMIT;
+ vmsa->ss.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->ss.selector + 5);
+ vmsa->ss.attrib = (vmsa->ss.attrib & 0xFF) | ((vmsa->ss.attrib >> 4) & 0xF00);
+ }
+
+ asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
+ if (vmsa->ds.selector) {
+ vmsa->ds.base = 0;
+ vmsa->ds.limit = HV_AP_SEGMENT_LIMIT;
+ vmsa->ds.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->ds.selector + 5);
+ vmsa->ds.attrib = (vmsa->ds.attrib & 0xFF) | ((vmsa->ds.attrib >> 4) & 0xF00);
+ }
+
+ vmsa->efer = native_read_msr(MSR_EFER);
+
+ asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
+ asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
+ asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
+
+ vmsa->xcr0 = 1;
+ vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
+ vmsa->rip = (u64)start_ip;
+ vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
+
+ vmsa->sev_feature_snp = 1;
+ vmsa->sev_feature_restrict_injection = 1;
+
+ rmp_adjust.as_uint64 = 0;
+ rmp_adjust.target_vmpl = 1;
+ rmp_adjust.vmsa = 1;
+ ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
+ rmp_adjust.as_uint64);
+ if (ret != 0) {
+ pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
+ return ret;
+ }
+
+ local_irq_save(flags);
+ arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+ if (unlikely(!*arg)) {
+ ret = -ENOMEM;
+ goto done;
+ }
+
+ if (ms_hyperv.vtl != 0) {
+ enable_vtl_input = (struct hv_enable_vp_vtl_input *)*arg;
+ memset(enable_vtl_input, 0, sizeof(*enable_vtl_input));
+ enable_vtl_input->partitionid = -1;
+ enable_vtl_input->vpindex = cpu;
+ enable_vtl_input->targetvtl = ms_hyperv.vtl;
+ *(u64 *)&enable_vtl_input->context[0] = __pa(vmsa) | 1;
+
+ ret = hv_do_hypercall(HVCALL_ENABLE_VP_VTL, enable_vtl_input, NULL);
+ if (ret != 0) {
+ pr_err("HvCallEnableVpVtl failed: %llx\n", ret);
+ goto done;
+ }
+ }
+
+ start_vp_input = (struct hv_start_virtual_processor_input *)*arg;
+ memset(start_vp_input, 0, sizeof(*start_vp_input));
+ start_vp_input->partitionid = -1;
+ start_vp_input->vpindex = cpu;
+ start_vp_input->targetvtl = ms_hyperv.vtl;
+ *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
+
+ do {
+ ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
+ start_vp_input, NULL);
+ } while (ret == HV_STATUS_TIME_OUT && retry--);
+
+ if (ret != 0) {
+ pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
+ goto done;
+ }
+
+done:
+ local_irq_restore(flags);
+ return ret;
+}
+
static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
{
#ifdef CONFIG_X86_64
@@ -241,6 +375,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)

native_smp_prepare_cpus(max_cpus);

+ /*
+ * Override wakeup_secondary_cpu callback for SEV-SNP
+ * enlightened guest.
+ */
+ if (hv_isolation_type_en_snp())
+ apic->wakeup_secondary_cpu = hv_snp_boot_ap;
+
+ if (!hv_root_partition)
+ return;
+
#ifdef CONFIG_X86_64
for_each_present_cpu(i) {
if (i == 0)
@@ -489,8 +633,7 @@ static void __init ms_hyperv_init_platform(void)

# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
- if (hv_root_partition)
- smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+ smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
# endif

/*
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 6e2a090e2649..7072adbf5540 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -139,6 +139,7 @@ struct ms_hyperv_tsc_page {
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
+#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
@@ -156,6 +157,7 @@ struct ms_hyperv_tsc_page {
#define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
#define HVCALL_RETARGET_INTERRUPT 0x007e
+#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
@@ -763,6 +765,22 @@ struct hv_input_unmap_device_interrupt {
struct hv_interrupt_entry interrupt_entry;
} __packed;

+struct hv_enable_vp_vtl_input {
+ u64 partitionid;
+ u32 vpindex;
+ u8 targetvtl;
+ u8 padding[3];
+ u8 context[0xe0];
+} __packed;
+
+struct hv_start_virtual_processor_input {
+ u64 partitionid;
+ u32 vpindex;
+ u8 targetvtl;
+ u8 padding[3];
+ u8 context[0xe0];
+} __packed;
+
#define HV_SOURCE_SHADOW_NONE 0x0
#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1

--
2.25.1


2022-11-19 04:57:20

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 14/18] x86/hyperv: Add hyperv-specific hadling for VMMCALL under SEV-ES

From: Tianyu Lan <[email protected]>

Add Hyperv-specific handling for faults caused by VMMCALL
instructions.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index b266f648e5cd..a4e526378603 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -725,6 +725,20 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
}

+static void hv_sev_es_hcall_prepare(struct ghcb *ghcb, struct pt_regs *regs)
+{
+ /* RAX and CPL are already in the GHCB */
+ ghcb_set_rcx(ghcb, regs->cx);
+ ghcb_set_rdx(ghcb, regs->dx);
+ ghcb_set_r8(ghcb, regs->r8);
+}
+
+static bool hv_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+ /* No checking of the return state needed */
+ return true;
+}
+
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.name = "Microsoft Hyper-V",
.detect = ms_hyperv_platform,
@@ -732,4 +746,6 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.init.x2apic_available = ms_hyperv_x2apic_available,
.init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
+ .runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare,
+ .runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish,
};
--
2.25.1


2022-11-19 05:03:19

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 05/18] x86/hyperv: Get Virtual Trust Level via hvcall

From: Tianyu Lan <[email protected]>

sev-snp guest provides vtl(Virtual Trust Level) and get it from
hyperv hvcall via HVCALL_GET_VP_REGISTERS.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++++++++
include/asm-generic/mshyperv.h | 2 ++
2 files changed, 37 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 4600c5941957..5b919d4d24c0 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -390,6 +390,39 @@ static void __init hv_get_partition_id(void)
local_irq_restore(flags);
}

+static u8 __init get_current_vtl(void)
+{
+ u64 control = ((u64)1 << HV_HYPERCALL_REP_COMP_OFFSET) | HVCALL_GET_VP_REGISTERS;
+ struct hv_get_vp_registers_input *input = NULL;
+ struct hv_get_vp_registers_output *output = NULL;
+ u8 vtl = 0;
+ int ret;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ input = *(struct hv_get_vp_registers_input **)this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = (struct hv_get_vp_registers_output *)input;
+ if (!input || !output) {
+ pr_err("Hyper-V: cannot allocate a shared page!");
+ goto done;
+ }
+
+ memset(input, 0, sizeof(*input) + sizeof(input->element[0]));
+ input->header.partitionid = HV_PARTITION_ID_SELF;
+ input->header.inputvtl = 0;
+ input->element[0].name0 = 0x000D0003;
+
+ ret = hv_do_hypercall(control, input, output);
+ if (ret == 0)
+ vtl = output->as64.low & 0xf;
+ else
+ pr_err("Hyper-V: failed to get the current VTL!");
+ local_irq_restore(flags);
+
+done:
+ return vtl;
+}
+
/*
* This function is to be invoked early in the boot sequence after the
* hypervisor has been detected.
@@ -527,6 +560,8 @@ void __init hyperv_init(void)
if (hv_is_isolation_supported())
swiotlb_update_mem_attributes();
#endif
+ /* Find the current VTL */
+ ms_hyperv.vtl = get_current_vtl();

return;

diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index bfb9eb9d7215..68133de044ec 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -46,6 +46,7 @@ struct ms_hyperv_info {
};
};
u64 shared_gpa_boundary;
+ u8 vtl;
};
extern struct ms_hyperv_info ms_hyperv;

@@ -55,6 +56,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
extern bool hv_isolation_type_snp(void);
+extern bool hv_isolation_type_en_snp(void);

/* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
static inline int hv_result(u64 status)
--
2.25.1


2022-11-19 05:04:41

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 15/18] x86/sev: Add a #HV exception handler

From: Tianyu Lan <[email protected]>

Add a #HV exception handler that uses IST stack.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/entry/entry_64.S | 58 +++++++++++++++++++++++++++
arch/x86/include/asm/cpu_entry_area.h | 6 +++
arch/x86/include/asm/idtentry.h | 39 +++++++++++++++++-
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/dumpstack_64.c | 9 ++++-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 53 ++++++++++++++++++++++++
arch/x86/mm/cpu_entry_area.c | 2 +
11 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9953d966d124..b2059df43c57 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -560,6 +560,64 @@ SYM_CODE_START(\asmsym)
.Lfrom_usermode_switch_stack_\@:
idtentry_body user_\cfunc, has_error_code=1

+_ASM_NOKPROBE(\asmsym)
+SYM_CODE_END(\asmsym)
+.endm
+/*
+ * idtentry_hv - Macro to generate entry stub for #HV
+ * @vector: Vector number
+ * @asmsym: ASM symbol for the entry point
+ * @cfunc: C function to be called
+ *
+ * The macro emits code to set up the kernel context for #HV. The #HV handler
+ * runs on an IST stack and needs to be able to support nested #HV exceptions.
+ *
+ * To make this work the #HV entry code tries its best to pretend it doesn't use
+ * an IST stack by switching to the task stack if coming from user-space (which
+ * includes early SYSCALL entry path) or back to the stack in the IRET frame if
+ * entered from kernel-mode.
+ *
+ * If entered from kernel-mode the return stack is validated first, and if it is
+ * not safe to use (e.g. because it points to the entry stack) the #HV handler
+ * will switch to a fall-back stack (HV2) and call a special handler function.
+ *
+ * The macro is only used for one vector, but it is planned to be extended in
+ * the future for the #HV exception.
+ */
+.macro idtentry_hv vector asmsym cfunc
+SYM_CODE_START(\asmsym)
+ UNWIND_HINT_IRET_REGS
+ ASM_CLAC
+ pushq $-1 /* ORIG_RAX: no syscall to restart */
+
+ testb $3, CS-ORIG_RAX(%rsp)
+ jnz .Lfrom_usermode_switch_stack_\@
+
+ call paranoid_entry
+
+ UNWIND_HINT_REGS
+
+ /*
+ * Switch off the IST stack to make it free for nested exceptions.
+ */
+ movq %rsp, %rdi /* pt_regs pointer */
+ call hv_switch_off_ist
+ movq %rax, %rsp /* Switch to new stack */
+
+ UNWIND_HINT_REGS
+
+ /* Update pt_regs */
+ movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
+ movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
+
+ movq %rsp, %rdi /* pt_regs pointer */
+ call kernel_\cfunc
+
+ jmp paranoid_exit
+
+.Lfrom_usermode_switch_stack_\@:
+ idtentry_body user_\cfunc, has_error_code=1
+
_ASM_NOKPROBE(\asmsym)
SYM_CODE_END(\asmsym)
.endm
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 75efc4c6f076..f173a16cfc59 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -30,6 +30,10 @@
char VC_stack[optional_stack_size]; \
char VC2_stack_guard[guardsize]; \
char VC2_stack[optional_stack_size]; \
+ char HV_stack_guard[guardsize]; \
+ char HV_stack[optional_stack_size]; \
+ char HV2_stack_guard[guardsize]; \
+ char HV2_stack[optional_stack_size]; \
char IST_top_guard[guardsize]; \

/* The exception stacks' physical storage. No guard pages required */
@@ -52,6 +56,8 @@ enum exception_stack_ordering {
ESTACK_MCE,
ESTACK_VC,
ESTACK_VC2,
+ ESTACK_HV,
+ ESTACK_HV2,
N_EXCEPTION_STACKS
};

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 72184b0b2219..652fea10d377 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -317,6 +317,19 @@ static __always_inline void __##func(struct pt_regs *regs)
__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code); \
__visible noinstr void user_##func(struct pt_regs *regs, unsigned long error_code)

+
+/**
+ * DECLARE_IDTENTRY_HV - Declare functions for the HV entry point
+ * @vector: Vector number (ignored for C)
+ * @func: Function name of the entry point
+ *
+ * Maps to DECLARE_IDTENTRY_RAW, but declares also the user C handler.
+ */
+#define DECLARE_IDTENTRY_HV(vector, func) \
+ DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func); \
+ __visible noinstr void kernel_##func(struct pt_regs *regs); \
+ __visible noinstr void user_##func(struct pt_regs *regs)
+
/**
* DEFINE_IDTENTRY_IST - Emit code for IST entry points
* @func: Function name of the entry point
@@ -376,6 +389,26 @@ static __always_inline void __##func(struct pt_regs *regs)
#define DEFINE_IDTENTRY_VC_USER(func) \
DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func)

+/**
+ * DEFINE_IDTENTRY_HV_KERNEL - Emit code for HV injection handler
+ * when raised from kernel mode
+ * @func: Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_KERNEL(func) \
+ DEFINE_IDTENTRY_RAW(kernel_##func)
+
+/**
+ * DEFINE_IDTENTRY_HV_USER - Emit code for HV injection handler
+ * when raised from user mode
+ * @func: Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_USER(func) \
+ DEFINE_IDTENTRY_RAW(user_##func)
+
#else /* CONFIG_X86_64 */

/**
@@ -465,6 +498,9 @@ __visible noinstr void func(struct pt_regs *regs, \
# define DECLARE_IDTENTRY_VC(vector, func) \
idtentry_vc vector asm_##func func

+# define DECLARE_IDTENTRY_HV(vector, func) \
+ idtentry_hv vector asm_##func func
+
#else
# define DECLARE_IDTENTRY_MCE(vector, func) \
DECLARE_IDTENTRY(vector, func)
@@ -622,9 +658,10 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
#endif

-/* #VC */
+/* #VC & #HV */
#ifdef CONFIG_AMD_MEM_ENCRYPT
DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication);
+DECLARE_IDTENTRY_HV(X86_TRAP_HV, exc_hv_injection);
#endif

#ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index e9e2c3ba5923..0bd7dab676c5 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -29,6 +29,7 @@
#define IST_INDEX_DB 2
#define IST_INDEX_MCE 3
#define IST_INDEX_VC 4
+#define IST_INDEX_HV 5

/*
* Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h
index f5d2325aa0b7..c6583631cecb 100644
--- a/arch/x86/include/asm/trapnr.h
+++ b/arch/x86/include/asm/trapnr.h
@@ -26,6 +26,7 @@
#define X86_TRAP_XF 19 /* SIMD Floating-Point Exception */
#define X86_TRAP_VE 20 /* Virtualization Exception */
#define X86_TRAP_CP 21 /* Control Protection Exception */
+#define X86_TRAP_HV 28 /* HV injected exception in SNP restricted mode */
#define X86_TRAP_VC 29 /* VMM Communication Exception */
#define X86_TRAP_IRET 32 /* IRET Exception */

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..6795d3e517d6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,6 +16,7 @@ asmlinkage __visible notrace
struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *eregs);
#endif

extern bool ibt_selftest(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..87afa3a4c8b1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2165,6 +2165,7 @@ static inline void tss_setup_ist(struct tss_struct *tss)
tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
/* Only mapped when SEV-ES is active */
tss->x86_tss.ist[IST_INDEX_VC] = __this_cpu_ist_top_va(VC);
+ tss->x86_tss.ist[IST_INDEX_HV] = __this_cpu_ist_top_va(HV);
}

#else /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 6c5defd6569a..23aa5912c87a 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -26,11 +26,14 @@ static const char * const exception_stack_names[] = {
[ ESTACK_MCE ] = "#MC",
[ ESTACK_VC ] = "#VC",
[ ESTACK_VC2 ] = "#VC2",
+ [ ESTACK_HV ] = "#HV",
+ [ ESTACK_HV2 ] = "#HV2",
+
};

const char *stack_type_name(enum stack_type type)
{
- BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+ BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);

if (type == STACK_TYPE_TASK)
return "TASK";
@@ -89,6 +92,8 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
EPAGERANGE(MCE),
EPAGERANGE(VC),
EPAGERANGE(VC2),
+ EPAGERANGE(HV),
+ EPAGERANGE(HV2),
};

static __always_inline bool in_exception_stack(unsigned long *stack, struct stack_info *info)
@@ -98,7 +103,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac
struct pt_regs *regs;
unsigned int k;

- BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+ BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);

begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
/*
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a58c6bc1cd68..48c0a7e1dbcb 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -113,6 +113,7 @@ static const __initconst struct idt_data def_idts[] = {

#ifdef CONFIG_AMD_MEM_ENCRYPT
ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
+ ISTG(X86_TRAP_HV, asm_exc_hv_injection, IST_INDEX_HV),
#endif

SYSG(X86_TRAP_OF, asm_exc_overflow),
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a428c62330d3..b54ee3ba37b0 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2004,6 +2004,59 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
irqentry_exit_to_user_mode(regs);
}

+static bool hv_raw_handle_exception(struct pt_regs *regs)
+{
+ return false;
+}
+
+static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
+{
+ unsigned long sp = (unsigned long)regs;
+
+ return (sp >= __this_cpu_ist_bottom_va(HV2) && sp < __this_cpu_ist_top_va(HV2));
+}
+
+DEFINE_IDTENTRY_HV_USER(exc_hv_injection)
+{
+ irqentry_enter_from_user_mode(regs);
+ instrumentation_begin();
+
+ if (!hv_raw_handle_exception(regs)) {
+ /*
+ * Do not kill the machine if user-space triggered the
+ * exception. Send SIGBUS instead and let user-space deal
+ * with it.
+ */
+ force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
+ }
+
+ instrumentation_end();
+ irqentry_exit_to_user_mode(regs);
+}
+
+DEFINE_IDTENTRY_HV_KERNEL(exc_hv_injection)
+{
+ irqentry_state_t irq_state;
+
+ irq_state = irqentry_nmi_enter(regs);
+ instrumentation_begin();
+
+ if (!hv_raw_handle_exception(regs)) {
+ pr_emerg("PANIC: Unhandled #HV exception in kernel space\n");
+
+ /* Show some debug info */
+ show_regs(regs);
+
+ /* Ask hypervisor to sev_es_terminate */
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
+ panic("Returned from Terminate-Request to Hypervisor\n");
+ }
+
+ instrumentation_end();
+ irqentry_nmi_exit(regs, irq_state);
+}
+
bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
{
unsigned long exit_code = regs->orig_ax;
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index 6c2f1b76a0b6..8b1670fa2b81 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -115,6 +115,8 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
cea_map_stack(VC);
cea_map_stack(VC2);
+ cea_map_stack(HV);
+ cea_map_stack(HV2);
}
}
}
--
2.25.1


2022-11-19 05:04:59

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 08/18] x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Vmbus int, synic and post message pages are shared with hypervisor
and so decrypt these pages in the sev-snp guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/connection.c | 13 +++++++++++++
drivers/hv/hv.c | 32 +++++++++++++++++++++++++++++++-
2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 9dc27e5d367a..43141225ea15 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -215,6 +215,15 @@ int vmbus_connect(void)
(void *)((unsigned long)vmbus_connection.int_page +
(HV_HYP_PAGE_SIZE >> 1));

+ if (hv_isolation_type_snp() || hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)
+ vmbus_connection.int_page, 1);
+ if (ret)
+ goto cleanup;
+
+ memset(vmbus_connection.int_page, 0, PAGE_SIZE);
+ }
+
/*
* Setup the monitor notification facility. The 1st page for
* parent->child and the 2nd page for child->parent
@@ -372,6 +381,10 @@ void vmbus_disconnect(void)
destroy_workqueue(vmbus_connection.work_queue);

if (vmbus_connection.int_page) {
+ if (hv_isolation_type_en_snp())
+ set_memory_encrypted((unsigned long)
+ vmbus_connection.int_page, 1);
+
hv_free_hyperv_page((unsigned long)vmbus_connection.int_page);
vmbus_connection.int_page = NULL;
}
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 4d6480d57546..f9111eb32739 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -20,6 +20,7 @@
#include <linux/interrupt.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
+#include <linux/set_memory.h>
#include "hyperv_vmbus.h"

/* The one and only */
@@ -117,7 +118,7 @@ int hv_post_message(union hv_connection_id connection_id,

int hv_synic_alloc(void)
{
- int cpu;
+ int cpu, ret;
struct hv_per_cpu_context *hv_cpu;

/*
@@ -168,6 +169,29 @@ int hv_synic_alloc(void)
pr_err("Unable to allocate post msg page\n");
goto err;
}
+
+ if (hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ ret |= set_memory_decrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ ret |= set_memory_decrypted((unsigned long)
+ hv_cpu->post_msg_page, 1);
+
+ if (ret) {
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ set_memory_encrypted((unsigned long)
+ hv_cpu->post_msg_page, 1);
+ goto err;
+ }
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
+ }
}

return 0;
@@ -188,6 +212,12 @@ void hv_synic_free(void)
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

+ if (hv_isolation_type_en_snp()) {
+ set_memory_encrypted((unsigned long)hv_cpu->synic_message_page, 1);
+ set_memory_encrypted((unsigned long)hv_cpu->synic_event_page, 1);
+ set_memory_encrypted((unsigned long)hv_cpu->post_msg_page, 1);
+ }
+
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
free_page((unsigned long)hv_cpu->post_msg_page);
--
2.25.1


2022-11-19 05:15:12

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 12/18] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Read processor amd memory info from specific address which are
populated by Hyper-V. Initialize smp cpu related ops, pvalidate
system memory and add it into e820 table.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 75 ++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 2ea4f21c6172..f0c97210c64a 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -34,6 +34,12 @@
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
#include <asm/coco.h>
+#include <asm/io_apic.h>
+#include <asm/svm.h>
+#include <asm/sev.h>
+#include <asm/sev-snp.h>
+#include <asm/realmode.h>
+#include <asm/e820/api.h>

/* Is Linux running as the root partition? */
bool hv_root_partition;
@@ -253,6 +259,33 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#endif

+static __init int hv_snp_set_rtc_noop(const struct timespec64 *now) { return -EINVAL; }
+static __init void hv_snp_get_rtc_noop(struct timespec64 *now) { }
+
+static u32 processor_count;
+
+static __init void hv_snp_get_smp_config(unsigned int early)
+{
+ if (!early) {
+ while (num_processors < processor_count) {
+ early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
+ early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
+ physid_set(num_processors, phys_cpu_present_map);
+ set_cpu_possible(num_processors, true);
+ set_cpu_present(num_processors, true);
+ num_processors++;
+ }
+ }
+}
+
+struct memory_map_entry {
+ u64 starting_gpn;
+ u64 numpages;
+ u16 type;
+ u16 flags;
+ u32 reserved;
+};
+
static void __init ms_hyperv_init_platform(void)
{
int hv_max_functions_eax;
@@ -260,6 +293,11 @@ static void __init ms_hyperv_init_platform(void)
int hv_host_info_ebx;
int hv_host_info_ecx;
int hv_host_info_edx;
+ struct memory_map_entry *entry;
+ struct e820_entry *e820_entry;
+ u64 e820_end;
+ u64 ram_end;
+ u64 page;

#ifdef CONFIG_PARAVIRT
pv_info.name = "Hyper-V";
@@ -477,6 +515,43 @@ static void __init ms_hyperv_init_platform(void)
if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
mark_tsc_unstable("running on Hyper-V");

+ if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+ x86_platform.legacy.reserve_bios_regions = 0;
+ x86_platform.set_wallclock = hv_snp_set_rtc_noop;
+ x86_platform.get_wallclock = hv_snp_get_rtc_noop;
+ x86_init.resources.probe_roms = x86_init_noop;
+ x86_init.resources.reserve_resources = x86_init_noop;
+ x86_init.mpparse.find_smp_config = x86_init_noop;
+ x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
+
+ /*
+ * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
+ * and legacy APIC page read/write. Switch to hv apic here.
+ */
+ disable_ioapic_support();
+ hv_apic_init();
+
+ processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
+
+ entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
+ + sizeof(struct memory_map_entry));
+
+ for (; entry->numpages != 0; entry++) {
+ e820_entry = &e820_table->entries[e820_table->nr_entries - 1];
+ e820_end = e820_entry->addr + e820_entry->size;
+ ram_end = (entry->starting_gpn + entry->numpages) * PAGE_SIZE;
+
+ if (e820_end < entry->starting_gpn * PAGE_SIZE)
+ e820_end = entry->starting_gpn * PAGE_SIZE;
+ if (e820_end < ram_end) {
+ pr_info("Hyper-V: add [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
+ e820__range_add(e820_end, ram_end - e820_end, E820_TYPE_RAM);
+ for (page = e820_end; page < ram_end; page += PAGE_SIZE)
+ pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
+ }
+ }
+ }
+
hardlockup_detector_disable();
}

--
2.25.1


2022-11-19 05:21:22

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V2 17/18] x86/sev: optimize system vector processing invoked from #HV exception

From: Ashish Kalra <[email protected]>

Construct system vector table and dispatch system vector exceptions through
sysvec_table from #HV exception handler instead of explicitly calling each
system vector. The system vector table is created dynamically and is placed
in a new named ELF section.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/entry/entry_64.S | 6 +++
arch/x86/kernel/sev.c | 70 +++++++++++++----------------------
arch/x86/kernel/vmlinux.lds.S | 7 ++++
3 files changed, 38 insertions(+), 45 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index fe460cf44ab5..05f4fcc60652 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -409,6 +409,12 @@ SYM_CODE_START(\asmsym)

_ASM_NOKPROBE(\asmsym)
SYM_CODE_END(\asmsym)
+ .if \vector >= FIRST_SYSTEM_VECTOR && \vector < NR_VECTORS
+ .section .system_vectors, "aw"
+ .byte \vector
+ .quad \cfunc
+ .previous
+ .endif
.endm

/*
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 23cd025f97dc..5a2f59022c98 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -157,6 +157,16 @@ struct sev_snp_runtime_data {

static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);

+static void (*sysvec_table[NR_VECTORS - FIRST_SYSTEM_VECTOR])
+ (struct pt_regs *regs) __ro_after_init;
+
+struct sysvec_entry {
+ unsigned char vector;
+ void (*sysvec_func)(struct pt_regs *regs);
+} __packed;
+
+extern struct sysvec_entry __system_vectors[], __system_vectors_end[];
+
static inline u64 sev_es_rd_ghcb_msr(void)
{
return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
@@ -222,51 +232,11 @@ static void do_exc_hv(struct pt_regs *regs)
} else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
WARN(1, "syscall shouldn't happen\n");
} else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
- switch (pending_events.vector) {
-#if IS_ENABLED(CONFIG_HYPERV)
- case HYPERV_STIMER0_VECTOR:
- sysvec_hyperv_stimer0(regs);
- break;
- case HYPERVISOR_CALLBACK_VECTOR:
- sysvec_hyperv_callback(regs);
- break;
-#endif
-#ifdef CONFIG_SMP
- case RESCHEDULE_VECTOR:
- sysvec_reschedule_ipi(regs);
- break;
- case IRQ_MOVE_CLEANUP_VECTOR:
- sysvec_irq_move_cleanup(regs);
- break;
- case REBOOT_VECTOR:
- sysvec_reboot(regs);
- break;
- case CALL_FUNCTION_SINGLE_VECTOR:
- sysvec_call_function_single(regs);
- break;
- case CALL_FUNCTION_VECTOR:
- sysvec_call_function(regs);
- break;
-#endif
-#ifdef CONFIG_X86_LOCAL_APIC
- case ERROR_APIC_VECTOR:
- sysvec_error_interrupt(regs);
- break;
- case SPURIOUS_APIC_VECTOR:
- sysvec_spurious_apic_interrupt(regs);
- break;
- case LOCAL_TIMER_VECTOR:
- sysvec_apic_timer_interrupt(regs);
- break;
- case X86_PLATFORM_IPI_VECTOR:
- sysvec_x86_platform_ipi(regs);
- break;
-#endif
- case 0x0:
- break;
- default:
- panic("Unexpected vector %d\n", vector);
- unreachable();
+ if (!(sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])) {
+ WARN(1, "system vector entry 0x%x is NULL\n",
+ pending_events.vector);
+ } else {
+ (*sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])(regs);
}
} else {
common_interrupt(regs, pending_events.vector);
@@ -381,6 +351,14 @@ static bool sev_restricted_injection_enabled(void)
return sev_status & MSR_AMD64_SEV_RESTRICTED_INJECTION_ENABLED;
}

+static void __init construct_sysvec_table(void)
+{
+ struct sysvec_entry *p;
+
+ for (p = __system_vectors; p < __system_vectors_end; p++)
+ sysvec_table[p->vector - FIRST_SYSTEM_VECTOR] = p->sysvec_func;
+}
+
void __init sev_snp_init_hv_handling(void)
{
struct sev_snp_runtime_data *snp_data;
@@ -405,6 +383,8 @@ void __init sev_snp_init_hv_handling(void)
apic_set_eoi_write(hv_doorbell_apic_eoi_write);

local_irq_restore(flags);
+
+ construct_sysvec_table();
}

static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 15f29053cec4..160c7cdaa3de 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -322,6 +322,13 @@ SECTIONS
*(.altinstr_replacement)
}

+ . = ALIGN(8);
+ .system_vectors : AT(ADDR(.system_vectors) - LOAD_OFFSET) {
+ __system_vectors = .;
+ *(.system_vectors)
+ __system_vectors_end = .;
+ }
+
. = ALIGN(8);
.apicdrivers : AT(ADDR(.apicdrivers) - LOAD_OFFSET) {
__apicdrivers = .;
--
2.25.1


2022-12-12 18:20:42

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 02/18] x86/hyperv: Add sev-snp enlightened guest specific config

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>
> Introduce static key isolation_type_en_snp for enlightened
> guest check and add some specific options in ms_hyperv_init_
> platform().
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/hyperv/ivm.c | 12 +++++++++++-
> arch/x86/include/asm/mshyperv.h | 2 ++
> arch/x86/kernel/cpu/mshyperv.c | 29 ++++++++++++++++++++++++-----
> drivers/hv/hv_common.c | 7 +++++++
> 4 files changed, 44 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 1dbcbd9da74d..e9c30dad3419 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -259,10 +259,20 @@ bool hv_is_isolation_supported(void)
> }
>
> DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
> +
> +/*
> + * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
> + * isolation enlightened VM.
> + */
> +bool hv_isolation_type_en_snp(void)
> +{
> + return static_branch_unlikely(&isolation_type_en_snp);
> +}
>
> /*
> * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
> - * isolation VM.
> + * isolation VM with vTOM support.
> */
> bool hv_isolation_type_snp(void)
> {
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 61f0c206bff0..9b8c3f638845 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -14,6 +14,7 @@
> union hv_ghcb;
>
> DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
> +DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
>
> typedef int (*hyperv_fill_flush_list_func)(
> struct hv_guest_mapping_flush_list *flush,
> @@ -32,6 +33,7 @@ extern u64 hv_current_partition_id;
>
> extern union hv_ghcb * __percpu *hv_ghcb_pg;
>
> +extern bool hv_isolation_type_en_snp(void);

This file also has a declaration for hv_isolation_type_snp(). I
think this new declaration is near the beginning of this file so
that it can be referenced by hv_do_hypercall() and related
functions in Patch 6 of this series. So maybe move the
declaration of hv_isolation_type_snp() up so it is adjacent
to this one? It would make sense for the two to be together.

> int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 831613959a92..2ea4f21c6172 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -273,6 +273,21 @@ static void __init ms_hyperv_init_platform(void)
> ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
> ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
>
> + /*
> + * Add custom configuration for SEV-SNP Enlightened guest
> + */
> + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> + ms_hyperv.features |= HV_ACCESS_FREQUENCY_MSRS;
> + ms_hyperv.misc_features |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
> + ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> + ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
> + ms_hyperv.hints |= HV_X64_APIC_ACCESS_RECOMMENDED;
> + ms_hyperv.hints |= HV_X64_CLUSTER_IPI_RECOMMENDED;
> + }
> +
> + pr_info("Hyper-V: enlightment features 0x%x, hints 0x%x, misc 0x%x\n",
> + ms_hyperv.features, ms_hyperv.hints, ms_hyperv.misc_features);
> +

What's the reason for this additional call to pr_info()? There's a call to pr_info()
a couple of lines below that displays the same information, and a little bit more.
It seems like the above call should be deleted, as I think we should try to be as
consistent as possible in the output.

> hv_max_functions_eax = cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS);
>
> pr_info("Hyper-V: privilege flags low 0x%x, high 0x%x, hints 0x%x, misc 0x%x\n",
> @@ -328,18 +343,22 @@ static void __init ms_hyperv_init_platform(void)
> ms_hyperv.shared_gpa_boundary =
> BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
>
> - pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> - ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> -
> - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> + static_branch_enable(&isolation_type_en_snp);
> + } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> static_branch_enable(&isolation_type_snp);
> #ifdef CONFIG_SWIOTLB
> swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
> #endif
> }
> +
> + pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> + ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> +

Is there a reason for moving this pr_info() down a few lines? I can't see that the
intervening code changes any of the settings that are displayed, so it should be
good in the original location. Just trying to minimize changes that don't add value ...

> /* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
> if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> - if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
> + if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE
> + && !cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> cc_set_vendor(CC_VENDOR_HYPERV);
> }
> }
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index ae68298c0dca..2c6602571c47 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -268,6 +268,13 @@ bool __weak hv_isolation_type_snp(void)
> }
> EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
>
> +bool __weak hv_isolation_type_en_snp(void)
> +{
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
> +
> +

Nit: One blank line is sufficient. Don't need to add two.

> void __weak hv_setup_vmbus_handler(void (*handler)(void))
> {
> }
> --
> 2.25.1

2022-12-13 00:18:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 05/18] x86/hyperv: Get Virtual Trust Level via hvcall

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>
> sev-snp guest provides vtl(Virtual Trust Level) and get it from
> hyperv hvcall via HVCALL_GET_VP_REGISTERS.

Two general comments:

1) Could this patch be combined with Patch 9 of the series? It seems
like they go together since Patch 9 is the consumer of the VTL.

2) What is the bigger picture motivation for this patch and Patch 9
being part of the patch series for support fully enlightened SEV-SNP
guests? Won't the VTL always be 0 in such a guest? The code currently
assumes VTL 0, so it seems like this patch doesn't change anything. Or
maybe there's a scenario that I'm not aware of where the VTL is
other than 0.

>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/hyperv/hv_init.c | 35 ++++++++++++++++++++++++++++++++++
> include/asm-generic/mshyperv.h | 2 ++
> 2 files changed, 37 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 4600c5941957..5b919d4d24c0 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -390,6 +390,39 @@ static void __init hv_get_partition_id(void)
> local_irq_restore(flags);
> }
>
> +static u8 __init get_current_vtl(void)

The name get_current_vtl() seems to imply that there might be a
"previous" VTL, or that the VTL might change over time. I'm not aware
that either is the case. Couldn't this just be get_vtl()?

> +{
> + u64 control = ((u64)1 << HV_HYPERCALL_REP_COMP_OFFSET) | HVCALL_GET_VP_REGISTERS;

Simplify by using HV_HYPERCALL_REP_COMP_1.

> + struct hv_get_vp_registers_input *input = NULL;
> + struct hv_get_vp_registers_output *output = NULL;

It doesn't seem like the above two initializations to NULL are needed.

> + u8 vtl = 0;
> + int ret;

The result of hv_do_hypercall() should always be a u64.

> + unsigned long flags;
> +
> + local_irq_save(flags);
> + input = *(struct hv_get_vp_registers_input **)this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = (struct hv_get_vp_registers_output *)input;
> + if (!input || !output) {

Don't need to check both values since one is assigned from the other. :-)

> + pr_err("Hyper-V: cannot allocate a shared page!");

Error message text isn't correct.

> + goto done;

Need to do local_irq_restore() before goto done.

> + }
> +
> + memset(input, 0, sizeof(*input) + sizeof(input->element[0]));
> + input->header.partitionid = HV_PARTITION_ID_SELF;
> + input->header.inputvtl = 0;
> + input->element[0].name0 = 0x000D0003;

This constant should go in one of the hyperv-tlfs.h header files. If I
recall correctly, we're currently treating VTLs as x86-specific, so should
go in arch/x86/include/asm/hyperv-tlfs.h.

> +
> + ret = hv_do_hypercall(control, input, output);
> + if (ret == 0)

Use hv_result_success(ret).

> + vtl = output->as64.low & 0xf;

The 0xF mask should be defined in the hyperv-tlfs.h per
above.

> + else
> + pr_err("Hyper-V: failed to get the current VTL!");

Again, drop the word "current". And the exclamation mark isn't needed. :-)

> + local_irq_restore(flags);
> +
> +done:
> + return vtl;
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -527,6 +560,8 @@ void __init hyperv_init(void)
> if (hv_is_isolation_supported())
> swiotlb_update_mem_attributes();
> #endif
> + /* Find the current VTL */
> + ms_hyperv.vtl = get_current_vtl();

Drop "current" in the comment and function name.

>
> return;
>
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index bfb9eb9d7215..68133de044ec 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -46,6 +46,7 @@ struct ms_hyperv_info {
> };
> };
> u64 shared_gpa_boundary;
> + u8 vtl;
> };
> extern struct ms_hyperv_info ms_hyperv;
>
> @@ -55,6 +56,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
> extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
> extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> extern bool hv_isolation_type_snp(void);
> +extern bool hv_isolation_type_en_snp(void);

This declaration of hv_isolation_type_en_snp() shouldn't be needed here
as it has already been added to arch/x86/include/asm/mshyperv.h.

The declaration of hv_isolation_type_snp() occurs both places, but I
think that's some sloppiness from the past that could be fixed. In fact,
hv_isolation_type_snp() occurs *twice* in include/asm-generic/mshyperv.h.

>
> /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall
> status. */
> static inline int hv_result(u64 status)
> --
> 2.25.1

2022-12-13 07:47:18

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V2 18/18] x86/sev: Fix interrupt exit code paths from #HV exception


> Add checks in interrupt exit code paths in case of returns
> to user mode to check if currently executing the #HV handler
> then don't follow the irqentry_exit_to_user_mode path as
> that can potentially cause the #HV handler to be
> preempted and rescheduled on another CPU. Rescheduled #HV
> handler on another cpu will cause interrupts to be handled
> on a different cpu than the injected one, causing
> invalid EOIs and missed/lost guest interrupts and
> corresponding hangs and/or per-cpu IRQs handled on
> non-intended cpu.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
> arch/x86/kernel/sev.c | 30 +++++++++++++++
> 2 files changed, 96 insertions(+)
>
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 652fea10d377..45b47132be7c 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -13,6 +13,10 @@
>
> #include <asm/irq_stack.h>
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);

For linux host, CONFIG_AMD_MEM_ENCRYPT also gets enabled at host side
(for SME) and 'irqentry_exit_hv_cond' gets called. So, we need to handle
the below cases even when host CONFIG_AMD_MEM_ENCRYPT is enabled?

Thanks,
Pankaj

> +#endif
> +
> /**
> * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
> * No error code pushed by hardware
> @@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
> #define DECLARE_IDTENTRY_IRQ(vector, func) \
> DECLARE_IDTENTRY_ERRORCODE(vector, func)
>
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
> /**
> * DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
> * @func: Function name of the entry point
> @@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
> } \
> \
> static noinline void __##func(struct pt_regs *regs, u32 vector)
> +#else
> +
> +#define DEFINE_IDTENTRY_IRQ(func) \
> +static void __##func(struct pt_regs *regs, u32 vector); \
> + \
> +__visible noinstr void func(struct pt_regs *regs, \
> + unsigned long error_code) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + u32 vector = (u32)(u8)error_code; \
> + \
> + instrumentation_begin(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + run_irq_on_irqstack_cond(__##func, regs, vector); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static noinline void __##func(struct pt_regs *regs, u32 vector)
> +#endif
>
> /**
> * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
> @@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
> #define DECLARE_IDTENTRY_SYSVEC(vector, func) \
> DECLARE_IDTENTRY(vector, func)
>
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
> /**
> * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
> * @func: Function name of the entry point
> @@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
> } \
> \
> static noinline void __##func(struct pt_regs *regs)
> +#else
> +
> +#define DEFINE_IDTENTRY_SYSVEC(func) \
> +static void __##func(struct pt_regs *regs); \
> + \
> +__visible noinstr void func(struct pt_regs *regs) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + \
> + instrumentation_begin(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + run_sysvec_on_irqstack_cond(__##func, regs); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static noinline void __##func(struct pt_regs *regs)
> +#endif
> +
> +#ifndef CONFIG_AMD_MEM_ENCRYPT
>
> /**
> * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
> @@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
> } \
> \
> static __always_inline void __##func(struct pt_regs *regs)
> +#else
> +
> +#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
> +static __always_inline void __##func(struct pt_regs *regs); \
> + \
> +__visible noinstr void func(struct pt_regs *regs) \
> +{ \
> + irqentry_state_t state = irqentry_enter(regs); \
> + \
> + instrumentation_begin(); \
> + __irq_enter_raw(); \
> + kvm_set_cpu_l1tf_flush_l1d(); \
> + __##func(regs); \
> + __irq_exit_raw(); \
> + instrumentation_end(); \
> + irqentry_exit_hv_cond(regs, state); \
> +} \
> + \
> +static __always_inline void __##func(struct pt_regs *regs)
> +#endif
>
> /**
> * DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 5a2f59022c98..ef6a123c50fe 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -153,6 +153,10 @@ struct sev_hv_doorbell_page {
>
> struct sev_snp_runtime_data {
> struct sev_hv_doorbell_page hv_doorbell_page;
> + /*
> + * Indication that we are currently handling #HV events.
> + */
> + bool hv_handling_events;
> };
>
> static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> @@ -206,6 +210,8 @@ static void do_exc_hv(struct pt_regs *regs)
> union hv_pending_events pending_events;
> u8 vector;
>
> + this_cpu_read(snp_runtime_data)->hv_handling_events = true;
> +
> while (sev_hv_pending()) {
> asm volatile("cli" : : : "memory");
>
> @@ -244,6 +250,8 @@ static void do_exc_hv(struct pt_regs *regs)
>
> asm volatile("sti" : : : "memory");
> }
> +
> + this_cpu_read(snp_runtime_data)->hv_handling_events = false;
> }
>
> void check_hv_pending(struct pt_regs *regs)
> @@ -2541,3 +2549,25 @@ static int __init snp_init_platform_device(void)
> return 0;
> }
> device_initcall(snp_init_platform_device);
> +
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
> +{
> + /*
> + * Check whether this returns to user mode, if so and if
> + * we are currently executing the #HV handler then we don't
> + * want to follow the irqentry_exit_to_user_mode path as
> + * that can potentially cause the #HV handler to be
> + * preempted and rescheduled on another CPU. Rescheduled #HV
> + * handler on another cpu will cause interrupts to be handled
> + * on a different cpu than the injected one, causing
> + * invalid EOIs and missed/lost guest interrupts and
> + * corresponding hangs and/or per-cpu IRQs handled on
> + * non-intended cpu.
> + */
> + if (user_mode(regs) &&
> + this_cpu_read(snp_runtime_data)->hv_handling_events)
> + return;
> +
> + /* follow normal interrupt return/exit path */
> + irqentry_exit(regs, state);
> +}

2022-12-13 10:08:57

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V2 02/18] x86/hyperv: Add sev-snp enlightened guest specific config

On 12/13/2022 1:56 AM, Michael Kelley (LINUX) wrote:
>> @@ -32,6 +33,7 @@ extern u64 hv_current_partition_id;
>>
>> extern union hv_ghcb * __percpu *hv_ghcb_pg;
>>
>> +extern bool hv_isolation_type_en_snp(void);
> This file also has a declaration for hv_isolation_type_snp(). I
> think this new declaration is near the beginning of this file so
> that it can be referenced by hv_do_hypercall() and related
> functions in Patch 6 of this series. So maybe move the
> declaration of hv_isolation_type_snp() up so it is adjacent
> to this one? It would make sense for the two to be together.

Agree. Will update in the next version.

>
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index 831613959a92..2ea4f21c6172 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -273,6 +273,21 @@ static void __init ms_hyperv_init_platform(void)
>> ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
>> ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
>>
>> + /*
>> + * Add custom configuration for SEV-SNP Enlightened guest
>> + */
>> + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
>> + ms_hyperv.features |= HV_ACCESS_FREQUENCY_MSRS;
>> + ms_hyperv.misc_features |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
>> + ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
>> + ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
>> + ms_hyperv.hints |= HV_X64_APIC_ACCESS_RECOMMENDED;
>> + ms_hyperv.hints |= HV_X64_CLUSTER_IPI_RECOMMENDED;
>> + }
>> +
>> + pr_info("Hyper-V: enlightment features 0x%x, hints 0x%x, misc 0x%x\n",
>> + ms_hyperv.features, ms_hyperv.hints, ms_hyperv.misc_features);
>> +
> What's the reason for this additional call to pr_info()? There's a call to pr_info()
> a couple of lines below that displays the same information, and a little bit more.
> It seems like the above call should be deleted, as I think we should try to be as
> consistent as possible in the output.

Sorry for noise. This one should be redundant. Will remove in the next
version.

>
>> @@ -328,18 +343,22 @@ static void __init ms_hyperv_init_platform(void)
>> ms_hyperv.shared_gpa_boundary =
>> BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
>>
>> - pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>> - ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>> -
>> - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>> + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
>> + static_branch_enable(&isolation_type_en_snp);
>> + } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>> static_branch_enable(&isolation_type_snp);
>> #ifdef CONFIG_SWIOTLB
>> swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
>> #endif
>> }
>> +
>> + pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>> + ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>> +
> Is there a reason for moving this pr_info() down a few lines? I can't see that the
> intervening code changes any of the settings that are displayed, so it should be
> good in the original location. Just trying to minimize changes that don't add value ...
>

Agree. Will keep previous order. Thanks.

2022-12-13 17:40:05

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 07/18] clocksource: hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>

Previous patches to the Hyper-V clocksource driver have not been very
consistent in the Subject line prefix, but let's use
"clocksource/drivers/hyper-v:" since it has been used the most.

> Hyper-V tsc page is shared with hypervisor and it should be decrypted
> in sev-snp enlightened guest when it's used.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> drivers/clocksource/hyperv_timer.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index bb47610bbd1c..aa68eebed5ee 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -364,7 +364,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
> static union {
> struct ms_hyperv_tsc_page page;
> u8 reserved[PAGE_SIZE];
> -} tsc_pg __aligned(PAGE_SIZE);
> +} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);
>
> struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> {
> --
> 2.25.1

2022-12-13 18:17:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 08/18] x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>

The Subject prefix for this patch should be "Drivers: hv: vmbus:"

> Vmbus int, synic and post message pages are shared with hypervisor
> and so decrypt these pages in the sev-snp guest.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> drivers/hv/connection.c | 13 +++++++++++++
> drivers/hv/hv.c | 32 +++++++++++++++++++++++++++++++-
> 2 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 9dc27e5d367a..43141225ea15 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -215,6 +215,15 @@ int vmbus_connect(void)
> (void *)((unsigned long)vmbus_connection.int_page +
> (HV_HYP_PAGE_SIZE >> 1));
>
> + if (hv_isolation_type_snp() || hv_isolation_type_en_snp()) {

This decryption should be done only for a fully enlightened SEV-SNP
guest, not for a vTOM guest.

> + ret = set_memory_decrypted((unsigned long)
> + vmbus_connection.int_page, 1);
> + if (ret)
> + goto cleanup;

This cleanup path doesn't work correctly. It calls
vmbus_disconnect(), which will try to re-encrypt the memory.
But if the original decryption failed, re-encrypting is the wrong
thing to do.

It looks like this same bug exists in current code if the decryption
of the monitor pages fails or if just one of the original memory
allocations fails. vmbus_disconnect() doesn't know whether it
should re-encrypt the pages.

> +
> + memset(vmbus_connection.int_page, 0, PAGE_SIZE);
> + }
> +
> /*
> * Setup the monitor notification facility. The 1st page for
> * parent->child and the 2nd page for child->parent
> @@ -372,6 +381,10 @@ void vmbus_disconnect(void)
> destroy_workqueue(vmbus_connection.work_queue);
>
> if (vmbus_connection.int_page) {
> + if (hv_isolation_type_en_snp())
> + set_memory_encrypted((unsigned long)
> + vmbus_connection.int_page, 1);
> +
> hv_free_hyperv_page((unsigned long)vmbus_connection.int_page);
> vmbus_connection.int_page = NULL;
> }
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 4d6480d57546..f9111eb32739 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -20,6 +20,7 @@
> #include <linux/interrupt.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/mshyperv.h>
> +#include <linux/set_memory.h>
> #include "hyperv_vmbus.h"
>
> /* The one and only */
> @@ -117,7 +118,7 @@ int hv_post_message(union hv_connection_id connection_id,
>
> int hv_synic_alloc(void)
> {
> - int cpu;
> + int cpu, ret;
> struct hv_per_cpu_context *hv_cpu;
>
> /*
> @@ -168,6 +169,29 @@ int hv_synic_alloc(void)
> pr_err("Unable to allocate post msg page\n");
> goto err;
> }
> +
> + if (hv_isolation_type_en_snp()) {
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + ret |= set_memory_decrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + ret |= set_memory_decrypted((unsigned long)
> + hv_cpu->post_msg_page, 1);
> +
> + if (ret) {
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + set_memory_encrypted((unsigned long)
> + hv_cpu->post_msg_page, 1);
> + goto err;

Same kind of cleanup problem here. Some of the memory may have
been decrypted, but some may not have. Re-encrypting all three pages
risks re-encrypting a page that failed to be decrypted, and that might
cause problems.

> + }
> +
> + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> + memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
> + }
> }
>
> return 0;
> @@ -188,6 +212,12 @@ void hv_synic_free(void)
> struct hv_per_cpu_context *hv_cpu
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> + if (hv_isolation_type_en_snp()) {
> + set_memory_encrypted((unsigned long)hv_cpu->synic_message_page, 1);
> + set_memory_encrypted((unsigned long)hv_cpu->synic_event_page, 1);
> + set_memory_encrypted((unsigned long)hv_cpu->post_msg_page, 1);

This cleanup doesn't always work correctly. There are multiple memory
allocations in hv_synic_alloc(). If some succeeded, but some failed, then
might get here with some memory that was allocated but not decrypted.
Trying to re-encrypt that memory before freeing it could cause problems.

> + }
> +
> free_page((unsigned long)hv_cpu->synic_event_page);
> free_page((unsigned long)hv_cpu->synic_message_page);
> free_page((unsigned long)hv_cpu->post_msg_page);
> --
> 2.25.1

2022-12-14 16:53:23

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V2 07/18] clocksource: hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest

On 12/14/2022 1:30 AM, Michael Kelley (LINUX) wrote:
> From: Tianyu Lan<[email protected]> Sent: Friday, November 18, 2022 7:46 PM
> Previous patches to the Hyper-V clocksource driver have not been very
> consistent in the Subject line prefix, but let's use
> "clocksource/drivers/hyper-v:" since it has been used the most.
>

OK. Will update in the next version.

Thanks.

2022-12-14 18:39:08

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 11/18] Drivers: hv: vmbus: Decrypt vmbus ring buffer

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>
> The ring buffer is remapped in the hv_ringbuffer_init()
> and it should be with decrypt flag in order to share it
> with hypervisor in sev-snp enlightened guest.

FWIW, the change in this patch is included in Patch 9
in my vTOM-related patch series.

>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> drivers/hv/ring_buffer.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 59a4aa86d1f3..391995c76be7 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -18,6 +18,7 @@
> #include <linux/slab.h>
> #include <linux/prefetch.h>
> #include <linux/io.h>
> +#include <linux/set_memory.h>
> #include <asm/mshyperv.h>
>
> #include "hyperv_vmbus.h"
> @@ -233,14 +234,18 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>
> ring_info->ring_buffer = (struct hv_ring_buffer *)
> vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
> + hv_isolation_type_en_snp() ?
> + pgprot_decrypted(PAGE_KERNEL_NOENC) :
> PAGE_KERNEL);

Note that pgprot_decrypted(PAGE_KERNEL_NOENC) can always be
used, as it does nothing in a normal VM. Its use doesn't need to be
conditional on the isolation type.

>
> + if (hv_isolation_type_en_snp())
> + memset(ring_info->ring_buffer, 0x00, page_cnt * PAGE_SIZE);

My version of the change always does the zero'ing, but only of
the first page, as Hyper-V expects the ring buffer header page to be
clean. The initial contents of the rest of the ring buffer doesn't
really matter.

> +
> kfree(pages_wraparound);
> if (!ring_info->ring_buffer)
> return -ENOMEM;
> }
>
> -

The above looks like a spurious whitespace change.

> ring_info->ring_buffer->read_index =
> ring_info->ring_buffer->write_index = 0;
>
> --
> 2.25.1

2022-12-26 04:27:31

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V2 08/18] x86/hyperv: decrypt vmbus pages for sev-snp enlightened guest

On 12/14/2022 2:08 AM, Michael Kelley (LINUX) wrote:
> From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>>
>
> The Subject prefix for this patch should be "Drivers: hv: vmbus:"

Sure. Will update in the next version.

>
>> Vmbus int, synic and post message pages are shared with hypervisor
>> and so decrypt these pages in the sev-snp guest.
>>
>> Signed-off-by: Tianyu Lan <[email protected]>
>> ---
>> drivers/hv/connection.c | 13 +++++++++++++
>> drivers/hv/hv.c | 32 +++++++++++++++++++++++++++++++-
>> 2 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index 9dc27e5d367a..43141225ea15 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -215,6 +215,15 @@ int vmbus_connect(void)
>> (void *)((unsigned long)vmbus_connection.int_page +
>> (HV_HYP_PAGE_SIZE >> 1));
>>
>> + if (hv_isolation_type_snp() || hv_isolation_type_en_snp()) {
>
> This decryption should be done only for a fully enlightened SEV-SNP
> guest, not for a vTOM guest.
>
>> + ret = set_memory_decrypted((unsigned long)
>> + vmbus_connection.int_page, 1);
>> + if (ret)
>> + goto cleanup;
>
> This cleanup path doesn't work correctly. It calls
> vmbus_disconnect(), which will try to re-encrypt the memory.
> But if the original decryption failed, re-encrypting is the wrong
> thing to do.
>
> It looks like this same bug exists in current code if the decryption
> of the monitor pages fails or if just one of the original memory
> allocations fails. vmbus_disconnect() doesn't know whether it
> should re-encrypt the pages.

Agree. It's necessary to handle decryption failure case by case in stead
of re-encryting all pages. Will fix this in the next version. Thanks to
point out.

2022-12-26 08:14:47

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V2 11/18] Drivers: hv: vmbus: Decrypt vmbus ring buffer

On 12/15/2022 2:25 AM, Michael Kelley (LINUX) wrote:
> From: Tianyu Lan<[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>> The ring buffer is remapped in the hv_ringbuffer_init()
>> and it should be with decrypt flag in order to share it
>> with hypervisor in sev-snp enlightened guest.
> FWIW, the change in this patch is included in Patch 9
> in my vTOM-related patch series.
>

I will rebase next version on your series. Thanks for reminder.

2022-12-28 17:17:46

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 12/18] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>
> Read processor amd memory info from specific address which are
> populated by Hyper-V. Initialize smp cpu related ops, pvalidate
> system memory and add it into e820 table.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/kernel/cpu/mshyperv.c | 75 ++++++++++++++++++++++++++++++++++
> 1 file changed, 75 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 2ea4f21c6172..f0c97210c64a 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -34,6 +34,12 @@
> #include <clocksource/hyperv_timer.h>
> #include <asm/numa.h>
> #include <asm/coco.h>
> +#include <asm/io_apic.h>
> +#include <asm/svm.h>
> +#include <asm/sev.h>
> +#include <asm/sev-snp.h>
> +#include <asm/realmode.h>
> +#include <asm/e820/api.h>
>
> /* Is Linux running as the root partition? */
> bool hv_root_partition;
> @@ -253,6 +259,33 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> }
> #endif
>
> +static __init int hv_snp_set_rtc_noop(const struct timespec64 *now) { return -EINVAL; }
> +static __init void hv_snp_get_rtc_noop(struct timespec64 *now) { }

These two functions duplicate set_rtc_noop() and get_rtc_noop() in x86_init.c. Couldn't
the "static" be removed from these two, and just use them, like with x86_init_noop()?
You'd also need to add extern statements in x86_init.h. But making them like the
other *_init_noop() functions seems better than duplicating them.

Also, it looks like these two functions might be called well after Linux is booted, so
having them marked as "__init" seems problematic. The same is true for set_rtc_noop()
and get_rtc_noop().

I'd suggesting breaking out this RTC handling into a separate patch in the series.

> +
> +static u32 processor_count;
> +
> +static __init void hv_snp_get_smp_config(unsigned int early)
> +{
> + if (!early) {
> + while (num_processors < processor_count) {
> + early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
> + early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
> + physid_set(num_processors, phys_cpu_present_map);
> + set_cpu_possible(num_processors, true);
> + set_cpu_present(num_processors, true);
> + num_processors++;
> + }
> + }
> +}
> +
> +struct memory_map_entry {
> + u64 starting_gpn;
> + u64 numpages;
> + u16 type;
> + u16 flags;
> + u32 reserved;
> +};
> +
> static void __init ms_hyperv_init_platform(void)
> {
> int hv_max_functions_eax;
> @@ -260,6 +293,11 @@ static void __init ms_hyperv_init_platform(void)
> int hv_host_info_ebx;
> int hv_host_info_ecx;
> int hv_host_info_edx;
> + struct memory_map_entry *entry;
> + struct e820_entry *e820_entry;
> + u64 e820_end;
> + u64 ram_end;
> + u64 page;
>
> #ifdef CONFIG_PARAVIRT
> pv_info.name = "Hyper-V";
> @@ -477,6 +515,43 @@ static void __init ms_hyperv_init_platform(void)
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
>
> + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> + x86_platform.legacy.reserve_bios_regions = 0;
> + x86_platform.set_wallclock = hv_snp_set_rtc_noop;
> + x86_platform.get_wallclock = hv_snp_get_rtc_noop;

Should x86_platform.legacy.rtc be set to 0 as well so that add_rtc_cmos()
does not try to register the RTC as a platform device?

> + x86_init.resources.probe_roms = x86_init_noop;
> + x86_init.resources.reserve_resources = x86_init_noop;

reserve_bios_regions, probe_roms, and reserve_resources are all being
set to 0 or NULL. Seems like there should be a comment or text in the
commit message saying why. I can work with you offline to write a concise
message.

> + x86_init.mpparse.find_smp_config = x86_init_noop;
> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
> +
> + /*
> + * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
> + * and legacy APIC page read/write. Switch to hv apic here.
> + */
> + disable_ioapic_support();
> + hv_apic_init();
> +
> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);

EN_SEV_SNP_PROCESSOR_INFO_ADDR is not defined until Patch 13 of this series.

> +
> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
> + + sizeof(struct memory_map_entry));

This calculation isn't what I expected. A brief summary of the layout of the memory
at this fixed address would be helpful. Evidently, the first 32 bits is the processor count,
and then there are multiple memory map entries. But why skip over an entire memory
map entry to account for the 32-bit processor_count?

Maybe the definition of struct memory_map_entry and EN_SEV_SNP_PROCESSOR_INFO_ADDR
should go in arch/x86/include/asm/mshyperv.h, along with some explanatory comments.

> +
> + for (; entry->numpages != 0; entry++) {
> + e820_entry = &e820_table->entries[e820_table->nr_entries - 1];
> + e820_end = e820_entry->addr + e820_entry->size;
> + ram_end = (entry->starting_gpn + entry->numpages) * PAGE_SIZE;
> +
> + if (e820_end < entry->starting_gpn * PAGE_SIZE)
> + e820_end = entry->starting_gpn * PAGE_SIZE;
> + if (e820_end < ram_end) {

I'm curious about any gaps or overlaps with the existing e820 map entries. What
is expected? Or is this code just trying to be defensive?

> + pr_info("Hyper-V: add [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);

Seems like the above message should include "e820" to make clear this is an update
to the e820 map.

> + e820__range_add(e820_end, ram_end - e820_end, E820_TYPE_RAM);
> + for (page = e820_end; page < ram_end; page += PAGE_SIZE)
> + pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
> + }
> + }

Does e820__update_table() need to be called once any range adds are complete?
I don't know, so I'm just asking.

> + }
> +
> hardlockup_detector_disable();
> }
>
> --
> 2.25.1

2022-12-28 18:31:30

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V2 13/18] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]> Sent: Friday, November 18, 2022 7:46 PM
>
> The wakeup_secondary_cpu callback was populated with wakeup_
> cpu_via_vmgexit() which doesn't work for Hyper-V. Override it
> with Hyper-V specific hook which uses HVCALL_START_VIRTUAL_
> PROCESSOR hvcall to start AP with vmsa data structure.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 13 +++
> arch/x86/include/asm/svm.h | 55 ++++++++++-
> arch/x86/kernel/cpu/mshyperv.c | 147 +++++++++++++++++++++++++++++-
> include/asm-generic/hyperv-tlfs.h | 18 ++++
> 4 files changed, 230 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index ebc271bb6d8e..e34aaf730220 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> +union sev_rmp_adjust {
> + u64 as_uint64;
> + struct {
> + unsigned long target_vmpl : 8;
> + unsigned long enable_read : 1;
> + unsigned long enable_write : 1;
> + unsigned long enable_user_execute : 1;
> + unsigned long enable_kernel_execute : 1;
> + unsigned long reserved1 : 4;
> + unsigned long vmsa : 1;
> + };
> +};
> +
> /* SNP Guest message request */
> struct snp_req_data {
> unsigned long req_gpa;
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 0361626841bc..fc54d3e7f817 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -328,8 +328,61 @@ struct vmcb_save_area {
> u64 br_to;
> u64 last_excp_from;
> u64 last_excp_to;
> - u8 reserved_6[72];
> +
> + /*
> + * The following part of the save area is valid only for
> + * SEV-ES guests when referenced through the GHCB or for
> + * saving to the host save area.
> + */

It seems unexpected to add these SEV-ES specific fields to a structure
with a comment that says for legacy and SEV-MEM guests. There's already
a struct sev_es_save_area with a comment that says for SEV-ES and
SEV_SNP guests, and that struct seems to have most or all of what is being
added here. Hopefully there's a way to use struct sev_es_save_area,
perhaps with some minor tweaks if necessary.

> + u8 reserved_7[72];
> u32 spec_ctrl; /* Guest version of SPEC_CTRL at 0x2E0 */
> + u8 reserved_7b[4];
> + u32 pkru;
> + u8 reserved_7a[20];
> + u64 reserved_8; /* rax already available at 0x01f8 */
> + u64 rcx;
> + u64 rdx;
> + u64 rbx;
> + u64 reserved_9; /* rsp already available at 0x01d8 */
> + u64 rbp;
> + u64 rsi;
> + u64 rdi;
> + u64 r8;
> + u64 r9;
> + u64 r10;
> + u64 r11;
> + u64 r12;
> + u64 r13;
> + u64 r14;
> + u64 r15;
> + u8 reserved_10[16];
> + u64 sw_exit_code;
> + u64 sw_exit_info_1;
> + u64 sw_exit_info_2;
> + u64 sw_scratch;
> + union {
> + u64 sev_features;
> + struct {
> + u64 sev_feature_snp : 1;
> + u64 sev_feature_vtom : 1;
> + u64 sev_feature_reflectvc : 1;
> + u64 sev_feature_restrict_injection : 1;
> + u64 sev_feature_alternate_injection : 1;
> + u64 sev_feature_full_debug : 1;
> + u64 sev_feature_reserved1 : 1;
> + u64 sev_feature_snpbtb_isolation : 1;
> + u64 sev_feature_resrved2 : 56;
> + };
> + };
> + u64 vintr_ctrl;
> + u64 guest_error_code;
> + u64 virtual_tom;
> + u64 tlb_id;
> + u64 pcpu_id;
> + u64 event_inject;
> + u64 xcr0;
> + u8 valid_bitmap[16];
> + u64 x87_state_gpa;
> } __packed;
>
> /* Save area definition for SEV-ES and SEV-SNP guests */
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f0c97210c64a..b266f648e5cd 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -41,6 +41,10 @@
> #include <asm/realmode.h>
> #include <asm/e820/api.h>
>
> +#define EN_SEV_SNP_PROCESSOR_INFO_ADDR 0x802000
> +#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT 0xffffffff

The above three definitions would benefit from some comments explaining
what they are.

> +
> /* Is Linux running as the root partition? */
> bool hv_root_partition;
> struct ms_hyperv_info ms_hyperv;
> @@ -232,6 +236,136 @@ static void __init hv_smp_prepare_boot_cpu(void)
> #endif
> }
>
> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
> +static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip)
> +{
> + struct vmcb_save_area *vmsa = (struct vmcb_save_area *)
> + __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + struct desc_ptr gdtr;
> + u64 ret, retry = 5;
> + struct hv_enable_vp_vtl_input *enable_vtl_input;
> + struct hv_start_virtual_processor_input *start_vp_input;
> + union sev_rmp_adjust rmp_adjust;
> + void **arg;
> + unsigned long flags;
> +
> + *(void **)per_cpu_ptr(hyperv_pcpu_input_arg, cpu) = ap_start_input_arg;

I don't understand the above. It seems like the hyperv_pcpu_input_arg is being
set to the same static location for all APs. The static location gets overwritten in
hv_common_cpu_init(), so maybe everything works. But it seems like
ap_start_input_arg can just be used directly in this function without having to
update hyperv_pcpu_input_arg.

> +
> + hv_vp_index[cpu] = cpu;

The hv_vp_index[cpu] is also updated in hv_common_cpu_init(). Is there a
reason to initialize the value here? This code also assumes that Linux CPU
numbers and Hyper-V VP indices are the same. I've always observed that they
are indeed the same, but Hyper-V doesn't guarantee that. Hence we set the
value in hv_common_cpu_init() based on reading the per-CPU synthetic
register that contains the VP index.

> +
> + /* Prevent APs from entering busy calibration loop */
> + preset_lpj = lpj_fine;

I wonder if this is really needed. In a SEV-SNP guest that isn't running on
Hyper-V, how is this handled?

> +
> + /* Replace the provided real-mode start_ip */
> + start_ip = (unsigned long)secondary_startup_64_no_verify;

Any reason to update this global value? The starting IP is passed to Hyper-V
via the VMSA, so it doesn't seem like a global update is needed.

> +
> + native_store_gdt(&gdtr);
> +
> + vmsa->gdtr.base = gdtr.address;
> + vmsa->gdtr.limit = gdtr.size;
> +
> + asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> + if (vmsa->es.selector) {
> + vmsa->es.base = 0;
> + vmsa->es.limit = HV_AP_SEGMENT_LIMIT;
> + vmsa->es.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->es.selector + 5);
> + vmsa->es.attrib = (vmsa->es.attrib & 0xFF) | ((vmsa->es.attrib >> 4) & 0xF00);
> + }

The above "if" statement is repeated four times with different registers. Seems
like a helper function could easily encapsulate it, though not the "asm volatile"
statement.

> +
> + asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> + if (vmsa->cs.selector) {
> + vmsa->cs.base = 0;
> + vmsa->cs.limit = HV_AP_SEGMENT_LIMIT;
> + vmsa->cs.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->cs.selector + 5);
> + vmsa->cs.attrib = (vmsa->cs.attrib & 0xFF) | ((vmsa->cs.attrib >> 4) & 0xF00);
> + }
> +
> + asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> + if (vmsa->ss.selector) {
> + vmsa->ss.base = 0;
> + vmsa->ss.limit = HV_AP_SEGMENT_LIMIT;
> + vmsa->ss.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->ss.selector + 5);
> + vmsa->ss.attrib = (vmsa->ss.attrib & 0xFF) | ((vmsa->ss.attrib >> 4) & 0xF00);
> + }
> +
> + asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> + if (vmsa->ds.selector) {
> + vmsa->ds.base = 0;
> + vmsa->ds.limit = HV_AP_SEGMENT_LIMIT;
> + vmsa->ds.attrib = *(u16 *)(vmsa->gdtr.base + vmsa->ds.selector + 5);
> + vmsa->ds.attrib = (vmsa->ds.attrib & 0xFF) | ((vmsa->ds.attrib >> 4) & 0xF00);
> + }
> +
> + vmsa->efer = native_read_msr(MSR_EFER);
> +
> + asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
> + asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
> + asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
> +
> + vmsa->xcr0 = 1;
> + vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
> + vmsa->rip = (u64)start_ip;
> + vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
> +
> + vmsa->sev_feature_snp = 1;
> + vmsa->sev_feature_restrict_injection = 1;
> +
> + rmp_adjust.as_uint64 = 0;
> + rmp_adjust.target_vmpl = 1;
> + rmp_adjust.vmsa = 1;
> + ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
> + rmp_adjust.as_uint64);
> + if (ret != 0) {
> + pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
> + return ret;
> + }
> +
> + local_irq_save(flags);
> + arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> + if (unlikely(!*arg)) {
> + ret = -ENOMEM;
> + goto done;
> + }

This code seems unnecessary. Just use ap_start_input_arg directly.
No need to disable interrupts.

> +
> + if (ms_hyperv.vtl != 0) {
> + enable_vtl_input = (struct hv_enable_vp_vtl_input *)*arg;
> + memset(enable_vtl_input, 0, sizeof(*enable_vtl_input));
> + enable_vtl_input->partitionid = -1;
> + enable_vtl_input->vpindex = cpu;
> + enable_vtl_input->targetvtl = ms_hyperv.vtl;
> + *(u64 *)&enable_vtl_input->context[0] = __pa(vmsa) | 1;
> +
> + ret = hv_do_hypercall(HVCALL_ENABLE_VP_VTL, enable_vtl_input, NULL);
> + if (ret != 0) {

Use hv_result_success() to test the hypercall result.

> + pr_err("HvCallEnableVpVtl failed: %llx\n", ret);
> + goto done;
> + }
> + }
> +
> + start_vp_input = (struct hv_start_virtual_processor_input *)*arg;
> + memset(start_vp_input, 0, sizeof(*start_vp_input));
> + start_vp_input->partitionid = -1;
> + start_vp_input->vpindex = cpu;
> + start_vp_input->targetvtl = ms_hyperv.vtl;
> + *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
> +
> + do {
> + ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
> + start_vp_input, NULL);
> + } while (ret == HV_STATUS_TIME_OUT && retry--);

Use hv_result() to check for HV_STATUS_TIME_OUT.

> +
> + if (ret != 0) {

Use hv_result_success().

> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
> + goto done;
> + }
> +
> +done:
> + local_irq_restore(flags);

The entry to this function allocates a page for the VMSA. Does
that page ever get freed?

> + return ret;
> +}
> +
> static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> {
> #ifdef CONFIG_X86_64
> @@ -241,6 +375,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
>
> native_smp_prepare_cpus(max_cpus);
>
> + /*
> + * Override wakeup_secondary_cpu callback for SEV-SNP
> + * enlightened guest.
> + */
> + if (hv_isolation_type_en_snp())
> + apic->wakeup_secondary_cpu = hv_snp_boot_ap;
> +
> + if (!hv_root_partition)
> + return;
> +
> #ifdef CONFIG_X86_64
> for_each_present_cpu(i) {
> if (i == 0)
> @@ -489,8 +633,7 @@ static void __init ms_hyperv_init_platform(void)
>
> # ifdef CONFIG_SMP
> smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> - if (hv_root_partition)
> - smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> # endif
>
> /*
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 6e2a090e2649..7072adbf5540 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -139,6 +139,7 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
> #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
> #define HVCALL_SEND_IPI 0x000b
> +#define HVCALL_ENABLE_VP_VTL 0x000f
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> @@ -156,6 +157,7 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
> #define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
> #define HVCALL_RETARGET_INTERRUPT 0x007e
> +#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> @@ -763,6 +765,22 @@ struct hv_input_unmap_device_interrupt {
> struct hv_interrupt_entry interrupt_entry;
> } __packed;
>
> +struct hv_enable_vp_vtl_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];

It looks like the 0xe0 comes from the Hyper-V TLFS, but your
code is doing something different -- it's setting the VMSA address
instead of putting the context values inline.

> +} __packed;
> +
> +struct hv_start_virtual_processor_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];

Same here.

> +} __packed;
> +
> #define HV_SOURCE_SHADOW_NONE 0x0
> #define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1
>
> --
> 2.25.1

2023-01-10 13:07:04

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V2 15/18] x86/sev: Add a #HV exception handler


> Add a #HV exception handler that uses IST stack.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/entry/entry_64.S | 58 +++++++++++++++++++++++++++
> arch/x86/include/asm/cpu_entry_area.h | 6 +++
> arch/x86/include/asm/idtentry.h | 39 +++++++++++++++++-
> arch/x86/include/asm/page_64_types.h | 1 +
> arch/x86/include/asm/trapnr.h | 1 +
> arch/x86/include/asm/traps.h | 1 +
> arch/x86/kernel/cpu/common.c | 1 +
> arch/x86/kernel/dumpstack_64.c | 9 ++++-
> arch/x86/kernel/idt.c | 1 +
> arch/x86/kernel/sev.c | 53 ++++++++++++++++++++++++
> arch/x86/mm/cpu_entry_area.c | 2 +
> 11 files changed, 169 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9953d966d124..b2059df43c57 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -560,6 +560,64 @@ SYM_CODE_START(\asmsym)
> .Lfrom_usermode_switch_stack_\@:
> idtentry_body user_\cfunc, has_error_code=1
>
> +_ASM_NOKPROBE(\asmsym)
> +SYM_CODE_END(\asmsym)
> +.endm
> +/*
> + * idtentry_hv - Macro to generate entry stub for #HV
> + * @vector: Vector number
> + * @asmsym: ASM symbol for the entry point
> + * @cfunc: C function to be called
> + *
> + * The macro emits code to set up the kernel context for #HV. The #HV handler
> + * runs on an IST stack and needs to be able to support nested #HV exceptions.
> + *
> + * To make this work the #HV entry code tries its best to pretend it doesn't use
> + * an IST stack by switching to the task stack if coming from user-space (which
> + * includes early SYSCALL entry path) or back to the stack in the IRET frame if
> + * entered from kernel-mode.
> + *
> + * If entered from kernel-mode the return stack is validated first, and if it is
> + * not safe to use (e.g. because it points to the entry stack) the #HV handler
> + * will switch to a fall-back stack (HV2) and call a special handler function.
> + *
> + * The macro is only used for one vector, but it is planned to be extended in
> + * the future for the #HV exception.

Noticed same comment line in the #VC exception handling section (macro
idtentry_vc). Just wondering we need to remove this line as the patch
being proposed for the #HV exception handling? unless this is planned to
be extended in some other way.

Thanks,
Pankaj

> + */
> +.macro idtentry_hv vector asmsym cfunc
> +SYM_CODE_START(\asmsym)
> + UNWIND_HINT_IRET_REGS
> + ASM_CLAC
> + pushq $-1 /* ORIG_RAX: no syscall to restart */
> +
> + testb $3, CS-ORIG_RAX(%rsp)
> + jnz .Lfrom_usermode_switch_stack_\@
> +
> + call paranoid_entry
> +
> + UNWIND_HINT_REGS
> +
> + /*
> + * Switch off the IST stack to make it free for nested exceptions.
> + */
> + movq %rsp, %rdi /* pt_regs pointer */
> + call hv_switch_off_ist
> + movq %rax, %rsp /* Switch to new stack */
> +
> + UNWIND_HINT_REGS
> +
> + /* Update pt_regs */
> + movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
> + movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
> +
> + movq %rsp, %rdi /* pt_regs pointer */
> + call kernel_\cfunc
> +
> + jmp paranoid_exit
> +
> +.Lfrom_usermode_switch_stack_\@:
> + idtentry_body user_\cfunc, has_error_code=1
> +
> _ASM_NOKPROBE(\asmsym)
> SYM_CODE_END(\asmsym)
> .endm
> diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
> index 75efc4c6f076..f173a16cfc59 100644
> --- a/arch/x86/include/asm/cpu_entry_area.h
> +++ b/arch/x86/include/asm/cpu_entry_area.h
> @@ -30,6 +30,10 @@
> char VC_stack[optional_stack_size]; \
> char VC2_stack_guard[guardsize]; \
> char VC2_stack[optional_stack_size]; \
> + char HV_stack_guard[guardsize]; \
> + char HV_stack[optional_stack_size]; \
> + char HV2_stack_guard[guardsize]; \
> + char HV2_stack[optional_stack_size]; \
> char IST_top_guard[guardsize]; \
>
> /* The exception stacks' physical storage. No guard pages required */
> @@ -52,6 +56,8 @@ enum exception_stack_ordering {
> ESTACK_MCE,
> ESTACK_VC,
> ESTACK_VC2,
> + ESTACK_HV,
> + ESTACK_HV2,
> N_EXCEPTION_STACKS
> };
>
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 72184b0b2219..652fea10d377 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -317,6 +317,19 @@ static __always_inline void __##func(struct pt_regs *regs)
> __visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code); \
> __visible noinstr void user_##func(struct pt_regs *regs, unsigned long error_code)
>
> +
> +/**
> + * DECLARE_IDTENTRY_HV - Declare functions for the HV entry point
> + * @vector: Vector number (ignored for C)
> + * @func: Function name of the entry point
> + *
> + * Maps to DECLARE_IDTENTRY_RAW, but declares also the user C handler.
> + */
> +#define DECLARE_IDTENTRY_HV(vector, func) \
> + DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func); \
> + __visible noinstr void kernel_##func(struct pt_regs *regs); \
> + __visible noinstr void user_##func(struct pt_regs *regs)
> +
> /**
> * DEFINE_IDTENTRY_IST - Emit code for IST entry points
> * @func: Function name of the entry point
> @@ -376,6 +389,26 @@ static __always_inline void __##func(struct pt_regs *regs)
> #define DEFINE_IDTENTRY_VC_USER(func) \
> DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func)
>
> +/**
> + * DEFINE_IDTENTRY_HV_KERNEL - Emit code for HV injection handler
> + * when raised from kernel mode
> + * @func: Function name of the entry point
> + *
> + * Maps to DEFINE_IDTENTRY_RAW
> + */
> +#define DEFINE_IDTENTRY_HV_KERNEL(func) \
> + DEFINE_IDTENTRY_RAW(kernel_##func)
> +
> +/**
> + * DEFINE_IDTENTRY_HV_USER - Emit code for HV injection handler
> + * when raised from user mode
> + * @func: Function name of the entry point
> + *
> + * Maps to DEFINE_IDTENTRY_RAW
> + */
> +#define DEFINE_IDTENTRY_HV_USER(func) \
> + DEFINE_IDTENTRY_RAW(user_##func)
> +
> #else /* CONFIG_X86_64 */
>
> /**
> @@ -465,6 +498,9 @@ __visible noinstr void func(struct pt_regs *regs, \
> # define DECLARE_IDTENTRY_VC(vector, func) \
> idtentry_vc vector asm_##func func
>
> +# define DECLARE_IDTENTRY_HV(vector, func) \
> + idtentry_hv vector asm_##func func
> +
> #else
> # define DECLARE_IDTENTRY_MCE(vector, func) \
> DECLARE_IDTENTRY(vector, func)
> @@ -622,9 +658,10 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault);
> DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
> #endif
>
> -/* #VC */
> +/* #VC & #HV */
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication);
> +DECLARE_IDTENTRY_HV(X86_TRAP_HV, exc_hv_injection);
> #endif
>
> #ifdef CONFIG_XEN_PV
> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
> index e9e2c3ba5923..0bd7dab676c5 100644
> --- a/arch/x86/include/asm/page_64_types.h
> +++ b/arch/x86/include/asm/page_64_types.h
> @@ -29,6 +29,7 @@
> #define IST_INDEX_DB 2
> #define IST_INDEX_MCE 3
> #define IST_INDEX_VC 4
> +#define IST_INDEX_HV 5
>
> /*
> * Set __PAGE_OFFSET to the most negative possible address +
> diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h
> index f5d2325aa0b7..c6583631cecb 100644
> --- a/arch/x86/include/asm/trapnr.h
> +++ b/arch/x86/include/asm/trapnr.h
> @@ -26,6 +26,7 @@
> #define X86_TRAP_XF 19 /* SIMD Floating-Point Exception */
> #define X86_TRAP_VE 20 /* Virtualization Exception */
> #define X86_TRAP_CP 21 /* Control Protection Exception */
> +#define X86_TRAP_HV 28 /* HV injected exception in SNP restricted mode */
> #define X86_TRAP_VC 29 /* VMM Communication Exception */
> #define X86_TRAP_IRET 32 /* IRET Exception */
>
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index 47ecfff2c83d..6795d3e517d6 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -16,6 +16,7 @@ asmlinkage __visible notrace
> struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
> void __init trap_init(void);
> asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
> +asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *eregs);
> #endif
>
> extern bool ibt_selftest(void);
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 3e508f239098..87afa3a4c8b1 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2165,6 +2165,7 @@ static inline void tss_setup_ist(struct tss_struct *tss)
> tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
> /* Only mapped when SEV-ES is active */
> tss->x86_tss.ist[IST_INDEX_VC] = __this_cpu_ist_top_va(VC);
> + tss->x86_tss.ist[IST_INDEX_HV] = __this_cpu_ist_top_va(HV);
> }
>
> #else /* CONFIG_X86_64 */
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 6c5defd6569a..23aa5912c87a 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -26,11 +26,14 @@ static const char * const exception_stack_names[] = {
> [ ESTACK_MCE ] = "#MC",
> [ ESTACK_VC ] = "#VC",
> [ ESTACK_VC2 ] = "#VC2",
> + [ ESTACK_HV ] = "#HV",
> + [ ESTACK_HV2 ] = "#HV2",
> +
> };
>
> const char *stack_type_name(enum stack_type type)
> {
> - BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
> + BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);
>
> if (type == STACK_TYPE_TASK)
> return "TASK";
> @@ -89,6 +92,8 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
> EPAGERANGE(MCE),
> EPAGERANGE(VC),
> EPAGERANGE(VC2),
> + EPAGERANGE(HV),
> + EPAGERANGE(HV2),
> };
>
> static __always_inline bool in_exception_stack(unsigned long *stack, struct stack_info *info)
> @@ -98,7 +103,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac
> struct pt_regs *regs;
> unsigned int k;
>
> - BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
> + BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);
>
> begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
> /*
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index a58c6bc1cd68..48c0a7e1dbcb 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -113,6 +113,7 @@ static const __initconst struct idt_data def_idts[] = {
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
> + ISTG(X86_TRAP_HV, asm_exc_hv_injection, IST_INDEX_HV),
> #endif
>
> SYSG(X86_TRAP_OF, asm_exc_overflow),
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index a428c62330d3..b54ee3ba37b0 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2004,6 +2004,59 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
> irqentry_exit_to_user_mode(regs);
> }
>
> +static bool hv_raw_handle_exception(struct pt_regs *regs)
> +{
> + return false;
> +}
> +
> +static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
> +{
> + unsigned long sp = (unsigned long)regs;
> +
> + return (sp >= __this_cpu_ist_bottom_va(HV2) && sp < __this_cpu_ist_top_va(HV2));
> +}
> +
> +DEFINE_IDTENTRY_HV_USER(exc_hv_injection)
> +{
> + irqentry_enter_from_user_mode(regs);
> + instrumentation_begin();
> +
> + if (!hv_raw_handle_exception(regs)) {
> + /*
> + * Do not kill the machine if user-space triggered the
> + * exception. Send SIGBUS instead and let user-space deal
> + * with it.
> + */
> + force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
> + }
> +
> + instrumentation_end();
> + irqentry_exit_to_user_mode(regs);
> +}
> +
> +DEFINE_IDTENTRY_HV_KERNEL(exc_hv_injection)
> +{
> + irqentry_state_t irq_state;
> +
> + irq_state = irqentry_nmi_enter(regs);
> + instrumentation_begin();
> +
> + if (!hv_raw_handle_exception(regs)) {
> + pr_emerg("PANIC: Unhandled #HV exception in kernel space\n");
> +
> + /* Show some debug info */
> + show_regs(regs);
> +
> + /* Ask hypervisor to sev_es_terminate */
> + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
> +
> + panic("Returned from Terminate-Request to Hypervisor\n");
> + }
> +
> + instrumentation_end();
> + irqentry_nmi_exit(regs, irq_state);
> +}
> +
> bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
> {
> unsigned long exit_code = regs->orig_ax;
> diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
> index 6c2f1b76a0b6..8b1670fa2b81 100644
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -115,6 +115,8 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
> if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
> cea_map_stack(VC);
> cea_map_stack(VC2);
> + cea_map_stack(HV);
> + cea_map_stack(HV2);
> }
> }
> }

2023-01-10 14:23:17

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V2 15/18] x86/sev: Add a #HV exception handler

On 1/10/2023 8:47 PM, Gupta, Pankaj wrote:
>> + *
>> + * To make this work the #HV entry code tries its best to pretend it
>> doesn't use
>> + * an IST stack by switching to the task stack if coming from
>> user-space (which
>> + * includes early SYSCALL entry path) or back to the stack in the
>> IRET frame if
>> + * entered from kernel-mode.
>> + *
>> + * If entered from kernel-mode the return stack is validated first,
>> and if it is
>> + * not safe to use (e.g. because it points to the entry stack) the
>> #HV handler
>> + * will switch to a fall-back stack (HV2) and call a special handler
>> function.
>> + *
>> + * The macro is only used for one vector, but it is planned to be
>> extended in
>> + * the future for the #HV exception.
>
> Noticed same comment line in the #VC exception handling section (macro
> idtentry_vc). Just wondering we need to remove this line as the patch
> being proposed for the #HV exception handling? unless this is planned to
> be extended in some other way.

Nice catch! Will remove this in the next version.

Thanks.

2023-01-12 07:50:48

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V2 15/18] x86/sev: Add a #HV exception handler


>>> + * To make this work the #HV entry code tries its best to pretend it
>>> doesn't use
>>> + * an IST stack by switching to the task stack if coming from
>>> user-space (which
>>> + * includes early SYSCALL entry path) or back to the stack in the
>>> IRET frame if
>>> + * entered from kernel-mode.
>>> + *
>>> + * If entered from kernel-mode the return stack is validated first,
>>> and if it is
>>> + * not safe to use (e.g. because it points to the entry stack) the
>>> #HV handler
>>> + * will switch to a fall-back stack (HV2) and call a special handler
>>> function.
>>> + *
>>> + * The macro is only used for one vector, but it is planned to be
>>> extended in
>>> + * the future for the #HV exception.
>>
>> Noticed same comment line in the #VC exception handling section (macro
>> idtentry_vc). Just wondering we need to remove this line as the patch
>> being proposed for the #HV exception handling? unless this is planned
>> to be extended in some other way.
>
> Nice catch! Will remove this in the next version.

Thanks. Just a note: the referred 'idtentry_vc' macro has some
instructions added/moved (e.g CLD is moved at start of IDT entry, ENDBR
added) plus some additional comments added later. Don't know what all of
them are required to be replicated in 'idtentry_hv', just want to share
my observation.

Thanks,
Pankaj