2023-04-03 17:45:26

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 00/17] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Tianyu Lan <[email protected]>


This patchset is to add AMD sev-snp enlightened guest
support on hyperv. Hyperv uses Linux direct boot mode
to boot up Linux kernel and so it needs to pvalidate
system memory by itself.

In hyperv case, there is no boot loader and so cc blob
is prepared by hypervisor. In this series, hypervisor
set the cc blob address directly into boot parameter
of Linux kernel. If the magic number on cc blob address
is valid, kernel will read cc blob.

Shared memory between guests and hypervisor should be
decrypted and zero memory after decrypt memory. The data
in the target address. It maybe smearedto avoid smearing
data.

Introduce #HV exception support in AMD sev snp code and
#HV handler.

Change since v3:
- Replace struct sev_es_save_area with struct vmcb_save_area
- Move smp, cpu and memory enumerating code from mshyperv.c to ivm.c
- Handle nested entry case of do_exc_hv() case.
- Check NMI event when irq is disabled

Change since v2:
- Remove validate kernel memory code at boot stage
- Split #HV page patch into two parts
- Remove HV-APIC change due to enable x2apic from
host side
- Rework vmbus code to handle error of decrypt page
- Spilt memory and cpu initialization patch.
Change since v1:
- Remove boot param changes for cc blob address and
use setup head to pass cc blob info
- Remove unnessary WARN and BUG check
- Add system vector table map in the #HV exception
- Fix interrupt exit issue when use #HV exception

Ashish Kalra (2):
x86/sev: optimize system vector processing invoked from #HV exception
x86/sev: Fix interrupt exit code paths from #HV exception

Tianyu Lan (15):
x86/hyperv: Add sev-snp enlightened guest static key
Drivers: hv: vmbus: Decrypt vmbus ring buffer
x86/hyperv: Set Virtual Trust Level in VMBus init message
x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
enlightened guest
clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp
enlightened guest
x86/hyperv: decrypt VMBus pages for sev-snp enlightened guest
drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
enlightened guest
x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
x86/hyperv: SEV-SNP enlightened guest don't support legacy rtc
x86/hyperv: Add smp support for sev-snp guest
x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
x86/sev: Add a #HV exception handler
x86/sev: Add Check of #HV event in path
x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv
x86/sev: Remove restrict interrupt injection from
SNP_FEATURES_IMPL_REQ

arch/x86/boot/compressed/sev.c | 1 -
arch/x86/entry/entry_64.S | 84 ++++++
arch/x86/hyperv/hv_init.c | 42 +++
arch/x86/hyperv/ivm.c | 181 ++++++++++++
arch/x86/include/asm/cpu_entry_area.h | 6 +
arch/x86/include/asm/hyperv-tlfs.h | 7 +
arch/x86/include/asm/idtentry.h | 105 ++++++-
arch/x86/include/asm/irqflags.h | 11 +
arch/x86/include/asm/mem_encrypt.h | 2 +
arch/x86/include/asm/mshyperv.h | 81 ++++-
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/sev.h | 13 +
arch/x86/include/asm/svm.h | 15 +-
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/include/asm/x86_init.h | 2 +
arch/x86/include/uapi/asm/svm.h | 4 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/cpu/mshyperv.c | 42 ++-
arch/x86/kernel/dumpstack_64.c | 9 +-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 407 ++++++++++++++++++++++----
arch/x86/kernel/traps.c | 42 +++
arch/x86/kernel/vmlinux.lds.S | 7 +
arch/x86/kernel/x86_init.c | 4 +-
arch/x86/mm/cpu_entry_area.c | 2 +
drivers/clocksource/hyperv_timer.c | 2 +-
drivers/hv/connection.c | 1 +
drivers/hv/hv.c | 42 ++-
drivers/hv/hv_common.c | 27 +-
include/asm-generic/hyperv-tlfs.h | 19 ++
include/asm-generic/mshyperv.h | 2 +
include/linux/hyperv.h | 4 +-
33 files changed, 1092 insertions(+), 77 deletions(-)

--
2.25.1


2023-04-03 17:45:37

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 02/17] Drivers: hv: vmbus: Decrypt vmbus ring buffer

From: Tianyu Lan <[email protected]>

The ring buffer is remapped in the hv_ringbuffer_init()
and it should be with decrypt flag in order to share it
with hypervisor in sev-snp enlightened guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e1..9f3e2d71d015 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,7 @@
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/idtentry.h>
+#include <asm/set_memory.h>
#include <linux/kexec.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
@@ -113,6 +114,11 @@ static int hv_cpu_init(unsigned int cpu)

}
if (!WARN_ON(!(*hvp))) {
+ if (hv_isolation_type_en_snp()) {
+ WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
+ memset(*hvp, 0, PAGE_SIZE);
+ }
+
msr.enable = 1;
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
}
--
2.25.1

2023-04-03 17:46:09

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 11/17] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES

From: Tianyu Lan <[email protected]>

Add Hyperv-specific handling for faults caused by VMMCALL
instructions.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 829234ba8da5..69ca4b8f615a 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,7 @@
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
+#include <asm/svm.h>

/* Is Linux running as the root partition? */
bool hv_root_partition;
@@ -577,6 +578,20 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
}

+static void hv_sev_es_hcall_prepare(struct ghcb *ghcb, struct pt_regs *regs)
+{
+ /* RAX and CPL are already in the GHCB */
+ ghcb_set_rcx(ghcb, regs->cx);
+ ghcb_set_rdx(ghcb, regs->dx);
+ ghcb_set_r8(ghcb, regs->r8);
+}
+
+static bool hv_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+ /* No checking of the return state needed */
+ return true;
+}
+
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.name = "Microsoft Hyper-V",
.detect = ms_hyperv_platform,
@@ -584,4 +599,6 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.init.x2apic_available = ms_hyperv_x2apic_available,
.init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
+ .runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare,
+ .runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish,
};
--
2.25.1

2023-04-03 17:46:14

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 06/17] x86/hyperv: decrypt VMBus pages for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

VMBus post msg, synic event and message pages are shared
with hypervisor and so decrypt these pages in the sev-snp guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V3:
* Set encrypt page back in the hv_synic_free()

Change since RFC V2:
* Fix error in the error code path and encrypt
pages correctly when decryption failure happens.
---
drivers/hv/hv.c | 42 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 008234894d28..e09cea8f2f04 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -20,6 +20,7 @@
#include <linux/interrupt.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
+#include <linux/set_memory.h>
#include "hyperv_vmbus.h"

/* The one and only */
@@ -117,7 +118,7 @@ int hv_post_message(union hv_connection_id connection_id,

int hv_synic_alloc(void)
{
- int cpu;
+ int cpu, ret;
struct hv_per_cpu_context *hv_cpu;

/*
@@ -168,9 +169,39 @@ int hv_synic_alloc(void)
pr_err("Unable to allocate post msg page\n");
goto err;
}
+
+ if (hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ if (ret)
+ goto err;
+
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ if (ret)
+ goto err_decrypt_event_page;
+
+ ret = set_memory_decrypted((unsigned long)
+ hv_cpu->post_msg_page, 1);
+ if (ret)
+ goto err_decrypt_msg_page;
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
+ }
}

return 0;
+
+err_decrypt_msg_page:
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+
+err_decrypt_event_page:
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+
err:
/*
* Any memory allocations that succeeded will be freed when
@@ -191,6 +222,15 @@ void hv_synic_free(void)
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
free_page((unsigned long)hv_cpu->post_msg_page);
+
+ if (hv_isolation_type_en_snp()) {
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ set_memory_encrypted((unsigned long)
+ hv_cpu->post_msg_page, 1);
+ }
}

kfree(hv_context.hv_numa_map);
--
2.25.1

2023-04-03 17:46:24

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 03/17] x86/hyperv: Set Virtual Trust Level in VMBus init message

From: Tianyu Lan <[email protected]>

sev-snp guest provides vtl(Virtual Trust Level) and
get it from hyperv hvcall via HVCALL_GET_VP_REGISTERS.
Set target vtl in the VMBus init message.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC v3:
* Use the standard helper functions to check hypercall result
* Fix coding style

Change since RFC v2:
* Rename get_current_vtl() to get_vtl()
* Fix some coding style issues
---
arch/x86/hyperv/hv_init.c | 36 ++++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 7 ++++++
drivers/hv/connection.c | 1 +
include/asm-generic/mshyperv.h | 2 ++
include/linux/hyperv.h | 4 ++--
5 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 9f3e2d71d015..7602db5ad4aa 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -384,6 +384,40 @@ static void __init hv_get_partition_id(void)
local_irq_restore(flags);
}

+static u8 __init get_vtl(void)
+{
+ u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
+ struct hv_get_vp_registers_input *input;
+ struct hv_get_vp_registers_output *output;
+ u64 vtl = 0;
+ u64 ret;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ input = *(struct hv_get_vp_registers_input **)this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = (struct hv_get_vp_registers_output *)input;
+ if (!input) {
+ local_irq_restore(flags);
+ goto done;
+ }
+
+ memset(input, 0, sizeof(*input) + sizeof(input->element[0]));
+ input->header.partitionid = HV_PARTITION_ID_SELF;
+ input->header.vpindex = HV_VP_INDEX_SELF;
+ input->header.inputvtl = 0;
+ input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
+
+ ret = hv_do_hypercall(control, input, output);
+ if (hv_result_success(ret))
+ vtl = output->as64.low & HV_X64_VTL_MASK;
+ else
+ pr_err("Hyper-V: failed to get VTL!");
+ local_irq_restore(flags);
+
+done:
+ return vtl;
+}
+
/*
* This function is to be invoked early in the boot sequence after the
* hypervisor has been detected.
@@ -512,6 +546,8 @@ void __init hyperv_init(void)
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);

+ /* Find the VTL */
+ ms_hyperv.vtl = get_vtl();
return;

clean_guest_os_id:
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index b4fb75bd1013..3d3f014976e8 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -301,6 +301,13 @@ enum hv_isolation_type {
#define HV_X64_MSR_TIME_REF_COUNT HV_REGISTER_TIME_REF_COUNT
#define HV_X64_MSR_REFERENCE_TSC HV_REGISTER_REFERENCE_TSC

+/*
+ * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
+ * there is not associated MSR address.
+ */
+#define HV_X64_REGISTER_VSM_VP_STATUS 0x000D0003
+#define HV_X64_VTL_MASK GENMASK(3, 0)
+
/* Hyper-V memory host visibility */
enum hv_mem_host_visibility {
VMBUS_PAGE_NOT_VISIBLE = 0,
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index f670cfd2e056..e4c39f4016ad 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
*/
if (version >= VERSION_WIN10_V5) {
msg->msg_sint = VMBUS_MESSAGE_SINT;
+ msg->msg_vtl = ms_hyperv.vtl;
vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID_4;
} else {
msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index afcd9ae9588c..f91fb06634f0 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -48,6 +48,7 @@ struct ms_hyperv_info {
};
};
u64 shared_gpa_boundary;
+ u8 vtl;
};
extern struct ms_hyperv_info ms_hyperv;
extern bool hv_nested;
@@ -58,6 +59,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
extern bool hv_isolation_type_snp(void);
+extern bool hv_isolation_type_en_snp(void);

/* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
static inline int hv_result(u64 status)
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index bfbc37ce223b..1f2bfec4abde 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
u64 interrupt_page;
struct {
u8 msg_sint;
- u8 padding1[3];
- u32 padding2;
+ u8 msg_vtl;
+ u8 reserved[6];
};
};
u64 monitor_page1;
--
2.25.1

2023-04-03 17:46:38

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 09/17] x86/hyperv: SEV-SNP enlightened guest don't support legacy rtc

From: Tianyu Lan <[email protected]>

SEV-SNP enlightened guest doesn't support legacy rtc. Set
legacy.rtc, x86_platform.set_wallclock and get_wallclock to
0 or noop(). Make get/set_rtc_noop() to be public and reuse
them in the ms_hyperv_init_platform().

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 3 +++
arch/x86/include/asm/x86_init.h | 2 ++
arch/x86/kernel/x86_init.c | 4 ++--
3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index fa4de2761460..c0f3fa924163 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -391,6 +391,9 @@ __init void hv_sev_init_mem_and_cpu(void)
* bios regions and reserve resources are not
* available. Set these callback to NULL.
*/
+ x86_platform.legacy.rtc = 0;
+ x86_platform.set_wallclock = set_rtc_noop;
+ x86_platform.get_wallclock = get_rtc_noop;
x86_platform.legacy.reserve_bios_regions = 0;
x86_init.resources.probe_roms = x86_init_noop;
x86_init.resources.reserve_resources = x86_init_noop;
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6f873c686cd3..e5a6f9baec67 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -330,5 +330,7 @@ extern void x86_init_uint_noop(unsigned int unused);
extern bool bool_x86_init_noop(void);
extern void x86_op_int_noop(int cpu);
extern bool x86_pnpbios_disabled(void);
+extern int set_rtc_noop(const struct timespec64 *now);
+extern void get_rtc_noop(struct timespec64 *now);

#endif
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 95be3831df73..d82f4fa2f1bf 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -33,8 +33,8 @@ static int __init iommu_init_noop(void) { return 0; }
static void iommu_shutdown_noop(void) { }
bool __init bool_x86_init_noop(void) { return false; }
void x86_op_int_noop(int cpu) { }
-static __init int set_rtc_noop(const struct timespec64 *now) { return -EINVAL; }
-static __init void get_rtc_noop(struct timespec64 *now) { }
+int set_rtc_noop(const struct timespec64 *now) { return -EINVAL; }
+void get_rtc_noop(struct timespec64 *now) { }

static __initconst const struct of_device_id of_cmos_match[] = {
{ .compatible = "motorola,mc146818" },
--
2.25.1

2023-04-03 17:46:38

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 07/17] drivers: hv: Decrypt percpu hvcall input arg page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Hypervisor needs to access iput arg page and guest should decrypt
the page.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V3:
* Use pgcount to decrypt memory.

Change since RFC V2:
* Set inputarg to be zero after kfree()
* Not free mem when fail to encrypt mem in the hv_common_cpu_die().
---
drivers/hv/hv_common.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index e3b04e978932..455432d49fd3 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -21,6 +21,7 @@
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>

@@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
u64 msr_vp_index;
gfp_t flags;
int pgcount = hv_root_partition ? 2 : 1;
+ int ret;

/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
if (!(*inputarg))
return -ENOMEM;

+ if (hv_isolation_type_en_snp()) {
+ ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+ if (ret) {
+ kfree(*inputarg);
+ *inputarg = NULL;
+ return ret;
+ }
+
+ memset(*inputarg, 0x00, PAGE_SIZE);
+ }
+
if (hv_root_partition) {
outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -156,6 +169,7 @@ int hv_common_cpu_die(unsigned int cpu)
{
unsigned long flags;
void **inputarg, **outputarg;
+ int pgcount = hv_root_partition ? 2 : 1;
void *mem;

local_irq_save(flags);
@@ -171,7 +185,12 @@ int hv_common_cpu_die(unsigned int cpu)

local_irq_restore(flags);

- kfree(mem);
+ if (hv_isolation_type_en_snp()) {
+ if (!set_memory_encrypted((unsigned long)mem, pgcount))
+ kfree(mem);
+ } else {
+ kfree(mem);
+ }

return 0;
}
--
2.25.1

2023-04-03 17:46:49

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 10/17] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]>

The wakeup_secondary_cpu callback was populated with wakeup_
cpu_via_vmgexit() which doesn't work for Hyper-V and Hyper-V
requires to call Hyper-V specific hvcall to start APs. So override
it with Hyper-V specific hook to start AP sev_es_save_area data
structure.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change sicne RFC v3:
* Replace struct sev_es_save_area with struct
vmcb_save_area
* Move code from mshyperv.c to ivm.c

Change since RFC v2:
* Add helper function to initialize segment
* Fix some coding style
---
arch/x86/hyperv/ivm.c | 89 +++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 18 +++++++
arch/x86/include/asm/sev.h | 13 +++++
arch/x86/include/asm/svm.h | 15 +++++-
arch/x86/kernel/cpu/mshyperv.c | 13 ++++-
arch/x86/kernel/sev.c | 4 +-
include/asm-generic/hyperv-tlfs.h | 19 +++++++
7 files changed, 166 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index c0f3fa924163..51243148b8e6 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -22,11 +22,15 @@
#include <asm/sev.h>
#include <asm/realmode.h>
#include <asm/e820/api.h>
+#include <asm/desc.h>

#ifdef CONFIG_AMD_MEM_ENCRYPT

#define GHCB_USAGE_HYPERV_CALL 1

+static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
+
union hv_ghcb {
struct ghcb ghcb;
struct {
@@ -437,6 +441,91 @@ __init void hv_sev_init_mem_and_cpu(void)
}
}

+#define hv_populate_vmcb_seg(seg, gdtr_base) \
+do { \
+ if (seg.selector) { \
+ seg.base = 0; \
+ seg.limit = HV_AP_SEGMENT_LIMIT; \
+ seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
+ seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
+ } \
+} while (0) \
+
+int hv_snp_boot_ap(int cpu, unsigned long start_ip)
+{
+ struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
+ __get_free_page(GFP_KERNEL | __GFP_ZERO);
+ struct desc_ptr gdtr;
+ u64 ret, retry = 5;
+ struct hv_start_virtual_processor_input *start_vp_input;
+ union sev_rmp_adjust rmp_adjust;
+ unsigned long flags;
+
+ native_store_gdt(&gdtr);
+
+ vmsa->gdtr.base = gdtr.address;
+ vmsa->gdtr.limit = gdtr.size;
+
+ asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
+ hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
+
+ asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
+ hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
+
+ asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
+ hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
+
+ asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
+ hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
+
+ vmsa->efer = native_read_msr(MSR_EFER);
+
+ asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
+ asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
+ asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
+
+ vmsa->xcr0 = 1;
+ vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
+ vmsa->rip = (u64)secondary_startup_64_no_verify;
+ vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
+
+ vmsa->sev_features.snp = 1;
+ vmsa->sev_features.restrict_injection = 1;
+
+ rmp_adjust.as_uint64 = 0;
+ rmp_adjust.target_vmpl = 1;
+ rmp_adjust.vmsa = 1;
+ ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
+ rmp_adjust.as_uint64);
+ if (ret != 0) {
+ pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
+ return ret;
+ }
+
+ local_irq_save(flags);
+ start_vp_input =
+ (struct hv_start_virtual_processor_input *)ap_start_input_arg;
+ memset(start_vp_input, 0, sizeof(*start_vp_input));
+ start_vp_input->partitionid = -1;
+ start_vp_input->vpindex = cpu;
+ start_vp_input->targetvtl = ms_hyperv.vtl;
+ *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
+
+ do {
+ ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
+ start_vp_input, NULL);
+ } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
+
+ if (!hv_result_success(ret)) {
+ pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
+ goto done;
+ }
+
+done:
+ local_irq_restore(flags);
+ return ret;
+}
+
void __init hv_vtom_init(void)
{
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index a4a59007b5f2..e23c987deb7a 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -55,6 +55,20 @@ struct memory_map_entry {
u32 reserved;
};

+/*
+ * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
+ * to start AP in enlightened SEV guest.
+ */
+#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT 0xffffffff
+
+/*
+ * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
+ * to start AP in enlightened SEV guest.
+ */
+#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT 0xffffffff
+
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -253,6 +267,8 @@ struct irq_domain *hv_create_pci_msi_domain(void);
int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);

#ifdef CONFIG_AMD_MEM_ENCRYPT
void hv_ghcb_msr_write(u64 msr, u64 value);
@@ -261,6 +277,7 @@ bool hv_ghcb_negotiate_protocol(void);
void hv_ghcb_terminate(unsigned int set, unsigned int reason);
void hv_vtom_init(void);
void hv_sev_init_mem_and_cpu(void);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);
#else
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
@@ -268,6 +285,7 @@ static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
static inline void hv_vtom_init(void) {}
static inline void hv_sev_init_mem_and_cpu(void) {}
+static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
#endif

extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..e34aaf730220 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

+union sev_rmp_adjust {
+ u64 as_uint64;
+ struct {
+ unsigned long target_vmpl : 8;
+ unsigned long enable_read : 1;
+ unsigned long enable_write : 1;
+ unsigned long enable_user_execute : 1;
+ unsigned long enable_kernel_execute : 1;
+ unsigned long reserved1 : 4;
+ unsigned long vmsa : 1;
+ };
+};
+
/* SNP Guest message request */
struct snp_req_data {
unsigned long req_gpa;
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b1..bcf970901512 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -423,7 +423,20 @@ struct sev_es_save_area {
u64 guest_exit_info_2;
u64 guest_exit_int_info;
u64 guest_nrip;
- u64 sev_features;
+ union {
+ struct {
+ u64 snp : 1;
+ u64 vtom : 1;
+ u64 reflectvc : 1;
+ u64 restrict_injection : 1;
+ u64 alternate_injection : 1;
+ u64 full_debug : 1;
+ u64 reserved1 : 1;
+ u64 snpbtb_isolation : 1;
+ u64 resrved2 : 56;
+ };
+ u64 val;
+ } sev_features;
u64 vintr_ctrl;
u64 guest_exit_code;
u64 virtual_tom;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 71820bbf9e90..829234ba8da5 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -300,6 +300,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)

native_smp_prepare_cpus(max_cpus);

+ /*
+ * Override wakeup_secondary_cpu callback for SEV-SNP
+ * enlightened guest.
+ */
+ if (hv_isolation_type_en_snp())
+ apic->wakeup_secondary_cpu = hv_snp_boot_ap;
+
+ if (!hv_root_partition)
+ return;
+
#ifdef CONFIG_X86_64
for_each_present_cpu(i) {
if (i == 0)
@@ -503,8 +513,7 @@ static void __init ms_hyperv_init_platform(void)

# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
- if (hv_root_partition)
- smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+ smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
# endif

/*
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 679026a640ef..b3f95fcb8b18 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1080,7 +1080,7 @@ static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
* SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
*/
vmsa->vmpl = 0;
- vmsa->sev_features = sev_status >> 2;
+ vmsa->sev_features.val = sev_status >> 2;

/* Switch the page over to a VMSA page now that it is initialized */
ret = snp_set_vmsa(vmsa, true);
@@ -1097,7 +1097,7 @@ static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
ghcb = __sev_get_ghcb(&state);

vc_ghcb_invalidate(ghcb);
- ghcb_set_rax(ghcb, vmsa->sev_features);
+ ghcb_set_rax(ghcb, vmsa->sev_features.val);
ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index ea406e901469..b2f14aa608f7 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -148,6 +148,7 @@ union hv_reference_tsc_msr {
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
+#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
@@ -165,6 +166,7 @@ union hv_reference_tsc_msr {
#define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
#define HVCALL_RETARGET_INTERRUPT 0x007e
+#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
@@ -220,6 +222,7 @@ enum HV_GENERIC_SET_FORMAT {
#define HV_STATUS_INVALID_PORT_ID 17
#define HV_STATUS_INVALID_CONNECTION_ID 18
#define HV_STATUS_INSUFFICIENT_BUFFERS 19
+#define HV_STATUS_TIME_OUT 0x78

/*
* The Hyper-V TimeRefCount register and the TSC
@@ -779,6 +782,22 @@ struct hv_input_unmap_device_interrupt {
struct hv_interrupt_entry interrupt_entry;
} __packed;

+struct hv_enable_vp_vtl_input {
+ u64 partitionid;
+ u32 vpindex;
+ u8 targetvtl;
+ u8 padding[3];
+ u8 context[0xe0];
+} __packed;
+
+struct hv_start_virtual_processor_input {
+ u64 partitionid;
+ u32 vpindex;
+ u8 targetvtl;
+ u8 padding[3];
+ u8 context[0xe0];
+} __packed;
+
#define HV_SOURCE_SHADOW_NONE 0x0
#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1

--
2.25.1

2023-04-03 17:46:55

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 05/17] clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Hyper-V tsc page is shared with hypervisor and it should
be decrypted in sev-snp enlightened guest when it's used.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Change the Subject line prefix
---
drivers/clocksource/hyperv_timer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index c0cef92b12b8..44da16ca203c 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -365,7 +365,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
static union {
struct ms_hyperv_tsc_page page;
u8 reserved[PAGE_SIZE];
-} tsc_pg __aligned(PAGE_SIZE);
+} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);

static struct ms_hyperv_tsc_page *tsc_page = &tsc_pg.page;
static unsigned long tsc_pfn;
--
2.25.1

2023-04-03 17:47:07

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 13/17] x86/sev: Add Check of #HV event in path

From: Tianyu Lan <[email protected]>

Add check_hv_pending() and check_hv_pending_after_irq() to
check queued #HV event when irq is disabled.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/entry/entry_64.S | 18 ++++++++++++++++
arch/x86/include/asm/irqflags.h | 11 ++++++++++
arch/x86/kernel/sev.c | 38 +++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d877774c3141..efa56dfde19e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1073,6 +1073,15 @@ SYM_CODE_END(paranoid_entry)
* R15 - old SPEC_CTRL
*/
SYM_CODE_START_LOCAL(paranoid_exit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ /*
+ * If a #HV was delivered during execution and interrupts were
+ * disabled, then check if it can be handled before the iret
+ * (which may re-enable interrupts).
+ */
+ mov %rsp, %rdi
+ call check_hv_pending
+#endif
UNWIND_HINT_REGS

/*
@@ -1197,6 +1206,15 @@ SYM_CODE_START(error_entry)
SYM_CODE_END(error_entry)

SYM_CODE_START_LOCAL(error_return)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ /*
+ * If a #HV was delivered during execution and interrupts were
+ * disabled, then check if it can be handled before the iret
+ * (which may re-enable interrupts).
+ */
+ mov %rsp, %rdi
+ call check_hv_pending
+#endif
UNWIND_HINT_REGS
DEBUG_ENTRY_ASSERT_IRQS_OFF
testb $3, CS(%rsp)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c5ae649d2df..8368e3fe2d36 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -11,6 +11,10 @@
/*
* Interrupt control:
*/
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+void check_hv_pending(struct pt_regs *regs);
+void check_hv_pending_irq_enable(void);
+#endif

/* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
extern inline unsigned long native_save_fl(void);
@@ -40,12 +44,19 @@ static __always_inline void native_irq_disable(void)
static __always_inline void native_irq_enable(void)
{
asm volatile("sti": : :"memory");
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ check_hv_pending_irq_enable();
+#endif
}

static __always_inline void native_safe_halt(void)
{
mds_idle_clear_cpu_buffers();
asm volatile("sti; hlt": : :"memory");
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ check_hv_pending_irq_enable();
+#endif
}

static __always_inline void native_halt(void)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 2684a45b50a6..6445f5356c45 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -179,6 +179,44 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
}

+static void do_exc_hv(struct pt_regs *regs)
+{
+ /* Handle #HV exception. */
+}
+
+void check_hv_pending(struct pt_regs *regs)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ if ((regs->flags & X86_EFLAGS_IF) == 0)
+ return;
+
+ do_exc_hv(regs);
+}
+
+void check_hv_pending_irq_enable(void)
+{
+ struct pt_regs regs;
+
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ memset(&regs, 0, sizeof(struct pt_regs));
+ asm volatile("movl %%cs, %%eax;" : "=a" (regs.cs));
+ asm volatile("movl %%ss, %%eax;" : "=a" (regs.ss));
+ regs.orig_ax = 0xffffffff;
+ regs.flags = native_save_fl();
+
+ /*
+ * Disable irq when handle pending #HV events after
+ * re-enabling irq.
+ */
+ asm volatile("cli" : : : "memory");
+ do_exc_hv(&regs);
+ asm volatile("sti" : : : "memory");
+}
+
void noinstr __sev_es_ist_exit(void)
{
unsigned long ist;
--
2.25.1

2023-04-03 17:47:14

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 15/17] x86/sev: optimize system vector processing invoked from #HV exception

From: Ashish Kalra <[email protected]>

Construct system vector table and dispatch system vector exceptions through
sysvec_table from #HV exception handler instead of explicitly calling each
system vector. The system vector table is created dynamically and is placed
in a new named ELF section.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/entry/entry_64.S | 6 +++
arch/x86/kernel/sev.c | 70 +++++++++++++----------------------
arch/x86/kernel/vmlinux.lds.S | 7 ++++
3 files changed, 38 insertions(+), 45 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index efa56dfde19e..802fb6dec244 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -419,6 +419,12 @@ SYM_CODE_START(\asmsym)

_ASM_NOKPROBE(\asmsym)
SYM_CODE_END(\asmsym)
+ .if \vector >= FIRST_SYSTEM_VECTOR && \vector < NR_VECTORS
+ .section .system_vectors, "aw"
+ .byte \vector
+ .quad \cfunc
+ .previous
+ .endif
.endm

/*
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7fcb3b548215..2c964f7ac7dc 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -151,6 +151,16 @@ struct sev_snp_runtime_data {

static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);

+static void (*sysvec_table[NR_VECTORS - FIRST_SYSTEM_VECTOR])
+ (struct pt_regs *regs) __ro_after_init;
+
+struct sysvec_entry {
+ unsigned char vector;
+ void (*sysvec_func)(struct pt_regs *regs);
+} __packed;
+
+extern struct sysvec_entry __system_vectors[], __system_vectors_end[];
+
static inline u64 sev_es_rd_ghcb_msr(void)
{
return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
@@ -216,51 +226,11 @@ static void do_exc_hv(struct pt_regs *regs)
} else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
WARN(1, "syscall shouldn't happen\n");
} else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
- switch (pending_events.vector) {
-#if IS_ENABLED(CONFIG_HYPERV)
- case HYPERV_STIMER0_VECTOR:
- sysvec_hyperv_stimer0(regs);
- break;
- case HYPERVISOR_CALLBACK_VECTOR:
- sysvec_hyperv_callback(regs);
- break;
-#endif
-#ifdef CONFIG_SMP
- case RESCHEDULE_VECTOR:
- sysvec_reschedule_ipi(regs);
- break;
- case IRQ_MOVE_CLEANUP_VECTOR:
- sysvec_irq_move_cleanup(regs);
- break;
- case REBOOT_VECTOR:
- sysvec_reboot(regs);
- break;
- case CALL_FUNCTION_SINGLE_VECTOR:
- sysvec_call_function_single(regs);
- break;
- case CALL_FUNCTION_VECTOR:
- sysvec_call_function(regs);
- break;
-#endif
-#ifdef CONFIG_X86_LOCAL_APIC
- case ERROR_APIC_VECTOR:
- sysvec_error_interrupt(regs);
- break;
- case SPURIOUS_APIC_VECTOR:
- sysvec_spurious_apic_interrupt(regs);
- break;
- case LOCAL_TIMER_VECTOR:
- sysvec_apic_timer_interrupt(regs);
- break;
- case X86_PLATFORM_IPI_VECTOR:
- sysvec_x86_platform_ipi(regs);
- break;
-#endif
- case 0x0:
- break;
- default:
- panic("Unexpected vector %d\n", vector);
- unreachable();
+ if (!(sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])) {
+ WARN(1, "system vector entry 0x%x is NULL\n",
+ pending_events.vector);
+ } else {
+ (*sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])(regs);
}
} else {
common_interrupt(regs, pending_events.vector);
@@ -383,6 +353,14 @@ static bool sev_restricted_injection_enabled(void)
return sev_status & MSR_AMD64_SNP_RESTRICTED_INJ;
}

+static void __init construct_sysvec_table(void)
+{
+ struct sysvec_entry *p;
+
+ for (p = __system_vectors; p < __system_vectors_end; p++)
+ sysvec_table[p->vector - FIRST_SYSTEM_VECTOR] = p->sysvec_func;
+}
+
void __init sev_snp_init_hv_handling(void)
{
struct sev_es_runtime_data *data;
@@ -407,6 +385,8 @@ void __init sev_snp_init_hv_handling(void)
apic_set_eoi_write(hv_doorbell_apic_eoi_write);

local_irq_restore(flags);
+
+ construct_sysvec_table();
}

static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 25f155205770..c37165d8e877 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -338,6 +338,13 @@ SECTIONS
*(.altinstr_replacement)
}

+ . = ALIGN(8);
+ .system_vectors : AT(ADDR(.system_vectors) - LOAD_OFFSET) {
+ __system_vectors = .;
+ *(.system_vectors)
+ __system_vectors_end = .;
+ }
+
. = ALIGN(8);
.apicdrivers : AT(ADDR(.apicdrivers) - LOAD_OFFSET) {
__apicdrivers = .;
--
2.25.1

2023-04-03 17:47:19

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

Read processor amd memory info from specific address which are
populated by Hyper-V. Initialize smp cpu related ops, pvalidate
system memory and add it into e820 table.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 78 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 16 +++++++
arch/x86/kernel/cpu/mshyperv.c | 3 ++
3 files changed, 97 insertions(+)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 368b2731950e..fa4de2761460 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -17,6 +17,11 @@
#include <asm/mem_encrypt.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
+#include <asm/coco.h>
+#include <asm/io_apic.h>
+#include <asm/sev.h>
+#include <asm/realmode.h>
+#include <asm/e820/api.h>

#ifdef CONFIG_AMD_MEM_ENCRYPT

@@ -57,6 +62,22 @@ union hv_ghcb {

static u16 hv_ghcb_version __ro_after_init;

+static u32 processor_count;
+
+static __init void hv_snp_get_smp_config(unsigned int early)
+{
+ if (!early) {
+ while (num_processors < processor_count) {
+ early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
+ early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
+ physid_set(num_processors, phys_cpu_present_map);
+ set_cpu_possible(num_processors, true);
+ set_cpu_present(num_processors, true);
+ num_processors++;
+ }
+ }
+}
+
u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
{
union hv_ghcb *hv_ghcb;
@@ -356,6 +377,63 @@ static bool hv_is_private_mmio(u64 addr)
return false;
}

+__init void hv_sev_init_mem_and_cpu(void)
+{
+ struct memory_map_entry *entry;
+ struct e820_entry *e820_entry;
+ u64 e820_end;
+ u64 ram_end;
+ u64 page;
+
+ /*
+ * Hyper-V enlightened snp guest boots kernel
+ * directly without bootloader and so roms,
+ * bios regions and reserve resources are not
+ * available. Set these callback to NULL.
+ */
+ x86_platform.legacy.reserve_bios_regions = 0;
+ x86_init.resources.probe_roms = x86_init_noop;
+ x86_init.resources.reserve_resources = x86_init_noop;
+ x86_init.mpparse.find_smp_config = x86_init_noop;
+ x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
+
+ /*
+ * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
+ * and legacy APIC page read/write. Switch to hv apic here.
+ */
+ disable_ioapic_support();
+
+ /* Read processor number and memory layout. */
+ processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
+ entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
+ + sizeof(struct memory_map_entry));
+
+ /*
+ * E820 table in the memory just describes memory for
+ * kernel, ACPI table, cmdline, boot params and ramdisk.
+ * Hyper-V popoulates the rest memory layout in the EN_SEV_
+ * SNP_PROCESSOR_INFO_ADDR.
+ */
+ for (; entry->numpages != 0; entry++) {
+ e820_entry = &e820_table->entries[
+ e820_table->nr_entries - 1];
+ e820_end = e820_entry->addr + e820_entry->size;
+ ram_end = (entry->starting_gpn +
+ entry->numpages) * PAGE_SIZE;
+
+ if (e820_end < entry->starting_gpn * PAGE_SIZE)
+ e820_end = entry->starting_gpn * PAGE_SIZE;
+
+ if (e820_end < ram_end) {
+ pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
+ e820__range_add(e820_end, ram_end - e820_end,
+ E820_TYPE_RAM);
+ for (page = e820_end; page < ram_end; page += PAGE_SIZE)
+ pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
+ }
+ }
+}
+
void __init hv_vtom_init(void)
{
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 3c15e23162e7..a4a59007b5f2 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -41,6 +41,20 @@ extern bool hv_isolation_type_en_snp(void);

extern union hv_ghcb * __percpu *hv_ghcb_pg;

+/*
+ * Hyper-V puts processor and memory layout info
+ * to this address in SEV-SNP enlightened guest.
+ */
+#define EN_SEV_SNP_PROCESSOR_INFO_ADDR 0x802000
+
+struct memory_map_entry {
+ u64 starting_gpn;
+ u64 numpages;
+ u16 type;
+ u16 flags;
+ u32 reserved;
+};
+
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -246,12 +260,14 @@ void hv_ghcb_msr_read(u64 msr, u64 *value);
bool hv_ghcb_negotiate_protocol(void);
void hv_ghcb_terminate(unsigned int set, unsigned int reason);
void hv_vtom_init(void);
+void hv_sev_init_mem_and_cpu(void);
#else
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
static inline void hv_vtom_init(void) {}
+static inline void hv_sev_init_mem_and_cpu(void) {}
#endif

extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 2f2dcb2370b6..71820bbf9e90 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -529,6 +529,9 @@ static void __init ms_hyperv_init_platform(void)
if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
mark_tsc_unstable("running on Hyper-V");

+ if (hv_isolation_type_en_snp())
+ hv_sev_init_mem_and_cpu();
+
hardlockup_detector_disable();
}

--
2.25.1

2023-04-03 17:47:26

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 04/17] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest

From: Tianyu Lan <[email protected]>

In sev-snp enlightened guest, Hyper-V hypercall needs
to use vmmcall to trigger vmexit and notify hypervisor
to handle hypercall request.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Fix indentation style
---
arch/x86/include/asm/mshyperv.h | 44 ++++++++++++++++++++++++---------
1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 2fa40c6ca99f..3c15e23162e7 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -52,16 +52,25 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
u64 hv_status;

#ifdef CONFIG_X86_64
- if (!hv_hypercall_pg)
- return U64_MAX;
+ if (hv_isolation_type_en_snp()) {
+ __asm__ __volatile__("mov %4, %%r8\n"
+ "vmmcall"
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input_address)
+ : "r" (output_address)
+ : "cc", "memory", "r8", "r9", "r10", "r11");
+ } else {
+ if (!hv_hypercall_pg)
+ return U64_MAX;

- __asm__ __volatile__("mov %4, %%r8\n"
- CALL_NOSPEC
- : "=a" (hv_status), ASM_CALL_CONSTRAINT,
- "+c" (control), "+d" (input_address)
- : "r" (output_address),
- THUNK_TARGET(hv_hypercall_pg)
- : "cc", "memory", "r8", "r9", "r10", "r11");
+ __asm__ __volatile__("mov %4, %%r8\n"
+ CALL_NOSPEC
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input_address)
+ : "r" (output_address),
+ THUNK_TARGET(hv_hypercall_pg)
+ : "cc", "memory", "r8", "r9", "r10", "r11");
+ }
#else
u32 input_address_hi = upper_32_bits(input_address);
u32 input_address_lo = lower_32_bits(input_address);
@@ -95,7 +104,13 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
u64 hv_status;

#ifdef CONFIG_X86_64
- {
+ if (hv_isolation_type_en_snp()) {
+ __asm__ __volatile__(
+ "vmmcall"
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input1)
+ :: "cc", "r8", "r9", "r10", "r11");
+ } else {
__asm__ __volatile__(CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
"+c" (control), "+d" (input1)
@@ -140,7 +155,14 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
u64 hv_status;

#ifdef CONFIG_X86_64
- {
+ if (hv_isolation_type_en_snp()) {
+ __asm__ __volatile__("mov %4, %%r8\n"
+ "vmmcall"
+ : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+ "+c" (control), "+d" (input1)
+ : "r" (input2)
+ : "cc", "r8", "r9", "r10", "r11");
+ } else {
__asm__ __volatile__("mov %4, %%r8\n"
CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
--
2.25.1

2023-04-03 17:47:27

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 12/17] x86/sev: Add a #HV exception handler

From: Tianyu Lan <[email protected]>

Add a #HV exception handler that uses IST stack.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V2:
* Remove unnecessary line in the change log.
---
arch/x86/entry/entry_64.S | 60 +++++++++++++++++++++++++++
arch/x86/include/asm/cpu_entry_area.h | 6 +++
arch/x86/include/asm/idtentry.h | 39 ++++++++++++++++-
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/trapnr.h | 1 +
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/common.c | 1 +
arch/x86/kernel/dumpstack_64.c | 9 +++-
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/sev.c | 53 +++++++++++++++++++++++
arch/x86/kernel/traps.c | 40 ++++++++++++++++++
arch/x86/mm/cpu_entry_area.c | 2 +
12 files changed, 211 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index eccc3431e515..d877774c3141 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -570,6 +570,66 @@ SYM_CODE_START(\asmsym)
.Lfrom_usermode_switch_stack_\@:
idtentry_body user_\cfunc, has_error_code=1

+_ASM_NOKPROBE(\asmsym)
+SYM_CODE_END(\asmsym)
+.endm
+/*
+ * idtentry_hv - Macro to generate entry stub for #HV
+ * @vector: Vector number
+ * @asmsym: ASM symbol for the entry point
+ * @cfunc: C function to be called
+ *
+ * The macro emits code to set up the kernel context for #HV. The #HV handler
+ * runs on an IST stack and needs to be able to support nested #HV exceptions.
+ *
+ * To make this work the #HV entry code tries its best to pretend it doesn't use
+ * an IST stack by switching to the task stack if coming from user-space (which
+ * includes early SYSCALL entry path) or back to the stack in the IRET frame if
+ * entered from kernel-mode.
+ *
+ * If entered from kernel-mode the return stack is validated first, and if it is
+ * not safe to use (e.g. because it points to the entry stack) the #HV handler
+ * will switch to a fall-back stack (HV2) and call a special handler function.
+ *
+ * The macro is only used for one vector, but it is planned to be extended in
+ * the future for the #HV exception.
+ */
+.macro idtentry_hv vector asmsym cfunc
+SYM_CODE_START(\asmsym)
+ UNWIND_HINT_IRET_REGS
+ ENDBR
+ ASM_CLAC
+ pushq $-1 /* ORIG_RAX: no syscall to restart */
+
+ testb $3, CS-ORIG_RAX(%rsp)
+ jnz .Lfrom_usermode_switch_stack_\@
+
+ call paranoid_entry
+
+ UNWIND_HINT_REGS
+
+ /*
+ * Switch off the IST stack to make it free for nested exceptions.
+ */
+ movq %rsp, %rdi /* pt_regs pointer */
+ call hv_switch_off_ist
+ movq %rax, %rsp /* Switch to new stack */
+
+ ENCODE_FRAME_POINTER
+ UNWIND_HINT_REGS
+
+ /* Update pt_regs */
+ movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
+ movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
+
+ movq %rsp, %rdi /* pt_regs pointer */
+ call kernel_\cfunc
+
+ jmp paranoid_exit
+
+.Lfrom_usermode_switch_stack_\@:
+ idtentry_body user_\cfunc, has_error_code=1
+
_ASM_NOKPROBE(\asmsym)
SYM_CODE_END(\asmsym)
.endm
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 462fc34f1317..2186ed601b4a 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -30,6 +30,10 @@
char VC_stack[optional_stack_size]; \
char VC2_stack_guard[guardsize]; \
char VC2_stack[optional_stack_size]; \
+ char HV_stack_guard[guardsize]; \
+ char HV_stack[optional_stack_size]; \
+ char HV2_stack_guard[guardsize]; \
+ char HV2_stack[optional_stack_size]; \
char IST_top_guard[guardsize]; \

/* The exception stacks' physical storage. No guard pages required */
@@ -52,6 +56,8 @@ enum exception_stack_ordering {
ESTACK_MCE,
ESTACK_VC,
ESTACK_VC2,
+ ESTACK_HV,
+ ESTACK_HV2,
N_EXCEPTION_STACKS
};

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b241af4ce9b4..22ac10a7b34f 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -317,6 +317,19 @@ static __always_inline void __##func(struct pt_regs *regs)
__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code); \
__visible noinstr void user_##func(struct pt_regs *regs, unsigned long error_code)

+
+/**
+ * DECLARE_IDTENTRY_HV - Declare functions for the HV entry point
+ * @vector: Vector number (ignored for C)
+ * @func: Function name of the entry point
+ *
+ * Maps to DECLARE_IDTENTRY_RAW, but declares also the user C handler.
+ */
+#define DECLARE_IDTENTRY_HV(vector, func) \
+ DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func); \
+ __visible noinstr void kernel_##func(struct pt_regs *regs); \
+ __visible noinstr void user_##func(struct pt_regs *regs)
+
/**
* DEFINE_IDTENTRY_IST - Emit code for IST entry points
* @func: Function name of the entry point
@@ -376,6 +389,26 @@ static __always_inline void __##func(struct pt_regs *regs)
#define DEFINE_IDTENTRY_VC_USER(func) \
DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func)

+/**
+ * DEFINE_IDTENTRY_HV_KERNEL - Emit code for HV injection handler
+ * when raised from kernel mode
+ * @func: Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_KERNEL(func) \
+ DEFINE_IDTENTRY_RAW(kernel_##func)
+
+/**
+ * DEFINE_IDTENTRY_HV_USER - Emit code for HV injection handler
+ * when raised from user mode
+ * @func: Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_USER(func) \
+ DEFINE_IDTENTRY_RAW(user_##func)
+
#else /* CONFIG_X86_64 */

/**
@@ -465,6 +498,9 @@ __visible noinstr void func(struct pt_regs *regs, \
# define DECLARE_IDTENTRY_VC(vector, func) \
idtentry_vc vector asm_##func func

+# define DECLARE_IDTENTRY_HV(vector, func) \
+ idtentry_hv vector asm_##func func
+
#else
# define DECLARE_IDTENTRY_MCE(vector, func) \
DECLARE_IDTENTRY(vector, func)
@@ -618,9 +654,10 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection);
#endif

-/* #VC */
+/* #VC & #HV */
#ifdef CONFIG_AMD_MEM_ENCRYPT
DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication);
+DECLARE_IDTENTRY_HV(X86_TRAP_HV, exc_hv_injection);
#endif

#ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index e9e2c3ba5923..0bd7dab676c5 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -29,6 +29,7 @@
#define IST_INDEX_DB 2
#define IST_INDEX_MCE 3
#define IST_INDEX_VC 4
+#define IST_INDEX_HV 5

/*
* Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h
index f5d2325aa0b7..c6583631cecb 100644
--- a/arch/x86/include/asm/trapnr.h
+++ b/arch/x86/include/asm/trapnr.h
@@ -26,6 +26,7 @@
#define X86_TRAP_XF 19 /* SIMD Floating-Point Exception */
#define X86_TRAP_VE 20 /* Virtualization Exception */
#define X86_TRAP_CP 21 /* Control Protection Exception */
+#define X86_TRAP_HV 28 /* HV injected exception in SNP restricted mode */
#define X86_TRAP_VC 29 /* VMM Communication Exception */
#define X86_TRAP_IRET 32 /* IRET Exception */

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..6795d3e517d6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,6 +16,7 @@ asmlinkage __visible notrace
struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *eregs);
#endif

extern bool ibt_selftest(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..5bc44bcf6e48 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2172,6 +2172,7 @@ static inline void tss_setup_ist(struct tss_struct *tss)
tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
/* Only mapped when SEV-ES is active */
tss->x86_tss.ist[IST_INDEX_VC] = __this_cpu_ist_top_va(VC);
+ tss->x86_tss.ist[IST_INDEX_HV] = __this_cpu_ist_top_va(HV);
}

#else /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f05339fee778..6d8f8864810c 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -26,11 +26,14 @@ static const char * const exception_stack_names[] = {
[ ESTACK_MCE ] = "#MC",
[ ESTACK_VC ] = "#VC",
[ ESTACK_VC2 ] = "#VC2",
+ [ ESTACK_HV ] = "#HV",
+ [ ESTACK_HV2 ] = "#HV2",
+
};

const char *stack_type_name(enum stack_type type)
{
- BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+ BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);

if (type == STACK_TYPE_TASK)
return "TASK";
@@ -89,6 +92,8 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
EPAGERANGE(MCE),
EPAGERANGE(VC),
EPAGERANGE(VC2),
+ EPAGERANGE(HV),
+ EPAGERANGE(HV2),
};

static __always_inline bool in_exception_stack(unsigned long *stack, struct stack_info *info)
@@ -98,7 +103,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac
struct pt_regs *regs;
unsigned int k;

- BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+ BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);

begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
/*
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a58c6bc1cd68..48c0a7e1dbcb 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -113,6 +113,7 @@ static const __initconst struct idt_data def_idts[] = {

#ifdef CONFIG_AMD_MEM_ENCRYPT
ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC),
+ ISTG(X86_TRAP_HV, asm_exc_hv_injection, IST_INDEX_HV),
#endif

SYSG(X86_TRAP_OF, asm_exc_overflow),
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b3f95fcb8b18..2684a45b50a6 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2004,6 +2004,59 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
irqentry_exit_to_user_mode(regs);
}

+static bool hv_raw_handle_exception(struct pt_regs *regs)
+{
+ return false;
+}
+
+static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
+{
+ unsigned long sp = (unsigned long)regs;
+
+ return (sp >= __this_cpu_ist_bottom_va(HV2) && sp < __this_cpu_ist_top_va(HV2));
+}
+
+DEFINE_IDTENTRY_HV_USER(exc_hv_injection)
+{
+ irqentry_enter_from_user_mode(regs);
+ instrumentation_begin();
+
+ if (!hv_raw_handle_exception(regs)) {
+ /*
+ * Do not kill the machine if user-space triggered the
+ * exception. Send SIGBUS instead and let user-space deal
+ * with it.
+ */
+ force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
+ }
+
+ instrumentation_end();
+ irqentry_exit_to_user_mode(regs);
+}
+
+DEFINE_IDTENTRY_HV_KERNEL(exc_hv_injection)
+{
+ irqentry_state_t irq_state;
+
+ irq_state = irqentry_nmi_enter(regs);
+ instrumentation_begin();
+
+ if (!hv_raw_handle_exception(regs)) {
+ pr_emerg("PANIC: Unhandled #HV exception in kernel space\n");
+
+ /* Show some debug info */
+ show_regs(regs);
+
+ /* Ask hypervisor to sev_es_terminate */
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
+ panic("Returned from Terminate-Request to Hypervisor\n");
+ }
+
+ instrumentation_end();
+ irqentry_nmi_exit(regs, irq_state);
+}
+
bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
{
unsigned long exit_code = regs->orig_ax;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d317dc3d06a3..d29debec8134 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -905,6 +905,46 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r

return regs_ret;
}
+
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *regs)
+{
+ unsigned long sp, *stack;
+ struct stack_info info;
+ struct pt_regs *regs_ret;
+
+ /*
+ * In the SYSCALL entry path the RSP value comes from user-space - don't
+ * trust it and switch to the current kernel stack
+ */
+ if (ip_within_syscall_gap(regs)) {
+ sp = this_cpu_read(pcpu_hot.top_of_stack);
+ goto sync;
+ }
+
+ /*
+ * From here on the RSP value is trusted. Now check whether entry
+ * happened from a safe stack. Not safe are the entry or unknown stacks,
+ * use the fall-back stack instead in this case.
+ */
+ sp = regs->sp;
+ stack = (unsigned long *)sp;
+
+ if (!get_stack_info_noinstr(stack, current, &info) || info.type == STACK_TYPE_ENTRY ||
+ info.type > STACK_TYPE_EXCEPTION_LAST)
+ sp = __this_cpu_ist_top_va(HV2);
+sync:
+ /*
+ * Found a safe stack - switch to it as if the entry didn't happen via
+ * IST stack. The code below only copies pt_regs, the real switch happens
+ * in assembly code.
+ */
+ sp = ALIGN_DOWN(sp, 8) - sizeof(*regs_ret);
+
+ regs_ret = (struct pt_regs *)sp;
+ *regs_ret = *regs;
+
+ return regs_ret;
+}
#endif

asmlinkage __visible noinstr struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index 7316a8224259..3ec844cef652 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -153,6 +153,8 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
cea_map_stack(VC);
cea_map_stack(VC2);
+ cea_map_stack(HV);
+ cea_map_stack(HV2);
}
}
}
--
2.25.1

2023-04-03 17:48:56

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 17/17] x86/sev: Remove restrict interrupt injection from SNP_FEATURES_IMPL_REQ

From: Tianyu Lan <[email protected]>

Enabled restrict interrupt injection function. Remove MSR_AMD64_
SNP_RESTRICTED_INJ from SNP_FEATURES_IMPL_REQ to let kernel boot
up with this function.
---
arch/x86/boot/compressed/sev.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index d63ad8f99f83..a5f41301a600 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -299,7 +299,6 @@ static void enforce_vmpl0(void)
*/
#define SNP_FEATURES_IMPL_REQ (MSR_AMD64_SNP_VTOM | \
MSR_AMD64_SNP_REFLECT_VC | \
- MSR_AMD64_SNP_RESTRICTED_INJ | \
MSR_AMD64_SNP_ALT_INJ | \
MSR_AMD64_SNP_DEBUG_SWAP | \
MSR_AMD64_SNP_VMPL_SSS | \
--
2.25.1

2023-04-03 17:48:57

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 14/17] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Tianyu Lan <[email protected]>

Enable #HV exception to handle interrupt requests from hypervisor.

Co-developed-by: Lendacky Thomas <[email protected]>
Co-developed-by: Kalra Ashish <[email protected]>
Signed-off-by: Tianyu Lan <[email protected]>
---
Change since RFC V3:
* Check NMI event when irq is disabled.
* Remove redundant variable
---
arch/x86/include/asm/mem_encrypt.h | 2 +
arch/x86/include/uapi/asm/svm.h | 4 +
arch/x86/kernel/sev.c | 314 ++++++++++++++++++++++++-----
arch/x86/kernel/traps.c | 2 +
4 files changed, 266 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index b7126701574c..9299caeca69f 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
void __init mem_encrypt_free_decrypted_mem(void);

void __init sev_es_init_vc_handling(void);
+void __init sev_snp_init_hv_handling(void);

#define __bss_decrypted __section(".bss..decrypted")

@@ -73,6 +74,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
static inline void __init sme_enable(struct boot_params *bp) { }

static inline void sev_es_init_vc_handling(void) { }
+static inline void sev_snp_init_hv_handling(void) { }

static inline int __init
early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 80e1df482337..828d624a38cf 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -115,6 +115,10 @@
#define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
#define SVM_VMGEXIT_AP_CREATE 1
#define SVM_VMGEXIT_AP_DESTROY 2
+#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
+#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
+#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
+#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
#define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
#define SVM_VMGEXIT_TERM_REQUEST 0x8000fffe
#define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code) \
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 6445f5356c45..7fcb3b548215 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -122,6 +122,152 @@ struct sev_config {

static struct sev_config sev_cfg __read_mostly;

+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
+static noinstr void __sev_put_ghcb(struct ghcb_state *state);
+static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
+static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
+
+union hv_pending_events {
+ u16 events;
+ struct {
+ u8 vector;
+ u8 nmi : 1;
+ u8 mc : 1;
+ u8 reserved1 : 5;
+ u8 no_further_signal : 1;
+ };
+};
+
+struct sev_hv_doorbell_page {
+ union hv_pending_events pending_events;
+ u8 no_eoi_required;
+ u8 reserved2[61];
+ u8 padding[4032];
+};
+
+struct sev_snp_runtime_data {
+ struct sev_hv_doorbell_page hv_doorbell_page;
+};
+
+static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+ return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+}
+
+static __always_inline void sev_es_wr_ghcb_msr(u64 val)
+{
+ u32 low, high;
+
+ low = (u32)(val);
+ high = (u32)(val >> 32);
+
+ native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+}
+
+struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
+{
+ return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
+}
+
+static u8 sev_hv_pending(void)
+{
+ return sev_snp_current_doorbell_page()->pending_events.events;
+}
+
+#define sev_hv_pending_nmi \
+ sev_snp_current_doorbell_page()->pending_events.nmi
+
+static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
+{
+ if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
+ return;
+
+ BUG_ON(reg != APIC_EOI);
+ apic->write(reg, val);
+}
+
+static void do_exc_hv(struct pt_regs *regs)
+{
+ union hv_pending_events pending_events;
+
+ while (sev_hv_pending()) {
+ pending_events.events = xchg(
+ &sev_snp_current_doorbell_page()->pending_events.events,
+ 0);
+
+ if (pending_events.nmi)
+ exc_nmi(regs);
+
+#ifdef CONFIG_X86_MCE
+ if (pending_events.mc)
+ exc_machine_check(regs);
+#endif
+
+ if (!pending_events.vector)
+ return;
+
+ if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
+ /* Exception vectors */
+ WARN(1, "exception shouldn't happen\n");
+ } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
+ sysvec_irq_move_cleanup(regs);
+ } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
+ WARN(1, "syscall shouldn't happen\n");
+ } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
+ switch (pending_events.vector) {
+#if IS_ENABLED(CONFIG_HYPERV)
+ case HYPERV_STIMER0_VECTOR:
+ sysvec_hyperv_stimer0(regs);
+ break;
+ case HYPERVISOR_CALLBACK_VECTOR:
+ sysvec_hyperv_callback(regs);
+ break;
+#endif
+#ifdef CONFIG_SMP
+ case RESCHEDULE_VECTOR:
+ sysvec_reschedule_ipi(regs);
+ break;
+ case IRQ_MOVE_CLEANUP_VECTOR:
+ sysvec_irq_move_cleanup(regs);
+ break;
+ case REBOOT_VECTOR:
+ sysvec_reboot(regs);
+ break;
+ case CALL_FUNCTION_SINGLE_VECTOR:
+ sysvec_call_function_single(regs);
+ break;
+ case CALL_FUNCTION_VECTOR:
+ sysvec_call_function(regs);
+ break;
+#endif
+#ifdef CONFIG_X86_LOCAL_APIC
+ case ERROR_APIC_VECTOR:
+ sysvec_error_interrupt(regs);
+ break;
+ case SPURIOUS_APIC_VECTOR:
+ sysvec_spurious_apic_interrupt(regs);
+ break;
+ case LOCAL_TIMER_VECTOR:
+ sysvec_apic_timer_interrupt(regs);
+ break;
+ case X86_PLATFORM_IPI_VECTOR:
+ sysvec_x86_platform_ipi(regs);
+ break;
+#endif
+ case 0x0:
+ break;
+ default:
+ panic("Unexpected vector %d\n", vector);
+ unreachable();
+ }
+ } else {
+ common_interrupt(regs, pending_events.vector);
+ }
+ }
+}
+
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -179,18 +325,19 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
}

-static void do_exc_hv(struct pt_regs *regs)
-{
- /* Handle #HV exception. */
-}
-
void check_hv_pending(struct pt_regs *regs)
{
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;

- if ((regs->flags & X86_EFLAGS_IF) == 0)
+ /* Handle NMI when irq is disabled. */
+ if ((regs->flags & X86_EFLAGS_IF) == 0) {
+ if (sev_hv_pending_nmi) {
+ exc_nmi(regs);
+ sev_hv_pending_nmi = 0;
+ }
return;
+ }

do_exc_hv(regs);
}
@@ -231,68 +378,35 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
}

-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+static bool sev_restricted_injection_enabled(void)
+{
+ return sev_status & MSR_AMD64_SNP_RESTRICTED_INJ;
+}
+
+void __init sev_snp_init_hv_handling(void)
{
struct sev_es_runtime_data *data;
+ struct ghcb_state state;
struct ghcb *ghcb;
+ unsigned long flags;

WARN_ON(!irqs_disabled());
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
+ return;

data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (unlikely(data->ghcb_active)) {
- /* GHCB is already in use - save its contents */
-
- if (unlikely(data->backup_ghcb_active)) {
- /*
- * Backup-GHCB is also already in use. There is no way
- * to continue here so just kill the machine. To make
- * panic() work, mark GHCBs inactive so that messages
- * can be printed out.
- */
- data->ghcb_active = false;
- data->backup_ghcb_active = false;
-
- instrumentation_begin();
- panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
- instrumentation_end();
- }
-
- /* Mark backup_ghcb active before writing to it */
- data->backup_ghcb_active = true;

- state->ghcb = &data->backup_ghcb;
+ local_irq_save(flags);

- /* Backup GHCB content */
- *state->ghcb = *ghcb;
- } else {
- state->ghcb = NULL;
- data->ghcb_active = true;
- }
+ ghcb = __sev_get_ghcb(&state);

- return ghcb;
-}
+ sev_snp_setup_hv_doorbell_page(ghcb);

-static inline u64 sev_es_rd_ghcb_msr(void)
-{
- return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
-}
-
-static __always_inline void sev_es_wr_ghcb_msr(u64 val)
-{
- u32 low, high;
+ __sev_put_ghcb(&state);

- low = (u32)(val);
- high = (u32)(val >> 32);
+ apic_set_eoi_write(hv_doorbell_apic_eoi_write);

- native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+ local_irq_restore(flags);
}

static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
@@ -553,6 +667,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"

+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (unlikely(data->ghcb_active)) {
+ /* GHCB is already in use - save its contents */
+
+ if (unlikely(data->backup_ghcb_active)) {
+ /*
+ * Backup-GHCB is also already in use. There is no way
+ * to continue here so just kill the machine. To make
+ * panic() work, mark GHCBs inactive so that messages
+ * can be printed out.
+ */
+ data->ghcb_active = false;
+ data->backup_ghcb_active = false;
+
+ instrumentation_begin();
+ panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+ instrumentation_end();
+ }
+
+ /* Mark backup_ghcb active before writing to it */
+ data->backup_ghcb_active = true;
+
+ state->ghcb = &data->backup_ghcb;
+
+ /* Backup GHCB content */
+ *state->ghcb = *ghcb;
+ } else {
+ state->ghcb = NULL;
+ data->ghcb_active = true;
+ }
+
+ return ghcb;
+}
+
+static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
+{
+ u64 pa;
+ enum es_result ret;
+
+ pa = __pa(sev_snp_current_doorbell_page());
+ vc_ghcb_invalidate(ghcb);
+ ret = vmgexit_hv_doorbell_page(ghcb,
+ SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
+ pa);
+ if (ret != ES_OK)
+ panic("SEV-SNP: failed to set up #HV doorbell page");
+}
+
static noinstr void __sev_put_ghcb(struct ghcb_state *state)
{
struct sev_es_runtime_data *data;
@@ -1281,6 +1458,7 @@ static void snp_register_per_cpu_ghcb(void)
ghcb = &data->ghcb_page;

snp_register_ghcb_early(__pa(ghcb));
+ sev_snp_setup_hv_doorbell_page(ghcb);
}

void setup_ghcb(void)
@@ -1320,6 +1498,11 @@ void setup_ghcb(void)
snp_register_ghcb_early(__pa(&boot_ghcb_page));
}

+int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
+{
+ return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
+}
+
#ifdef CONFIG_HOTPLUG_CPU
static void sev_es_ap_hlt_loop(void)
{
@@ -1393,6 +1576,7 @@ static void __init alloc_runtime_data(int cpu)
static void __init init_ghcb(int cpu)
{
struct sev_es_runtime_data *data;
+ struct sev_snp_runtime_data *snp_data;
int err;

data = per_cpu(runtime_data, cpu);
@@ -1404,6 +1588,19 @@ static void __init init_ghcb(int cpu)

memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));

+ snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
+ if (!snp_data)
+ panic("Can't allocate SEV-SNP runtime data");
+
+ err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
+ sizeof(snp_data->hv_doorbell_page));
+ if (err)
+ panic("Can't map #HV doorbell pages unencrypted");
+
+ memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
+
+ per_cpu(snp_runtime_data, cpu) = snp_data;
+
data->ghcb_active = false;
data->backup_ghcb_active = false;
}
@@ -2044,7 +2241,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)

static bool hv_raw_handle_exception(struct pt_regs *regs)
{
- return false;
+ /* Clear the no_further_signal bit */
+ sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
+
+ check_hv_pending(regs);
+
+ return true;
}

static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d29debec8134..1aa6cab2394b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1503,5 +1503,7 @@ void __init trap_init(void)
cpu_init_exception_handling();
/* Setup traps as cpu_init() might #GP */
idt_setup_traps();
+ sev_snp_init_hv_handling();
+
cpu_init();
}
--
2.25.1

2023-04-03 17:49:07

by Tianyu Lan

[permalink] [raw]
Subject: [RFC PATCH V4 16/17] x86/sev: Fix interrupt exit code paths from #HV exception

From: Ashish Kalra <[email protected]>

Add checks in interrupt exit code paths in case of returns
to user mode to check if currently executing the #HV handler
then don't follow the irqentry_exit_to_user_mode path as
that can potentially cause the #HV handler to be
preempted and rescheduled on another CPU. Rescheduled #HV
handler on another cpu will cause interrupts to be handled
on a different cpu than the injected one, causing
invalid EOIs and missed/lost guest interrupts and
corresponding hangs and/or per-cpu IRQs handled on
non-intended cpu.

Signed-off-by: Ashish Kalra <[email protected]>
---
Change since RFC v3:
* Add check of hv_handling_events in the do_exc_hv()
to avoid nested entry.
---
arch/x86/include/asm/idtentry.h | 66 +++++++++++++++++++++++++++++++++
arch/x86/kernel/sev.c | 34 +++++++++++++++++
2 files changed, 100 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 22ac10a7b34f..08cc99936531 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -13,6 +13,10 @@

#include <asm/irq_stack.h>

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
+#endif
+
/**
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points
* No error code pushed by hardware
@@ -176,6 +180,7 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
#define DECLARE_IDTENTRY_IRQ(vector, func) \
DECLARE_IDTENTRY_ERRORCODE(vector, func)

+#ifndef CONFIG_AMD_MEM_ENCRYPT
/**
* DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
* @func: Function name of the entry point
@@ -205,6 +210,26 @@ __visible noinstr void func(struct pt_regs *regs, \
} \
\
static noinline void __##func(struct pt_regs *regs, u32 vector)
+#else
+
+#define DEFINE_IDTENTRY_IRQ(func) \
+static void __##func(struct pt_regs *regs, u32 vector); \
+ \
+__visible noinstr void func(struct pt_regs *regs, \
+ unsigned long error_code) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ u32 vector = (u32)(u8)error_code; \
+ \
+ instrumentation_begin(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ run_irq_on_irqstack_cond(__##func, regs, vector); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static noinline void __##func(struct pt_regs *regs, u32 vector)
+#endif

/**
* DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
@@ -221,6 +246,7 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
#define DECLARE_IDTENTRY_SYSVEC(vector, func) \
DECLARE_IDTENTRY(vector, func)

+#ifndef CONFIG_AMD_MEM_ENCRYPT
/**
* DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
* @func: Function name of the entry point
@@ -245,6 +271,26 @@ __visible noinstr void func(struct pt_regs *regs) \
} \
\
static noinline void __##func(struct pt_regs *regs)
+#else
+
+#define DEFINE_IDTENTRY_SYSVEC(func) \
+static void __##func(struct pt_regs *regs); \
+ \
+__visible noinstr void func(struct pt_regs *regs) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ \
+ instrumentation_begin(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ run_sysvec_on_irqstack_cond(__##func, regs); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static noinline void __##func(struct pt_regs *regs)
+#endif
+
+#ifndef CONFIG_AMD_MEM_ENCRYPT

/**
* DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
@@ -274,6 +320,26 @@ __visible noinstr void func(struct pt_regs *regs) \
} \
\
static __always_inline void __##func(struct pt_regs *regs)
+#else
+
+#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
+static __always_inline void __##func(struct pt_regs *regs); \
+ \
+__visible noinstr void func(struct pt_regs *regs) \
+{ \
+ irqentry_state_t state = irqentry_enter(regs); \
+ \
+ instrumentation_begin(); \
+ __irq_enter_raw(); \
+ kvm_set_cpu_l1tf_flush_l1d(); \
+ __##func(regs); \
+ __irq_exit_raw(); \
+ instrumentation_end(); \
+ irqentry_exit_hv_cond(regs, state); \
+} \
+ \
+static __always_inline void __##func(struct pt_regs *regs)
+#endif

/**
* DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 2c964f7ac7dc..32d7dd0159ac 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -147,6 +147,10 @@ struct sev_hv_doorbell_page {

struct sev_snp_runtime_data {
struct sev_hv_doorbell_page hv_doorbell_page;
+ /*
+ * Indication that we are currently handling #HV events.
+ */
+ bool hv_handling_events;
};

static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
@@ -202,6 +206,12 @@ static void do_exc_hv(struct pt_regs *regs)
{
union hv_pending_events pending_events;

+ /* Avoid nested entry. */
+ if (this_cpu_read(snp_runtime_data)->hv_handling_events)
+ return;
+
+ this_cpu_read(snp_runtime_data)->hv_handling_events = true;
+
while (sev_hv_pending()) {
pending_events.events = xchg(
&sev_snp_current_doorbell_page()->pending_events.events,
@@ -236,6 +246,8 @@ static void do_exc_hv(struct pt_regs *regs)
common_interrupt(regs, pending_events.vector);
}
}
+
+ this_cpu_read(snp_runtime_data)->hv_handling_events = false;
}

static __always_inline bool on_vc_stack(struct pt_regs *regs)
@@ -2533,3 +2545,25 @@ static int __init snp_init_platform_device(void)
return 0;
}
device_initcall(snp_init_platform_device);
+
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
+{
+ /*
+ * Check whether this returns to user mode, if so and if
+ * we are currently executing the #HV handler then we don't
+ * want to follow the irqentry_exit_to_user_mode path as
+ * that can potentially cause the #HV handler to be
+ * preempted and rescheduled on another CPU. Rescheduled #HV
+ * handler on another cpu will cause interrupts to be handled
+ * on a different cpu than the injected one, causing
+ * invalid EOIs and missed/lost guest interrupts and
+ * corresponding hangs and/or per-cpu IRQs handled on
+ * non-intended cpu.
+ */
+ if (user_mode(regs) &&
+ this_cpu_read(snp_runtime_data)->hv_handling_events)
+ return;
+
+ /* follow normal interrupt return/exit path */
+ irqentry_exit(regs, state);
+}
--
2.25.1

2023-04-03 18:25:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH V4 12/17] x86/sev: Add a #HV exception handler

On Mon, Apr 03, 2023 at 01:44:00PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Add a #HV exception handler that uses IST stack.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC V2:
> * Remove unnecessary line in the change log.
> ---
> arch/x86/entry/entry_64.S | 60 +++++++++++++++++++++++++++
> arch/x86/include/asm/cpu_entry_area.h | 6 +++
> arch/x86/include/asm/idtentry.h | 39 ++++++++++++++++-
> arch/x86/include/asm/page_64_types.h | 1 +
> arch/x86/include/asm/trapnr.h | 1 +
> arch/x86/include/asm/traps.h | 1 +
> arch/x86/kernel/cpu/common.c | 1 +
> arch/x86/kernel/dumpstack_64.c | 9 +++-
> arch/x86/kernel/idt.c | 1 +
> arch/x86/kernel/sev.c | 53 +++++++++++++++++++++++
> arch/x86/kernel/traps.c | 40 ++++++++++++++++++
> arch/x86/mm/cpu_entry_area.c | 2 +
> 12 files changed, 211 insertions(+), 3 deletions(-)

Same comment as here:

https://lore.kernel.org/r/20230331155714.GCZCcC2pHVZgIHr8k8@fat_crate.local

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-04-04 12:27:48

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V4 17/17] x86/sev: Remove restrict interrupt injection from SNP_FEATURES_IMPL_REQ


> Enabled restrict interrupt injection function. Remove MSR_AMD64_
> SNP_RESTRICTED_INJ from SNP_FEATURES_IMPL_REQ to let kernel boot
> up with this function.
> ---
> arch/x86/boot/compressed/sev.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index d63ad8f99f83..a5f41301a600 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -299,7 +299,6 @@ static void enforce_vmpl0(void)
> */
> #define SNP_FEATURES_IMPL_REQ (MSR_AMD64_SNP_VTOM | \
> MSR_AMD64_SNP_REFLECT_VC | \
> - MSR_AMD64_SNP_RESTRICTED_INJ | \

Should we update the bit in "SNP_FEATURES_PRESENT" instead?

Thanks,
Pankaj
> MSR_AMD64_SNP_ALT_INJ | \
> MSR_AMD64_SNP_DEBUG_SWAP | \
> MSR_AMD64_SNP_VMPL_SSS | \

2023-04-04 13:34:25

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 17/17] x86/sev: Remove restrict interrupt injection from SNP_FEATURES_IMPL_REQ


On 4/4/2023 8:25 PM, Gupta, Pankaj wrote:
>
>> Enabled restrict interrupt injection function. Remove MSR_AMD64_
>> SNP_RESTRICTED_INJ from SNP_FEATURES_IMPL_REQ to let kernel boot
>> up with this function.
>> ---
>>   arch/x86/boot/compressed/sev.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/arch/x86/boot/compressed/sev.c
>> b/arch/x86/boot/compressed/sev.c
>> index d63ad8f99f83..a5f41301a600 100644
>> --- a/arch/x86/boot/compressed/sev.c
>> +++ b/arch/x86/boot/compressed/sev.c
>> @@ -299,7 +299,6 @@ static void enforce_vmpl0(void)
>>    */
>>   #define SNP_FEATURES_IMPL_REQ    (MSR_AMD64_SNP_VTOM |            \
>>                    MSR_AMD64_SNP_REFLECT_VC |        \
>> -                 MSR_AMD64_SNP_RESTRICTED_INJ |        \
>
> Should we update the bit in "SNP_FEATURES_PRESENT" instead?
>

Nice cath! we should add the bit in the SNP_FEATURES_PRESENT.

2023-04-05 05:48:12

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V4 14/17] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv


> Enable #HV exception to handle interrupt requests from hypervisor.
>
> Co-developed-by: Lendacky Thomas <[email protected]>
> Co-developed-by: Kalra Ashish <[email protected]>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC V3:
> * Check NMI event when irq is disabled.
> * Remove redundant variable
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 +
> arch/x86/include/uapi/asm/svm.h | 4 +
> arch/x86/kernel/sev.c | 314 ++++++++++++++++++++++++-----
> arch/x86/kernel/traps.c | 2 +
> 4 files changed, 266 insertions(+), 56 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index b7126701574c..9299caeca69f 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> void __init mem_encrypt_free_decrypted_mem(void);
>
> void __init sev_es_init_vc_handling(void);
> +void __init sev_snp_init_hv_handling(void);
>
> #define __bss_decrypted __section(".bss..decrypted")
>
> @@ -73,6 +74,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline void sev_es_init_vc_handling(void) { }
> +static inline void sev_snp_init_hv_handling(void) { }
>
> static inline int __init
> early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 80e1df482337..828d624a38cf 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -115,6 +115,10 @@
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> #define SVM_VMGEXIT_AP_DESTROY 2
> +#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
> +#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
> +#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
> +#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
> #define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
> #define SVM_VMGEXIT_TERM_REQUEST 0x8000fffe
> #define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code) \
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 6445f5356c45..7fcb3b548215 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -122,6 +122,152 @@ struct sev_config {
>
> static struct sev_config sev_cfg __read_mostly;
>
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
> +static noinstr void __sev_put_ghcb(struct ghcb_state *state);
> +static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
> +
> +union hv_pending_events {
> + u16 events;
> + struct {
> + u8 vector;
> + u8 nmi : 1;
> + u8 mc : 1;
> + u8 reserved1 : 5;
> + u8 no_further_signal : 1;
> + };
> +};
> +
> +struct sev_hv_doorbell_page {
> + union hv_pending_events pending_events;
> + u8 no_eoi_required;
> + u8 reserved2[61];
> + u8 padding[4032];
> +};
> +
> +struct sev_snp_runtime_data {
> + struct sev_hv_doorbell_page hv_doorbell_page;
> +};
> +
> +static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> +
> +static inline u64 sev_es_rd_ghcb_msr(void)
> +{
> + return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> +}
> +
> +static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> +{
> + u32 low, high;
> +
> + low = (u32)(val);
> + high = (u32)(val >> 32);
> +
> + native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +}
> +
> +struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
> +{
> + return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
> +}
> +
> +static u8 sev_hv_pending(void)
> +{
> + return sev_snp_current_doorbell_page()->pending_events.events;
> +}
> +
> +#define sev_hv_pending_nmi \
> + sev_snp_current_doorbell_page()->pending_events.nmi
> +
> +static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
> +{
> + if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
> + return;
> +
> + BUG_ON(reg != APIC_EOI);
> + apic->write(reg, val);
> +}
> +
> +static void do_exc_hv(struct pt_regs *regs)
> +{
> + union hv_pending_events pending_events;
> +
> + while (sev_hv_pending()) {
> + pending_events.events = xchg(
> + &sev_snp_current_doorbell_page()->pending_events.events,
> + 0);
> +
> + if (pending_events.nmi)
> + exc_nmi(regs);
> +
> +#ifdef CONFIG_X86_MCE
> + if (pending_events.mc)
> + exc_machine_check(regs);
> +#endif
> +
> + if (!pending_events.vector)
> + return;

This prevents do_exc_hv() executing from #hv handler as
"hv_handling_events" never resets?

Thanks,
Pankaj
> +
> + if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
> + /* Exception vectors */
> + WARN(1, "exception shouldn't happen\n");
> + } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
> + sysvec_irq_move_cleanup(regs);
> + } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
> + WARN(1, "syscall shouldn't happen\n");
> + } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
> + switch (pending_events.vector) {
> +#if IS_ENABLED(CONFIG_HYPERV)
> + case HYPERV_STIMER0_VECTOR:
> + sysvec_hyperv_stimer0(regs);
> + break;
> + case HYPERVISOR_CALLBACK_VECTOR:
> + sysvec_hyperv_callback(regs);
> + break;
> +#endif
> +#ifdef CONFIG_SMP
> + case RESCHEDULE_VECTOR:
> + sysvec_reschedule_ipi(regs);
> + break;
> + case IRQ_MOVE_CLEANUP_VECTOR:
> + sysvec_irq_move_cleanup(regs);
> + break;
> + case REBOOT_VECTOR:
> + sysvec_reboot(regs);
> + break;
> + case CALL_FUNCTION_SINGLE_VECTOR:
> + sysvec_call_function_single(regs);
> + break;
> + case CALL_FUNCTION_VECTOR:
> + sysvec_call_function(regs);
> + break;
> +#endif
> +#ifdef CONFIG_X86_LOCAL_APIC
> + case ERROR_APIC_VECTOR:
> + sysvec_error_interrupt(regs);
> + break;
> + case SPURIOUS_APIC_VECTOR:
> + sysvec_spurious_apic_interrupt(regs);
> + break;
> + case LOCAL_TIMER_VECTOR:
> + sysvec_apic_timer_interrupt(regs);
> + break;
> + case X86_PLATFORM_IPI_VECTOR:
> + sysvec_x86_platform_ipi(regs);
> + break;
> +#endif
> + case 0x0:
> + break;
> + default:
> + panic("Unexpected vector %d\n", vector);
> + unreachable();
> + }
> + } else {
> + common_interrupt(regs, pending_events.vector);
> + }
> + }
> +}
> +
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> {
> unsigned long sp = regs->sp;
> @@ -179,18 +325,19 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
> }
>
> -static void do_exc_hv(struct pt_regs *regs)
> -{
> - /* Handle #HV exception. */
> -}
> -
> void check_hv_pending(struct pt_regs *regs)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> return;
>
> - if ((regs->flags & X86_EFLAGS_IF) == 0)
> + /* Handle NMI when irq is disabled. */
> + if ((regs->flags & X86_EFLAGS_IF) == 0) {
> + if (sev_hv_pending_nmi) {
> + exc_nmi(regs);
> + sev_hv_pending_nmi = 0;
> + }
> return;
> + }
>
> do_exc_hv(regs);
> }
> @@ -231,68 +378,35 @@ void noinstr __sev_es_ist_exit(void)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
> }
>
> -/*
> - * Nothing shall interrupt this code path while holding the per-CPU
> - * GHCB. The backup GHCB is only for NMIs interrupting this path.
> - *
> - * Callers must disable local interrupts around it.
> - */
> -static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +static bool sev_restricted_injection_enabled(void)
> +{
> + return sev_status & MSR_AMD64_SNP_RESTRICTED_INJ;
> +}
> +
> +void __init sev_snp_init_hv_handling(void)
> {
> struct sev_es_runtime_data *data;
> + struct ghcb_state state;
> struct ghcb *ghcb;
> + unsigned long flags;
>
> WARN_ON(!irqs_disabled());
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
> + return;
>
> data = this_cpu_read(runtime_data);
> - ghcb = &data->ghcb_page;
> -
> - if (unlikely(data->ghcb_active)) {
> - /* GHCB is already in use - save its contents */
> -
> - if (unlikely(data->backup_ghcb_active)) {
> - /*
> - * Backup-GHCB is also already in use. There is no way
> - * to continue here so just kill the machine. To make
> - * panic() work, mark GHCBs inactive so that messages
> - * can be printed out.
> - */
> - data->ghcb_active = false;
> - data->backup_ghcb_active = false;
> -
> - instrumentation_begin();
> - panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> - instrumentation_end();
> - }
> -
> - /* Mark backup_ghcb active before writing to it */
> - data->backup_ghcb_active = true;
>
> - state->ghcb = &data->backup_ghcb;
> + local_irq_save(flags);
>
> - /* Backup GHCB content */
> - *state->ghcb = *ghcb;
> - } else {
> - state->ghcb = NULL;
> - data->ghcb_active = true;
> - }
> + ghcb = __sev_get_ghcb(&state);
>
> - return ghcb;
> -}
> + sev_snp_setup_hv_doorbell_page(ghcb);
>
> -static inline u64 sev_es_rd_ghcb_msr(void)
> -{
> - return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> -}
> -
> -static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> -{
> - u32 low, high;
> + __sev_put_ghcb(&state);
>
> - low = (u32)(val);
> - high = (u32)(val >> 32);
> + apic_set_eoi_write(hv_doorbell_apic_eoi_write);
>
> - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> + local_irq_restore(flags);
> }
>
> static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
> @@ -553,6 +667,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
> /* Include code shared with pre-decompression boot stage */
> #include "sev-shared.c"
>
> +/*
> + * Nothing shall interrupt this code path while holding the per-CPU
> + * GHCB. The backup GHCB is only for NMIs interrupting this path.
> + *
> + * Callers must disable local interrupts around it.
> + */
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +{
> + struct sev_es_runtime_data *data;
> + struct ghcb *ghcb;
> +
> + WARN_ON(!irqs_disabled());
> +
> + data = this_cpu_read(runtime_data);
> + ghcb = &data->ghcb_page;
> +
> + if (unlikely(data->ghcb_active)) {
> + /* GHCB is already in use - save its contents */
> +
> + if (unlikely(data->backup_ghcb_active)) {
> + /*
> + * Backup-GHCB is also already in use. There is no way
> + * to continue here so just kill the machine. To make
> + * panic() work, mark GHCBs inactive so that messages
> + * can be printed out.
> + */
> + data->ghcb_active = false;
> + data->backup_ghcb_active = false;
> +
> + instrumentation_begin();
> + panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
> + instrumentation_end();
> + }
> +
> + /* Mark backup_ghcb active before writing to it */
> + data->backup_ghcb_active = true;
> +
> + state->ghcb = &data->backup_ghcb;
> +
> + /* Backup GHCB content */
> + *state->ghcb = *ghcb;
> + } else {
> + state->ghcb = NULL;
> + data->ghcb_active = true;
> + }
> +
> + return ghcb;
> +}
> +
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
> +{
> + u64 pa;
> + enum es_result ret;
> +
> + pa = __pa(sev_snp_current_doorbell_page());
> + vc_ghcb_invalidate(ghcb);
> + ret = vmgexit_hv_doorbell_page(ghcb,
> + SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
> + pa);
> + if (ret != ES_OK)
> + panic("SEV-SNP: failed to set up #HV doorbell page");
> +}
> +
> static noinstr void __sev_put_ghcb(struct ghcb_state *state)
> {
> struct sev_es_runtime_data *data;
> @@ -1281,6 +1458,7 @@ static void snp_register_per_cpu_ghcb(void)
> ghcb = &data->ghcb_page;
>
> snp_register_ghcb_early(__pa(ghcb));
> + sev_snp_setup_hv_doorbell_page(ghcb);
> }
>
> void setup_ghcb(void)
> @@ -1320,6 +1498,11 @@ void setup_ghcb(void)
> snp_register_ghcb_early(__pa(&boot_ghcb_page));
> }
>
> +int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
> +{
> + return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
> +}
> +
> #ifdef CONFIG_HOTPLUG_CPU
> static void sev_es_ap_hlt_loop(void)
> {
> @@ -1393,6 +1576,7 @@ static void __init alloc_runtime_data(int cpu)
> static void __init init_ghcb(int cpu)
> {
> struct sev_es_runtime_data *data;
> + struct sev_snp_runtime_data *snp_data;
> int err;
>
> data = per_cpu(runtime_data, cpu);
> @@ -1404,6 +1588,19 @@ static void __init init_ghcb(int cpu)
>
> memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));
>
> + snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
> + if (!snp_data)
> + panic("Can't allocate SEV-SNP runtime data");
> +
> + err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
> + sizeof(snp_data->hv_doorbell_page));
> + if (err)
> + panic("Can't map #HV doorbell pages unencrypted");
> +
> + memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
> +
> + per_cpu(snp_runtime_data, cpu) = snp_data;
> +
> data->ghcb_active = false;
> data->backup_ghcb_active = false;
> }
> @@ -2044,7 +2241,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
>
> static bool hv_raw_handle_exception(struct pt_regs *regs)
> {
> - return false;
> + /* Clear the no_further_signal bit */
> + sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
> +
> + check_hv_pending(regs);
> +
> + return true;
> }
>
> static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index d29debec8134..1aa6cab2394b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1503,5 +1503,7 @@ void __init trap_init(void)
> cpu_init_exception_handling();
> /* Setup traps as cpu_init() might #GP */
> idt_setup_traps();
> + sev_snp_init_hv_handling();
> +
> cpu_init();
> }

2023-04-12 14:13:39

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 02/17] Drivers: hv: vmbus: Decrypt vmbus ring buffer

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>
> The ring buffer is remapped in the hv_ringbuffer_init()
> and it should be with decrypt flag in order to share it
> with hypervisor in sev-snp enlightened guest.

This is the wrong commit message and subject for this patch. The v3 version
of this patch had the correct information about the VP Assist Page.

>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/hyperv/hv_init.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a5f9474f08e1..9f3e2d71d015 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -18,6 +18,7 @@
> #include <asm/hyperv-tlfs.h>
> #include <asm/mshyperv.h>
> #include <asm/idtentry.h>
> +#include <asm/set_memory.h>
> #include <linux/kexec.h>
> #include <linux/version.h>
> #include <linux/vmalloc.h>
> @@ -113,6 +114,11 @@ static int hv_cpu_init(unsigned int cpu)
>
> }
> if (!WARN_ON(!(*hvp))) {
> + if (hv_isolation_type_en_snp()) {
> + WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
> + memset(*hvp, 0, PAGE_SIZE);
> + }
> +
> msr.enable = 1;
> wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> }
> --
> 2.25.1

2023-04-12 14:32:24

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 03/17] x86/hyperv: Set Virtual Trust Level in VMBus init message

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>
> sev-snp guest provides vtl(Virtual Trust Level) and
> get it from hyperv hvcall via HVCALL_GET_VP_REGISTERS.
> Set target vtl in the VMBus init message.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC v3:
> * Use the standard helper functions to check hypercall result
> * Fix coding style
>
> Change since RFC v2:
> * Rename get_current_vtl() to get_vtl()
> * Fix some coding style issues
> ---
> arch/x86/hyperv/hv_init.c | 36 ++++++++++++++++++++++++++++++
> arch/x86/include/asm/hyperv-tlfs.h | 7 ++++++
> drivers/hv/connection.c | 1 +
> include/asm-generic/mshyperv.h | 2 ++
> include/linux/hyperv.h | 4 ++--
> 5 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 9f3e2d71d015..7602db5ad4aa 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -384,6 +384,40 @@ static void __init hv_get_partition_id(void)
> local_irq_restore(flags);
> }
>
> +static u8 __init get_vtl(void)
> +{
> + u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
> + struct hv_get_vp_registers_input *input;
> + struct hv_get_vp_registers_output *output;
> + u64 vtl = 0;
> + u64 ret;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + input = *(struct hv_get_vp_registers_input **)this_cpu_ptr(hyperv_pcpu_input_arg);

The cast to struct hv_get_vp_registers_input isn't needed here. Just do:

input = *this_cpu_ptr(hyperv_pcpu_input_arg);

I know we have other code that references hyperv_pcpu_input_arg with a more
complicated code sequence, but it's really not necessary. At some point, we
should go back and clean those up, but let's not add any new cases. :-)


> + output = (struct hv_get_vp_registers_output *)input;
> + if (!input) {
> + local_irq_restore(flags);
> + goto done;
> + }
> +
> + memset(input, 0, sizeof(*input) + sizeof(input->element[0]));

Use struct_size() to calculate the size of a structure plus a trailing
variable size array.

> + input->header.partitionid = HV_PARTITION_ID_SELF;
> + input->header.vpindex = HV_VP_INDEX_SELF;
> + input->header.inputvtl = 0;
> + input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
> +
> + ret = hv_do_hypercall(control, input, output);
> + if (hv_result_success(ret))
> + vtl = output->as64.low & HV_X64_VTL_MASK;
> + else
> + pr_err("Hyper-V: failed to get VTL!");

Let's include the hypercall status in the failure message. If a failure ever
happens, we will really want to know what that status is. :-)

> + local_irq_restore(flags);
> +
> +done:
> + return vtl;
> +}
> +
> /*
> * This function is to be invoked early in the boot sequence after the
> * hypervisor has been detected.
> @@ -512,6 +546,8 @@ void __init hyperv_init(void)
> /* Query the VMs extended capability once, so that it can be cached. */
> hv_query_ext_cap(0);
>
> + /* Find the VTL */
> + ms_hyperv.vtl = get_vtl();
> return;
>
> clean_guest_os_id:
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index b4fb75bd1013..3d3f014976e8 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -301,6 +301,13 @@ enum hv_isolation_type {
> #define HV_X64_MSR_TIME_REF_COUNT HV_REGISTER_TIME_REF_COUNT
> #define HV_X64_MSR_REFERENCE_TSC HV_REGISTER_REFERENCE_TSC
>
> +/*
> + * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
> + * there is not associated MSR address.
> + */
> +#define HV_X64_REGISTER_VSM_VP_STATUS 0x000D0003
> +#define HV_X64_VTL_MASK GENMASK(3, 0)
> +
> /* Hyper-V memory host visibility */
> enum hv_mem_host_visibility {
> VMBUS_PAGE_NOT_VISIBLE = 0,
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index f670cfd2e056..e4c39f4016ad 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
> */
> if (version >= VERSION_WIN10_V5) {
> msg->msg_sint = VMBUS_MESSAGE_SINT;
> + msg->msg_vtl = ms_hyperv.vtl;
> vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID_4;
> } else {
> msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index afcd9ae9588c..f91fb06634f0 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -48,6 +48,7 @@ struct ms_hyperv_info {
> };
> };
> u64 shared_gpa_boundary;
> + u8 vtl;
> };
> extern struct ms_hyperv_info ms_hyperv;
> extern bool hv_nested;
> @@ -58,6 +59,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
> extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
> extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> extern bool hv_isolation_type_snp(void);
> +extern bool hv_isolation_type_en_snp(void);

Why is this extern added as part of this patch? There are not any new
references to hv_isolation_type_en_snp() that would need it.

>
> /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
> static inline int hv_result(u64 status)
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index bfbc37ce223b..1f2bfec4abde 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
> u64 interrupt_page;
> struct {
> u8 msg_sint;
> - u8 padding1[3];
> - u32 padding2;
> + u8 msg_vtl;
> + u8 reserved[6];
> };
> };
> u64 monitor_page1;
> --
> 2.25.1

2023-04-12 14:35:02

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 06/17] x86/hyperv: decrypt VMBus pages for sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>

The Subject prefix for this patch is still wrong. I previously commented on
this. :-( It should be Drivers: hv: vmbus:


> VMBus post msg, synic event and message pages are shared
> with hypervisor and so decrypt these pages in the sev-snp guest.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC V3:
> * Set encrypt page back in the hv_synic_free()
>
> Change since RFC V2:
> * Fix error in the error code path and encrypt
> pages correctly when decryption failure happens.
> ---
> drivers/hv/hv.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 008234894d28..e09cea8f2f04 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -20,6 +20,7 @@
> #include <linux/interrupt.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/mshyperv.h>
> +#include <linux/set_memory.h>
> #include "hyperv_vmbus.h"
>
> /* The one and only */
> @@ -117,7 +118,7 @@ int hv_post_message(union hv_connection_id connection_id,
>
> int hv_synic_alloc(void)
> {
> - int cpu;
> + int cpu, ret;
> struct hv_per_cpu_context *hv_cpu;
>
> /*
> @@ -168,9 +169,39 @@ int hv_synic_alloc(void)
> pr_err("Unable to allocate post msg page\n");
> goto err;
> }
> +
> + if (hv_isolation_type_en_snp()) {
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + if (ret)
> + goto err;
> +
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + if (ret)
> + goto err_decrypt_event_page;
> +
> + ret = set_memory_decrypted((unsigned long)
> + hv_cpu->post_msg_page, 1);
> + if (ret)
> + goto err_decrypt_msg_page;
> +
> + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> + memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
> + }
> }
>
> return 0;
> +
> +err_decrypt_msg_page:
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> +
> +err_decrypt_event_page:
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> +
> err:
> /*
> * Any memory allocations that succeeded will be freed when
> @@ -191,6 +222,15 @@ void hv_synic_free(void)
> free_page((unsigned long)hv_cpu->synic_event_page);
> free_page((unsigned long)hv_cpu->synic_message_page);
> free_page((unsigned long)hv_cpu->post_msg_page);
> +
> + if (hv_isolation_type_en_snp()) {
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + set_memory_encrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + set_memory_encrypted((unsigned long)
> + hv_cpu->post_msg_page, 1);
> + }

The re-encryption must be done *before* pages are freed!

Furthermore, if the re-encryption fails, we should not free
the page as it would pollute the free memory pool. The best
we can do is leak the memory. See Patch 5 in Dexuan's
TDX series, which does the same thing (but still doesn't
get it quite right, per my comments).

Michael

> }
>
> kfree(hv_context.hv_numa_map);
> --
> 2.25.1

2023-04-12 14:36:39

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 07/17] drivers: hv: Decrypt percpu hvcall input arg page in sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>
> Hypervisor needs to access iput arg page and guest should decrypt
> the page.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC V3:
> * Use pgcount to decrypt memory.
>
> Change since RFC V2:
> * Set inputarg to be zero after kfree()
> * Not free mem when fail to encrypt mem in the hv_common_cpu_die().
> ---
> drivers/hv/hv_common.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index e3b04e978932..455432d49fd3 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -21,6 +21,7 @@
> #include <linux/ptrace.h>
> #include <linux/slab.h>
> #include <linux/dma-map-ops.h>
> +#include <linux/set_memory.h>
> #include <asm/hyperv-tlfs.h>
> #include <asm/mshyperv.h>
>
> @@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
> u64 msr_vp_index;
> gfp_t flags;
> int pgcount = hv_root_partition ? 2 : 1;
> + int ret;
>
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
> @@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
> if (!(*inputarg))
> return -ENOMEM;
>
> + if (hv_isolation_type_en_snp()) {
> + ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> + if (ret) {
> + kfree(*inputarg);
> + *inputarg = NULL;
> + return ret;
> + }
> +
> + memset(*inputarg, 0x00, PAGE_SIZE);

The memset() needs to use pgcount * PAGE_SIZE.

> + }
> +
> if (hv_root_partition) {
> outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> @@ -156,6 +169,7 @@ int hv_common_cpu_die(unsigned int cpu)
> {
> unsigned long flags;
> void **inputarg, **outputarg;
> + int pgcount = hv_root_partition ? 2 : 1;
> void *mem;
>
> local_irq_save(flags);
> @@ -171,7 +185,12 @@ int hv_common_cpu_die(unsigned int cpu)
>
> local_irq_restore(flags);
>
> - kfree(mem);
> + if (hv_isolation_type_en_snp()) {
> + if (!set_memory_encrypted((unsigned long)mem, pgcount))
> + kfree(mem);
> + } else {
> + kfree(mem);
> + }
>
> return 0;
> }
> --
> 2.25.1

2023-04-12 14:55:38

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>
> Read processor amd memory info from specific address which are
> populated by Hyper-V. Initialize smp cpu related ops, pvalidate
> system memory and add it into e820 table.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/hyperv/ivm.c | 78 +++++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 16 +++++++
> arch/x86/kernel/cpu/mshyperv.c | 3 ++
> 3 files changed, 97 insertions(+)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 368b2731950e..fa4de2761460 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -17,6 +17,11 @@
> #include <asm/mem_encrypt.h>
> #include <asm/mshyperv.h>
> #include <asm/hypervisor.h>
> +#include <asm/coco.h>
> +#include <asm/io_apic.h>
> +#include <asm/sev.h>
> +#include <asm/realmode.h>
> +#include <asm/e820/api.h>
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> @@ -57,6 +62,22 @@ union hv_ghcb {
>
> static u16 hv_ghcb_version __ro_after_init;
>
> +static u32 processor_count;
> +
> +static __init void hv_snp_get_smp_config(unsigned int early)
> +{
> + if (!early) {
> + while (num_processors < processor_count) {
> + early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
> + early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
> + physid_set(num_processors, phys_cpu_present_map);
> + set_cpu_possible(num_processors, true);
> + set_cpu_present(num_processors, true);
> + num_processors++;
> + }
> + }
> +}
> +
> u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> {
> union hv_ghcb *hv_ghcb;
> @@ -356,6 +377,63 @@ static bool hv_is_private_mmio(u64 addr)
> return false;
> }
>
> +__init void hv_sev_init_mem_and_cpu(void)
> +{
> + struct memory_map_entry *entry;
> + struct e820_entry *e820_entry;
> + u64 e820_end;
> + u64 ram_end;
> + u64 page;
> +
> + /*
> + * Hyper-V enlightened snp guest boots kernel
> + * directly without bootloader and so roms,
> + * bios regions and reserve resources are not
> + * available. Set these callback to NULL.
> + */
> + x86_platform.legacy.reserve_bios_regions = 0;
> + x86_init.resources.probe_roms = x86_init_noop;
> + x86_init.resources.reserve_resources = x86_init_noop;
> + x86_init.mpparse.find_smp_config = x86_init_noop;
> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
> +
> + /*
> + * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
> + * and legacy APIC page read/write. Switch to hv apic here.
> + */
> + disable_ioapic_support();
> +
> + /* Read processor number and memory layout. */
> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
> + + sizeof(struct memory_map_entry));

Why is the first map entry being skipped?

> +
> + /*
> + * E820 table in the memory just describes memory for
> + * kernel, ACPI table, cmdline, boot params and ramdisk.
> + * Hyper-V popoulates the rest memory layout in the EN_SEV_
> + * SNP_PROCESSOR_INFO_ADDR.
> + */
> + for (; entry->numpages != 0; entry++) {
> + e820_entry = &e820_table->entries[
> + e820_table->nr_entries - 1];
> + e820_end = e820_entry->addr + e820_entry->size;
> + ram_end = (entry->starting_gpn +
> + entry->numpages) * PAGE_SIZE;
> +
> + if (e820_end < entry->starting_gpn * PAGE_SIZE)
> + e820_end = entry->starting_gpn * PAGE_SIZE;
> +
> + if (e820_end < ram_end) {
> + pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
> + e820__range_add(e820_end, ram_end - e820_end,
> + E820_TYPE_RAM);
> + for (page = e820_end; page < ram_end; page += PAGE_SIZE)
> + pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
> + }
> + }
> +}
> +
> void __init hv_vtom_init(void)
> {
> /*
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 3c15e23162e7..a4a59007b5f2 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -41,6 +41,20 @@ extern bool hv_isolation_type_en_snp(void);
>
> extern union hv_ghcb * __percpu *hv_ghcb_pg;
>
> +/*
> + * Hyper-V puts processor and memory layout info
> + * to this address in SEV-SNP enlightened guest.
> + */
> +#define EN_SEV_SNP_PROCESSOR_INFO_ADDR 0x802000
> +
> +struct memory_map_entry {
> + u64 starting_gpn;
> + u64 numpages;
> + u16 type;
> + u16 flags;
> + u32 reserved;
> +};
> +
> int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> @@ -246,12 +260,14 @@ void hv_ghcb_msr_read(u64 msr, u64 *value);
> bool hv_ghcb_negotiate_protocol(void);
> void hv_ghcb_terminate(unsigned int set, unsigned int reason);
> void hv_vtom_init(void);
> +void hv_sev_init_mem_and_cpu(void);
> #else
> static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
> static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
> static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
> static inline void hv_vtom_init(void) {}
> +static inline void hv_sev_init_mem_and_cpu(void) {}
> #endif
>
> extern bool hv_isolation_type_snp(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 2f2dcb2370b6..71820bbf9e90 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -529,6 +529,9 @@ static void __init ms_hyperv_init_platform(void)
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
>
> + if (hv_isolation_type_en_snp())
> + hv_sev_init_mem_and_cpu();

This looks good now. Moving the code into a separate function in
ivm.c works well.

> +
> hardlockup_detector_disable();
> }
>
> --
> 2.25.1

2023-04-12 15:01:15

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 10/17] x86/hyperv: Add smp support for sev-snp guest

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>
> The wakeup_secondary_cpu callback was populated with wakeup_
> cpu_via_vmgexit() which doesn't work for Hyper-V and Hyper-V
> requires to call Hyper-V specific hvcall to start APs. So override
> it with Hyper-V specific hook to start AP sev_es_save_area data
> structure.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change sicne RFC v3:
> * Replace struct sev_es_save_area with struct
> vmcb_save_area
> * Move code from mshyperv.c to ivm.c
>
> Change since RFC v2:
> * Add helper function to initialize segment
> * Fix some coding style
> ---
> arch/x86/hyperv/ivm.c | 89 +++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 18 +++++++
> arch/x86/include/asm/sev.h | 13 +++++
> arch/x86/include/asm/svm.h | 15 +++++-
> arch/x86/kernel/cpu/mshyperv.c | 13 ++++-
> arch/x86/kernel/sev.c | 4 +-
> include/asm-generic/hyperv-tlfs.h | 19 +++++++
> 7 files changed, 166 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index c0f3fa924163..51243148b8e6 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -22,11 +22,15 @@
> #include <asm/sev.h>
> #include <asm/realmode.h>
> #include <asm/e820/api.h>
> +#include <asm/desc.h>
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> #define GHCB_USAGE_HYPERV_CALL 1
>
> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
> +static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);

Just a question: ap_start_stack is a static variable that gets used as the
starting stack for every AP. So obviously, once each AP is started, we must
be sure that the AP moves off the ap_start_stack before the next AP is
started. How is that synchronization done? I see that do_boot_cpu() is
where the wakeup_secondary_cpu() function is called. Then there's
some waiting until the AP completes "initial initialization" per the
comment in the code. Is there where we know that the AP is no
longer using ap_start_stack?

> +
> union hv_ghcb {
> struct ghcb ghcb;
> struct {
> @@ -437,6 +441,91 @@ __init void hv_sev_init_mem_and_cpu(void)
> }
> }
>
> +#define hv_populate_vmcb_seg(seg, gdtr_base) \
> +do { \
> + if (seg.selector) { \
> + seg.base = 0; \
> + seg.limit = HV_AP_SEGMENT_LIMIT; \
> + seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5); \
> + seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
> + } \
> +} while (0) \
> +
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip)
> +{
> + struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
> + __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + struct desc_ptr gdtr;
> + u64 ret, retry = 5;
> + struct hv_start_virtual_processor_input *start_vp_input;
> + union sev_rmp_adjust rmp_adjust;
> + unsigned long flags;
> +
> + native_store_gdt(&gdtr);
> +
> + vmsa->gdtr.base = gdtr.address;
> + vmsa->gdtr.limit = gdtr.size;
> +
> + asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> + hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
> +
> + asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> + hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> + hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
> +
> + asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> + hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
> +
> + vmsa->efer = native_read_msr(MSR_EFER);
> +
> + asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
> + asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
> + asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
> +
> + vmsa->xcr0 = 1;
> + vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
> + vmsa->rip = (u64)secondary_startup_64_no_verify;
> + vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
> +
> + vmsa->sev_features.snp = 1;
> + vmsa->sev_features.restrict_injection = 1;
> +
> + rmp_adjust.as_uint64 = 0;
> + rmp_adjust.target_vmpl = 1;
> + rmp_adjust.vmsa = 1;
> + ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
> + rmp_adjust.as_uint64);
> + if (ret != 0) {
> + pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
> + return ret;
> + }
> +
> + local_irq_save(flags);
> + start_vp_input =
> + (struct hv_start_virtual_processor_input *)ap_start_input_arg;
> + memset(start_vp_input, 0, sizeof(*start_vp_input));
> + start_vp_input->partitionid = -1;
> + start_vp_input->vpindex = cpu;
> + start_vp_input->targetvtl = ms_hyperv.vtl;
> + *(u64 *)&start_vp_input->context[0] = __pa(vmsa) | 1;
> +
> + do {
> + ret = hv_do_hypercall(HVCALL_START_VIRTUAL_PROCESSOR,
> + start_vp_input, NULL);
> + } while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
> +
> + if (!hv_result_success(ret)) {
> + pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
> + goto done;
> + }
> +
> +done:
> + local_irq_restore(flags);
> + return ret;
> +}
> +
> void __init hv_vtom_init(void)
> {
> /*
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index a4a59007b5f2..e23c987deb7a 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -55,6 +55,20 @@ struct memory_map_entry {
> u32 reserved;
> };
>
> +/*
> + * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> + * to start AP in enlightened SEV guest.
> + */
> +#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT 0xffffffff
> +
> +/*
> + * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> + * to start AP in enlightened SEV guest.
> + */
> +#define HV_AP_INIT_GPAT_DEFAULT 0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT 0xffffffff

This code looks like it is erroneously added twice.

> +
> int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> @@ -253,6 +267,8 @@ struct irq_domain *hv_create_pci_msi_domain(void);
> int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> struct hv_interrupt_entry *entry);
> int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> +int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> void hv_ghcb_msr_write(u64 msr, u64 value);
> @@ -261,6 +277,7 @@ bool hv_ghcb_negotiate_protocol(void);
> void hv_ghcb_terminate(unsigned int set, unsigned int reason);
> void hv_vtom_init(void);
> void hv_sev_init_mem_and_cpu(void);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
> #else
> static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
> static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> @@ -268,6 +285,7 @@ static inline bool hv_ghcb_negotiate_protocol(void) { return
> false; }
> static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
> static inline void hv_vtom_init(void) {}
> static inline void hv_sev_init_mem_and_cpu(void) {}
> +static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
> #endif
>
> extern bool hv_isolation_type_snp(void);
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index ebc271bb6d8e..e34aaf730220 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -86,6 +86,19 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>
> #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
>
> +union sev_rmp_adjust {
> + u64 as_uint64;
> + struct {
> + unsigned long target_vmpl : 8;
> + unsigned long enable_read : 1;
> + unsigned long enable_write : 1;
> + unsigned long enable_user_execute : 1;
> + unsigned long enable_kernel_execute : 1;
> + unsigned long reserved1 : 4;
> + unsigned long vmsa : 1;
> + };
> +};
> +
> /* SNP Guest message request */
> struct snp_req_data {
> unsigned long req_gpa;
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index cb1ee53ad3b1..bcf970901512 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -423,7 +423,20 @@ struct sev_es_save_area {
> u64 guest_exit_info_2;
> u64 guest_exit_int_info;
> u64 guest_nrip;
> - u64 sev_features;
> + union {
> + struct {
> + u64 snp : 1;
> + u64 vtom : 1;
> + u64 reflectvc : 1;
> + u64 restrict_injection : 1;
> + u64 alternate_injection : 1;
> + u64 full_debug : 1;
> + u64 reserved1 : 1;
> + u64 snpbtb_isolation : 1;
> + u64 resrved2 : 56;
> + };
> + u64 val;
> + } sev_features;
> u64 vintr_ctrl;
> u64 guest_exit_code;
> u64 virtual_tom;
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 71820bbf9e90..829234ba8da5 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -300,6 +300,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
>
> native_smp_prepare_cpus(max_cpus);
>
> + /*
> + * Override wakeup_secondary_cpu callback for SEV-SNP
> + * enlightened guest.
> + */
> + if (hv_isolation_type_en_snp())
> + apic->wakeup_secondary_cpu = hv_snp_boot_ap;

In another patch set for starting CPUs at VTL2, there was some discussion
about using wakeup_secondary_cpu vs. wakeup_secondary_cpu_64. The
decision was to use wakeup_secondary_cpu_64. See
https://lore.kernel.org/linux-hyperv/[email protected]/T/#u

I'm not sure of all the tradeoffs, but I wonder if wakeup_secondary_cpu_64
should be used here.

> +
> + if (!hv_root_partition)
> + return;
> +
> #ifdef CONFIG_X86_64
> for_each_present_cpu(i) {
> if (i == 0)
> @@ -503,8 +513,7 @@ static void __init ms_hyperv_init_platform(void)
>
> # ifdef CONFIG_SMP
> smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> - if (hv_root_partition)
> - smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> # endif
>
> /*
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 679026a640ef..b3f95fcb8b18 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -1080,7 +1080,7 @@ static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
> * SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
> */
> vmsa->vmpl = 0;
> - vmsa->sev_features = sev_status >> 2;
> + vmsa->sev_features.val = sev_status >> 2;
>
> /* Switch the page over to a VMSA page now that it is initialized */
> ret = snp_set_vmsa(vmsa, true);
> @@ -1097,7 +1097,7 @@ static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
> ghcb = __sev_get_ghcb(&state);
>
> vc_ghcb_invalidate(ghcb);
> - ghcb_set_rax(ghcb, vmsa->sev_features);
> + ghcb_set_rax(ghcb, vmsa->sev_features.val);
> ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION);
> ghcb_set_sw_exit_info_1(ghcb, ((u64)apic_id << 32) | SVM_VMGEXIT_AP_CREATE);
> ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa));
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index ea406e901469..b2f14aa608f7 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -148,6 +148,7 @@ union hv_reference_tsc_msr {
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
> #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
> #define HVCALL_SEND_IPI 0x000b
> +#define HVCALL_ENABLE_VP_VTL 0x000f
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
> #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
> #define HVCALL_SEND_IPI_EX 0x0015
> @@ -165,6 +166,7 @@ union hv_reference_tsc_msr {
> #define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
> #define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
> #define HVCALL_RETARGET_INTERRUPT 0x007e
> +#define HVCALL_START_VIRTUAL_PROCESSOR 0x0099
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> @@ -220,6 +222,7 @@ enum HV_GENERIC_SET_FORMAT {
> #define HV_STATUS_INVALID_PORT_ID 17
> #define HV_STATUS_INVALID_CONNECTION_ID 18
> #define HV_STATUS_INSUFFICIENT_BUFFERS 19
> +#define HV_STATUS_TIME_OUT 0x78
>
> /*
> * The Hyper-V TimeRefCount register and the TSC
> @@ -779,6 +782,22 @@ struct hv_input_unmap_device_interrupt {
> struct hv_interrupt_entry interrupt_entry;
> } __packed;
>
> +struct hv_enable_vp_vtl_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];
> +} __packed;
> +
> +struct hv_start_virtual_processor_input {
> + u64 partitionid;
> + u32 vpindex;
> + u8 targetvtl;
> + u8 padding[3];
> + u8 context[0xe0];
> +} __packed;
> +
> #define HV_SOURCE_SHADOW_NONE 0x0
> #define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1
>
> --
> 2.25.1

2023-04-12 15:04:12

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 14/17] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

From: Tianyu Lan <[email protected]> Sent: Monday, April 3, 2023 10:44 AM
>

The patch subject prefix of "x86/hyperv/sev:" doesn't make sense.
There's no pathname like that in the kernel code. I think it should just be
"x86/sev:".

> Enable #HV exception to handle interrupt requests from hypervisor.
>
> Co-developed-by: Lendacky Thomas <[email protected]>
> Co-developed-by: Kalra Ashish <[email protected]>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since RFC V3:
> * Check NMI event when irq is disabled.
> * Remove redundant variable
> ---
> arch/x86/include/asm/mem_encrypt.h | 2 +
> arch/x86/include/uapi/asm/svm.h | 4 +
> arch/x86/kernel/sev.c | 314 ++++++++++++++++++++++++-----
> arch/x86/kernel/traps.c | 2 +
> 4 files changed, 266 insertions(+), 56 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h
> b/arch/x86/include/asm/mem_encrypt.h
> index b7126701574c..9299caeca69f 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long
> vaddr, int npages,
> void __init mem_encrypt_free_decrypted_mem(void);
>
> void __init sev_es_init_vc_handling(void);
> +void __init sev_snp_init_hv_handling(void);
>
> #define __bss_decrypted __section(".bss..decrypted")
>
> @@ -73,6 +74,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params
> *bp) { }
> static inline void __init sme_enable(struct boot_params *bp) { }
>
> static inline void sev_es_init_vc_handling(void) { }
> +static inline void sev_snp_init_hv_handling(void) { }
>
> static inline int __init
> early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 80e1df482337..828d624a38cf 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -115,6 +115,10 @@
> #define SVM_VMGEXIT_AP_CREATE_ON_INIT 0
> #define SVM_VMGEXIT_AP_CREATE 1
> #define SVM_VMGEXIT_AP_DESTROY 2
> +#define SVM_VMGEXIT_HV_DOORBELL_PAGE 0x80000014
> +#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE 0
> +#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE 1
> +#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE 2
> #define SVM_VMGEXIT_HV_FEATURES 0x8000fffd
> #define SVM_VMGEXIT_TERM_REQUEST 0x8000fffe
> #define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code) \
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 6445f5356c45..7fcb3b548215 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -122,6 +122,152 @@ struct sev_config {
>
> static struct sev_config sev_cfg __read_mostly;
>
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
> +static noinstr void __sev_put_ghcb(struct ghcb_state *state);
> +static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
> +
> +union hv_pending_events {
> + u16 events;
> + struct {
> + u8 vector;
> + u8 nmi : 1;
> + u8 mc : 1;
> + u8 reserved1 : 5;
> + u8 no_further_signal : 1;
> + };
> +};
> +
> +struct sev_hv_doorbell_page {
> + union hv_pending_events pending_events;
> + u8 no_eoi_required;
> + u8 reserved2[61];
> + u8 padding[4032];
> +};
> +
> +struct sev_snp_runtime_data {
> + struct sev_hv_doorbell_page hv_doorbell_page;
> +};
> +
> +static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
> +
> +static inline u64 sev_es_rd_ghcb_msr(void)
> +{
> + return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> +}
> +
> +static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> +{
> + u32 low, high;
> +
> + low = (u32)(val);
> + high = (u32)(val >> 32);
> +
> + native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +}
> +
> +struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
> +{
> + return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
> +}
> +
> +static u8 sev_hv_pending(void)
> +{
> + return sev_snp_current_doorbell_page()->pending_events.events;
> +}
> +
> +#define sev_hv_pending_nmi \
> + sev_snp_current_doorbell_page()->pending_events.nmi
> +
> +static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
> +{
> + if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
> + return;
> +
> + BUG_ON(reg != APIC_EOI);
> + apic->write(reg, val);
> +}
> +
> +static void do_exc_hv(struct pt_regs *regs)
> +{
> + union hv_pending_events pending_events;
> +
> + while (sev_hv_pending()) {
> + pending_events.events = xchg(
> + &sev_snp_current_doorbell_page()->pending_events.events,
> + 0);
> +
> + if (pending_events.nmi)
> + exc_nmi(regs);
> +
> +#ifdef CONFIG_X86_MCE
> + if (pending_events.mc)
> + exc_machine_check(regs);
> +#endif
> +
> + if (!pending_events.vector)
> + return;
> +
> + if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
> + /* Exception vectors */
> + WARN(1, "exception shouldn't happen\n");
> + } else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
> + sysvec_irq_move_cleanup(regs);
> + } else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
> + WARN(1, "syscall shouldn't happen\n");
> + } else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
> + switch (pending_events.vector) {
> +#if IS_ENABLED(CONFIG_HYPERV)
> + case HYPERV_STIMER0_VECTOR:
> + sysvec_hyperv_stimer0(regs);
> + break;
> + case HYPERVISOR_CALLBACK_VECTOR:
> + sysvec_hyperv_callback(regs);
> + break;
> +#endif
> +#ifdef CONFIG_SMP
> + case RESCHEDULE_VECTOR:
> + sysvec_reschedule_ipi(regs);
> + break;
> + case IRQ_MOVE_CLEANUP_VECTOR:
> + sysvec_irq_move_cleanup(regs);
> + break;
> + case REBOOT_VECTOR:
> + sysvec_reboot(regs);
> + break;
> + case CALL_FUNCTION_SINGLE_VECTOR:
> + sysvec_call_function_single(regs);
> + break;
> + case CALL_FUNCTION_VECTOR:
> + sysvec_call_function(regs);
> + break;
> +#endif
> +#ifdef CONFIG_X86_LOCAL_APIC
> + case ERROR_APIC_VECTOR:
> + sysvec_error_interrupt(regs);
> + break;
> + case SPURIOUS_APIC_VECTOR:
> + sysvec_spurious_apic_interrupt(regs);
> + break;
> + case LOCAL_TIMER_VECTOR:
> + sysvec_apic_timer_interrupt(regs);
> + break;
> + case X86_PLATFORM_IPI_VECTOR:
> + sysvec_x86_platform_ipi(regs);
> + break;
> +#endif
> + case 0x0:
> + break;
> + default:
> + panic("Unexpected vector %d\n", vector);
> + unreachable();
> + }
> + } else {
> + common_interrupt(regs, pending_events.vector);
> + }
> + }
> +}
> +
> static __always_inline bool on_vc_stack(struct pt_regs *regs)
> {
> unsigned long sp = regs->sp;
> @@ -179,18 +325,19 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
> }
>
> -static void do_exc_hv(struct pt_regs *regs)
> -{
> - /* Handle #HV exception. */
> -}
> -
> void check_hv_pending(struct pt_regs *regs)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> return;
>
> - if ((regs->flags & X86_EFLAGS_IF) == 0)
> + /* Handle NMI when irq is disabled. */
> + if ((regs->flags & X86_EFLAGS_IF) == 0) {
> + if (sev_hv_pending_nmi) {
> + exc_nmi(regs);
> + sev_hv_pending_nmi = 0;
> + }
> return;
> + }
>
> do_exc_hv(regs);
> }
> @@ -231,68 +378,35 @@ void noinstr __sev_es_ist_exit(void)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
> }
>
> -/*
> - * Nothing shall interrupt this code path while holding the per-CPU
> - * GHCB. The backup GHCB is only for NMIs interrupting this path.
> - *
> - * Callers must disable local interrupts around it.
> - */
> -static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +static bool sev_restricted_injection_enabled(void)
> +{
> + return sev_status & MSR_AMD64_SNP_RESTRICTED_INJ;
> +}
> +
> +void __init sev_snp_init_hv_handling(void)
> {
> struct sev_es_runtime_data *data;
> + struct ghcb_state state;
> struct ghcb *ghcb;
> + unsigned long flags;
>
> WARN_ON(!irqs_disabled());
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) ||
> !sev_restricted_injection_enabled())
> + return;
>
> data = this_cpu_read(runtime_data);
> - ghcb = &data->ghcb_page;
> -
> - if (unlikely(data->ghcb_active)) {
> - /* GHCB is already in use - save its contents */
> -
> - if (unlikely(data->backup_ghcb_active)) {
> - /*
> - * Backup-GHCB is also already in use. There is no way
> - * to continue here so just kill the machine. To make
> - * panic() work, mark GHCBs inactive so that messages
> - * can be printed out.
> - */
> - data->ghcb_active = false;
> - data->backup_ghcb_active = false;
> -
> - instrumentation_begin();
> - panic("Unable to handle #VC exception! GHCB and Backup
> GHCB are already in use");
> - instrumentation_end();
> - }
> -
> - /* Mark backup_ghcb active before writing to it */
> - data->backup_ghcb_active = true;
>
> - state->ghcb = &data->backup_ghcb;
> + local_irq_save(flags);
>
> - /* Backup GHCB content */
> - *state->ghcb = *ghcb;
> - } else {
> - state->ghcb = NULL;
> - data->ghcb_active = true;
> - }
> + ghcb = __sev_get_ghcb(&state);
>
> - return ghcb;
> -}
> + sev_snp_setup_hv_doorbell_page(ghcb);
>
> -static inline u64 sev_es_rd_ghcb_msr(void)
> -{
> - return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> -}
> -
> -static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> -{
> - u32 low, high;
> + __sev_put_ghcb(&state);
>
> - low = (u32)(val);
> - high = (u32)(val >> 32);
> + apic_set_eoi_write(hv_doorbell_apic_eoi_write);
>
> - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> + local_irq_restore(flags);
> }
>
> static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
> @@ -553,6 +667,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb
> *ghcb, struct es_em_ctxt
> /* Include code shared with pre-decompression boot stage */
> #include "sev-shared.c"
>
> +/*
> + * Nothing shall interrupt this code path while holding the per-CPU
> + * GHCB. The backup GHCB is only for NMIs interrupting this path.
> + *
> + * Callers must disable local interrupts around it.
> + */
> +static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
> +{
> + struct sev_es_runtime_data *data;
> + struct ghcb *ghcb;
> +
> + WARN_ON(!irqs_disabled());
> +
> + data = this_cpu_read(runtime_data);
> + ghcb = &data->ghcb_page;
> +
> + if (unlikely(data->ghcb_active)) {
> + /* GHCB is already in use - save its contents */
> +
> + if (unlikely(data->backup_ghcb_active)) {
> + /*
> + * Backup-GHCB is also already in use. There is no way
> + * to continue here so just kill the machine. To make
> + * panic() work, mark GHCBs inactive so that messages
> + * can be printed out.
> + */
> + data->ghcb_active = false;
> + data->backup_ghcb_active = false;
> +
> + instrumentation_begin();
> + panic("Unable to handle #VC exception! GHCB and Backup
> GHCB are already in use");
> + instrumentation_end();
> + }
> +
> + /* Mark backup_ghcb active before writing to it */
> + data->backup_ghcb_active = true;
> +
> + state->ghcb = &data->backup_ghcb;
> +
> + /* Backup GHCB content */
> + *state->ghcb = *ghcb;
> + } else {
> + state->ghcb = NULL;
> + data->ghcb_active = true;
> + }
> +
> + return ghcb;
> +}
> +
> +static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
> +{
> + u64 pa;
> + enum es_result ret;
> +
> + pa = __pa(sev_snp_current_doorbell_page());
> + vc_ghcb_invalidate(ghcb);
> + ret = vmgexit_hv_doorbell_page(ghcb,
> + SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
> + pa);
> + if (ret != ES_OK)
> + panic("SEV-SNP: failed to set up #HV doorbell page");
> +}
> +
> static noinstr void __sev_put_ghcb(struct ghcb_state *state)
> {
> struct sev_es_runtime_data *data;
> @@ -1281,6 +1458,7 @@ static void snp_register_per_cpu_ghcb(void)
> ghcb = &data->ghcb_page;
>
> snp_register_ghcb_early(__pa(ghcb));
> + sev_snp_setup_hv_doorbell_page(ghcb);
> }
>
> void setup_ghcb(void)
> @@ -1320,6 +1498,11 @@ void setup_ghcb(void)
> snp_register_ghcb_early(__pa(&boot_ghcb_page));
> }
>
> +int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
> +{
> + return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE,
> op, pa);
> +}
> +
> #ifdef CONFIG_HOTPLUG_CPU
> static void sev_es_ap_hlt_loop(void)
> {
> @@ -1393,6 +1576,7 @@ static void __init alloc_runtime_data(int cpu)
> static void __init init_ghcb(int cpu)
> {
> struct sev_es_runtime_data *data;
> + struct sev_snp_runtime_data *snp_data;
> int err;
>
> data = per_cpu(runtime_data, cpu);
> @@ -1404,6 +1588,19 @@ static void __init init_ghcb(int cpu)
>
> memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));
>
> + snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
> + if (!snp_data)
> + panic("Can't allocate SEV-SNP runtime data");
> +
> + err = early_set_memory_decrypted((unsigned long)&snp_data-
> >hv_doorbell_page,
> + sizeof(snp_data->hv_doorbell_page));
> + if (err)
> + panic("Can't map #HV doorbell pages unencrypted");
> +
> + memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data-
> >hv_doorbell_page));
> +
> + per_cpu(snp_runtime_data, cpu) = snp_data;
> +
> data->ghcb_active = false;
> data->backup_ghcb_active = false;
> }
> @@ -2044,7 +2241,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
>
> static bool hv_raw_handle_exception(struct pt_regs *regs)
> {
> - return false;
> + /* Clear the no_further_signal bit */
> + sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
> +
> + check_hv_pending(regs);
> +
> + return true;
> }
>
> static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index d29debec8134..1aa6cab2394b 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1503,5 +1503,7 @@ void __init trap_init(void)
> cpu_init_exception_handling();
> /* Setup traps as cpu_init() might #GP */
> idt_setup_traps();
> + sev_snp_init_hv_handling();
> +
> cpu_init();
> }
> --
> 2.25.1

2023-04-12 15:59:50

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

On 4/3/23 10:43, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Read processor amd memory info from specific address which are
> populated by Hyper-V. Initialize smp cpu related ops, pvalidate
> system memory and add it into e820 table.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/hyperv/ivm.c | 78 +++++++++++++++++++++++++++++++++
> arch/x86/include/asm/mshyperv.h | 16 +++++++
> arch/x86/kernel/cpu/mshyperv.c | 3 ++
> 3 files changed, 97 insertions(+)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 368b2731950e..fa4de2761460 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -17,6 +17,11 @@
> #include <asm/mem_encrypt.h>
> #include <asm/mshyperv.h>
> #include <asm/hypervisor.h>
> +#include <asm/coco.h>
> +#include <asm/io_apic.h>
> +#include <asm/sev.h>
> +#include <asm/realmode.h>
> +#include <asm/e820/api.h>
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> @@ -57,6 +62,22 @@ union hv_ghcb {
>
> static u16 hv_ghcb_version __ro_after_init;
>
> +static u32 processor_count;
> +
> +static __init void hv_snp_get_smp_config(unsigned int early)
> +{
> + if (!early) {
> + while (num_processors < processor_count) {
> + early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
> + early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
> + physid_set(num_processors, phys_cpu_present_map);
> + set_cpu_possible(num_processors, true);
> + set_cpu_present(num_processors, true);
> + num_processors++;
> + }
> + }
> +}

Folks, please minimize indentation:

if (early)
return;

It would also be nice to see *some* explanation in the changelog or
comments about why it's best and correct to just do nothing if early==1.

Also, this _consumes_ data from hv_sev_init_mem_and_cpu(). It would
make more sense to me to have them ordered the other way.
hv_sev_init_mem_and_cpu() first, this second.

> u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> {
> union hv_ghcb *hv_ghcb;
> @@ -356,6 +377,63 @@ static bool hv_is_private_mmio(u64 addr)
> return false;
> }
>
> +__init void hv_sev_init_mem_and_cpu(void)
> +{
> + struct memory_map_entry *entry;
> + struct e820_entry *e820_entry;
> + u64 e820_end;
> + u64 ram_end;
> + u64 page;
> +
> + /*
> + * Hyper-V enlightened snp guest boots kernel
> + * directly without bootloader and so roms,
> + * bios regions and reserve resources are not
> + * available. Set these callback to NULL.
> + */
> + x86_platform.legacy.reserve_bios_regions = 0;
> + x86_init.resources.probe_roms = x86_init_noop;
> + x86_init.resources.reserve_resources = x86_init_noop;
> + x86_init.mpparse.find_smp_config = x86_init_noop;
> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;

This is one of those places that vertical alignment adds clarity:

> + x86_init.resources.probe_roms = x86_init_noop;
> + x86_init.resources.reserve_resources = x86_init_noop;
> + x86_init.mpparse.find_smp_config = x86_init_noop;
> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;

See? 3 noops and only one actual implemented function. Clear as day now.

> + /*> + * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
> + * and legacy APIC page read/write. Switch to hv apic here.
> + */
> + disable_ioapic_support();

Do these systems have X86_FEATURE_APIC set? Why is this needed in
addition to the architectural enumeration that already exists?

Is there any other place in the kernel that has this one-off disabling
of the APIC?

> + /* Read processor number and memory layout. */
> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
> + + sizeof(struct memory_map_entry));

Ick.

There are a lot of ways to do this. But, this is an awfully ugly way.

struct snp_processor_info {
u32 processor_count;
struct memory_map_entry[] entries;
}

struct snp_processor_info *snp_pi =
__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
processor_count = snp_pi->processor_count;

Then, have your for() loop through snp_pi->entries;

Actually, I'm not _quite_ sure that processor_count and entries are next
to each other. But, either way, I do think a struct makes sense.

Also, what guarantees that EN_SEV_SNP_PROCESSOR_INFO_ADDR is mapped?
It's up above 8MB which I don't remember off the top of my head as being
a special address.

> + /*
> + * E820 table in the memory just describes memory for
> + * kernel, ACPI table, cmdline, boot params and ramdisk.
> + * Hyper-V popoulates the rest memory layout in the EN_SEV_
> + * SNP_PROCESSOR_INFO_ADDR.
> + */

Really? That is not very cool. We need a better explanation of why
there was no way to use the decades-old e820 or EFI memory map and why
this needs to be a special snowflake.

> + for (; entry->numpages != 0; entry++) {
> + e820_entry = &e820_table->entries[
> + e820_table->nr_entries - 1];
> + e820_end = e820_entry->addr + e820_entry->size;
> + ram_end = (entry->starting_gpn +
> + entry->numpages) * PAGE_SIZE;
> +
> + if (e820_end < entry->starting_gpn * PAGE_SIZE)
> + e820_end = entry->starting_gpn * PAGE_SIZE;
> +
> + if (e820_end < ram_end) {
> + pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
> + e820__range_add(e820_end, ram_end - e820_end,
> + E820_TYPE_RAM);
> + for (page = e820_end; page < ram_end; page += PAGE_SIZE)
> + pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
> + }
> + }
> +}

Oh, is this just about having a pre-accepted area and a non-accepted
area? Is this basically another one-off implementation of unaccepted
memory ... that doesn't use the EFI standard?

> void __init hv_vtom_init(void)
> {
> /*
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 3c15e23162e7..a4a59007b5f2 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -41,6 +41,20 @@ extern bool hv_isolation_type_en_snp(void);
>
> extern union hv_ghcb * __percpu *hv_ghcb_pg;
>
> +/*
> + * Hyper-V puts processor and memory layout info
> + * to this address in SEV-SNP enlightened guest.
> + */
> +#define EN_SEV_SNP_PROCESSOR_INFO_ADDR 0x802000
> +
> +struct memory_map_entry {
> + u64 starting_gpn;
> + u64 numpages;
> + u16 type;
> + u16 flags;
> + u32 reserved;
> +};
> +
> int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
> @@ -246,12 +260,14 @@ void hv_ghcb_msr_read(u64 msr, u64 *value);
> bool hv_ghcb_negotiate_protocol(void);
> void hv_ghcb_terminate(unsigned int set, unsigned int reason);
> void hv_vtom_init(void);
> +void hv_sev_init_mem_and_cpu(void);
> #else
> static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
> static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
> static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
> static inline void hv_vtom_init(void) {}
> +static inline void hv_sev_init_mem_and_cpu(void) {}
> #endif
>
> extern bool hv_isolation_type_snp(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 2f2dcb2370b6..71820bbf9e90 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -529,6 +529,9 @@ static void __init ms_hyperv_init_platform(void)
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
>
> + if (hv_isolation_type_en_snp())
> + hv_sev_init_mem_and_cpu();
> +
> hardlockup_detector_disable();
> }
>

2023-04-13 03:31:20

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 03/17] x86/hyperv: Set Virtual Trust Level in VMBus init message

On 4/12/2023 10:24 PM, Michael Kelley (LINUX) wrote:
>> +static u8 __init get_vtl(void)
>> +{
>> + u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
>> + struct hv_get_vp_registers_input *input;
>> + struct hv_get_vp_registers_output *output;
>> + u64 vtl = 0;
>> + u64 ret;
>> + unsigned long flags;
>> +
>> + local_irq_save(flags);
>> + input = *(struct hv_get_vp_registers_input **)this_cpu_ptr(hyperv_pcpu_input_arg);
> The cast to struct hv_get_vp_registers_input isn't needed here. Just do:
>
> input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>
> I know we have other code that references hyperv_pcpu_input_arg with a more
> complicated code sequence, but it's really not necessary. At some point, we
> should go back and clean those up, but let's not add any new cases. ????


Hi Michael:
Thanks for your review. Yes, agree. I just follow the old coding style
and will update.

>
>
>> + output = (struct hv_get_vp_registers_output *)input;
>> + if (!input) {
>> + local_irq_restore(flags);
>> + goto done;
>> + }
>> +
>> + memset(input, 0, sizeof(*input) + sizeof(input->element[0]));
> Use struct_size() to calculate the size of a structure plus a trailing
> variable size array.
>
>> + input->header.partitionid = HV_PARTITION_ID_SELF;
>> + input->header.vpindex = HV_VP_INDEX_SELF;
>> + input->header.inputvtl = 0;
>> + input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
>> +
>> + ret = hv_do_hypercall(control, input, output);
>> + if (hv_result_success(ret))
>> + vtl = output->as64.low & HV_X64_VTL_MASK;
>> + else
>> + pr_err("Hyper-V: failed to get VTL!");
> Let's include the hypercall status in the failure message. If a failure ever
> happens, we will really want to know what that status is. ????
>

Agree. Will update.

>> rv;
>> extern bool hv_nested;
>> @@ -58,6 +59,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
>> extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
>> extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
>> extern bool hv_isolation_type_snp(void);
>> +extern bool hv_isolation_type_en_snp(void);
>

2023-04-14 04:44:08

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 06/17] x86/hyperv: decrypt VMBus pages for sev-snp enlightened guest

On 4/12/2023 10:32 PM, Michael Kelley (LINUX) wrote:
>> @@ -191,6 +222,15 @@ void hv_synic_free(void)
>> free_page((unsigned long)hv_cpu->synic_event_page);
>> free_page((unsigned long)hv_cpu->synic_message_page);
>> free_page((unsigned long)hv_cpu->post_msg_page);
>> +
>> + if (hv_isolation_type_en_snp()) {
>> + set_memory_encrypted((unsigned long)
>> + hv_cpu->synic_message_page, 1);
>> + set_memory_encrypted((unsigned long)
>> + hv_cpu->synic_event_page, 1);
>> + set_memory_encrypted((unsigned long)
>> + hv_cpu->post_msg_page, 1);
>> + }
> The re-encryption must be done*before* pages are freed!
>
> Furthermore, if the re-encryption fails, we should not free
> the page as it would pollute the free memory pool. The best
> we can do is leak the memory. See Patch 5 in Dexuan's
> TDX series, which does the same thing (but still doesn't
> get it quite right, per my comments).
>

You are right. The order is wrong. we should figure out a right solution
to handle such case.

2023-04-14 11:06:01

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [RFC PATCH V4 13/17] x86/sev: Add Check of #HV event in path

> Add check_hv_pending() and check_hv_pending_after_irq() to
> check queued #HV event when irq is disabled.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/entry/entry_64.S | 18 ++++++++++++++++
> arch/x86/include/asm/irqflags.h | 11 ++++++++++
> arch/x86/kernel/sev.c | 38 +++++++++++++++++++++++++++++++++
> 3 files changed, 67 insertions(+)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index d877774c3141..efa56dfde19e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1073,6 +1073,15 @@ SYM_CODE_END(paranoid_entry)
> * R15 - old SPEC_CTRL
> */
> SYM_CODE_START_LOCAL(paranoid_exit)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + /*
> + * If a #HV was delivered during execution and interrupts were
> + * disabled, then check if it can be handled before the iret
> + * (which may re-enable interrupts).
> + */
> + mov %rsp, %rdi
> + call check_hv_pending
> +#endif
> UNWIND_HINT_REGS
>
> /*
> @@ -1197,6 +1206,15 @@ SYM_CODE_START(error_entry)
> SYM_CODE_END(error_entry)
>
> SYM_CODE_START_LOCAL(error_return)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + /*
> + * If a #HV was delivered during execution and interrupts were
> + * disabled, then check if it can be handled before the iret
> + * (which may re-enable interrupts).
> + */
> + mov %rsp, %rdi
> + call check_hv_pending
> +#endif
> UNWIND_HINT_REGS
> DEBUG_ENTRY_ASSERT_IRQS_OFF
> testb $3, CS(%rsp)
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index 8c5ae649d2df..8368e3fe2d36 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -11,6 +11,10 @@
> /*
> * Interrupt control:
> */
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +void check_hv_pending(struct pt_regs *regs);
> +void check_hv_pending_irq_enable(void);
> +#endif
>
> /* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
> extern inline unsigned long native_save_fl(void);
> @@ -40,12 +44,19 @@ static __always_inline void native_irq_disable(void)
> static __always_inline void native_irq_enable(void)
> {
> asm volatile("sti": : :"memory");
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + check_hv_pending_irq_enable();
> +#endif
> }
>
> static __always_inline void native_safe_halt(void)
> {
> mds_idle_clear_cpu_buffers();
> asm volatile("sti; hlt": : :"memory");
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + check_hv_pending_irq_enable();
> +#endif
> }
>
> static __always_inline void native_halt(void)
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 2684a45b50a6..6445f5356c45 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -179,6 +179,44 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
> this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
> }
>
> +static void do_exc_hv(struct pt_regs *regs)
> +{
> + /* Handle #HV exception. */
> +}
> +
> +void check_hv_pending(struct pt_regs *regs)
> +{
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> + return;
> +
> + if ((regs->flags & X86_EFLAGS_IF) == 0)
> + return;
> +
> + do_exc_hv(regs);
> +}
> +
> +void check_hv_pending_irq_enable(void)
> +{
> + struct pt_regs regs;
> +
> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> + return;
> +
> + memset(&regs, 0, sizeof(struct pt_regs));
> + asm volatile("movl %%cs, %%eax;" : "=a" (regs.cs));
> + asm volatile("movl %%ss, %%eax;" : "=a" (regs.ss));
> + regs.orig_ax = 0xffffffff;
> + regs.flags = native_save_fl();
> +
> + /*
> + * Disable irq when handle pending #HV events after
> + * re-enabling irq.
> + */
> + asm volatile("cli" : : : "memory");

Just curious, Does the hypervisor injects irqs via doorbell page when
interrupts are disabled with "cli" ? Trying to understand the need to
cli/sti covering on "do_exc_hv".

Thanks,
Pankaj

> + do_exc_hv(&regs);
> + asm volatile("sti" : : : "memory");
> +}
> +
> void noinstr __sev_es_ist_exit(void)
> {
> unsigned long ist;
> --
> 2.25.1
>

2023-04-14 16:28:26

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 10/17] x86/hyperv: Add smp support for sev-snp guest

On 4/12/2023 10:59 PM, Michael Kelley (LINUX) wrote:
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>>
>> #define GHCB_USAGE_HYPERV_CALL 1
>>
>> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
>> +static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
> Just a question: ap_start_stack is a static variable that gets used as the
> starting stack for every AP. So obviously, once each AP is started, we must
> be sure that the AP moves off the ap_start_stack before the next AP is
> started. How is that synchronization done? I see that do_boot_cpu() is
> where the wakeup_secondary_cpu() function is called. Then there's
> some waiting until the AP completes "initial initialization" per the
> comment in the code. Is there where we know that the AP is no
> longer using ap_start_stack?
>

Hi Micahel:
secondary_startup_64_no_verify() in the head_64.S initializes
a boot time stack to replace the old stack. It's very begining stage of
starting AP. The initial_stack was initialized with idle->thread.sp in
the do_boot_cpu(). The AP is started one by one in current code and so
It's safe to reuse the stack for all APs to boot up.

278 /*
279 * Setup a boot time stack - Any secondary CPU will have lost
its stack
280 * by now because the cr3-switch above unmaps the real-mode stack
281 */
282 movq initial_stack(%rip), %rsp
283

2023-04-14 16:40:47

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 13/17] x86/sev: Add Check of #HV event in path



On 4/14/2023 7:02 PM, Pankaj Gupta wrote:
>> +void check_hv_pending_irq_enable(void)
>> +{
>> + struct pt_regs regs;
>> +
>> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
>> + return;
>> +
>> + memset(&regs, 0, sizeof(struct pt_regs));
>> + asm volatile("movl %%cs, %%eax;" : "=a" (regs.cs));
>> + asm volatile("movl %%ss, %%eax;" : "=a" (regs.ss));
>> + regs.orig_ax = 0xffffffff;
>> + regs.flags = native_save_fl();
>> +
>> + /*
>> + * Disable irq when handle pending #HV events after
>> + * re-enabling irq.
>> + */
>> + asm volatile("cli" : : : "memory");
> Just curious, Does the hypervisor injects irqs via doorbell page when
> interrupts are disabled with "cli" ? Trying to understand the need to
> cli/sti covering on "do_exc_hv".


Hi Pankaj:
Thanks for your review. Yes, Hypervisor still injects #HV exception
when irq was disabled check_hv_pending() is called when
there is a #HV exception. It checks irq flag and return back without
handling irq event when irq was disabled.

2023-04-16 07:22:56

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

On 4/12/2023 10:39 PM, Michael Kelley (LINUX) wrote:
>> + /* Read processor number and memory layout. */
>> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
>> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
>> + + sizeof(struct memory_map_entry));
> Why is the first map entry being skipped?

The first entry is populated with processor count by Hyper-V.

2023-04-16 07:24:27

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 14/17] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

On 4/5/2023 1:43 PM, Gupta, Pankaj wrote:
>> +
>> +        if (!pending_events.vector)
>> +            return;
>
> This prevents do_exc_hv() executing from #hv handler as
> "hv_handling_events" never resets?
>

Nice catch! Will fix in the next version.

2023-04-16 07:36:38

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

On 4/12/2023 10:39 PM, Michael Kelley (LINUX) wrote:
>> + /* Read processor number and memory layout. */
>> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
>> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
>> + + sizeof(struct memory_map_entry));
> Why is the first map entry being skipped?

The first entry is populated with processor count by Hyper-V.

2023-04-16 08:04:21

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 14/17] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv

On 4/12/2023 11:02 PM, Michael Kelley (LINUX) wrote:
> From: Tianyu Lan<[email protected]> Sent: Monday, April 3, 2023 10:44 AM
> The patch subject prefix of "x86/hyperv/sev:" doesn't make sense.
> There's no pathname like that in the kernel code. I think it should just be
> "x86/sev:".
>

Agree. Will update in the next version.

2023-04-17 08:15:00

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [RFC PATCH V4 13/17] x86/sev: Add Check of #HV event in path

> >> +void check_hv_pending_irq_enable(void)
> >> +{
> >> + struct pt_regs regs;
> >> +
> >> + if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> >> + return;
> >> +
> >> + memset(&regs, 0, sizeof(struct pt_regs));
> >> + asm volatile("movl %%cs, %%eax;" : "=a" (regs.cs));
> >> + asm volatile("movl %%ss, %%eax;" : "=a" (regs.ss));
> >> + regs.orig_ax = 0xffffffff;
> >> + regs.flags = native_save_fl();
> >> +
> >> + /*
> >> + * Disable irq when handle pending #HV events after
> >> + * re-enabling irq.
> >> + */
> >> + asm volatile("cli" : : : "memory");
> > Just curious, Does the hypervisor injects irqs via doorbell page when
> > interrupts are disabled with "cli" ? Trying to understand the need to
> > cli/sti covering on "do_exc_hv".
>
>
> Hi Pankaj:
> Thanks for your review. Yes, Hypervisor still injects #HV exception
> when irq was disabled check_hv_pending() is called when
> there is a #HV exception. It checks irq flag and return back without
> handling irq event when irq was disabled.

o.k. Thanks for your reply! I am clear with this part.

But want to know if there is possibility when "do_exc_hv" would keep
handling irqs in the continuous while loop i.e from the update in the
hv doorbell page and that can result in DOS like scenario? Is there is already
a protection for this?

Thanks,
Pankaj

2023-04-17 12:58:14

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

From: Tianyu Lan <[email protected]> Sent: Sunday, April 16, 2023 12:21 AM
>
> On 4/12/2023 10:39 PM, Michael Kelley (LINUX) wrote:
> >> + /* Read processor number and memory layout. */
> >> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
> >> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
> >> + + sizeof(struct memory_map_entry));
> > Why is the first map entry being skipped?
>
> The first entry is populated with processor count by Hyper-V.

Perhaps add a comment to acknowledge that the behavior
is a bit unexpected:

The 0th entry in the memory layout array contains just a 32-bit
processor count. Read that value and then skip over the reminder
of the 0th entry. Start processing memory_map_entry's with array
element 1.

Michael


2023-04-18 13:37:05

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest

On 4/12/2023 11:53 PM, Dave Hansen wrote:
>>
>> +static u32 processor_count;
>> +
>> +static __init void hv_snp_get_smp_config(unsigned int early)
>> +{
>> + if (!early) {
>> + while (num_processors < processor_count) {
>> + early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
>> + early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
>> + physid_set(num_processors, phys_cpu_present_map);
>> + set_cpu_possible(num_processors, true);
>> + set_cpu_present(num_processors, true);
>> + num_processors++;
>> + }
>> + }
>> +}
> Folks, please minimize indentation:
>
> if (early)
> return;
>
> It would also be nice to see*some* explanation in the changelog or
> comments about why it's best and correct to just do nothing if early==1.
>
> Also, this_consumes_ data from hv_sev_init_mem_and_cpu(). It would
> make more sense to me to have them ordered the other way.
> hv_sev_init_mem_and_cpu() first, this second.

Hi Dave
Thanks for your review. Good suggestion! Will update in the next
verison.

>
>> u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
>> {
>> union hv_ghcb *hv_ghcb;
>> @@ -356,6 +377,63 @@ static bool hv_is_private_mmio(u64 addr)
>> return false;
>> }
>>
>> +__init void hv_sev_init_mem_and_cpu(void)
>> +{
>> + struct memory_map_entry *entry;
>> + struct e820_entry *e820_entry;
>> + u64 e820_end;
>> + u64 ram_end;
>> + u64 page;
>> +
>> + /*
>> + * Hyper-V enlightened snp guest boots kernel
>> + * directly without bootloader and so roms,
>> + * bios regions and reserve resources are not
>> + * available. Set these callback to NULL.
>> + */
>> + x86_platform.legacy.reserve_bios_regions = 0;
>> + x86_init.resources.probe_roms = x86_init_noop;
>> + x86_init.resources.reserve_resources = x86_init_noop;
>> + x86_init.mpparse.find_smp_config = x86_init_noop;
>> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
> This is one of those places that vertical alignment adds clarity:
>
>> + x86_init.resources.probe_roms = x86_init_noop;
>> + x86_init.resources.reserve_resources = x86_init_noop;
>> + x86_init.mpparse.find_smp_config = x86_init_noop;
>> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
> See? 3 noops and only one actual implemented function. Clear as day now.
>

Yes, this looks better. Will update.

>> + /*> + * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
>> + * and legacy APIC page read/write. Switch to hv apic here.
>> + */
>> + disable_ioapic_support();
> Do these systems have X86_FEATURE_APIC set? Why is this needed in
> addition to the architectural enumeration that already exists?
>

X86_FEATURE_APIC is still set. Hyper-V provides parav-virtualized local
apic interface to replace APIC page opeartion. In the SEV-SNP guest.

> Is there any other place in the kernel that has this one-off disabling
> of the APIC?

In current kernel code, ioapic support still may be disabled when there
is no MP table or ACPI MADT configuration. Please see
__apic_intr_mode_select() and disable_smp() for detial where ioapic is
disabled.

>
>> + /* Read processor number and memory layout. */
>> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
>> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
>> + + sizeof(struct memory_map_entry));
> Ick.
>
> There are a lot of ways to do this. But, this is an awfully ugly way.
>
> struct snp_processor_info {
> u32 processor_count;
> struct memory_map_entry[] entries;
> }
>
> struct snp_processor_info *snp_pi =
> __va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
> processor_count = snp_pi->processor_count;
>
> Then, have your for() loop through snp_pi->entries;
>
> Actually, I'm not_quite_ sure that processor_count and entries are next
> to each other. But, either way, I do think a struct makes sense.

Agree. Will update.

>
> Also, what guarantees that EN_SEV_SNP_PROCESSOR_INFO_ADDR is mapped?
> It's up above 8MB which I don't remember off the top of my head as being
> a special address.

This EN_SEV_SNP_PROCESSOR_INFO_ADDR is specified by hypervisor tool.
Hypervisor populates mem and cpu info to the page in the memory and
kernel may access it via adding PHYS_OFFSET_OFFSET directly.

>
>> + /*
>> + * E820 table in the memory just describes memory for
>> + * kernel, ACPI table, cmdline, boot params and ramdisk.
>> + * Hyper-V popoulates the rest memory layout in the EN_SEV_
>> + * SNP_PROCESSOR_INFO_ADDR.
>> + */
> Really? That is not very cool. We need a better explanation of why
> there was no way to use the decades-old e820 or EFI memory map and why
> this needs to be a special snowflake.

Agree. There should be a comment to describe that there is no virtual
Bios in the guest and hypervisor boots Linux kernel directly. So kernel
needs to populdate e820 tables which should be prepared by virtual Bios.

>
>> + for (; entry->numpages != 0; entry++) {
>> + e820_entry = &e820_table->entries[
>> + e820_table->nr_entries - 1];
>> + e820_end = e820_entry->addr + e820_entry->size;
>> + ram_end = (entry->starting_gpn +
>> + entry->numpages) * PAGE_SIZE;
>> +
>> + if (e820_end < entry->starting_gpn * PAGE_SIZE)
>> + e820_end = entry->starting_gpn * PAGE_SIZE;
>> +
>> + if (e820_end < ram_end) {
>> + pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
>> + e820__range_add(e820_end, ram_end - e820_end,
>> + E820_TYPE_RAM);
>> + for (page = e820_end; page < ram_end; page += PAGE_SIZE)
>> + pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
>> + }
>> + }
>> +}
> Oh, is this just about having a pre-accepted area and a non-accepted
> area? Is this basically another one-off implementation of unaccepted
> memory ... that doesn't use the EFI standard?

No, there is no virtual EFI firmware inside VM and so kernel gets mem
and vcpu info directly from Hyper-V.

2023-04-18 14:12:42

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [RFC PATCH V4 13/17] x86/sev: Add Check of #HV event in path

Hi Tianyu,


> Add check_hv_pending() and check_hv_pending_after_irq() to
> check queued #HV event when irq is disabled.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> arch/x86/entry/entry_64.S | 18 ++++++++++++++++
> arch/x86/include/asm/irqflags.h | 11 ++++++++++
> arch/x86/kernel/sev.c | 38 +++++++++++++++++++++++++++++++++
> 3 files changed, 67 insertions(+)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index d877774c3141..efa56dfde19e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1073,6 +1073,15 @@ SYM_CODE_END(paranoid_entry)
> * R15 - old SPEC_CTRL
> */
> SYM_CODE_START_LOCAL(paranoid_exit)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + /*
> + * If a #HV was delivered during execution and interrupts were
> + * disabled, then check if it can be handled before the iret
> + * (which may re-enable interrupts).
> + */
> + mov %rsp, %rdi
> + call check_hv_pending
> +#endif
> UNWIND_HINT_REGS
>
> /*
> @@ -1197,6 +1206,15 @@ SYM_CODE_START(error_entry)
> SYM_CODE_END(error_entry)
>
> SYM_CODE_START_LOCAL(error_return)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + /*
> + * If a #HV was delivered during execution and interrupts were
> + * disabled, then check if it can be handled before the iret
> + * (which may re-enable interrupts).
> + */
> + mov %rsp, %rdi
> + call check_hv_pending
> +#endif
> UNWIND_HINT_REGS
> DEBUG_ENTRY_ASSERT_IRQS_OFF
> testb $3, CS(%rsp)
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index 8c5ae649d2df..8368e3fe2d36 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -11,6 +11,10 @@
> /*
> * Interrupt control:
> */
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +void check_hv_pending(struct pt_regs *regs);
> +void check_hv_pending_irq_enable(void);
> +#endif
>
> /* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
> extern inline unsigned long native_save_fl(void);
> @@ -40,12 +44,19 @@ static __always_inline void native_irq_disable(void)
> static __always_inline void native_irq_enable(void)
> {
> asm volatile("sti": : :"memory");
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + check_hv_pending_irq_enable();
> +#endif
> }
>
> static __always_inline void native_safe_halt(void)
> {
> mds_idle_clear_cpu_buffers();
> asm volatile("sti; hlt": : :"memory");

I was able to boot the Linux guest on KVM SNP host with below [1]
change above your patch series.

Thanks,
Pankaj

[1]
diff --git a/arch/x86/include/asm/irqflags.h
b/arch/x86/include/asm/irqflags.h
index 8368e3fe2d36..df993bec56c4 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -52,11 +52,13 @@ static __always_inline void native_irq_enable(void)
static __always_inline void native_safe_halt(void)
{
mds_idle_clear_cpu_buffers();
- asm volatile("sti; hlt": : :"memory");
+ asm volatile("sti": : :"memory");

#ifdef CONFIG_AMD_MEM_ENCRYPT
check_hv_pending_irq_enable();
#endif
+ asm volatile("hlt": : :"memory");
}

2023-04-18 14:17:36

by Tianyu Lan

[permalink] [raw]
Subject: Re: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest



On 4/17/2023 8:49 PM, Michael Kelley (LINUX) wrote:
>>> Why is the first map entry being skipped?
>> The first entry is populated with processor count by Hyper-V.
> Perhaps add a comment to acknowledge that the behavior
> is a bit unexpected:
>
> The 0th entry in the memory layout array contains just a 32-bit
> processor count. Read that value and then skip over the reminder
> of the 0th entry. Start processing memory_map_entry's with array
> element 1.
>

Good suggestion! Will update in the next version.