2023-04-08 20:51:49

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v4 0/6] Support TDX guests on Hyper-V

The patchset adds the Hyper-V specific code so that a TDX guest can run
on Hyper-V. Please review. Thanks!

This v4 patchset is based on Michael Kelley's v7 DDA patchset:
https://github.com/kelleymh/linux/commits/v7
Michael's patches are being merged into the upstream (some of the
patches are already in tip.git and the others are going upstream via
the Hyper-V tree).

This v4 patchset addressed the comments from Kirill, Sathyanarayanan
and Michael. Please see each patch's log message for the changes.

If you want to view the patches on github, it is also in this branch:
https://github.com/dcui/tdx/commits/decui/michaelv7dda/tdx/v4

FYI, v1-v3 are here:
https://lwn.net/ml/linux-kernel/[email protected]/
https://lwn.net/ml/linux-kernel/[email protected]/
https://lwn.net/ml/linux-kernel/[email protected]/

Thanks,
Dexuan

Dexuan Cui (6):
x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
x86/tdx: Support vmalloc() for tdx_enc_status_changed()
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Support hypercalls for TDX guests
Drivers: hv: vmbus: Support TDX guests
x86/hyperv: Fix serial console interrupts for TDX guests

arch/x86/coco/tdx/tdx.c | 122 ++++++++++++++++++++++-------
arch/x86/hyperv/hv_apic.c | 6 +-
arch/x86/hyperv/hv_init.c | 27 ++++++-
arch/x86/hyperv/ivm.c | 20 +++++
arch/x86/include/asm/hyperv-tlfs.h | 3 +-
arch/x86/include/asm/mshyperv.h | 20 +++++
arch/x86/kernel/cpu/mshyperv.c | 43 ++++++++++
drivers/hv/hv.c | 62 +++++++++++++--
drivers/hv/hv_common.c | 30 +++++++
include/asm-generic/mshyperv.h | 1 +
10 files changed, 295 insertions(+), 39 deletions(-)

--
2.25.1


2023-04-08 20:52:03

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for TDX guests

When a TDX guest runs on Hyper-V, the UEFI firmware sets the HW_REDUCED
flag, and consequently ttyS0 interrupts can't work. Fix the issue by
overriding x86_init.acpi.reduced_hw_early_init().

Signed-off-by: Dexuan Cui <[email protected]>
---

Changes since v1:
None.

arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e9106c9d92f81..deedced0f2bb0 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -318,6 +318,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#endif

+/*
+ * When a TDX guest runs on Hyper-V, the firmware sets the HW_REDUCED flag: see
+ * acpi_tb_create_local_fadt(). Consequently ttyS0 interrupts can't work because
+ * request_irq() -> ... -> irq_to_desc() returns NULL for ttyS0. This happens
+ * because mp_config_acpi_legacy_irqs() sees a nr_legacy_irqs() of 0, so it
+ * doesn't initialize the array 'mp_irqs[]', and later setup_IO_APIC_irqs() ->
+ * find_irq_entry() fails to find the legacy irqs from the array, and hence
+ * doesn't create the necessary irq description info.
+ *
+ * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but doesn't
+ * change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
+ * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(), and
+ * eventually serial console interrupts can work properly.
+ */
+static void __init reduced_hw_init(void)
+{
+ x86_init.timers.timer_init = x86_init_noop;
+ x86_init.irqs.pre_vector_init = x86_init_noop;
+}
+
static void __init ms_hyperv_init_platform(void)
{
int hv_max_functions_eax;
@@ -425,6 +445,8 @@ static void __init ms_hyperv_init_platform(void)

/* A TDX VM must use x2APIC and doesn't use lazy EOI */
ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+
+ x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
}
}

--
2.25.1

2023-04-08 20:52:46

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests

Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
No need to use hv_vp_assist_page.
Don't use the unsafe Hyper-V TSC page.
Don't try to use HV_REGISTER_CRASH_CTL.
Don't trust Hyper-V's TLB-flushing hypercalls.
Don't use lazy EOI.
Share SynIC Event/Message pages and VMBus Monitor pages with the host.
Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().

Signed-off-by: Dexuan Cui <[email protected]>
---

Changes in v2:
Used a new function hv_set_memory_enc_dec_needed() in
__set_memory_enc_pgtable().
Added the missing set_memory_encrypted() in hv_synic_free().

Changes in v3:
Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
(Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).

Used cc_mkdec() in hv_synic_enable_regs().

ms_hyperv_init_platform():
Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.

Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.

Changes in v4:
A minor rebase to Michael's v7 DDA patchset. I'm very happy that
I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
Michael's work.

arch/x86/hyperv/hv_apic.c | 6 ++--
arch/x86/hyperv/hv_init.c | 19 ++++++++---
arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++-
drivers/hv/hv.c | 62 +++++++++++++++++++++++++++++++---
4 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index fb8b2c088681a..16919c7b3196e 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
(exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
return true;

- if (!hv_hypercall_pg)
+ /* A TDX guest doesn't use hv_hypercall_pg. */
+ if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
return false;

if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
@@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)

trace_hyperv_send_ipi_one(cpu, vector);

- if (!hv_hypercall_pg || (vp == VP_INVAL))
+ /* A TDX guest doesn't use hv_hypercall_pg. */
+ if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
return false;

if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index f175e0de821c3..f28357ecad7d9 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
static int hv_cpu_init(unsigned int cpu)
{
union hv_vp_assist_msr_contents msr = { 0 };
- struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+ struct hv_vp_assist_page **hvp;
int ret;

ret = hv_common_cpu_init(cpu);
@@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
if (!hv_vp_assist_page)
return 0;

+ hvp = &hv_vp_assist_page[cpu];
if (hv_root_partition) {
/*
* For root partition we get the hypervisor provided VP assist
@@ -398,11 +399,21 @@ void __init hyperv_init(void)
if (hv_common_init())
return;

- hv_vp_assist_page = kcalloc(num_possible_cpus(),
- sizeof(*hv_vp_assist_page), GFP_KERNEL);
+ /*
+ * The VP assist page is useless to a TDX guest: the only use we
+ * would have for it is lazy EOI, which can not be used with TDX.
+ */
+ if (hv_isolation_type_tdx())
+ hv_vp_assist_page = NULL;
+ else
+ hv_vp_assist_page = kcalloc(num_possible_cpus(),
+ sizeof(*hv_vp_assist_page),
+ GFP_KERNEL);
if (!hv_vp_assist_page) {
ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
- goto common_free;
+
+ if (!hv_isolation_type_tdx())
+ goto common_free;
}

if (hv_isolation_type_snp()) {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index a87fb934cd4b4..e9106c9d92f81 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)

if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
static_branch_enable(&isolation_type_snp);
- else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+ else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
static_branch_enable(&isolation_type_tdx);
+
+ /*
+ * The GPAs of SynIC Event/Message pages and VMBus
+ * Moniter pages need to be added by this offset.
+ */
+ ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+ /* Don't use the unsafe Hyper-V TSC page */
+ ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+ /* HV_REGISTER_CRASH_CTL is unsupported */
+ ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+
+ /* Don't trust Hyper-V's TLB-flushing hypercalls */
+ ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+
+ /* A TDX VM must use x2APIC and doesn't use lazy EOI */
+ ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+ }
}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 008234894d287..22ecb79d21efd 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -18,6 +18,7 @@
#include <linux/clockchips.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
+#include <linux/set_memory.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
#include "hyperv_vmbus.h"
@@ -119,6 +120,7 @@ int hv_synic_alloc(void)
{
int cpu;
struct hv_per_cpu_context *hv_cpu;
+ int ret = -ENOMEM;

/*
* First, zero all per-cpu memory areas so hv_synic_free() can
@@ -168,6 +170,30 @@ int hv_synic_alloc(void)
pr_err("Unable to allocate post msg page\n");
goto err;
}
+
+
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC msg page\n");
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC event page\n");
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->post_msg_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt post msg page\n");
+ goto err;
+ }
+ }
}

return 0;
@@ -176,18 +202,42 @@ int hv_synic_alloc(void)
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/
- return -ENOMEM;
+ return ret;
}


void hv_synic_free(void)
{
int cpu;
+ int ret;

for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_encrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC msg page\n");
+ continue;
+ }
+
+ ret = set_memory_encrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC event page\n");
+ continue;
+ }
+
+ ret = set_memory_encrypted(
+ (unsigned long)hv_cpu->post_msg_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt post msg page\n");
+ continue;
+ }
+ }
+
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
free_page((unsigned long)hv_cpu->post_msg_page);
@@ -225,8 +275,9 @@ void hv_synic_enable_regs(unsigned int cpu)
if (!hv_cpu->synic_message_page)
pr_err("Fail to map synic message page.\n");
} else {
- simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
- >> HV_HYP_PAGE_SHIFT;
+ simp.base_simp_gpa =
+ cc_mkdec(virt_to_phys(hv_cpu->synic_message_page)) >>
+ HV_HYP_PAGE_SHIFT;
}

hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
@@ -244,8 +295,9 @@ void hv_synic_enable_regs(unsigned int cpu)
if (!hv_cpu->synic_event_page)
pr_err("Fail to map synic event page.\n");
} else {
- siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
- >> HV_HYP_PAGE_SHIFT;
+ siefp.base_siefp_gpa =
+ cc_mkdec(virt_to_phys(hv_cpu->synic_event_page)) >>
+ HV_HYP_PAGE_SHIFT;
}

hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
--
2.25.1

2023-04-08 20:52:49

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v4 4/6] x86/hyperv: Support hypercalls for TDX guests

A TDX guest uses the GHCI call rather than hv_hypercall_pg.

In hv_do_hypercall(), Hyper-V requires that the input/output addresses
must have the cc_mask.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
Changes in v2:
Implemented hv_tdx_hypercall() in C rather than in assembly code.
Renamed the parameter names of hv_tdx_hypercall().
Used cc_mkdec() directly in hv_do_hypercall().

Changes in v3:
Decrypted/encrypted hyperv_pcpu_input_arg in
hv_common_cpu_init() and hv_common_cpu_die().

Changes in v4:
__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
hv_common_cpu_die(): explicitly ignore the error set_memory_encrypted() [Michael Kelley]
Added Sathyanarayanan's Reviewed-by.

arch/x86/hyperv/hv_init.c | 8 ++++++++
arch/x86/hyperv/ivm.c | 14 ++++++++++++++
arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
drivers/hv/hv_common.c | 24 ++++++++++++++++++++++++
include/asm-generic/mshyperv.h | 1 +
5 files changed, 64 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e12..f175e0de821c3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -432,6 +432,10 @@ void __init hyperv_init(void)
/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);

+ /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+ if (hv_isolation_type_tdx())
+ goto skip_hypercall_pg_init;
+
hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -471,6 +475,7 @@ void __init hyperv_init(void)
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
}

+skip_hypercall_pg_init:
/*
* hyperv_init() is called before LAPIC is initialized: see
* apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -594,6 +599,9 @@ bool hv_is_hyperv_initialized(void)
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return false;

+ /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+ if (hv_isolation_type_tdx())
+ return true;
/*
* Verify that earlier initialization succeeded by checking
* that the hypercall page is setup
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 3658ade4f4121..23304c9ddd34f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -415,3 +415,17 @@ bool hv_isolation_type_tdx(void)
{
return static_branch_unlikely(&isolation_type_tdx);
}
+
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+ struct tdx_hypercall_args args = { };
+
+ args.r10 = control;
+ args.rdx = param1;
+ args.r8 = param2;
+
+ (void)__tdx_hypercall_ret(&args);
+
+ return args.r11;
+}
+EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index de7ceae9e65e9..71077326f57bc 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -10,6 +10,7 @@
#include <asm/nospec-branch.h>
#include <asm/paravirt.h>
#include <asm/mshyperv.h>
+#include <asm/coco.h>

/*
* Hyper-V always provides a single IO-APIC at this MMIO address.
@@ -45,6 +46,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);

+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+/*
+ * If the hypercall involves no input or output parameters, the hypervisor
+ * ignores the corresponding GPA pointer.
+ */
static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
u64 input_address = input ? virt_to_phys(input) : 0;
@@ -52,6 +59,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
u64 hv_status;

#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control,
+ cc_mkdec(input_address),
+ cc_mkdec(output_address));
if (!hv_hypercall_pg)
return U64_MAX;

@@ -95,6 +106,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
u64 hv_status;

#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control, input1, 0);
+
{
__asm__ __volatile__(CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -140,6 +154,9 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
u64 hv_status;

#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control, input1, input2);
+
{
__asm__ __volatile__("mov %4, %%r8\n"
CALL_NOSPEC
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c55db7ea6580b..10e85682e83eb 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -21,6 +21,7 @@
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>

@@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
u64 msr_vp_index;
gfp_t flags;
int pgcount = hv_root_partition ? 2 : 1;
+ int ret;

/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
if (!(*inputarg))
return -ENOMEM;

+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+ if (ret) {
+ /* It may be unsafe to free *inputarg */
+ *inputarg = NULL;
+ return ret;
+ }
+
+ memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
+ }
+
if (hv_root_partition) {
outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -157,6 +170,8 @@ int hv_common_cpu_die(unsigned int cpu)
unsigned long flags;
void **inputarg, **outputarg;
void *mem;
+ int pgcount = hv_root_partition ? 2 : 1;
+ int ret;

local_irq_save(flags);

@@ -171,6 +186,15 @@ int hv_common_cpu_die(unsigned int cpu)

local_irq_restore(flags);

+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_encrypted((unsigned long)mem, pgcount);
+ if (ret)
+ pr_warn("Hyper-V: Failed to encrypt input arg on cpu%d: %d\n",
+ cpu, ret);
+ /* It's unsafe to free 'mem'. */
+ return 0;
+ }
+
kfree(mem);

return 0;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index afcd9ae9588cb..83e56ebe0cb7f 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -58,6 +58,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
extern bool hv_isolation_type_snp(void);
+extern bool hv_isolation_type_tdx(void);

/* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
static inline int hv_result(u64 status)
--
2.25.1

2023-04-11 16:44:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v4 4/6] x86/hyperv: Support hypercalls for TDX guests

From: Dexuan Cui <[email protected]> Sent: Saturday, April 8, 2023 1:48 PM
>
> A TDX guest uses the GHCI call rather than hv_hypercall_pg.
>
> In hv_do_hypercall(), Hyper-V requires that the input/output addresses
> must have the cc_mask.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan
> <[email protected]>
> Signed-off-by: Dexuan Cui <[email protected]>
> ---
> Changes in v2:
> Implemented hv_tdx_hypercall() in C rather than in assembly code.
> Renamed the parameter names of hv_tdx_hypercall().
> Used cc_mkdec() directly in hv_do_hypercall().
>
> Changes in v3:
> Decrypted/encrypted hyperv_pcpu_input_arg in
> hv_common_cpu_init() and hv_common_cpu_die().
>
> Changes in v4:
> __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
> hv_common_cpu_die(): explicitly ignore the error set_memory_encrypted() [Michael Kelley]
> Added Sathyanarayanan's Reviewed-by.
>
> arch/x86/hyperv/hv_init.c | 8 ++++++++
> arch/x86/hyperv/ivm.c | 14 ++++++++++++++
> arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
> drivers/hv/hv_common.c | 24 ++++++++++++++++++++++++
> include/asm-generic/mshyperv.h | 1 +
> 5 files changed, 64 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a5f9474f08e12..f175e0de821c3 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -432,6 +432,10 @@ void __init hyperv_init(void)
> /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>
> + /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> + if (hv_isolation_type_tdx())
> + goto skip_hypercall_pg_init;
> +
> hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
> VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -471,6 +475,7 @@ void __init hyperv_init(void)
> wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> }
>
> +skip_hypercall_pg_init:
> /*
> * hyperv_init() is called before LAPIC is initialized: see
> * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> @@ -594,6 +599,9 @@ bool hv_is_hyperv_initialized(void)
> if (x86_hyper_type != X86_HYPER_MS_HYPERV)
> return false;
>
> + /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> + if (hv_isolation_type_tdx())
> + return true;
> /*
> * Verify that earlier initialization succeeded by checking
> * that the hypercall page is setup
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 3658ade4f4121..23304c9ddd34f 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -415,3 +415,17 @@ bool hv_isolation_type_tdx(void)
> {
> return static_branch_unlikely(&isolation_type_tdx);
> }
> +
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
> +{
> + struct tdx_hypercall_args args = { };
> +
> + args.r10 = control;
> + args.rdx = param1;
> + args.r8 = param2;
> +
> + (void)__tdx_hypercall_ret(&args);
> +
> + return args.r11;
> +}
> +EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index de7ceae9e65e9..71077326f57bc 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -10,6 +10,7 @@
> #include <asm/nospec-branch.h>
> #include <asm/paravirt.h>
> #include <asm/mshyperv.h>
> +#include <asm/coco.h>
>
> /*
> * Hyper-V always provides a single IO-APIC at this MMIO address.
> @@ -45,6 +46,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
>
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +/*
> + * If the hypercall involves no input or output parameters, the hypervisor
> + * ignores the corresponding GPA pointer.
> + */
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> u64 input_address = input ? virt_to_phys(input) : 0;
> @@ -52,6 +59,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> u64 hv_status;
>
> #ifdef CONFIG_X86_64
> + if (hv_isolation_type_tdx())
> + return hv_tdx_hypercall(control,
> + cc_mkdec(input_address),
> + cc_mkdec(output_address));
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> @@ -95,6 +106,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
> u64 hv_status;
>
> #ifdef CONFIG_X86_64
> + if (hv_isolation_type_tdx())
> + return hv_tdx_hypercall(control, input1, 0);
> +
> {
> __asm__ __volatile__(CALL_NOSPEC
> : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> @@ -140,6 +154,9 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
> u64 hv_status;
>
> #ifdef CONFIG_X86_64
> + if (hv_isolation_type_tdx())
> + return hv_tdx_hypercall(control, input1, input2);
> +
> {
> __asm__ __volatile__("mov %4, %%r8\n"
> CALL_NOSPEC
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index c55db7ea6580b..10e85682e83eb 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -21,6 +21,7 @@
> #include <linux/ptrace.h>
> #include <linux/slab.h>
> #include <linux/dma-map-ops.h>
> +#include <linux/set_memory.h>
> #include <asm/hyperv-tlfs.h>
> #include <asm/mshyperv.h>
>
> @@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
> u64 msr_vp_index;
> gfp_t flags;
> int pgcount = hv_root_partition ? 2 : 1;
> + int ret;
>
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
> @@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
> if (!(*inputarg))
> return -ENOMEM;
>
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> + if (ret) {
> + /* It may be unsafe to free *inputarg */
> + *inputarg = NULL;
> + return ret;
> + }
> +
> + memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
> + }
> +
> if (hv_root_partition) {
> outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> @@ -157,6 +170,8 @@ int hv_common_cpu_die(unsigned int cpu)
> unsigned long flags;
> void **inputarg, **outputarg;
> void *mem;
> + int pgcount = hv_root_partition ? 2 : 1;
> + int ret;
>
> local_irq_save(flags);
>
> @@ -171,6 +186,15 @@ int hv_common_cpu_die(unsigned int cpu)
>
> local_irq_restore(flags);
>
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_encrypted((unsigned long)mem, pgcount);
> + if (ret)
> + pr_warn("Hyper-V: Failed to encrypt input arg on cpu%d: %d\n",
> + cpu, ret);
> + /* It's unsafe to free 'mem'. */
> + return 0;
> + }
> +
> kfree(mem);
>
> return 0;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index afcd9ae9588cb..83e56ebe0cb7f 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -58,6 +58,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
> extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
> extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> extern bool hv_isolation_type_snp(void);
> +extern bool hv_isolation_type_tdx(void);
>
> /* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
> static inline int hv_result(u64 status)
> --
> 2.25.1

Reviewed-by: Michael Kelley <[email protected]>

2023-04-11 17:19:28

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for TDX guests

From: Dexuan Cui <[email protected]> Sent: Saturday, April 8, 2023 1:48 PM
>
> When a TDX guest runs on Hyper-V, the UEFI firmware sets the HW_REDUCED
> flag, and consequently ttyS0 interrupts can't work. Fix the issue by
> overriding x86_init.acpi.reduced_hw_early_init().
>
> Signed-off-by: Dexuan Cui <[email protected]>
> ---
>
> Changes since v1:
> None.
>
> arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index e9106c9d92f81..deedced0f2bb0 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -318,6 +318,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> }
> #endif
>
> +/*
> + * When a TDX guest runs on Hyper-V, the firmware sets the HW_REDUCED flag: see
> + * acpi_tb_create_local_fadt(). Consequently ttyS0 interrupts can't work because
> + * request_irq() -> ... -> irq_to_desc() returns NULL for ttyS0. This happens
> + * because mp_config_acpi_legacy_irqs() sees a nr_legacy_irqs() of 0, so it
> + * doesn't initialize the array 'mp_irqs[]', and later setup_IO_APIC_irqs() ->
> + * find_irq_entry() fails to find the legacy irqs from the array, and hence
> + * doesn't create the necessary irq description info.
> + *
> + * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but doesn't
> + * change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
> + * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(), and
> + * eventually serial console interrupts can work properly.

I had a little trouble parsing this comment. Slightly better wording is:

* Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() here,
* except don't change 'legacy_pic'. It keeps its default value 'default_legacy_pic'.
* mp_config_acpi_legacy_irqs() sees a non-zero nr_legacy_irqs(), and
* eventually serial console interrupts can work properly.

Otherwise,

Reviewed-by: Michael Kelley <[email protected]>

> + */
> +static void __init reduced_hw_init(void)
> +{
> + x86_init.timers.timer_init = x86_init_noop;
> + x86_init.irqs.pre_vector_init = x86_init_noop;
> +}
> +
> static void __init ms_hyperv_init_platform(void)
> {
> int hv_max_functions_eax;
> @@ -425,6 +445,8 @@ static void __init ms_hyperv_init_platform(void)
>
> /* A TDX VM must use x2APIC and doesn't use lazy EOI */
> ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> +
> + x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
> }
> }
>
> --
> 2.25.1

2023-04-12 14:12:28

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests

From: Dexuan Cui <[email protected]> Sent: Saturday, April 8, 2023 1:48 PM
>
> Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
> No need to use hv_vp_assist_page.
> Don't use the unsafe Hyper-V TSC page.
> Don't try to use HV_REGISTER_CRASH_CTL.
> Don't trust Hyper-V's TLB-flushing hypercalls.
> Don't use lazy EOI.
> Share SynIC Event/Message pages and VMBus Monitor pages with the host.
> Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
>
> Signed-off-by: Dexuan Cui <[email protected]>
> ---
>
> Changes in v2:
> Used a new function hv_set_memory_enc_dec_needed() in
> __set_memory_enc_pgtable().
> Added the missing set_memory_encrypted() in hv_synic_free().
>
> Changes in v3:
> Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
> (Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).
>
> Used cc_mkdec() in hv_synic_enable_regs().
>
> ms_hyperv_init_platform():
> Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
> Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.
>
> Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.
>
> Changes in v4:
> A minor rebase to Michael's v7 DDA patchset. I'm very happy that
> I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
> Michael's work.
>
> arch/x86/hyperv/hv_apic.c | 6 ++--
> arch/x86/hyperv/hv_init.c | 19 ++++++++---
> arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++-
> drivers/hv/hv.c | 62 +++++++++++++++++++++++++++++++---
> 4 files changed, 96 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index fb8b2c088681a..16919c7b3196e 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int
> vector,
> (exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
> return true;
>
> - if (!hv_hypercall_pg)
> + /* A TDX guest doesn't use hv_hypercall_pg. */
> + if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
> return false;
>
> if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> @@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)
>
> trace_hyperv_send_ipi_one(cpu, vector);
>
> - if (!hv_hypercall_pg || (vp == VP_INVAL))
> + /* A TDX guest doesn't use hv_hypercall_pg. */
> + if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
> return false;
>
> if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f175e0de821c3..f28357ecad7d9 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
> static int hv_cpu_init(unsigned int cpu)
> {
> union hv_vp_assist_msr_contents msr = { 0 };
> - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> + struct hv_vp_assist_page **hvp;
> int ret;
>
> ret = hv_common_cpu_init(cpu);
> @@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
> if (!hv_vp_assist_page)
> return 0;
>
> + hvp = &hv_vp_assist_page[cpu];
> if (hv_root_partition) {
> /*
> * For root partition we get the hypervisor provided VP assist
> @@ -398,11 +399,21 @@ void __init hyperv_init(void)
> if (hv_common_init())
> return;
>
> - hv_vp_assist_page = kcalloc(num_possible_cpus(),
> - sizeof(*hv_vp_assist_page), GFP_KERNEL);
> + /*
> + * The VP assist page is useless to a TDX guest: the only use we
> + * would have for it is lazy EOI, which can not be used with TDX.
> + */
> + if (hv_isolation_type_tdx())
> + hv_vp_assist_page = NULL;
> + else
> + hv_vp_assist_page = kcalloc(num_possible_cpus(),
> + sizeof(*hv_vp_assist_page),
> + GFP_KERNEL);
> if (!hv_vp_assist_page) {
> ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> - goto common_free;
> +
> + if (!hv_isolation_type_tdx())
> + goto common_free;
> }
>
> if (hv_isolation_type_snp()) {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index a87fb934cd4b4..e9106c9d92f81 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)
>
> if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> static_branch_enable(&isolation_type_snp);
> - else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> + else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
> static_branch_enable(&isolation_type_tdx);
> +
> + /*
> + * The GPAs of SynIC Event/Message pages and VMBus
> + * Moniter pages need to be added by this offset.
> + */
> + ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
> +
> + /* Don't use the unsafe Hyper-V TSC page */
> + ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
> +
> + /* HV_REGISTER_CRASH_CTL is unsupported */
> + ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> +
> + /* Don't trust Hyper-V's TLB-flushing hypercalls */
> + ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
> +
> + /* A TDX VM must use x2APIC and doesn't use lazy EOI */
> + ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> + }
> }
>
> if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 008234894d287..22ecb79d21efd 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -18,6 +18,7 @@
> #include <linux/clockchips.h>
> #include <linux/delay.h>
> #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/mshyperv.h>
> #include "hyperv_vmbus.h"
> @@ -119,6 +120,7 @@ int hv_synic_alloc(void)
> {
> int cpu;
> struct hv_per_cpu_context *hv_cpu;
> + int ret = -ENOMEM;
>
> /*
> * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
> pr_err("Unable to allocate post msg page\n");
> goto err;
> }
> +
> +
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->synic_message_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt SYNIC msg page\n");
> + goto err;
> + }
> +
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->synic_event_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt SYNIC event page\n");
> + goto err;
> + }
> +
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->post_msg_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt post msg page\n");
> + goto err;
> + }
> + }

One other comment: The memory for the synic_message_page,
synic_event_page, and post_msg_page is obtained using get_zeroed_page().
But after the decryption, the memory contents will be random garbage that
isn't all zeroes. You'll need to do a memset() after the decryption to get the
contents back to zero. Compare with Patch 6 in Tianyu's fully enlightened
SNP patch series.

Michael

> }
>
> return 0;
> @@ -176,18 +202,42 @@ int hv_synic_alloc(void)
> * Any memory allocations that succeeded will be freed when
> * the caller cleans up by calling hv_synic_free()
> */
> - return -ENOMEM;
> + return ret;
> }
>
>
> void hv_synic_free(void)
> {
> int cpu;
> + int ret;
>
> for_each_present_cpu(cpu) {
> struct hv_per_cpu_context *hv_cpu
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_encrypted(
> + (unsigned long)hv_cpu->synic_message_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt SYNIC msg page\n");
> + continue;
> + }
> +
> + ret = set_memory_encrypted(
> + (unsigned long)hv_cpu->synic_event_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt SYNIC event page\n");
> + continue;
> + }
> +
> + ret = set_memory_encrypted(
> + (unsigned long)hv_cpu->post_msg_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt post msg page\n");
> + continue;
> + }
> + }
> +
> free_page((unsigned long)hv_cpu->synic_event_page);
> free_page((unsigned long)hv_cpu->synic_message_page);
> free_page((unsigned long)hv_cpu->post_msg_page);
> @@ -225,8 +275,9 @@ void hv_synic_enable_regs(unsigned int cpu)
> if (!hv_cpu->synic_message_page)
> pr_err("Fail to map synic message page.\n");
> } else {
> - simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> - >> HV_HYP_PAGE_SHIFT;
> + simp.base_simp_gpa =
> + cc_mkdec(virt_to_phys(hv_cpu->synic_message_page)) >>
> + HV_HYP_PAGE_SHIFT;
> }
>
> hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
> @@ -244,8 +295,9 @@ void hv_synic_enable_regs(unsigned int cpu)
> if (!hv_cpu->synic_event_page)
> pr_err("Fail to map synic event page.\n");
> } else {
> - siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> - >> HV_HYP_PAGE_SHIFT;
> + siefp.base_siefp_gpa =
> + cc_mkdec(virt_to_phys(hv_cpu->synic_event_page)) >>
> + HV_HYP_PAGE_SHIFT;
> }
>
> hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
> --
> 2.25.1

2023-04-21 03:46:00

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests

> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Wednesday, April 12, 2023 7:05 AM
> > @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
> > pr_err("Unable to allocate post msg page\n");
> > goto err;
> > }
> > +
> > +
> > + if (hv_isolation_type_tdx()) {
> > + ret = set_memory_decrypted(
> > + (unsigned long)hv_cpu->synic_message_page, 1);
> > + if (ret) {
> > + pr_err("Failed to decrypt SYNIC msg page\n");
> > + goto err;
> > + }
> > +
> > + ret = set_memory_decrypted(
> > + (unsigned long)hv_cpu->synic_event_page, 1);
> > + if (ret) {
> > + pr_err("Failed to decrypt SYNIC event page\n");
> > + goto err;
> > + }
> > +
> > + ret = set_memory_decrypted(
> > + (unsigned long)hv_cpu->post_msg_page, 1);
> > + if (ret) {
> > + pr_err("Failed to decrypt post msg page\n");
> > + goto err;
> > + }
> > + }
>
> One other comment: The memory for the synic_message_page,
> synic_event_page, and post_msg_page is obtained using get_zeroed_page().
> But after the decryption, the memory contents will be random garbage that

I'm not sure about this, as I don't see any "unknown msgtype=" message from
vmbus_on_msg_dpc() :-)

I agree it's good to add a memset(). Will do it in v5.

> isn't all zeroes. You'll need to do a memset() after the decryption to get the
> contents back to zero. Compare with Patch 6 in Tianyu's fully enlightened
> SNP patch series.
>
> Michael

2023-04-21 04:03:16

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v4 6/6] x86/hyperv: Fix serial console interrupts for TDX guests

> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Tuesday, April 11, 2023 10:14 AM
> > ...
> > + * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but
> > doesn't change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
> > + * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(),
> > and eventually serial console interrupts can work properly.
>
> I had a little trouble parsing this comment. Slightly better wording is:
>
> * Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init()
> here,
> * except don't change 'legacy_pic'. It keeps its default value
> 'default_legacy_pic'.
> * mp_config_acpi_legacy_irqs() sees a non-zero nr_legacy_irqs(), and
> * eventually serial console interrupts can work properly.
>
> Otherwise,
>
> Reviewed-by: Michael Kelley <[email protected]>

Thanks! Will use this in v5.

2023-04-22 01:12:42

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v4 5/6] Drivers: hv: vmbus: Support TDX guests

> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Tuesday, April 11, 2023 9:53 AM
> ...
> > @@ -168,6 +170,30 @@ int hv_synic_alloc(void)
> > pr_err("Unable to allocate post msg page\n");
> > goto err;
> > ...
> The error path here doesn't always work correctly. If one or more of the
> three pages is decrypted, and then one of the decryptions fails, we're left
> in a state where all three pages are allocated, but some are decrypted
> and some are not. hv_synic_free() won't know which pages are allocated
> but still encrypted, and will try to re-encrypt a page that wasn't
> successfully decrypted.
>
> The code to clean up from an error is messy because there are three pages
> involved. You've posted a separate patch to eliminate the need for the
> post_msg_page. If that patch came before this patch series (or maybe
> incorporated into this series), the code here only has to deal with two
> pages instead of three, making the cleanup of decryption errors easier.
>
> > }
> >. ..
> > void hv_synic_free(void)
> > {
> > int cpu;
> > + int ret;
> >...
> If any of the three re-encryptions fails, we'll leak all three pages. That's
> probably OK. Eliminating the post_msg_page will help.

The post_msg_page has been removed in the hyperv-next branch.
I'm rebasing this patch and is going to post v5.
I think I can use the below changes:

@@ -159,6 +161,28 @@ int hv_synic_alloc(void)
goto err;
}
}
+
+ /* It's better to leak the page if the decryption fails. */
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC msg page\n");
+ hv_cpu->synic_message_page = NULL;
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC event page\n");
+ hv_cpu->synic_event_page = NULL;
+ goto err;
+ }
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ }
}
...
void hv_synic_free(void)
{
int cpu;
+ int ret;

for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

+ /* It's better to leak the page if the encryption fails. */
+ if (hv_isolation_type_tdx()) {
+ if (hv_cpu->synic_message_page) {
+ ret = set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC msg page\n");
+ hv_cpu->synic_message_page = NULL;
+ }
+ }
+
+ if (hv_cpu->synic_event_page) {
+ ret = set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC event page\n");
+ hv_cpu->synic_event_page = NULL;
+ }
+ }
+ }
+
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
}

BTW, at the end of hv_synic_alloc() there is a comment saying:
/*
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/

If hv_synic_alloc() -> get_zeroed_page() fails, hv_context.hv_numa_map() is
not freed() and hv_synic_alloc() returns -ENOMEM to vmbus_bus_init(); next,
we do "goto err_alloc;", i.e. hv_synic_free() is not called. We can post a
separate patch for this.

Thanks,
Dexuan