2023-04-22 02:20:53

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 0/6] Support TDX guests on Hyper-V

The patchset adds the Hyper-V specific code so that a TDX guest can run
on Hyper-V. Please review.

This v5 patchset is based on Michael Kelley's v7 DDA patchset:
https://github.com/kelleymh/linux/commits/v7
Some of Michael's patches are in tip.git, and the others are in Hyper-V tree.

This v5 patchset addressed the comments from Kirill and and Michael:
1. Added Michael's Reviewed-by to all the 6 patches except for patch 5.
2. Added Kirill's Signed-off-by to patch 2.
3. Improved the error handling in hv_synic_alloc() and hv_synic_free().

Please see each patch's log message for the changes.

@x86 maintainers: Can you please take patch 1 and 2? They can apply
cleanly to the tip.git tree's branch x86/tdx.

@Michael Kelley: Can you please review patch 5?

@Wei Liu: Patch 3, 4, 5, 6 need to go through the Hyper-V tree's
hyperv-next branch because they're based on Michael Kelley's DDA
patches. The 4 patches can apply cleanly to hyperv-next.

If you want to view the patches on github, it is in this branch:
https://github.com/dcui/tdx/commits/decui/michaelv7dda/tdx/v5

FYI, v1-v4 are here:
https://lwn.net/ml/linux-kernel/[email protected]/
https://lwn.net/ml/linux-kernel/[email protected]/
https://lwn.net/ml/linux-kernel/[email protected]/
https://lwn.net/ml/linux-kernel/[email protected]/

Thanks,
Dexuan

Dexuan Cui (6):
x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
x86/tdx: Support vmalloc() for tdx_enc_status_changed()
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Support hypercalls for TDX guests
Drivers: hv: vmbus: Support TDX guests
x86/hyperv: Fix serial console interrupts for TDX guests

arch/x86/coco/tdx/tdx.c | 122 ++++++++++++++++++++++-------
arch/x86/hyperv/hv_apic.c | 6 +-
arch/x86/hyperv/hv_init.c | 27 ++++++-
arch/x86/hyperv/ivm.c | 20 +++++
arch/x86/include/asm/hyperv-tlfs.h | 3 +-
arch/x86/include/asm/mshyperv.h | 20 +++++
arch/x86/kernel/cpu/mshyperv.c | 43 ++++++++++
drivers/hv/hv.c | 54 ++++++++++++-
drivers/hv/hv_common.c | 30 +++++++
include/asm-generic/mshyperv.h | 1 +
10 files changed, 289 insertions(+), 37 deletions(-)

--
2.25.1


2023-04-22 02:20:53

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 1/6] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed

GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
operation for the pages in the region starting at the GPA specified
in R11.

When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
when hyperv_init() -> swiotlb_update_mem_attributes() ->
set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.

Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++--------
1 file changed, 52 insertions(+), 12 deletions(-)

Changes in v2:
Used __tdx_hypercall() directly in tdx_map_gpa().
Added a max_retry_cnt of 1000.
Renamed a few variables, e.g., r11 -> map_fail_paddr.

Changes in v3:
Changed max_retry_cnt from 1000 to 3.

Changes in v4:
__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
Added Kirill's Acked-by.

Changes in v5:
Added Michael's Reviewed-by.

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 4c4c6db39eca..5574c91541a2 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -28,6 +28,8 @@
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_REPORT_FATAL_ERROR 0x10003

+#define TDVMCALL_STATUS_RETRY 1
+
/* MMIO direction */
#define EPT_READ 0
#define EPT_WRITE 1
@@ -788,14 +790,15 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
}

/*
- * Inform the VMM of the guest's intent for this physical page: shared with
- * the VMM or private to the guest. The VMM is expected to change its mapping
- * of the page in response.
+ * Notify the VMM about page mapping conversion. More info about ABI
+ * can be found in TDX Guest-Host-Communication Interface (GHCI),
+ * section "TDG.VP.VMCALL<MapGPA>".
*/
-static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
{
- phys_addr_t start = __pa(vaddr);
- phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ int max_retry_cnt = 3, retry_cnt = 0;
+ struct tdx_hypercall_args args;
+ u64 map_fail_paddr, ret;

if (!enc) {
/* Set the shared (decrypted) bits: */
@@ -803,12 +806,49 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
end |= cc_mkdec(0);
}

- /*
- * Notify the VMM about page mapping conversion. More info about ABI
- * can be found in TDX Guest-Host-Communication Interface (GHCI),
- * section "TDG.VP.VMCALL<MapGPA>"
- */
- if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
+ while (1) {
+ memset(&args, 0, sizeof(args));
+ args.r10 = TDX_HYPERCALL_STANDARD;
+ args.r11 = TDVMCALL_MAP_GPA;
+ args.r12 = start;
+ args.r13 = end - start;
+
+ ret = __tdx_hypercall_ret(&args);
+ if (ret != TDVMCALL_STATUS_RETRY)
+ break;
+ /*
+ * The guest must retry the operation for the pages in the
+ * region starting at the GPA specified in R11. Make sure R11
+ * contains a sane value.
+ */
+ map_fail_paddr = args.r11;
+ if (map_fail_paddr < start || map_fail_paddr >= end)
+ return false;
+
+ if (map_fail_paddr == start) {
+ retry_cnt++;
+ if (retry_cnt > max_retry_cnt)
+ return false;
+ } else {
+ retry_cnt = 0;
+ start = map_fail_paddr;
+ }
+ }
+
+ return !ret;
+}
+
+/*
+ * Inform the VMM of the guest's intent for this physical page: shared with
+ * the VMM or private to the guest. The VMM is expected to change its mapping
+ * of the page in response.
+ */
+static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+{
+ phys_addr_t start = __pa(vaddr);
+ phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+
+ if (!tdx_map_gpa(start, end, enc))
return false;

/* private->shared conversion requires only MapGPA call */
--
2.25.1

2023-04-22 02:20:53

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 3/6] x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests

No logic change to SNP/VBS guests.

hv_isolation_type_tdx() wil be used to instruct a TDX guest on Hyper-V to
do some TDX-specific operations, e.g. hv_do_hypercall() should use
__tdx_hypercall(), and a TDX guest on Hyper-V should handle the Hyper-V
Event/Message/Monitor pages specially.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/hyperv/ivm.c | 6 ++++++
arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
arch/x86/include/asm/mshyperv.h | 3 +++
arch/x86/kernel/cpu/mshyperv.c | 2 ++
drivers/hv/hv_common.c | 6 ++++++
include/asm-generic/mshyperv.h | 1 +
6 files changed, 20 insertions(+), 1 deletion(-)

Changes in v2:
Added "#ifdef CONFIG_INTEL_TDX_GUEST and #endif" for
hv_isolation_type_tdx() in arch/x86/hyperv/ivm.c.

Simplified the changes in ms_hyperv_init_platform().

Changes in v3:
Added Kuppuswamy's Reviewed-by.

Changes in v4:
A minor rebase to Michael's v7 DDA patchset.

Changes in v5:
Added Michael's Reviewed-by.

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 127d5b7b63de..3658ade4f412 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -400,6 +400,7 @@ bool hv_is_isolation_supported(void)
}

DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);

/*
* hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
@@ -409,3 +410,8 @@ bool hv_isolation_type_snp(void)
{
return static_branch_unlikely(&isolation_type_snp);
}
+
+bool hv_isolation_type_tdx(void)
+{
+ return static_branch_unlikely(&isolation_type_tdx);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index b4fb75bd1013..338f383c721c 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -169,7 +169,8 @@
enum hv_isolation_type {
HV_ISOLATION_TYPE_NONE = 0,
HV_ISOLATION_TYPE_VBS = 1,
- HV_ISOLATION_TYPE_SNP = 2
+ HV_ISOLATION_TYPE_SNP = 2,
+ HV_ISOLATION_TYPE_TDX = 3
};

/* Hyper-V specific model specific registers (MSRs) */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index e3cef98a0142..de7ceae9e65e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -22,6 +22,7 @@
union hv_ghcb;

DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);

typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
@@ -38,6 +39,8 @@ extern u64 hv_current_partition_id;

extern union hv_ghcb * __percpu *hv_ghcb_pg;

+extern bool hv_isolation_type_tdx(void);
+
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index ff348ebb6ae2..a87fb934cd4b 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -405,6 +405,8 @@ static void __init ms_hyperv_init_platform(void)

if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
static_branch_enable(&isolation_type_snp);
+ else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+ static_branch_enable(&isolation_type_tdx);
}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 6d40b6c7b23b..c55db7ea6580 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -271,6 +271,12 @@ bool __weak hv_isolation_type_snp(void)
}
EXPORT_SYMBOL_GPL(hv_isolation_type_snp);

+bool __weak hv_isolation_type_tdx(void)
+{
+ return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
+
void __weak hv_setup_vmbus_handler(void (*handler)(void))
{
}
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index afcd9ae9588c..83e56ebe0cb7 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -58,6 +58,7 @@ extern void * __percpu *hyperv_pcpu_output_arg;
extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
extern bool hv_isolation_type_snp(void);
+extern bool hv_isolation_type_tdx(void);

/* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
static inline int hv_result(u64 status)
--
2.25.1

2023-04-22 02:21:03

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 4/6] x86/hyperv: Support hypercalls for TDX guests

A TDX guest uses the GHCI call rather than hv_hypercall_pg.

In hv_do_hypercall(), Hyper-V requires that the input/output addresses
must have the cc_mask.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/hyperv/hv_init.c | 8 ++++++++
arch/x86/hyperv/ivm.c | 14 ++++++++++++++
arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
drivers/hv/hv_common.c | 24 ++++++++++++++++++++++++
4 files changed, 63 insertions(+)

Changes in v2:
Implemented hv_tdx_hypercall() in C rather than in assembly code.
Renamed the parameter names of hv_tdx_hypercall().
Used cc_mkdec() directly in hv_do_hypercall().

Changes in v3:
Decrypted/encrypted hyperv_pcpu_input_arg in
hv_common_cpu_init() and hv_common_cpu_die().

Changes in v4:
__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
hv_common_cpu_die(): explicitly ignore the error set_memory_encrypted() [Michael Kelley]
Added Sathyanarayanan's Reviewed-by.

Changes in v5:
Added Michael's Reviewed-by.

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e1..f175e0de821c 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -432,6 +432,10 @@ void __init hyperv_init(void)
/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);

+ /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+ if (hv_isolation_type_tdx())
+ goto skip_hypercall_pg_init;
+
hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -471,6 +475,7 @@ void __init hyperv_init(void)
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
}

+skip_hypercall_pg_init:
/*
* hyperv_init() is called before LAPIC is initialized: see
* apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -594,6 +599,9 @@ bool hv_is_hyperv_initialized(void)
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return false;

+ /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+ if (hv_isolation_type_tdx())
+ return true;
/*
* Verify that earlier initialization succeeded by checking
* that the hypercall page is setup
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 3658ade4f412..23304c9ddd34 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -415,3 +415,17 @@ bool hv_isolation_type_tdx(void)
{
return static_branch_unlikely(&isolation_type_tdx);
}
+
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+ struct tdx_hypercall_args args = { };
+
+ args.r10 = control;
+ args.rdx = param1;
+ args.r8 = param2;
+
+ (void)__tdx_hypercall_ret(&args);
+
+ return args.r11;
+}
+EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index de7ceae9e65e..71077326f57b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -10,6 +10,7 @@
#include <asm/nospec-branch.h>
#include <asm/paravirt.h>
#include <asm/mshyperv.h>
+#include <asm/coco.h>

/*
* Hyper-V always provides a single IO-APIC at this MMIO address.
@@ -45,6 +46,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);

+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+/*
+ * If the hypercall involves no input or output parameters, the hypervisor
+ * ignores the corresponding GPA pointer.
+ */
static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
u64 input_address = input ? virt_to_phys(input) : 0;
@@ -52,6 +59,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
u64 hv_status;

#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control,
+ cc_mkdec(input_address),
+ cc_mkdec(output_address));
if (!hv_hypercall_pg)
return U64_MAX;

@@ -95,6 +106,9 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
u64 hv_status;

#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control, input1, 0);
+
{
__asm__ __volatile__(CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -140,6 +154,9 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
u64 hv_status;

#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control, input1, input2);
+
{
__asm__ __volatile__("mov %4, %%r8\n"
CALL_NOSPEC
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c55db7ea6580..10e85682e83e 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -21,6 +21,7 @@
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>

@@ -128,6 +129,7 @@ int hv_common_cpu_init(unsigned int cpu)
u64 msr_vp_index;
gfp_t flags;
int pgcount = hv_root_partition ? 2 : 1;
+ int ret;

/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -137,6 +139,17 @@ int hv_common_cpu_init(unsigned int cpu)
if (!(*inputarg))
return -ENOMEM;

+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+ if (ret) {
+ /* It may be unsafe to free *inputarg */
+ *inputarg = NULL;
+ return ret;
+ }
+
+ memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
+ }
+
if (hv_root_partition) {
outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -157,6 +170,8 @@ int hv_common_cpu_die(unsigned int cpu)
unsigned long flags;
void **inputarg, **outputarg;
void *mem;
+ int pgcount = hv_root_partition ? 2 : 1;
+ int ret;

local_irq_save(flags);

@@ -171,6 +186,15 @@ int hv_common_cpu_die(unsigned int cpu)

local_irq_restore(flags);

+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_encrypted((unsigned long)mem, pgcount);
+ if (ret)
+ pr_warn("Hyper-V: Failed to encrypt input arg on cpu%d: %d\n",
+ cpu, ret);
+ /* It's unsafe to free 'mem'. */
+ return 0;
+ }
+
kfree(mem);

return 0;
--
2.25.1

2023-04-22 02:21:05

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed()

When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
allocates buffers using vzalloc(), and needs to share the buffers with the
host OS by calling set_memory_decrypted(), which is not working for
vmalloc() yet. Add the support by handling the pages one by one.

Co-developed-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++++++-------------
1 file changed, 52 insertions(+), 24 deletions(-)

Changes in v2:
Changed tdx_enc_status_changed() in place.

Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
how to get the underlying huge page info (if huge page is in use) and
try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
is_vm_area_hugepages() and __vfree() -> __vunmap(), and I think the
underlying page allocation info is internal to the mm code, and there
is no mm API to for me get the info in tdx_enc_status_changed().

Changes in v3:
No change since v2.

Changes in v4:
Added Kirill's Co-developed-by since Kirill helped to improve the
code by adding tdx_enc_status_changed_phys().

Thanks Kirill for the clarification on load_unaligned_zeropad()!

The vzalloc() usage in drivers/net/hyperv/netvsc.c: netvsc_init_buf()
remains the same. It may not worth it to "allocate a vmalloc region,
allocate pages manually", because we have to consider the worst case
where the system is sufferiing from severe memory fragmentation and
we can only allocate multiple single pages. We may not want to
complicate the code in netvsc_init_buf(). We'll support NIC SR-IOV
for TDX VMs on Hyper-V, so the netvsc send/recv buffers won't be
used when the VF NIC is up.

Changes in v5:
Added Kirill's Signed-off-by.
Added Michael's Reviewed-by.

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 5574c91541a2..731be50b3d09 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -7,6 +7,7 @@
#include <linux/cpufeature.h>
#include <linux/export.h>
#include <linux/io.h>
+#include <linux/mm.h>
#include <asm/coco.h>
#include <asm/tdx.h>
#include <asm/vmx.h>
@@ -789,6 +790,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
return true;
}

+static bool try_accept_page(phys_addr_t start, phys_addr_t end)
+{
+ /*
+ * For shared->private conversion, accept the page using
+ * TDX_ACCEPT_PAGE TDX module call.
+ */
+ while (start < end) {
+ unsigned long len = end - start;
+
+ /*
+ * Try larger accepts first. It gives chance to VMM to keep
+ * 1G/2M SEPT entries where possible and speeds up process by
+ * cutting number of hypercalls (if successful).
+ */
+
+ if (try_accept_one(&start, len, PG_LEVEL_1G))
+ continue;
+
+ if (try_accept_one(&start, len, PG_LEVEL_2M))
+ continue;
+
+ if (!try_accept_one(&start, len, PG_LEVEL_4K))
+ return false;
+ }
+
+ return true;
+}
+
/*
* Notify the VMM about page mapping conversion. More info about ABI
* can be found in TDX Guest-Host-Communication Interface (GHCI),
@@ -838,6 +867,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
return !ret;
}

+static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end,
+ bool enc)
+{
+ if (!tdx_map_gpa(start, end, enc))
+ return false;
+
+ /* private->shared conversion requires only MapGPA call */
+ if (!enc)
+ return true;
+
+ return try_accept_page(start, end);
+}
+
/*
* Inform the VMM of the guest's intent for this physical page: shared with
* the VMM or private to the guest. The VMM is expected to change its mapping
@@ -845,37 +887,23 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
*/
static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
{
- phys_addr_t start = __pa(vaddr);
- phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ unsigned long start = vaddr;
+ unsigned long end = start + numpages * PAGE_SIZE;

- if (!tdx_map_gpa(start, end, enc))
+ if (offset_in_page(start) != 0)
return false;

- /* private->shared conversion requires only MapGPA call */
- if (!enc)
- return true;
+ if (!is_vmalloc_addr((void *)start))
+ return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc);

- /*
- * For shared->private conversion, accept the page using
- * TDX_ACCEPT_PAGE TDX module call.
- */
while (start < end) {
- unsigned long len = end - start;
+ phys_addr_t start_pa = slow_virt_to_phys((void *)start);
+ phys_addr_t end_pa = start_pa + PAGE_SIZE;

- /*
- * Try larger accepts first. It gives chance to VMM to keep
- * 1G/2M SEPT entries where possible and speeds up process by
- * cutting number of hypercalls (if successful).
- */
-
- if (try_accept_one(&start, len, PG_LEVEL_1G))
- continue;
-
- if (try_accept_one(&start, len, PG_LEVEL_2M))
- continue;
-
- if (!try_accept_one(&start, len, PG_LEVEL_4K))
+ if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc))
return false;
+
+ start += PAGE_SIZE;
}

return true;
--
2.25.1

2023-04-22 02:21:40

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 5/6] Drivers: hv: vmbus: Support TDX guests

Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
No need to use hv_vp_assist_page.
Don't use the unsafe Hyper-V TSC page.
Don't try to use HV_REGISTER_CRASH_CTL.
Don't trust Hyper-V's TLB-flushing hypercalls.
Don't use lazy EOI.
Share SynIC Event/Message pages and VMBus Monitor pages with the host.
Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().

Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/hyperv/hv_apic.c | 6 ++--
arch/x86/hyperv/hv_init.c | 19 +++++++++---
arch/x86/kernel/cpu/mshyperv.c | 21 ++++++++++++-
drivers/hv/hv.c | 54 ++++++++++++++++++++++++++++++++--
4 files changed, 90 insertions(+), 10 deletions(-)

Changes in v2:
Used a new function hv_set_memory_enc_dec_needed() in
__set_memory_enc_pgtable().
Added the missing set_memory_encrypted() in hv_synic_free().

Changes in v3:
Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
(Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).

Used cc_mkdec() in hv_synic_enable_regs().

ms_hyperv_init_platform():
Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.

Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.

Changes in v4:
A minor rebase to Michael's v7 DDA patchset. I'm very happy that
I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
Michael's work.

Changes in v5:
Added memset() to clear synic_message_page and synic_event_page()
after set_memory_decrypted().
Rebased the patch since "post_msg_page" has been removed in
hyperv-next.
Improved the error handling in hv_synic_alloc()/free() [Michael
Kelley]

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index fb8b2c088681..16919c7b3196 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
(exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
return true;

- if (!hv_hypercall_pg)
+ /* A TDX guest doesn't use hv_hypercall_pg. */
+ if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
return false;

if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
@@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)

trace_hyperv_send_ipi_one(cpu, vector);

- if (!hv_hypercall_pg || (vp == VP_INVAL))
+ /* A TDX guest doesn't use hv_hypercall_pg. */
+ if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
return false;

if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index f175e0de821c..f28357ecad7d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
static int hv_cpu_init(unsigned int cpu)
{
union hv_vp_assist_msr_contents msr = { 0 };
- struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+ struct hv_vp_assist_page **hvp;
int ret;

ret = hv_common_cpu_init(cpu);
@@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
if (!hv_vp_assist_page)
return 0;

+ hvp = &hv_vp_assist_page[cpu];
if (hv_root_partition) {
/*
* For root partition we get the hypervisor provided VP assist
@@ -398,11 +399,21 @@ void __init hyperv_init(void)
if (hv_common_init())
return;

- hv_vp_assist_page = kcalloc(num_possible_cpus(),
- sizeof(*hv_vp_assist_page), GFP_KERNEL);
+ /*
+ * The VP assist page is useless to a TDX guest: the only use we
+ * would have for it is lazy EOI, which can not be used with TDX.
+ */
+ if (hv_isolation_type_tdx())
+ hv_vp_assist_page = NULL;
+ else
+ hv_vp_assist_page = kcalloc(num_possible_cpus(),
+ sizeof(*hv_vp_assist_page),
+ GFP_KERNEL);
if (!hv_vp_assist_page) {
ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
- goto common_free;
+
+ if (!hv_isolation_type_tdx())
+ goto common_free;
}

if (hv_isolation_type_snp()) {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index a87fb934cd4b..e9106c9d92f8 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)

if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
static_branch_enable(&isolation_type_snp);
- else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+ else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
static_branch_enable(&isolation_type_tdx);
+
+ /*
+ * The GPAs of SynIC Event/Message pages and VMBus
+ * Moniter pages need to be added by this offset.
+ */
+ ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+ /* Don't use the unsafe Hyper-V TSC page */
+ ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+ /* HV_REGISTER_CRASH_CTL is unsupported */
+ ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+
+ /* Don't trust Hyper-V's TLB-flushing hypercalls */
+ ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+
+ /* A TDX VM must use x2APIC and doesn't use lazy EOI */
+ ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+ }
}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 4e1407d59ba0..fa7dce26ec67 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -18,6 +18,7 @@
#include <linux/clockchips.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
+#include <linux/set_memory.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
#include "hyperv_vmbus.h"
@@ -116,6 +117,7 @@ int hv_synic_alloc(void)
{
int cpu;
struct hv_per_cpu_context *hv_cpu;
+ int ret = -ENOMEM;

/*
* First, zero all per-cpu memory areas so hv_synic_free() can
@@ -159,6 +161,28 @@ int hv_synic_alloc(void)
goto err;
}
}
+
+ /* It's better to leak the page if the decryption fails. */
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC msg page\n");
+ hv_cpu->synic_message_page = NULL;
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC event page\n");
+ hv_cpu->synic_event_page = NULL;
+ goto err;
+ }
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ }
}

return 0;
@@ -167,18 +191,40 @@ int hv_synic_alloc(void)
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/
- return -ENOMEM;
+ return ret;
}


void hv_synic_free(void)
{
int cpu;
+ int ret;

for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

+ /* It's better to leak the page if the encryption fails. */
+ if (hv_isolation_type_tdx()) {
+ if (hv_cpu->synic_message_page) {
+ ret = set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC msg page\n");
+ hv_cpu->synic_message_page = NULL;
+ }
+ }
+
+ if (hv_cpu->synic_event_page) {
+ ret = set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC event page\n");
+ hv_cpu->synic_event_page = NULL;
+ }
+ }
+ }
+
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
}
@@ -215,7 +261,8 @@ void hv_synic_enable_regs(unsigned int cpu)
if (!hv_cpu->synic_message_page)
pr_err("Fail to map synic message page.\n");
} else {
- simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+ simp.base_simp_gpa =
+ cc_mkdec(virt_to_phys(hv_cpu->synic_message_page))
>> HV_HYP_PAGE_SHIFT;
}

@@ -234,7 +281,8 @@ void hv_synic_enable_regs(unsigned int cpu)
if (!hv_cpu->synic_event_page)
pr_err("Fail to map synic event page.\n");
} else {
- siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+ siefp.base_siefp_gpa =
+ cc_mkdec(virt_to_phys(hv_cpu->synic_event_page))
>> HV_HYP_PAGE_SHIFT;
}

--
2.25.1

2023-04-22 02:22:24

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH v5 6/6] x86/hyperv: Fix serial console interrupts for TDX guests

When a TDX guest runs on Hyper-V, the UEFI firmware sets the HW_REDUCED
flag, and consequently ttyS0 interrupts can't work. Fix the issue by
overriding x86_init.acpi.reduced_hw_early_init().

Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

Changes since v1:
None.

Changes in v5:
Improved the comment [Michael Kelley]
Added Michael's Reviewed-by.

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e9106c9d92f8..942170ea6a5d 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -318,6 +318,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#endif

+/*
+ * When a TDX guest runs on Hyper-V, the firmware sets the HW_REDUCED flag: see
+ * acpi_tb_create_local_fadt(). Consequently ttyS0 interrupts can't work because
+ * request_irq() -> ... -> irq_to_desc() returns NULL for ttyS0. This happens
+ * because mp_config_acpi_legacy_irqs() sees a nr_legacy_irqs() of 0, so it
+ * doesn't initialize the array 'mp_irqs[]', and later setup_IO_APIC_irqs() ->
+ * find_irq_entry() fails to find the legacy irqs from the array, and hence
+ * doesn't create the necessary irq description info.
+ *
+ * Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() here,
+ * except don't change 'legacy_pic'. It keeps its default value
+ * 'default_legacy_pic'. mp_config_acpi_legacy_irqs() sees a non-zero
+ * nr_legacy_irqs(), and eventually serial console interrupts works properly.
+ */
+static void __init reduced_hw_init(void)
+{
+ x86_init.timers.timer_init = x86_init_noop;
+ x86_init.irqs.pre_vector_init = x86_init_noop;
+}
+
static void __init ms_hyperv_init_platform(void)
{
int hv_max_functions_eax;
@@ -425,6 +445,8 @@ static void __init ms_hyperv_init_platform(void)

/* A TDX VM must use x2APIC and doesn't use lazy EOI */
ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+
+ x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
}
}

--
2.25.1

2023-05-01 17:36:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 5/6] Drivers: hv: vmbus: Support TDX guests

From: Dexuan Cui <[email protected]> Sent: Friday, April 21, 2023 7:18 PM
>
> Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
> No need to use hv_vp_assist_page.
> Don't use the unsafe Hyper-V TSC page.
> Don't try to use HV_REGISTER_CRASH_CTL.
> Don't trust Hyper-V's TLB-flushing hypercalls.
> Don't use lazy EOI.
> Share SynIC Event/Message pages and VMBus Monitor pages with the host.

This patch no longer does anything with the VMBus monitor pages.

> Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().

The above line in the commit message is stale and can be dropped.

>
> Signed-off-by: Dexuan Cui <[email protected]>
> ---
> arch/x86/hyperv/hv_apic.c | 6 ++--
> arch/x86/hyperv/hv_init.c | 19 +++++++++---
> arch/x86/kernel/cpu/mshyperv.c | 21 ++++++++++++-
> drivers/hv/hv.c | 54 ++++++++++++++++++++++++++++++++--
> 4 files changed, 90 insertions(+), 10 deletions(-)
>
> Changes in v2:
> Used a new function hv_set_memory_enc_dec_needed() in
> __set_memory_enc_pgtable().
> Added the missing set_memory_encrypted() in hv_synic_free().
>
> Changes in v3:
> Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
> (Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).
>
> Used cc_mkdec() in hv_synic_enable_regs().
>
> ms_hyperv_init_platform():
> Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
> Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.
>
> Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.
>
> Changes in v4:
> A minor rebase to Michael's v7 DDA patchset. I'm very happy that
> I can drop my v3 change to arch/x86/mm/pat/set_memory.c due to
> Michael's work.
>
> Changes in v5:
> Added memset() to clear synic_message_page and synic_event_page()
> after set_memory_decrypted().
> Rebased the patch since "post_msg_page" has been removed in
> hyperv-next.
> Improved the error handling in hv_synic_alloc()/free() [Michael
> Kelley]
>
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index fb8b2c088681..16919c7b3196 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int
> vector,
> (exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
> return true;
>
> - if (!hv_hypercall_pg)
> + /* A TDX guest doesn't use hv_hypercall_pg. */
> + if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
> return false;
>
> if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> @@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)
>
> trace_hyperv_send_ipi_one(cpu, vector);
>
> - if (!hv_hypercall_pg || (vp == VP_INVAL))
> + /* A TDX guest doesn't use hv_hypercall_pg. */
> + if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
> return false;
>
> if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f175e0de821c..f28357ecad7d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -79,7 +79,7 @@ static int hyperv_init_ghcb(void)
> static int hv_cpu_init(unsigned int cpu)
> {
> union hv_vp_assist_msr_contents msr = { 0 };
> - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> + struct hv_vp_assist_page **hvp;
> int ret;
>
> ret = hv_common_cpu_init(cpu);
> @@ -89,6 +89,7 @@ static int hv_cpu_init(unsigned int cpu)
> if (!hv_vp_assist_page)
> return 0;
>
> + hvp = &hv_vp_assist_page[cpu];
> if (hv_root_partition) {
> /*
> * For root partition we get the hypervisor provided VP assist
> @@ -398,11 +399,21 @@ void __init hyperv_init(void)
> if (hv_common_init())
> return;
>
> - hv_vp_assist_page = kcalloc(num_possible_cpus(),
> - sizeof(*hv_vp_assist_page), GFP_KERNEL);
> + /*
> + * The VP assist page is useless to a TDX guest: the only use we
> + * would have for it is lazy EOI, which can not be used with TDX.
> + */
> + if (hv_isolation_type_tdx())
> + hv_vp_assist_page = NULL;
> + else
> + hv_vp_assist_page = kcalloc(num_possible_cpus(),
> + sizeof(*hv_vp_assist_page),
> + GFP_KERNEL);
> if (!hv_vp_assist_page) {
> ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> - goto common_free;
> +
> + if (!hv_isolation_type_tdx())
> + goto common_free;
> }
>
> if (hv_isolation_type_snp()) {
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index a87fb934cd4b..e9106c9d92f8 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -405,8 +405,27 @@ static void __init ms_hyperv_init_platform(void)
>
> if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> static_branch_enable(&isolation_type_snp);
> - else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
> + else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
> static_branch_enable(&isolation_type_tdx);
> +
> + /*
> + * The GPAs of SynIC Event/Message pages and VMBus
> + * Moniter pages need to be added by this offset.
> + */
> + ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
> +
> + /* Don't use the unsafe Hyper-V TSC page */
> + ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
> +
> + /* HV_REGISTER_CRASH_CTL is unsupported */
> + ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> +
> + /* Don't trust Hyper-V's TLB-flushing hypercalls */
> + ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
> +
> + /* A TDX VM must use x2APIC and doesn't use lazy EOI */
> + ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> + }
> }
>
> if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 4e1407d59ba0..fa7dce26ec67 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -18,6 +18,7 @@
> #include <linux/clockchips.h>
> #include <linux/delay.h>
> #include <linux/interrupt.h>
> +#include <linux/set_memory.h>
> #include <clocksource/hyperv_timer.h>
> #include <asm/mshyperv.h>
> #include "hyperv_vmbus.h"
> @@ -116,6 +117,7 @@ int hv_synic_alloc(void)
> {
> int cpu;
> struct hv_per_cpu_context *hv_cpu;
> + int ret = -ENOMEM;
>
> /*
> * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -159,6 +161,28 @@ int hv_synic_alloc(void)
> goto err;
> }
> }
> +
> + /* It's better to leak the page if the decryption fails. */
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->synic_message_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt SYNIC msg page\n");
> + hv_cpu->synic_message_page = NULL;
> + goto err;
> + }
> +
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->synic_event_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt SYNIC event page\n");
> + hv_cpu->synic_event_page = NULL;
> + goto err;
> + }

The error handling still doesn't work quite correctly. In the TDX case, upon
exiting this function, the synic_message_page and the synic_event_page must
each either be mapped decrypted or be NULL. This requirement is so
that hv_synic_free() will do the right thing in changing the mapping back to
encrypted. hv_synic_free() can't handle a non-NULL page being encrypted.

In the above code, if we fail to decrypt the synic_message_page, then setting
it to NULL will leak the page (which we'll live with) and ensures that hv_synic_free()
will handle it correctly. But at that point we'll exit with synic_event_page
non-NULL and in the encrypted state, which hv_synic_free() can't handle.

Michael

> +
> + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> + }
> }
>
> return 0;
> @@ -167,18 +191,40 @@ int hv_synic_alloc(void)
> * Any memory allocations that succeeded will be freed when
> * the caller cleans up by calling hv_synic_free()
> */
> - return -ENOMEM;
> + return ret;
> }
>
>
> void hv_synic_free(void)
> {
> int cpu;
> + int ret;
>
> for_each_present_cpu(cpu) {
> struct hv_per_cpu_context *hv_cpu
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> + /* It's better to leak the page if the encryption fails. */
> + if (hv_isolation_type_tdx()) {
> + if (hv_cpu->synic_message_page) {
> + ret = set_memory_encrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt SYNIC msg page\n");
> + hv_cpu->synic_message_page = NULL;
> + }
> + }
> +
> + if (hv_cpu->synic_event_page) {
> + ret = set_memory_encrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt SYNIC event page\n");
> + hv_cpu->synic_event_page = NULL;
> + }
> + }
> + }
> +
> free_page((unsigned long)hv_cpu->synic_event_page);
> free_page((unsigned long)hv_cpu->synic_message_page);
> }
> @@ -215,7 +261,8 @@ void hv_synic_enable_regs(unsigned int cpu)
> if (!hv_cpu->synic_message_page)
> pr_err("Fail to map synic message page.\n");
> } else {
> - simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
> + simp.base_simp_gpa =
> + cc_mkdec(virt_to_phys(hv_cpu->synic_message_page))
> >> HV_HYP_PAGE_SHIFT;
> }
>
> @@ -234,7 +281,8 @@ void hv_synic_enable_regs(unsigned int cpu)
> if (!hv_cpu->synic_event_page)
> pr_err("Fail to map synic event page.\n");
> } else {
> - siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
> + siefp.base_siefp_gpa =
> + cc_mkdec(virt_to_phys(hv_cpu->synic_event_page))
> >> HV_HYP_PAGE_SHIFT;
> }
>
> --
> 2.25.1

2023-05-02 01:49:10

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v5 5/6] Drivers: hv: vmbus: Support TDX guests

> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Monday, May 1, 2023 10:33 AM
> ...
> From: Dexuan Cui
> >
> > Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
> > No need to use hv_vp_assist_page.
> > Don't use the unsafe Hyper-V TSC page.
> > Don't try to use HV_REGISTER_CRASH_CTL.
> > Don't trust Hyper-V's TLB-flushing hypercalls.
> > Don't use lazy EOI.
> > Share SynIC Event/Message pages and VMBus Monitor pages with the
> > host.
>
> This patch no longer does anything with the VMBus monitor pages.
Sorry, I forgot to update the commit log. Will drop this from the log.

> > Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
>
> The above line in the commit message is stale and can be dropped.
Will drop this from the commit log.

> > @@ -116,6 +117,7 @@ int hv_synic_alloc(void)
> > {
> > int cpu;
> > struct hv_per_cpu_context *hv_cpu;
> > + int ret = -ENOMEM;
> >
> > /*
> > * First, zero all per-cpu memory areas so hv_synic_free() can
> > @@ -159,6 +161,28 @@ int hv_synic_alloc(void)
> > goto err;
> > }
> > }
> > +
> > + /* It's better to leak the page if the decryption fails. */
> > + if (hv_isolation_type_tdx()) {
> > + ret = set_memory_decrypted(
> > + (unsigned long)hv_cpu->synic_message_page, 1);
> > + if (ret) {
> > + pr_err("Failed to decrypt SYNIC msg page\n");
> > + hv_cpu->synic_message_page = NULL;
> > + goto err;
> > + }
> > +
> > + ret = set_memory_decrypted(
> > + (unsigned long)hv_cpu->synic_event_page, 1);
> > + if (ret) {
> > + pr_err("Failed to decrypt SYNIC event page\n");
> > + hv_cpu->synic_event_page = NULL;
> > + goto err;
> > + }
>
> The error handling still doesn't work quite correctly. In the TDX case, upon
> exiting this function, the synic_message_page and the synic_event_page
> must
> each either be mapped decrypted or be NULL. This requirement is so
> that hv_synic_free() will do the right thing in changing the mapping back to
> encrypted. hv_synic_free() can't handle a non-NULL page being encrypted.
>
> In the above code, if we fail to decrypt the synic_message_page, then setting
> it to NULL will leak the page (which we'll live with) and ensures that
> hv_synic_free()
> will handle it correctly. But at that point we'll exit with synic_event_page
> non-NULL and in the encrypted state, which hv_synic_free() can't handle.
>
> Michael

Thanks for spotting the issue!
I think the below extra changes should do the job:

@@ -121,91 +121,102 @@ int hv_synic_alloc(void)

/*
* First, zero all per-cpu memory areas so hv_synic_free() can
* detect what memory has been allocated and cleanup properly
* after any failures.
*/
for_each_present_cpu(cpu) {
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
memset(hv_cpu, 0, sizeof(*hv_cpu));
}

hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
GFP_KERNEL);
if (hv_context.hv_numa_map == NULL) {
pr_err("Unable to allocate NUMA map\n");
goto err;
}

for_each_present_cpu(cpu) {
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);

tasklet_init(&hv_cpu->msg_dpc,
vmbus_on_msg_dpc, (unsigned long) hv_cpu);

/*
* Synic message and event pages are allocated by paravisor.
* Skip these pages allocation here.
*/
if (!hv_isolation_type_snp() && !hv_root_partition) {
hv_cpu->synic_message_page =
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_message_page == NULL) {
pr_err("Unable to allocate SYNIC message page\n");
goto err;
}

hv_cpu->synic_event_page =
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_event_page == NULL) {
pr_err("Unable to allocate SYNIC event page\n");
+
+ free_page((unsigned long)hv_cpu->synic_message_page);
+ hv_cpu->synic_message_page = NULL;
+
goto err;
}
}

/* It's better to leak the page if the decryption fails. */
if (hv_isolation_type_tdx()) {
ret = set_memory_decrypted(
(unsigned long)hv_cpu->synic_message_page, 1);
if (ret) {
pr_err("Failed to decrypt SYNIC msg page\n");
hv_cpu->synic_message_page = NULL;
+
+ /*
+ * Free the event page so that a TDX VM won't
+ * try to encrypt the page in hv_synic_free().
+ */
+ free_page((unsigned long)hv_cpu->synic_event_page);
+ hv_cpu->synic_event_page = NULL;
goto err;
}

ret = set_memory_decrypted(
(unsigned long)hv_cpu->synic_event_page, 1);
if (ret) {
pr_err("Failed to decrypt SYNIC event page\n");
hv_cpu->synic_event_page = NULL;
goto err;
}

memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
}
}

return 0;
err:
/*
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/
return ret;
}

I'm going to use the below (i.e. v5 + the above extra changes) in v6.
Please let me know if there is still any bug.

@@ -116,6 +117,7 @@ int hv_synic_alloc(void)
{
int cpu;
struct hv_per_cpu_context *hv_cpu;
+ int ret = -ENOMEM;

/*
* First, zero all per-cpu memory areas so hv_synic_free() can
@@ -156,9 +158,42 @@ int hv_synic_alloc(void)
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_event_page == NULL) {
pr_err("Unable to allocate SYNIC event page\n");
+
+ free_page((unsigned long)hv_cpu->synic_message_page);
+ hv_cpu->synic_message_page = NULL;
+
goto err;
}
}
+
+ /* It's better to leak the page if the decryption fails. */
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC msg page\n");
+ hv_cpu->synic_message_page = NULL;
+
+ /*
+ * Free the event page so that a TDX VM won't
+ * try to encrypt the page in hv_synic_free().
+ */
+ free_page((unsigned long)hv_cpu->synic_event_page);
+ hv_cpu->synic_event_page = NULL;
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC event page\n");
+ hv_cpu->synic_event_page = NULL;
+ goto err;
+ }
+
+ memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+ memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+ }
}

return 0;
@@ -167,18 +202,40 @@ int hv_synic_alloc(void)
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/
- return -ENOMEM;
+ return ret;
}


void hv_synic_free(void)
{
int cpu;
+ int ret;

for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);

+ /* It's better to leak the page if the encryption fails. */
+ if (hv_isolation_type_tdx()) {
+ if (hv_cpu->synic_message_page) {
+ ret = set_memory_encrypted((unsigned long)
+ hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC msg page\n");
+ hv_cpu->synic_message_page = NULL;
+ }
+ }
+
+ if (hv_cpu->synic_event_page) {
+ ret = set_memory_encrypted((unsigned long)
+ hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC event page\n");
+ hv_cpu->synic_event_page = NULL;
+ }
+ }
+ }
+
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
}


I'll post a separate patch (currently if hv_synic_alloc() --> get_zeroed_page() fails,
hv_context.hv_numa_map is leaked):


--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1515,27 +1515,27 @@ static int vmbus_bus_init(void)
}

ret = hv_synic_alloc();
if (ret)
goto err_alloc;

/*
* Initialize the per-cpu interrupt state and stimer state.
* Then connect to the host.
*/
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
hv_synic_init, hv_synic_cleanup);
if (ret < 0)
- goto err_cpuhp;
+ goto err_alloc;
hyperv_cpuhp_online = ret;

ret = vmbus_connect();
if (ret)
goto err_connect;

if (hv_is_isolation_supported())
sysctl_record_panic_msg = 0;

/*
* Only register if the crash MSRs are available
*/
if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
@@ -1567,29 +1567,28 @@ static int vmbus_bus_init(void)
/*
* Always register the vmbus unload panic notifier because we
* need to shut the VMbus channel connection on panic.
*/
atomic_notifier_chain_register(&panic_notifier_list,
&hyperv_panic_vmbus_unload_block);

vmbus_request_offers();

return 0;

err_connect:
cpuhp_remove_state(hyperv_cpuhp_online);
-err_cpuhp:
- hv_synic_free();
err_alloc:
+ hv_synic_free();
if (vmbus_irq == -1) {
hv_remove_vmbus_handler();
} else {
free_percpu_irq(vmbus_irq, vmbus_evt);
free_percpu(vmbus_evt);
}
err_setup:
bus_unregister(&hv_bus);
unregister_sysctl_table(hv_ctl_table_hdr);
hv_ctl_table_hdr = NULL;
return ret;
}

2023-05-02 15:30:31

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 5/6] Drivers: hv: vmbus: Support TDX guests

From: Dexuan Cui <[email protected]> Sent: Monday, May 1, 2023 6:34 PM
>
> > From: Michael Kelley (LINUX) <[email protected]>
> > Sent: Monday, May 1, 2023 10:33 AM
> > ...
> > From: Dexuan Cui
> > >
> > > Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
> > > No need to use hv_vp_assist_page.
> > > Don't use the unsafe Hyper-V TSC page.
> > > Don't try to use HV_REGISTER_CRASH_CTL.
> > > Don't trust Hyper-V's TLB-flushing hypercalls.
> > > Don't use lazy EOI.
> > > Share SynIC Event/Message pages and VMBus Monitor pages with the
> > > host.
> >
> > This patch no longer does anything with the VMBus monitor pages.
> Sorry, I forgot to update the commit log. Will drop this from the log.
>
> > > Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
> >
> > The above line in the commit message is stale and can be dropped.
> Will drop this from the commit log.
>
> > > @@ -116,6 +117,7 @@ int hv_synic_alloc(void)
> > > {
> > > int cpu;
> > > struct hv_per_cpu_context *hv_cpu;
> > > + int ret = -ENOMEM;
> > >
> > > /*
> > > * First, zero all per-cpu memory areas so hv_synic_free() can
> > > @@ -159,6 +161,28 @@ int hv_synic_alloc(void)
> > > goto err;
> > > }
> > > }
> > > +
> > > + /* It's better to leak the page if the decryption fails. */
> > > + if (hv_isolation_type_tdx()) {
> > > + ret = set_memory_decrypted(
> > > + (unsigned long)hv_cpu->synic_message_page, 1);
> > > + if (ret) {
> > > + pr_err("Failed to decrypt SYNIC msg page\n");
> > > + hv_cpu->synic_message_page = NULL;
> > > + goto err;
> > > + }
> > > +
> > > + ret = set_memory_decrypted(
> > > + (unsigned long)hv_cpu->synic_event_page, 1);
> > > + if (ret) {
> > > + pr_err("Failed to decrypt SYNIC event page\n");
> > > + hv_cpu->synic_event_page = NULL;
> > > + goto err;
> > > + }
> >
> > The error handling still doesn't work quite correctly. In the TDX case, upon
> > exiting this function, the synic_message_page and the synic_event_page
> > must
> > each either be mapped decrypted or be NULL. This requirement is so
> > that hv_synic_free() will do the right thing in changing the mapping back to
> > encrypted. hv_synic_free() can't handle a non-NULL page being encrypted.
> >
> > In the above code, if we fail to decrypt the synic_message_page, then setting
> > it to NULL will leak the page (which we'll live with) and ensures that
> > hv_synic_free()
> > will handle it correctly. But at that point we'll exit with synic_event_page
> > non-NULL and in the encrypted state, which hv_synic_free() can't handle.
> >
> > Michael
>
> Thanks for spotting the issue!
> I think the below extra changes should do the job:
>
> @@ -121,91 +121,102 @@ int hv_synic_alloc(void)
>
> /*
> * First, zero all per-cpu memory areas so hv_synic_free() can
> * detect what memory has been allocated and cleanup properly
> * after any failures.
> */
> for_each_present_cpu(cpu) {
> hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
> memset(hv_cpu, 0, sizeof(*hv_cpu));
> }
>
> hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
> GFP_KERNEL);
> if (hv_context.hv_numa_map == NULL) {
> pr_err("Unable to allocate NUMA map\n");
> goto err;
> }
>
> for_each_present_cpu(cpu) {
> hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> tasklet_init(&hv_cpu->msg_dpc,
> vmbus_on_msg_dpc, (unsigned long) hv_cpu);
>
> /*
> * Synic message and event pages are allocated by paravisor.
> * Skip these pages allocation here.
> */
> if (!hv_isolation_type_snp() && !hv_root_partition) {
> hv_cpu->synic_message_page =
> (void *)get_zeroed_page(GFP_ATOMIC);
> if (hv_cpu->synic_message_page == NULL) {
> pr_err("Unable to allocate SYNIC message page\n");
> goto err;
> }
>
> hv_cpu->synic_event_page =
> (void *)get_zeroed_page(GFP_ATOMIC);
> if (hv_cpu->synic_event_page == NULL) {
> pr_err("Unable to allocate SYNIC event page\n");
> +
> + free_page((unsigned long)hv_cpu->synic_message_page);
> + hv_cpu->synic_message_page = NULL;
> +
> goto err;
> }
> }
>
> /* It's better to leak the page if the decryption fails. */
> if (hv_isolation_type_tdx()) {
> ret = set_memory_decrypted(
> (unsigned long)hv_cpu->synic_message_page, 1);
> if (ret) {
> pr_err("Failed to decrypt SYNIC msg page\n");
> hv_cpu->synic_message_page = NULL;
> +
> + /*
> + * Free the event page so that a TDX VM won't
> + * try to encrypt the page in hv_synic_free().
> + */
> + free_page((unsigned long)hv_cpu->synic_event_page);
> + hv_cpu->synic_event_page = NULL;
> goto err;
> }
>
> ret = set_memory_decrypted(
> (unsigned long)hv_cpu->synic_event_page, 1);
> if (ret) {
> pr_err("Failed to decrypt SYNIC event page\n");
> hv_cpu->synic_event_page = NULL;
> goto err;
> }
>
> memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> }
> }
>
> return 0;
> err:
> /*
> * Any memory allocations that succeeded will be freed when
> * the caller cleans up by calling hv_synic_free()
> */
> return ret;
> }
>
> I'm going to use the below (i.e. v5 + the above extra changes) in v6.
> Please let me know if there is still any bug.
>
> @@ -116,6 +117,7 @@ int hv_synic_alloc(void)
> {
> int cpu;
> struct hv_per_cpu_context *hv_cpu;
> + int ret = -ENOMEM;
>
> /*
> * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -156,9 +158,42 @@ int hv_synic_alloc(void)
> (void *)get_zeroed_page(GFP_ATOMIC);
> if (hv_cpu->synic_event_page == NULL) {
> pr_err("Unable to allocate SYNIC event page\n");
> +
> + free_page((unsigned long)hv_cpu->synic_message_page);
> + hv_cpu->synic_message_page = NULL;
> +
> goto err;
> }
> }
> +
> + /* It's better to leak the page if the decryption fails. */
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->synic_message_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt SYNIC msg page\n");
> + hv_cpu->synic_message_page = NULL;
> +
> + /*
> + * Free the event page so that a TDX VM won't
> + * try to encrypt the page in hv_synic_free().
> + */
> + free_page((unsigned long)hv_cpu->synic_event_page);
> + hv_cpu->synic_event_page = NULL;
> + goto err;
> + }
> +
> + ret = set_memory_decrypted(
> + (unsigned long)hv_cpu->synic_event_page, 1);
> + if (ret) {
> + pr_err("Failed to decrypt SYNIC event page\n");
> + hv_cpu->synic_event_page = NULL;
> + goto err;
> + }
> +
> + memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> + memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> + }

Yes, this looks good to me. A minor point: In the two calls to set decrypted,
if there is a failure, output the value of "ret" in the error message. It should
never happen, but if it did, it could be hard to diagnose, and we'll want all
the info we can get about the failure. And do the same in hv_synic_free()
if setting back to encrypted should fail.

Michael

> }
>
> return 0;
> @@ -167,18 +202,40 @@ int hv_synic_alloc(void)
> * Any memory allocations that succeeded will be freed when
> * the caller cleans up by calling hv_synic_free()
> */
> - return -ENOMEM;
> + return ret;
> }
>
>
> void hv_synic_free(void)
> {
> int cpu;
> + int ret;
>
> for_each_present_cpu(cpu) {
> struct hv_per_cpu_context *hv_cpu
> = per_cpu_ptr(hv_context.cpu_context, cpu);
>
> + /* It's better to leak the page if the encryption fails. */
> + if (hv_isolation_type_tdx()) {
> + if (hv_cpu->synic_message_page) {
> + ret = set_memory_encrypted((unsigned long)
> + hv_cpu->synic_message_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt SYNIC msg page\n");
> + hv_cpu->synic_message_page = NULL;
> + }
> + }
> +
> + if (hv_cpu->synic_event_page) {
> + ret = set_memory_encrypted((unsigned long)
> + hv_cpu->synic_event_page, 1);
> + if (ret) {
> + pr_err("Failed to encrypt SYNIC event page\n");
> + hv_cpu->synic_event_page = NULL;
> + }
> + }
> + }
> +
> free_page((unsigned long)hv_cpu->synic_event_page);
> free_page((unsigned long)hv_cpu->synic_message_page);
> }
>
>
> I'll post a separate patch (currently if hv_synic_alloc() --> get_zeroed_page() fails,
> hv_context.hv_numa_map is leaked):
>
>
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1515,27 +1515,27 @@ static int vmbus_bus_init(void)
> }
>
> ret = hv_synic_alloc();
> if (ret)
> goto err_alloc;
>
> /*
> * Initialize the per-cpu interrupt state and stimer state.
> * Then connect to the host.
> */
> ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online",
> hv_synic_init, hv_synic_cleanup);
> if (ret < 0)
> - goto err_cpuhp;
> + goto err_alloc;
> hyperv_cpuhp_online = ret;
>
> ret = vmbus_connect();
> if (ret)
> goto err_connect;
>
> if (hv_is_isolation_supported())
> sysctl_record_panic_msg = 0;
>
> /*
> * Only register if the crash MSRs are available
> */
> if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
> @@ -1567,29 +1567,28 @@ static int vmbus_bus_init(void)
> /*
> * Always register the vmbus unload panic notifier because we
> * need to shut the VMbus channel connection on panic.
> */
> atomic_notifier_chain_register(&panic_notifier_list,
> &hyperv_panic_vmbus_unload_block);
>
> vmbus_request_offers();
>
> return 0;
>
> err_connect:
> cpuhp_remove_state(hyperv_cpuhp_online);
> -err_cpuhp:
> - hv_synic_free();
> err_alloc:
> + hv_synic_free();
> if (vmbus_irq == -1) {
> hv_remove_vmbus_handler();
> } else {
> free_percpu_irq(vmbus_irq, vmbus_evt);
> free_percpu(vmbus_evt);
> }
> err_setup:
> bus_unregister(&hv_bus);
> unregister_sysctl_table(hv_ctl_table_hdr);
> hv_ctl_table_hdr = NULL;
> return ret;
> }

2023-05-02 19:26:20

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH v5 5/6] Drivers: hv: vmbus: Support TDX guests

> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Tuesday, May 2, 2023 8:26 AM
> ...
> Yes, this looks good to me. A minor point: In the two calls to set
Thanks for the confirmation!

> decrypted,
> if there is a failure, output the value of "ret" in the error message. It should
> never happen, but if it did, it could be hard to diagnose, and we'll want all
> the info we can get about the failure. And do the same in hv_synic_free()
> if setting back to encrypted should fail.
>
> Michael

Will do in v6.