The patchset adds the Hyper-V specific code so that a TDX guest can run
on Hyper-V. Please review. Thanks!
FYI, v1 and v2 are here:
https://lwn.net/ml/linux-kernel/[email protected]/
https://lwn.net/ml/linux-kernel/[email protected]/
This v3 pathset is based on tip.git's x86/tdx branch:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/tdx
(The x86/tdx branch now has Kirill's patch "x86/tdx: Expand __tdx_hypercall() to handle more arguments")
If you want to view the patches on github, it is in this branch:
https://github.com/dcui/tdx/commits/decui/upstream-tip/x86/tdx/2023-0205
This v3 patchset can also apply cleanly to:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next
Thanks,
Dexuan
Dexuan Cui (6):
x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
x86/tdx: Support vmalloc() for tdx_enc_status_changed()
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Support hypercalls for TDX guests
Drivers: hv: vmbus: Support TDX guests
x86/hyperv: Fix serial console interrupts for TDX guests
arch/x86/coco/tdx/tdx.c | 113 ++++++++++++++++++++++-------
arch/x86/hyperv/hv_apic.c | 6 +-
arch/x86/hyperv/hv_init.c | 27 ++++++-
arch/x86/hyperv/ivm.c | 28 +++++++
arch/x86/include/asm/hyperv-tlfs.h | 3 +-
arch/x86/include/asm/mshyperv.h | 20 +++++
arch/x86/kernel/cpu/mshyperv.c | 50 ++++++++++++-
arch/x86/mm/pat/set_memory.c | 2 +-
drivers/hv/connection.c | 4 +-
drivers/hv/hv.c | 62 ++++++++++++++--
drivers/hv/hv_common.c | 33 +++++++++
drivers/hv/ring_buffer.c | 2 +-
include/asm-generic/mshyperv.h | 2 +
13 files changed, 309 insertions(+), 43 deletions(-)
--
2.25.1
GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
operation for the pages in the region starting at the GPA specified
in R11.
When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
when hyperv_init() -> swiotlb_update_mem_attributes() ->
set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.
Signed-off-by: Dexuan Cui <[email protected]>
---
Changes in v2:
Used __tdx_hypercall() directly in tdx_map_gpa().
Added a max_retry_cnt of 1000.
Renamed a few variables, e.g., r11 -> map_fail_paddr.
Changes in v3:
Changed max_retry_cnt from 1000 to 3.
arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++--------
1 file changed, 52 insertions(+), 12 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 583ac2f1a5fe..6e2665c07395 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -28,6 +28,8 @@
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
+#define TDVMCALL_STATUS_RETRY 1
+
/* MMIO direction */
#define EPT_READ 0
#define EPT_WRITE 1
@@ -799,14 +801,15 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
}
/*
- * Inform the VMM of the guest's intent for this physical page: shared with
- * the VMM or private to the guest. The VMM is expected to change its mapping
- * of the page in response.
+ * Notify the VMM about page mapping conversion. More info about ABI
+ * can be found in TDX Guest-Host-Communication Interface (GHCI),
+ * section "TDG.VP.VMCALL<MapGPA>".
*/
-static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
{
- phys_addr_t start = __pa(vaddr);
- phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ int max_retry_cnt = 3, retry_cnt = 0;
+ struct tdx_hypercall_args args;
+ u64 map_fail_paddr, ret;
if (!enc) {
/* Set the shared (decrypted) bits: */
@@ -814,12 +817,49 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
end |= cc_mkdec(0);
}
- /*
- * Notify the VMM about page mapping conversion. More info about ABI
- * can be found in TDX Guest-Host-Communication Interface (GHCI),
- * section "TDG.VP.VMCALL<MapGPA>"
- */
- if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
+ while (1) {
+ memset(&args, 0, sizeof(args));
+ args.r10 = TDX_HYPERCALL_STANDARD;
+ args.r11 = TDVMCALL_MAP_GPA;
+ args.r12 = start;
+ args.r13 = end - start;
+
+ ret = __tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
+ if (ret != TDVMCALL_STATUS_RETRY)
+ break;
+ /*
+ * The guest must retry the operation for the pages in the
+ * region starting at the GPA specified in R11. Make sure R11
+ * contains a sane value.
+ */
+ map_fail_paddr = args.r11;
+ if (map_fail_paddr < start || map_fail_paddr >= end)
+ return false;
+
+ if (map_fail_paddr == start) {
+ retry_cnt++;
+ if (retry_cnt > max_retry_cnt)
+ return false;
+ } else {
+ retry_cnt = 0;
+ start = map_fail_paddr;
+ }
+ }
+
+ return !ret;
+}
+
+/*
+ * Inform the VMM of the guest's intent for this physical page: shared with
+ * the VMM or private to the guest. The VMM is expected to change its mapping
+ * of the page in response.
+ */
+static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+{
+ phys_addr_t start = __pa(vaddr);
+ phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+
+ if (!tdx_map_gpa(start, end, enc))
return false;
/* private->shared conversion requires only MapGPA call */
--
2.25.1
When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
allocates buffers using vzalloc(), and needs to share the buffers with the
host OS by calling set_memory_decrypted(), which is not working for
vmalloc() yet. Add the support by handling the pages one by one.
Signed-off-by: Dexuan Cui <[email protected]>
---
Changes in v2:
Changed tdx_enc_status_changed() in place.
Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
how to get the underlying huge page info (if huge page is in use) and
try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
is_vm_area_hugepages() and __vfree() -> __vunmap(), and I think the
underlying page allocation info is internal to the mm code, and there
is no mm API to for me get the info in tdx_enc_status_changed().
Hi, Kirill, the load_unaligned_zeropad() issue is not addressed in
this patch. The issue looks like a generic issue that also happens to
AMD SNP vTOM mode and C-bit mode. Will need to figure out how to
address the issue. If we decide to adjust direct mapping to have the
shared bit set, it lools like we need to do the below for each
'start_va' vmalloc page:
pa = slow_virt_to_phys(start_va);
set_memory_decrypted(phys_to_virt(pa), 1); -- this line calls
tdx_enc_status_changed() the second time for the same age, which is not
great. It looks like we need to find a way to reuse the cpa_flush()
related code in __set_memory_enc_pgtable() and make sure we call
tdx_enc_status_changed() only once for the same page from vmalloc()?
Changes in v3:
No change since v2.
arch/x86/coco/tdx/tdx.c | 69 ++++++++++++++++++++++++++---------------
1 file changed, 44 insertions(+), 25 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 6e2665c07395..2cad4b8c4dc4 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -7,6 +7,7 @@
#include <linux/cpufeature.h>
#include <linux/export.h>
#include <linux/io.h>
+#include <linux/mm.h>
#include <asm/coco.h>
#include <asm/tdx.h>
#include <asm/vmx.h>
@@ -800,6 +801,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
return true;
}
+static bool try_accept_page(phys_addr_t start, phys_addr_t end)
+{
+ /*
+ * For shared->private conversion, accept the page using
+ * TDX_ACCEPT_PAGE TDX module call.
+ */
+ while (start < end) {
+ unsigned long len = end - start;
+
+ /*
+ * Try larger accepts first. It gives chance to VMM to keep
+ * 1G/2M SEPT entries where possible and speeds up process by
+ * cutting number of hypercalls (if successful).
+ */
+
+ if (try_accept_one(&start, len, PG_LEVEL_1G))
+ continue;
+
+ if (try_accept_one(&start, len, PG_LEVEL_2M))
+ continue;
+
+ if (!try_accept_one(&start, len, PG_LEVEL_4K))
+ return false;
+ }
+
+ return true;
+}
+
/*
* Notify the VMM about page mapping conversion. More info about ABI
* can be found in TDX Guest-Host-Communication Interface (GHCI),
@@ -856,37 +885,27 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
*/
static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
{
- phys_addr_t start = __pa(vaddr);
- phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ bool is_vmalloc = is_vmalloc_addr((void *)vaddr);
+ unsigned long len = numpages * PAGE_SIZE;
+ void *start_va = (void *)vaddr, *end_va = start_va + len;
+ phys_addr_t start_pa, end_pa;
- if (!tdx_map_gpa(start, end, enc))
+ if (offset_in_page(start_va) != 0)
return false;
- /* private->shared conversion requires only MapGPA call */
- if (!enc)
- return true;
-
- /*
- * For shared->private conversion, accept the page using
- * TDX_ACCEPT_PAGE TDX module call.
- */
- while (start < end) {
- unsigned long len = end - start;
-
- /*
- * Try larger accepts first. It gives chance to VMM to keep
- * 1G/2M SEPT entries where possible and speeds up process by
- * cutting number of hypercalls (if successful).
- */
-
- if (try_accept_one(&start, len, PG_LEVEL_1G))
- continue;
+ while (start_va < end_va) {
+ start_pa = is_vmalloc ? slow_virt_to_phys(start_va) :
+ __pa(start_va);
+ end_pa = start_pa + (is_vmalloc ? PAGE_SIZE : len);
- if (try_accept_one(&start, len, PG_LEVEL_2M))
- continue;
+ if (!tdx_map_gpa(start_pa, end_pa, enc))
+ return false;
- if (!try_accept_one(&start, len, PG_LEVEL_4K))
+ /* private->shared conversion requires only MapGPA call */
+ if (enc && !try_accept_page(start_pa, end_pa))
return false;
+
+ start_va += is_vmalloc ? PAGE_SIZE : len;
}
return true;
--
2.25.1
No logic change to SNP/VBS guests.
hv_isolation_type_tdx() wil be used to instruct a TDX guest on Hyper-V to
do some TDX-specific operations, e.g. hv_do_hypercall() should use
__tdx_hypercall(), and a TDX guest on Hyper-V should handle the Hyper-V
Event/Message/Monitor pages specially.
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
Changes in v2:
Added "#ifdef CONFIG_INTEL_TDX_GUEST and #endif" for
hv_isolation_type_tdx() in arch/x86/hyperv/ivm.c.
Simplified the changes in ms_hyperv_init_platform().
Changes in v3:
Added Kuppuswamy's Reviewed-by.
arch/x86/hyperv/ivm.c | 9 +++++++++
arch/x86/include/asm/hyperv-tlfs.h | 3 ++-
arch/x86/include/asm/mshyperv.h | 3 +++
arch/x86/kernel/cpu/mshyperv.c | 7 ++++++-
drivers/hv/hv_common.c | 6 ++++++
5 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 1dbcbd9da74d..13ccb52eecd7 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -269,6 +269,15 @@ bool hv_isolation_type_snp(void)
return static_branch_unlikely(&isolation_type_snp);
}
+#ifdef CONFIG_INTEL_TDX_GUEST
+DEFINE_STATIC_KEY_FALSE(isolation_type_tdx);
+
+bool hv_isolation_type_tdx(void)
+{
+ return static_branch_unlikely(&isolation_type_tdx);
+}
+#endif
+
/*
* hv_mark_gpa_visibility - Set pages visible to host via hvcall.
*
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 08e822bd7aa6..1f4a967b48c5 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -163,7 +163,8 @@
enum hv_isolation_type {
HV_ISOLATION_TYPE_NONE = 0,
HV_ISOLATION_TYPE_VBS = 1,
- HV_ISOLATION_TYPE_SNP = 2
+ HV_ISOLATION_TYPE_SNP = 2,
+ HV_ISOLATION_TYPE_TDX = 3
};
/* Hyper-V specific model specific registers (MSRs) */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6d502f3efb0f..49bca07bbd2c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -14,6 +14,7 @@
union hv_ghcb;
DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_tdx);
typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
@@ -30,6 +31,8 @@ extern u64 hv_current_partition_id;
extern union hv_ghcb * __percpu *hv_ghcb_pg;
+extern bool hv_isolation_type_tdx(void);
+
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 46668e255421..941372449ff2 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -339,9 +339,14 @@ static void __init ms_hyperv_init_platform(void)
}
/* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
- if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
+ if (hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS ||
+ hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
cc_set_vendor(CC_VENDOR_HYPERV);
}
+
+ if (IS_ENABLED(CONFIG_INTEL_TDX_GUEST) &&
+ hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+ static_branch_enable(&isolation_type_tdx);
}
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..a9a03ab04b97 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -268,6 +268,12 @@ bool __weak hv_isolation_type_snp(void)
}
EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
+bool __weak hv_isolation_type_tdx(void)
+{
+ return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
+
void __weak hv_setup_vmbus_handler(void (*handler)(void))
{
}
--
2.25.1
A TDX guest uses the GHCI call rather than hv_hypercall_pg.
In hv_do_hypercall(), Hyper-V requires that the input/output addresses
must have the cc_mask.
Signed-off-by: Dexuan Cui <[email protected]>
---
Changes in v2:
Implemented hv_tdx_hypercall() in C rather than in assembly code.
Renamed the parameter names of hv_tdx_hypercall().
Used cc_mkdec() directly in hv_do_hypercall().
Changes in v3:
Decrypted/encrypted hyperv_pcpu_input_arg in
hv_common_cpu_init() and hv_common_cpu_die().
arch/x86/hyperv/hv_init.c | 8 ++++++++
arch/x86/hyperv/ivm.c | 14 ++++++++++++++
arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
drivers/hv/hv_common.c | 21 +++++++++++++++++++++
4 files changed, 60 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 41ef036ebb7b..6a0bcbd18306 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -430,6 +430,10 @@ void __init hyperv_init(void)
/* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
+ /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+ if (hv_isolation_type_tdx())
+ goto skip_hypercall_pg_init;
+
hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
@@ -469,6 +473,7 @@ void __init hyperv_init(void)
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
}
+skip_hypercall_pg_init:
/*
* hyperv_init() is called before LAPIC is initialized: see
* apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -602,6 +607,9 @@ bool hv_is_hyperv_initialized(void)
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return false;
+ /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
+ if (hv_isolation_type_tdx())
+ return true;
/*
* Verify that earlier initialization succeeded by checking
* that the hypercall page is setup
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 13ccb52eecd7..07e4253b5809 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -276,6 +276,20 @@ bool hv_isolation_type_tdx(void)
{
return static_branch_unlikely(&isolation_type_tdx);
}
+
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
+{
+ struct tdx_hypercall_args args = { };
+
+ args.r10 = control;
+ args.rdx = param1;
+ args.r8 = param2;
+
+ (void)__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
+
+ return args.r11;
+}
+EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
#endif
/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 49bca07bbd2c..159ab74d80e6 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -10,6 +10,7 @@
#include <asm/nospec-branch.h>
#include <asm/paravirt.h>
#include <asm/mshyperv.h>
+#include <asm/coco.h>
union hv_ghcb;
@@ -37,6 +38,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+/*
+ * If the hypercall involves no input or output parameters, the hypervisor
+ * ignores the corresponding GPA pointer.
+ */
static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
{
u64 input_address = input ? virt_to_phys(input) : 0;
@@ -44,6 +51,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
u64 hv_status;
#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control,
+ cc_mkdec(input_address),
+ cc_mkdec(output_address));
if (!hv_hypercall_pg)
return U64_MAX;
@@ -81,6 +92,9 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control, input1, 0);
+
{
__asm__ __volatile__(CALL_NOSPEC
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
@@ -112,6 +126,9 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
#ifdef CONFIG_X86_64
+ if (hv_isolation_type_tdx())
+ return hv_tdx_hypercall(control, input1, input2);
+
{
__asm__ __volatile__("mov %4, %%r8\n"
CALL_NOSPEC
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index a9a03ab04b97..219c3f235c50 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -21,6 +21,7 @@
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
@@ -125,6 +126,7 @@ int hv_common_cpu_init(unsigned int cpu)
u64 msr_vp_index;
gfp_t flags;
int pgcount = hv_root_partition ? 2 : 1;
+ int ret;
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -134,6 +136,17 @@ int hv_common_cpu_init(unsigned int cpu)
if (!(*inputarg))
return -ENOMEM;
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+ if (ret) {
+ /* It may be unsafe to free *inputarg */
+ *inputarg = NULL;
+ return ret;
+ }
+
+ memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
+ }
+
if (hv_root_partition) {
outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -154,6 +167,8 @@ int hv_common_cpu_die(unsigned int cpu)
unsigned long flags;
void **inputarg, **outputarg;
void *mem;
+ int pgcount = hv_root_partition ? 2 : 1;
+ int ret;
local_irq_save(flags);
@@ -168,6 +183,12 @@ int hv_common_cpu_die(unsigned int cpu)
local_irq_restore(flags);
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_encrypted((unsigned long)mem, pgcount);
+ if (ret)
+ return ret;
+ }
+
kfree(mem);
return 0;
--
2.25.1
When a TDX guest runs on Hyper-V, the UEFI firmware sets the HW_REDUCED
flag, and consequently ttyS0 interrupts can't work. Fix the issue by
overriding x86_init.acpi.reduced_hw_early_init().
Signed-off-by: Dexuan Cui <[email protected]>
---
This patch appears in the patchset for the first time.
arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 6a57af60ec9f..26490bc5a060 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -253,6 +253,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#endif
+/*
+ * When a TDX guest runs on Hyper-V, the firmware sets the HW_REDUCED flag: see
+ * acpi_tb_create_local_fadt(). Consequently ttyS0 interrupts can't work because
+ * request_irq() -> ... -> irq_to_desc() returns NULL for ttyS0. This happens
+ * because mp_config_acpi_legacy_irqs() sees a nr_legacy_irqs() of 0, so it
+ * doesn't initialize the array 'mp_irqs[]', and later setup_IO_APIC_irqs() ->
+ * find_irq_entry() fails to find the legacy irqs from the array, and hence
+ * doesn't create the necessary irq description info.
+ *
+ * Copy arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() but doesn't
+ * change 'legacy_pic', so it keeps its default value 'default_legacy_pic' in
+ * mp_config_acpi_legacy_irqs(), which sees a non-zero nr_legacy_irqs(), and
+ * eventually serial console interrupts can work properly.
+ */
+static void __init reduced_hw_init(void)
+{
+ x86_init.timers.timer_init = x86_init_noop;
+ x86_init.irqs.pre_vector_init = x86_init_noop;
+}
+
static void __init ms_hyperv_init_platform(void)
{
int hv_max_functions_eax;
@@ -367,6 +387,8 @@ static void __init ms_hyperv_init_platform(void)
/* A TDX VM must use x2APIC and doesn't use lazy EOI */
ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+
+ x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
}
}
--
2.25.1
Add Hyper-V specific code so that a TDX guest can run on Hyper-V:
No need to use hv_vp_assist_page.
Don't use the unsafe Hyper-V TSC page.
Don't try to use HV_REGISTER_CRASH_CTL.
Don't trust Hyper-V's TLB-flushing hypercalls.
Don't use lazy EOI.
Share SynIC Event/Message pages and VMBus Monitor pages with the host.
Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
Signed-off-by: Dexuan Cui <[email protected]>
---
Changes in v2:
Used a new function hv_set_memory_enc_dec_needed() in
__set_memory_enc_pgtable().
Added the missing set_memory_encrypted() in hv_synic_free().
Changes in v3:
Use pgprot_decrypted(PAGE_KERNEL)in hv_ringbuffer_init().
(Do not use PAGE_KERNEL_NOENC, which doesn't exist for ARM64).
Used cc_mkdec() in hv_synic_enable_regs().
ms_hyperv_init_platform():
Explicitly do not use HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED.
Explicitly do not use HV_X64_APIC_ACCESS_RECOMMENDED.
Enabled __send_ipi_mask() and __send_ipi_one() for TDX guests.
arch/x86/hyperv/hv_apic.c | 6 ++--
arch/x86/hyperv/hv_init.c | 19 ++++++++---
arch/x86/hyperv/ivm.c | 5 +++
arch/x86/kernel/cpu/mshyperv.c | 23 ++++++++++++-
arch/x86/mm/pat/set_memory.c | 2 +-
drivers/hv/connection.c | 4 ++-
drivers/hv/hv.c | 62 +++++++++++++++++++++++++++++++---
drivers/hv/hv_common.c | 6 ++++
drivers/hv/ring_buffer.c | 2 +-
include/asm-generic/mshyperv.h | 2 ++
10 files changed, 116 insertions(+), 15 deletions(-)
diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index fb8b2c088681..16919c7b3196 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -173,7 +173,8 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
(exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
return true;
- if (!hv_hypercall_pg)
+ /* A TDX guest doesn't use hv_hypercall_pg. */
+ if (!hv_isolation_type_tdx() && !hv_hypercall_pg)
return false;
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
@@ -227,7 +228,8 @@ static bool __send_ipi_one(int cpu, int vector)
trace_hyperv_send_ipi_one(cpu, vector);
- if (!hv_hypercall_pg || (vp == VP_INVAL))
+ /* A TDX guest doesn't use hv_hypercall_pg. */
+ if ((!hv_isolation_type_tdx() && !hv_hypercall_pg) || (vp == VP_INVAL))
return false;
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6a0bcbd18306..d641c9808c31 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
static int hv_cpu_init(unsigned int cpu)
{
union hv_vp_assist_msr_contents msr = { 0 };
- struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+ struct hv_vp_assist_page **hvp;
int ret;
ret = hv_common_cpu_init(cpu);
@@ -87,6 +87,7 @@ static int hv_cpu_init(unsigned int cpu)
if (!hv_vp_assist_page)
return 0;
+ hvp = &hv_vp_assist_page[cpu];
if (hv_root_partition) {
/*
* For root partition we get the hypervisor provided VP assist
@@ -396,11 +397,21 @@ void __init hyperv_init(void)
if (hv_common_init())
return;
- hv_vp_assist_page = kcalloc(num_possible_cpus(),
- sizeof(*hv_vp_assist_page), GFP_KERNEL);
+ /*
+ * The VP assist page is useless to a TDX guest: the only use we
+ * would have for it is lazy EOI, which can not be used with TDX.
+ */
+ if (hv_isolation_type_tdx())
+ hv_vp_assist_page = NULL;
+ else
+ hv_vp_assist_page = kcalloc(num_possible_cpus(),
+ sizeof(*hv_vp_assist_page),
+ GFP_KERNEL);
if (!hv_vp_assist_page) {
ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
- goto common_free;
+
+ if (!hv_isolation_type_tdx())
+ goto common_free;
}
if (hv_isolation_type_snp()) {
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 07e4253b5809..4398042f10d5 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -258,6 +258,11 @@ bool hv_is_isolation_supported(void)
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
}
+bool hv_set_memory_enc_dec_needed(void)
+{
+ return hv_is_isolation_supported() && !hv_isolation_type_tdx();
+}
+
DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
/*
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 941372449ff2..6a57af60ec9f 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -345,8 +345,29 @@ static void __init ms_hyperv_init_platform(void)
}
if (IS_ENABLED(CONFIG_INTEL_TDX_GUEST) &&
- hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX)
+ hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
static_branch_enable(&isolation_type_tdx);
+
+ /*
+ * The GPAs of SynIC Event/Message pages and VMBus
+ * Moniter pages need to be added by this offset.
+ */
+ ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+ /* Don't use the unsafe Hyper-V TSC page */
+ ms_hyperv.features &=
+ ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+ /* HV_REGISTER_CRASH_CTL is unsupported */
+ ms_hyperv.misc_features &=
+ ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+
+ /* Don't trust Hyper-V's TLB-flushing hypercalls */
+ ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+
+ /* A TDX VM must use x2APIC and doesn't use lazy EOI */
+ ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+ }
}
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 356758b7d4b4..e069bf28d683 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2175,7 +2175,7 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
- if (hv_is_isolation_supported())
+ if (hv_set_memory_enc_dec_needed())
return hv_set_mem_host_visibility(addr, numpages, !enc);
if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 9dc27e5d367a..1ecc3c29e3f7 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -250,12 +250,14 @@ int vmbus_connect(void)
* Isolation VM with AMD SNP needs to access monitor page via
* address space above shared gpa boundary.
*/
- if (hv_isolation_type_snp()) {
+ if (hv_isolation_type_snp() || hv_isolation_type_tdx()) {
vmbus_connection.monitor_pages_pa[0] +=
ms_hyperv.shared_gpa_boundary;
vmbus_connection.monitor_pages_pa[1] +=
ms_hyperv.shared_gpa_boundary;
+ }
+ if (hv_isolation_type_snp()) {
vmbus_connection.monitor_pages[0]
= memremap(vmbus_connection.monitor_pages_pa[0],
HV_HYP_PAGE_SIZE,
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 4d6480d57546..76a19cc56894 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -18,6 +18,7 @@
#include <linux/clockchips.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
+#include <linux/set_memory.h>
#include <clocksource/hyperv_timer.h>
#include <asm/mshyperv.h>
#include "hyperv_vmbus.h"
@@ -119,6 +120,7 @@ int hv_synic_alloc(void)
{
int cpu;
struct hv_per_cpu_context *hv_cpu;
+ int ret = -ENOMEM;
/*
* First, zero all per-cpu memory areas so hv_synic_free() can
@@ -168,6 +170,30 @@ int hv_synic_alloc(void)
pr_err("Unable to allocate post msg page\n");
goto err;
}
+
+
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC msg page\n");
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt SYNIC event page\n");
+ goto err;
+ }
+
+ ret = set_memory_decrypted(
+ (unsigned long)hv_cpu->post_msg_page, 1);
+ if (ret) {
+ pr_err("Failed to decrypt post msg page\n");
+ goto err;
+ }
+ }
}
return 0;
@@ -176,18 +202,42 @@ int hv_synic_alloc(void)
* Any memory allocations that succeeded will be freed when
* the caller cleans up by calling hv_synic_free()
*/
- return -ENOMEM;
+ return ret;
}
void hv_synic_free(void)
{
int cpu;
+ int ret;
for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
+ if (hv_isolation_type_tdx()) {
+ ret = set_memory_encrypted(
+ (unsigned long)hv_cpu->synic_message_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC msg page\n");
+ continue;
+ }
+
+ ret = set_memory_encrypted(
+ (unsigned long)hv_cpu->synic_event_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt SYNIC event page\n");
+ continue;
+ }
+
+ ret = set_memory_encrypted(
+ (unsigned long)hv_cpu->post_msg_page, 1);
+ if (ret) {
+ pr_err("Failed to encrypt post msg page\n");
+ continue;
+ }
+ }
+
free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
free_page((unsigned long)hv_cpu->post_msg_page);
@@ -223,8 +273,9 @@ void hv_synic_enable_regs(unsigned int cpu)
if (!hv_cpu->synic_message_page)
pr_err("Fail to map syinc message page.\n");
} else {
- simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
- >> HV_HYP_PAGE_SHIFT;
+ simp.base_simp_gpa =
+ cc_mkdec(virt_to_phys(hv_cpu->synic_message_page)) >>
+ HV_HYP_PAGE_SHIFT;
}
hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
@@ -241,8 +292,9 @@ void hv_synic_enable_regs(unsigned int cpu)
if (!hv_cpu->synic_event_page)
pr_err("Fail to map syinc event page.\n");
} else {
- siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
- >> HV_HYP_PAGE_SHIFT;
+ siefp.base_siefp_gpa =
+ cc_mkdec(virt_to_phys(hv_cpu->synic_event_page)) >>
+ HV_HYP_PAGE_SHIFT;
}
hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 219c3f235c50..42f9274a3f82 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -283,6 +283,12 @@ bool __weak hv_is_isolation_supported(void)
}
EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
+bool __weak hv_set_memory_enc_dec_needed(void)
+{
+ return false;
+}
+EXPORT_SYMBOL_GPL(hv_set_memory_enc_dec_needed);
+
bool __weak hv_isolation_type_snp(void)
{
return false;
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index c6692fd5ab15..ae3b3cd7ddbe 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -233,7 +233,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
ring_info->ring_buffer = (struct hv_ring_buffer *)
vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
- PAGE_KERNEL);
+ pgprot_decrypted(PAGE_KERNEL));
kfree(pages_wraparound);
if (!ring_info->ring_buffer)
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index d55d2833a37b..c8c16b3df68d 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -263,6 +263,7 @@ bool hv_is_hyperv_initialized(void);
bool hv_is_hibernation_supported(void);
enum hv_isolation_type hv_get_isolation_type(void);
bool hv_is_isolation_supported(void);
+bool hv_set_memory_enc_dec_needed(void);
bool hv_isolation_type_snp(void);
u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
void hyperv_cleanup(void);
@@ -275,6 +276,7 @@ static inline bool hv_is_hyperv_initialized(void) { return false; }
static inline bool hv_is_hibernation_supported(void) { return false; }
static inline void hyperv_cleanup(void) {}
static inline bool hv_is_isolation_supported(void) { return false; }
+static inline bool hv_set_memory_enc_dec_needed(void) { return false; }
static inline enum hv_isolation_type hv_get_isolation_type(void)
{
return HV_ISOLATION_TYPE_NONE;
--
2.25.1
On Mon, Feb 06, 2023 at 11:24:14AM -0800, Dexuan Cui wrote:
> GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
> error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
> operation for the pages in the region starting at the GPA specified
> in R11.
>
> When a TDX guest runs on Hyper-V, Hyper-V returns the retry error
> when hyperv_init() -> swiotlb_update_mem_attributes() ->
> set_memory_decrypted() decrypts up to 1GB of swiotlb bounce buffers.
>
> Signed-off-by: Dexuan Cui <[email protected]>
Looks good to me.
Acked-by: Kirill A. Shutemov <[email protected]>
--
Kiryl Shutsemau / Kirill A. Shutemov
On 2/6/23 11:24 AM, Dexuan Cui wrote:
> A TDX guest uses the GHCI call rather than hv_hypercall_pg.
>
> In hv_do_hypercall(), Hyper-V requires that the input/output addresses
> must have the cc_mask.
>
> Signed-off-by: Dexuan Cui <[email protected]>
>
> ---
Looks good to me
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>
> Changes in v2:
> Implemented hv_tdx_hypercall() in C rather than in assembly code.
> Renamed the parameter names of hv_tdx_hypercall().
> Used cc_mkdec() directly in hv_do_hypercall().
>
> Changes in v3:
> Decrypted/encrypted hyperv_pcpu_input_arg in
> hv_common_cpu_init() and hv_common_cpu_die().
>
> arch/x86/hyperv/hv_init.c | 8 ++++++++
> arch/x86/hyperv/ivm.c | 14 ++++++++++++++
> arch/x86/include/asm/mshyperv.h | 17 +++++++++++++++++
> drivers/hv/hv_common.c | 21 +++++++++++++++++++++
> 4 files changed, 60 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 41ef036ebb7b..6a0bcbd18306 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -430,6 +430,10 @@ void __init hyperv_init(void)
> /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>
> + /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> + if (hv_isolation_type_tdx())
> + goto skip_hypercall_pg_init;
> +
> hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
> VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -469,6 +473,7 @@ void __init hyperv_init(void)
> wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> }
>
> +skip_hypercall_pg_init:
> /*
> * hyperv_init() is called before LAPIC is initialized: see
> * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> @@ -602,6 +607,9 @@ bool hv_is_hyperv_initialized(void)
> if (x86_hyper_type != X86_HYPER_MS_HYPERV)
> return false;
>
> + /* A TDX guest uses the GHCI call rather than hv_hypercall_pg. */
> + if (hv_isolation_type_tdx())
> + return true;
> /*
> * Verify that earlier initialization succeeded by checking
> * that the hypercall page is setup
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 13ccb52eecd7..07e4253b5809 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -276,6 +276,20 @@ bool hv_isolation_type_tdx(void)
> {
> return static_branch_unlikely(&isolation_type_tdx);
> }
> +
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2)
> +{
> + struct tdx_hypercall_args args = { };
> +
> + args.r10 = control;
> + args.rdx = param1;
> + args.r8 = param2;
> +
> + (void)__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT);
> +
> + return args.r11;
> +}
> +EXPORT_SYMBOL_GPL(hv_tdx_hypercall);
> #endif
>
> /*
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 49bca07bbd2c..159ab74d80e6 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -10,6 +10,7 @@
> #include <asm/nospec-branch.h>
> #include <asm/paravirt.h>
> #include <asm/mshyperv.h>
> +#include <asm/coco.h>
>
> union hv_ghcb;
>
> @@ -37,6 +38,12 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
> int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
> int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
>
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +/*
> + * If the hypercall involves no input or output parameters, the hypervisor
> + * ignores the corresponding GPA pointer.
> + */
> static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> {
> u64 input_address = input ? virt_to_phys(input) : 0;
> @@ -44,6 +51,10 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> u64 hv_status;
>
> #ifdef CONFIG_X86_64
> + if (hv_isolation_type_tdx())
> + return hv_tdx_hypercall(control,
> + cc_mkdec(input_address),
> + cc_mkdec(output_address));
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> @@ -81,6 +92,9 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
>
> #ifdef CONFIG_X86_64
> + if (hv_isolation_type_tdx())
> + return hv_tdx_hypercall(control, input1, 0);
> +
> {
> __asm__ __volatile__(CALL_NOSPEC
> : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> @@ -112,6 +126,9 @@ static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
> u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
>
> #ifdef CONFIG_X86_64
> + if (hv_isolation_type_tdx())
> + return hv_tdx_hypercall(control, input1, input2);
> +
> {
> __asm__ __volatile__("mov %4, %%r8\n"
> CALL_NOSPEC
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index a9a03ab04b97..219c3f235c50 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -21,6 +21,7 @@
> #include <linux/ptrace.h>
> #include <linux/slab.h>
> #include <linux/dma-map-ops.h>
> +#include <linux/set_memory.h>
> #include <asm/hyperv-tlfs.h>
> #include <asm/mshyperv.h>
>
> @@ -125,6 +126,7 @@ int hv_common_cpu_init(unsigned int cpu)
> u64 msr_vp_index;
> gfp_t flags;
> int pgcount = hv_root_partition ? 2 : 1;
> + int ret;
>
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
> @@ -134,6 +136,17 @@ int hv_common_cpu_init(unsigned int cpu)
> if (!(*inputarg))
> return -ENOMEM;
>
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> + if (ret) {
> + /* It may be unsafe to free *inputarg */
> + *inputarg = NULL;
> + return ret;
> + }
> +
> + memset(*inputarg, 0x00, pgcount * HV_HYP_PAGE_SIZE);
> + }
> +
> if (hv_root_partition) {
> outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> @@ -154,6 +167,8 @@ int hv_common_cpu_die(unsigned int cpu)
> unsigned long flags;
> void **inputarg, **outputarg;
> void *mem;
> + int pgcount = hv_root_partition ? 2 : 1;
> + int ret;
>
> local_irq_save(flags);
>
> @@ -168,6 +183,12 @@ int hv_common_cpu_die(unsigned int cpu)
>
> local_irq_restore(flags);
>
> + if (hv_isolation_type_tdx()) {
> + ret = set_memory_encrypted((unsigned long)mem, pgcount);
> + if (ret)
> + return ret;
> + }
> +
> kfree(mem);
>
> return 0;
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
On Mon, Feb 06, 2023 at 11:24:15AM -0800, Dexuan Cui wrote:
> When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
> allocates buffers using vzalloc(), and needs to share the buffers with the
> host OS by calling set_memory_decrypted(), which is not working for
> vmalloc() yet. Add the support by handling the pages one by one.
>
> Signed-off-by: Dexuan Cui <[email protected]>
>
> ---
>
> Changes in v2:
> Changed tdx_enc_status_changed() in place.
>
> Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
> how to get the underlying huge page info (if huge page is in use) and
> try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
> is_vm_area_hugepages() and __vfree() -> __vunmap(), and I think the
> underlying page allocation info is internal to the mm code, and there
> is no mm API to for me get the info in tdx_enc_status_changed().
I also don't obvious way to retrieve this info after vmalloc() is
complete. split_page() makes all pages independent.
I think you can try to do this manually: allocate a vmalloc region,
allocate pages manually, and put into the region. This way you always know
page sizes and can optimize conversion to shared memory.
But it is tedious and I'm not sure if it worth the gain.
> Hi, Kirill, the load_unaligned_zeropad() issue is not addressed in
> this patch. The issue looks like a generic issue that also happens to
> AMD SNP vTOM mode and C-bit mode. Will need to figure out how to
> address the issue. If we decide to adjust direct mapping to have the
> shared bit set, it lools like we need to do the below for each
> 'start_va' vmalloc page:
> pa = slow_virt_to_phys(start_va);
> set_memory_decrypted(phys_to_virt(pa), 1); -- this line calls
> tdx_enc_status_changed() the second time for the same age, which is not
> great. It looks like we need to find a way to reuse the cpa_flush()
> related code in __set_memory_enc_pgtable() and make sure we call
> tdx_enc_status_changed() only once for the same page from vmalloc()?
Actually, current code will change direct mapping for you. I just
double-checked: the alias processing in __change_page_attr_set_clr() will
change direct mapping if you call it on vmalloc()ed memory.
Splitting direct mapping is still unfortunate, but well.
>
> Changes in v3:
> No change since v2.
>
> arch/x86/coco/tdx/tdx.c | 69 ++++++++++++++++++++++++++---------------
> 1 file changed, 44 insertions(+), 25 deletions(-)
I don't hate what you did here. But I think the code below is a bit
cleaner.
Any opinions?
static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end,
bool enc)
{
if (!tdx_map_gpa(start, end, enc))
return false;
/* private->shared conversion requires only MapGPA call */
if (!enc)
return true;
return try_accept_page(start, end);
}
/*
* Inform the VMM of the guest's intent for this physical page: shared with
* the VMM or private to the guest. The VMM is expected to change its mapping
* of the page in response.
*/
static bool tdx_enc_status_changed(unsigned long start, int numpages, bool enc)
{
unsigned long end = start + numpages * PAGE_SIZE;
if (offset_in_page(start) != 0)
return false;
if (!is_vmalloc_addr((void *)start))
return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc);
while (start < end) {
phys_addr_t start_pa = slow_virt_to_phys((void *)start);
phys_addr_t end_pa = start_pa + PAGE_SIZE;
if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc))
return false;
start += PAGE_SIZE;
}
return true;
}
--
Kiryl Shutsemau / Kirill A. Shutemov
> From: Kirill A. Shutemov <[email protected]>
> Sent: Friday, February 17, 2023 5:20 AM
> To: Dexuan Cui <[email protected]>
> > ...
Hi Krill, sorry for my late reply!
> > Hi, Dave, I checked the huge vmalloc mapping code, but still don't know
> > how to get the underlying huge page info (if huge page is in use) and
> > try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked
> > is_vm_area_hugepages() and __vfree() -> __vunmap(), and I think the
> > underlying page allocation info is internal to the mm code, and there
> > is no mm API to for me get the info in tdx_enc_status_changed().
>
> I also don't obvious way to retrieve this info after vmalloc() is
> complete. split_page() makes all pages independent.
>
> I think you can try to do this manually: allocate a vmalloc region,
> allocate pages manually, and put into the region. This way you always know
> page sizes and can optimize conversion to shared memory.
>
> But it is tedious and I'm not sure if it worth the gain.
Thanks, I'll do some research on this idea.
> > Hi, Kirill, the load_unaligned_zeropad() issue is not addressed in
> > this patch. The issue looks like a generic issue that also happens to
> > AMD SNP vTOM mode and C-bit mode. Will need to figure out how to
> > address the issue. If we decide to adjust direct mapping to have the
> > shared bit set, it lools like we need to do the below for each
> > 'start_va' vmalloc page:
> > pa = slow_virt_to_phys(start_va);
> > set_memory_decrypted(phys_to_virt(pa), 1); -- this line calls
> > tdx_enc_status_changed() the second time for the same age, which is not
> > great. It looks like we need to find a way to reuse the cpa_flush()
> > related code in __set_memory_enc_pgtable() and make sure we call
> > tdx_enc_status_changed() only once for the same page from vmalloc()?
>
> Actually, current code will change direct mapping for you. I just
> double-checked: the alias processing in __change_page_attr_set_clr() will
> change direct mapping if you call it on vmalloc()ed memory.
>
> Splitting direct mapping is still unfortunate, but well.
>
> >
> > Changes in v3:
> > No change since v2.
> >
> > arch/x86/coco/tdx/tdx.c | 69 ++++++++++++++++++++++++++---------------
> > 1 file changed, 44 insertions(+), 25 deletions(-)
>
> I don't hate what you did here. But I think the code below is a bit
> cleaner.
>
> Any opinions?
Thanks! Your version looks much better. I'll use it in in v4.