2023-08-11 23:22:21

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH 0/9] Support TDX guests on Hyper-V (the Hyper-V specific part)

Hyper-V provides two modes for running Intel TDX VMs:

1) TD Partitioning mode with a paravisor (see [1]).
2) In "fully enlightened" mode with normal TDX shared bit control
over page encryption, and no paravisor

The first mode is similar to AMD SEV-SNP's vTOM mode (see [2]).
The second is similar to AMD SEV-SNP's C-bit mode(see [2]).

For #2, the v6 patchset was [3], which is later split into 2 parts:
the generic TDX part (see [4][5]), and the Hyper-V specific part, i.e.
the first 5 patches of this patchset. For the second part, I rebased
the patches to Tianyu's fully enlighted SNP v5 patchset [2]. Since this
is a straightforward rebasing, I keep the existing Acked-by and
Reviewed-by in the v6 patchset [3].

The next 3 patches of this patchset add the support for #1.

The last patch (the 9th patch) just makes some cleanup.

Please review all the 9 patches, which are also on my github
branch [6].

I tested the patches for a regular VM, a VBS VM, a SNP VM
with the paravisor, and a TDX VM with the paravisor and a TDX
VM without the paravisor, and an ARM64 VM. All the VMs worked
as expected.

If the patches all look good, I expect that Tianyu's patches [2]
are merged into the Hyper-V tree's hyperv-next branch first, then
these 9 patches can be merged afterwards.

Note: Tianyu will post v6, and I expect that the 9 patches can
still apply atop Tianyu's v6 -- if not, I'll post a new version.

Thanks,
Dexuan

References:
[1] Intel TDX Module v1.5 TD Partitioning Architecture Specification
[2] https://lwn.net/ml/linux-kernel/20230810160412.820246-1-ltykernel%40gmail.com/
[3] https://lwn.net/ml/linux-kernel/[email protected]/
[4] https://lwn.net/ml/linux-kernel/20230811214826.9609-1-decui%40microsoft.com/
[5] https://github.com/dcui/tdx/commits/decui/mainline/x86/tdx/v10
[6] https://github.com/dcui/tdx/commits/decui/mainline/x86/hyperv/tdx

Dexuan Cui (9):
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Support hypercalls for fully enlightened TDX guests
Drivers: hv: vmbus: Support fully enlightened TDX guests
x86/hyperv: Fix serial console interrupts for fully enlightened TDX
guests
Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
x86/hyperv: Introduce a global variable hyperv_paravisor_present
Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the
paravisor
x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the
paravisor
x86/hyperv: Remove hv_isolation_type_en_snp

arch/x86/hyperv/hv_apic.c | 15 +++-
arch/x86/hyperv/hv_init.c | 53 ++++++++++---
arch/x86/hyperv/ivm.c | 120 ++++++++++++++++++++++++++---
arch/x86/include/asm/hyperv-tlfs.h | 3 +-
arch/x86/include/asm/mshyperv.h | 29 ++++++-
arch/x86/kernel/cpu/mshyperv.c | 72 ++++++++++++++---
drivers/hv/connection.c | 2 +-
drivers/hv/hv.c | 88 +++++++++++++++++----
drivers/hv/hv_common.c | 46 ++++++++---
drivers/hv/hyperv_vmbus.h | 11 +++
include/asm-generic/mshyperv.h | 3 +
11 files changed, 379 insertions(+), 63 deletions(-)

--
2.25.1



2023-08-11 23:28:00

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH 3/9] Drivers: hv: vmbus: Support fully enlightened TDX guests

Add Hyper-V specific code so that a fully enlightened TDX guest (i.e.
without the paravisor) can run on Hyper-V:
Don't use hv_vp_assist_page. Use GHCI instead.
Don't try to use the unsupported HV_REGISTER_CRASH_CTL.
Don't trust (use) Hyper-V's TLB-flushing hypercalls.
Don't use lazy EOI.
Share the SynIC Event/Message pages with the hypervisor.
Don't use the Hyper-V TSC page for now, because non-trivial work is
required to share the page with the hypervisor.

Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/hyperv/hv_apic.c | 15 ++++++++++++---
arch/x86/hyperv/hv_init.c | 19 +++++++++++++++----
arch/x86/kernel/cpu/mshyperv.c | 23 +++++++++++++++++++++++
drivers/hv/hv.c | 17 +++++++++++++++--
4 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 1fbda2f94184e..cb7429046d18d 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -177,8 +177,11 @@ static bool __send_ipi_mask(const struct cpumask *mask, int vector,
(exclude_self && weight == 1 && cpumask_test_cpu(this_cpu, mask)))
return true;

- if (!hv_hypercall_pg)
- return false;
+ /* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
+ if (!hv_hypercall_pg) {
+ if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
+ return false;
+ }

if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return false;
@@ -231,9 +234,15 @@ static bool __send_ipi_one(int cpu, int vector)

trace_hyperv_send_ipi_one(cpu, vector);

- if (!hv_hypercall_pg || (vp == VP_INVAL))
+ if (vp == VP_INVAL)
return false;

+ /* A fully enlightened TDX VM uses GHCI rather than hv_hypercall_pg. */
+ if (!hv_hypercall_pg) {
+ if (ms_hyperv.paravisor_present || !hv_isolation_type_tdx())
+ return false;
+ }
+
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return false;

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index d8ea54663113c..4bcd0a6f94760 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -80,7 +80,7 @@ static int hyperv_init_ghcb(void)
static int hv_cpu_init(unsigned int cpu)
{
union hv_vp_assist_msr_contents msr = { 0 };
- struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
+ struct hv_vp_assist_page **hvp;
int ret;

ret = hv_common_cpu_init(cpu);
@@ -90,6 +90,7 @@ static int hv_cpu_init(unsigned int cpu)
if (!hv_vp_assist_page)
return 0;

+ hvp = &hv_vp_assist_page[cpu];
if (hv_root_partition) {
/*
* For root partition we get the hypervisor provided VP assist
@@ -447,11 +448,21 @@ void __init hyperv_init(void)
if (hv_common_init())
return;

- hv_vp_assist_page = kcalloc(num_possible_cpus(),
- sizeof(*hv_vp_assist_page), GFP_KERNEL);
+ /*
+ * The VP assist page is useless to a TDX guest: the only use we
+ * would have for it is lazy EOI, which can not be used with TDX.
+ */
+ if (hv_isolation_type_tdx())
+ hv_vp_assist_page = NULL;
+ else
+ hv_vp_assist_page = kcalloc(num_possible_cpus(),
+ sizeof(*hv_vp_assist_page),
+ GFP_KERNEL);
if (!hv_vp_assist_page) {
ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
- goto common_free;
+
+ if (!hv_isolation_type_tdx())
+ goto common_free;
}

if (hv_isolation_type_snp()) {
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index a50fd3650ea9b..507df0f64ae18 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -420,6 +420,29 @@ static void __init ms_hyperv_init_platform(void)
static_branch_enable(&isolation_type_snp);
} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
static_branch_enable(&isolation_type_tdx);
+
+ /* A TDX VM must use x2APIC and doesn't use lazy EOI. */
+ ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
+
+ if (!ms_hyperv.paravisor_present) {
+ /*
+ * The ms_hyperv.shared_gpa_boundary_active in
+ * a fully enlightened TDX VM is 0, but the GPAs
+ * of the SynIC Event/Message pages and VMBus
+ * Moniter pages in such a VM still need to be
+ * added by this offset.
+ */
+ ms_hyperv.shared_gpa_boundary = cc_mkdec(0);
+
+ /* To be supported: more work is required. */
+ ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+
+ /* HV_REGISTER_CRASH_CTL is unsupported. */
+ ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
+
+ /* Don't trust Hyper-V's TLB-flushing hypercalls. */
+ ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+ }
}
}

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index ec6e35a0d9bf6..28bbb354324d6 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -121,11 +121,15 @@ int hv_synic_alloc(void)
(void *)get_zeroed_page(GFP_ATOMIC);
if (hv_cpu->synic_event_page == NULL) {
pr_err("Unable to allocate SYNIC event page\n");
+
+ free_page((unsigned long)hv_cpu->synic_message_page);
+ hv_cpu->synic_message_page = NULL;
goto err;
}
}

- if (hv_isolation_type_en_snp()) {
+ if (!ms_hyperv.paravisor_present &&
+ (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
ret = set_memory_decrypted((unsigned long)
hv_cpu->synic_message_page, 1);
if (ret) {
@@ -174,7 +178,8 @@ void hv_synic_free(void)
= per_cpu_ptr(hv_context.cpu_context, cpu);

/* It's better to leak the page if the encryption fails. */
- if (hv_isolation_type_en_snp()) {
+ if (!ms_hyperv.paravisor_present &&
+ (hv_isolation_type_en_snp() || hv_isolation_type_tdx())) {
if (hv_cpu->synic_message_page) {
ret = set_memory_encrypted((unsigned long)
hv_cpu->synic_message_page, 1);
@@ -232,6 +237,10 @@ void hv_synic_enable_regs(unsigned int cpu)
} else {
simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>> HV_HYP_PAGE_SHIFT;
+
+ if (hv_isolation_type_tdx())
+ simp.base_simp_gpa |= ms_hyperv.shared_gpa_boundary
+ >> HV_HYP_PAGE_SHIFT;
}

hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
@@ -251,6 +260,10 @@ void hv_synic_enable_regs(unsigned int cpu)
} else {
siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>> HV_HYP_PAGE_SHIFT;
+
+ if (hv_isolation_type_tdx())
+ siefp.base_siefp_gpa |= ms_hyperv.shared_gpa_boundary
+ >> HV_HYP_PAGE_SHIFT;
}

hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
--
2.25.1


2023-08-12 02:17:17

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH 4/9] x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests

When a fully enlightened TDX guest runs on Hyper-V, the UEFI firmware sets
the HW_REDUCED flag and consequently ttyS0 interrupts can't work. Fix the
issue by overriding x86_init.acpi.reduced_hw_early_init().

Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 507df0f64ae18..b4214e37e9124 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -324,6 +324,26 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#endif

+/*
+ * When a fully enlightened TDX VM runs on Hyper-V, the firmware sets the
+ * HW_REDUCED flag: refer to acpi_tb_create_local_fadt(). Consequently ttyS0
+ * interrupts can't work because request_irq() -> ... -> irq_to_desc() returns
+ * NULL for ttyS0. This happens because mp_config_acpi_legacy_irqs() sees a
+ * nr_legacy_irqs() of 0, so it doesn't initialize the array 'mp_irqs[]', and
+ * later setup_IO_APIC_irqs() -> find_irq_entry() fails to find the legacy irqs
+ * from the array and hence doesn't create the necessary irq description info.
+ *
+ * Clone arch/x86/kernel/acpi/boot.c: acpi_generic_reduced_hw_init() here,
+ * except don't change 'legacy_pic', which keeps its default value
+ * 'default_legacy_pic'. This way, mp_config_acpi_legacy_irqs() sees a non-zero
+ * nr_legacy_irqs() and eventually serial console interrupts works properly.
+ */
+static void __init reduced_hw_init(void)
+{
+ x86_init.timers.timer_init = x86_init_noop;
+ x86_init.irqs.pre_vector_init = x86_init_noop;
+}
+
static void __init ms_hyperv_init_platform(void)
{
int hv_max_functions_eax;
@@ -442,6 +462,8 @@ static void __init ms_hyperv_init_platform(void)

/* Don't trust Hyper-V's TLB-flushing hypercalls. */
ms_hyperv.hints &= ~HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED;
+
+ x86_init.acpi.reduced_hw_early_init = reduced_hw_init;
}
}
}
--
2.25.1