2023-12-06 16:37:44

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC] x86/sev: Temporary disable CPU re-onlining for SEV-SNP

It was discovered that an attempt to re-online a CPU in a SEV-SNP enabled
instance in AWS leads to the immediate reboot upon SVM_VMGEXIT_AP_CREATE
VMGEXIT. While support for SEV-SNP in KVM is not yet upstream, it is
unclear whether the problem is guest related or if the hypervisor is not
handling the case correctly. Note, currently Linux doesn't do
SVM_VMGEXIT_AP_DESTROY upon CPU offlining but it is also not entirely clear
from the specification whether this is a must or a nice-to-have
action. When done prior to SVM_VMGEXIT_AP_CREATE on AWS, guest reboot is no
longer observed. Unfortunately, CPU still fails to come up ("CPU1 failed
to report alive state").

Note, SEV-SNP feature on Hyper-V uses a different CPU wakeup
path (see hv_snp_boot_ap() in arch/x86/hyperv/ivm.c) which uses a
hypercall. This one does not seem to have any issues with CPU re-onlining,
at least on publicly available Azure instances.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
RFC: I'm using this silly patch (which makes the problem a bit less severe
though) to ask if there are plans to make this work, either on the host or
on the guest side. Thanks!
---
arch/x86/kernel/sev.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 70472eebe719..f7e56cae05c5 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1005,6 +1005,10 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)

cur_vmsa = per_cpu(sev_vmsa, cpu);

+ /* Re-onlining CPUs is currently unsupported */
+ if (cur_vmsa)
+ return -EOPNOTSUPP;
+
/*
* A new VMSA is created each time because there is no guarantee that
* the current VMSA is the kernels or that the vCPU is not running. If
--
2.43.0