LinuxLists.cc - Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event

2021-10-17 19:45:34

Subject: Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> From: Tom Lendacky <[email protected]>
>
> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> guests to alter the register state of the APs on their own. This allows
> the guest a way of simulating INIT-SIPI.
>
> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> so as to avoid updating the VMSA pointer while the vCPU is running.
>
> For CREATE
> The guest supplies the GPA of the VMSA to be used for the vCPU with the
> specified APIC ID. The GPA is saved in the svm struct of the target
> vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
> vCPU and then the vCPU is kicked.
>
> For CREATE_ON_INIT:
> The guest supplies the GPA of the VMSA to be used for the vCPU with the
> specified APIC ID the next time an INIT is performed. The GPA is saved
> in the svm struct of the target vCPU.
>
> For DESTROY:
> The guest indicates it wishes to stop the vCPU. The GPA is cleared from
> the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
> to vCPU and then the vCPU is kicked.
>
>
> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
> a result of the event or as a result of an INIT. The handler sets the vCPU
> to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
> vCPU as not runnable. Any previous VMSA pages that were installed as
> part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
> to be installed, the VMSA guest page is pinned and set as the VMSA in the
> vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
> VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
> vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
> run.

LOL, this part of the GHCB is debatable, though I guess it does say "may"...

Using VMGEXIT SW_EXITCODE 0x8000_0013, an SEV-SNP guest can create or update the
vCPU state of an AP, which may allow for a simpler and more secure method of
^^^^^^^
booting an AP.

> + if (VALID_PAGE(svm->snp_vmsa_pfn)) {

KVM's VMSA page should be freed on a successful "switch", because AFAICT it's
incorrect for KVM to ever go back to the original VMSA.

> + /*
> + * The snp_vmsa_pfn fields holds the hypervisor physical address
> + * of the about to be replaced VMSA which will no longer be used
> + * or referenced, so un-pin it.
> + */
> + kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
> + svm->snp_vmsa_pfn = INVALID_PAGE;
> + }
> +
> + if (VALID_PAGE(svm->snp_vmsa_gpa)) {
> + /*
> + * The VMSA is referenced by the hypervisor physical address,
> + * so retrieve the PFN and pin it.
> + */
> + pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));

Oh yay, a gfn. That means that the page is subject to memslot movement. I don't
think the code will break per se, but it's a wrinkle that's not handled.

I'm also pretty sure the page will effectively be leaked, I don't see a

kvm_release_pfn_dirty(svm->snp_vmsa_pfn);

in vCPU teardown.

Furthermore, letting the guest specify the page would open up to exploits of the
erratum where a spurious RMP violation is signaled if an in-use page, a.k.a. VMSA
page, is 2mb aligned. That also means the _guest_ needs to be somehow be aware
of the erratum.

And digging through the guest patches, this gives the guest _full_ control over
the VMSA contents. That is bonkers. At _best_ it gives the guest the ability to
fuzz VMRUN ucode by stuffing garbage into the VMSA.

Honestly, why should KVM even support guest-provided VMSAs? It's far, far simpler
to handle this fully in the guest with a BIOS<=>kernel mailbox; see the MP wakeup
protocol being added for TDX. That would allow improving the security for SEV-ES
as well, though I'm guessing no one actually cares about that in practice.

IIUC, the use case for VMPLs is that VMPL0 would be fully trusted by both the host
and guest, i.e. attacks via the VMSA are out-of-scope. That is very much not the
case here.

2021-10-20 21:49:47

by Brijesh Singh

[permalink] [raw]

Subject: Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event

On 10/15/21 2:50 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
>> guests to alter the register state of the APs on their own. This allows
>> the guest a way of simulating INIT-SIPI.
>>
>> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
>> so as to avoid updating the VMSA pointer while the vCPU is running.
>>
>> For CREATE
>> The guest supplies the GPA of the VMSA to be used for the vCPU with the
>> specified APIC ID. The GPA is saved in the svm struct of the target
>> vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
>> vCPU and then the vCPU is kicked.
>>
>> For CREATE_ON_INIT:
>> The guest supplies the GPA of the VMSA to be used for the vCPU with the
>> specified APIC ID the next time an INIT is performed. The GPA is saved
>> in the svm struct of the target vCPU.
>>
>> For DESTROY:
>> The guest indicates it wishes to stop the vCPU. The GPA is cleared from
>> the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>> to vCPU and then the vCPU is kicked.
>>
>>
>> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
>> a result of the event or as a result of an INIT. The handler sets the vCPU
>> to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
>> vCPU as not runnable. Any previous VMSA pages that were installed as
>> part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
>> to be installed, the VMSA guest page is pinned and set as the VMSA in the
>> vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
>> VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
>> vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
>> run.
> LOL, this part of the GHCB is debatable, though I guess it does say "may"...
>
> Using VMGEXIT SW_EXITCODE 0x8000_0013, an SEV-SNP guest can create or update the
> vCPU state of an AP, which may allow for a simpler and more secure method of
> ^^^^^^^
> booting an AP.
>
>> + if (VALID_PAGE(svm->snp_vmsa_pfn)) {
> KVM's VMSA page should be freed on a successful "switch", because AFAICT it's
> incorrect for KVM to ever go back to the original VMSA.
>
>> + /*
>> + * The snp_vmsa_pfn fields holds the hypervisor physical address
>> + * of the about to be replaced VMSA which will no longer be used
>> + * or referenced, so un-pin it.
>> + */
>> + kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
>> + svm->snp_vmsa_pfn = INVALID_PAGE;
>> + }
>> +
>> + if (VALID_PAGE(svm->snp_vmsa_gpa)) {
>> + /*
>> + * The VMSA is referenced by the hypervisor physical address,
>> + * so retrieve the PFN and pin it.
>> + */
>> + pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));
> Oh yay, a gfn. That means that the page is subject to memslot movement. I don't
> think the code will break per se, but it's a wrinkle that's not handled.
>
> I'm also pretty sure the page will effectively be leaked, I don't see a
>
> kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
>
> in vCPU teardown.
>
> Furthermore, letting the guest specify the page would open up to exploits of the
> erratum where a spurious RMP violation is signaled if an in-use page, a.k.a. VMSA
> page, is 2mb aligned. That also means the _guest_ needs to be somehow be aware
> of the erratum.

Good point Sean, a guest could exploit the IN_USE erratum in this case.
We need to somehow communicate this to guest so that it does not
allocate the VMSA at 2MB boundary. It would be nice if GHCB spec can add
a requirement that VMSA should not be a 2MB aligned. I will see what we
can do to address this.

> And digging through the guest patches, this gives the guest _full_ control over
> the VMSA contents. That is bonkers. At _best_ it gives the guest the ability to
> fuzz VMRUN ucode by stuffing garbage into the VMSA.

If guest puts garbage in VMSA then VMRUN will fail. I am sure ucode is
doing all kind of sanity checks to ensure that VMSA does not contain
invalid value before the run.

> Honestly, why should KVM even support guest-provided VMSAs? It's far, far simpler
> to handle this fully in the guest with a BIOS<=>kernel mailbox; see the MP wakeup
> protocol being added for TDX. That would allow improving the security for SEV-ES
> as well, though I'm guessing no one actually cares about that in practice.
> IIUC, the use case for VMPLs is that VMPL0 would be fully trusted by both the host
> and guest, i.e. attacks via the VMSA are out-of-scope. That is very much not the
> case here.

2021-10-20 23:02:03

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event

On Wed, Oct 20, 2021, Brijesh Singh wrote:
>
> On 10/15/21 2:50 PM, Sean Christopherson wrote:
> > And digging through the guest patches, this gives the guest _full_ control over
> > the VMSA contents. That is bonkers. At _best_ it gives the guest the ability to
> > fuzz VMRUN ucode by stuffing garbage into the VMSA.
>
> If guest puts garbage in VMSA then VMRUN will fail. I am sure ucode is
> doing all kind of sanity checks to ensure that VMSA does not contain
> invalid value before the run.

Oh, I'm well aware of the number of sanity checks that are in VM-Enter ucode, and
that's precisely why I'm of the opinion that letting the guest fuzz VMRUN is a
non-trivial security risk for the host. I know of at least at least two VMX bugs
(one erratum that I could find, one that must have been fixed with a ucode patch?)
where ucode failed to detect invalid state. Those were "benign" in that they
caused a missed VM-Fail but didn't corrupt CPU state, but it's not a stretch to
imagine a ucode bug that leads to corruption of CPU state and a system crash.

The sheer number of checks involved, combined with the fact that there likely
hasn't been much fuzzing of VM-Enter outside of the hardware vendor's own
validation, means I'm not exactly brimming with confidence that VMRUN's ucode
is perfect.

I fully acknowledge that the host kernel obviously "trusts" CPU ucode to a great
extent. My point here is that the design exposes the host to unnecessary risk.