by Michael Roth

[permalink] [raw]

Subject: Re: [RFC PATCH v4 07/10] KVM: x86: Add gmem hook for initializing private memory

On Fri, Aug 25, 2023 at 07:59:41PM -0500, Michael Roth wrote:
> On Fri, Aug 18, 2023 at 03:27:47PM -0700, Sean Christopherson wrote:
> > This seems like a bug in the SNP code. (a) why would KVM/SNP PSMASH in that
> > scenario and (b) why can't it zap/split the NPT before PSMASH?
>
> a) A PSMASH will be needed at some point, since, as detailed above, the 4K
> NPT mapping requires the RMP entries for the pages it maps to be
> limited to 4K RMP entries, but...
> b) What would happen normally[1] is the guest would issue PVALIDATE to
> *rescind* the validated status of that 4K GPA before issuing the GHCB
> request to convert it to shared. This would cause an
> #NPF(RMP+SIZE_MISMATCH) and handle_rmp_page_fault() would PSMASH the RMP
> entry so the PVALIDATE can succeed.
>
> So KVM doesn't even really have the option of deferring the PSMASH, it
> happens before the SET_MEMORY_ATTRIBUTES is even issued to zap the 2MB
> NPT mapping and switch the 2M range to 'mixed'. Because of that, we also
> need a hook in the KVM MMU code to clamp the max mapping level based
> on RMP entry size. Currently the kvm_gmem_prepare() in this patch
> doubles for handling that clamping, so we would still need a similar
> hook for that if we move the RMP initialization stuff to allocation
> time.
>
> [1] This handling is recommended for 'well-behaved' guests according to
> GHCB, but I don't see it documented as a hard requirement anywhere, so there
> is a possibility that that we have to deal with a guest that doesn't do this.
> What would happen then is the PVALIDATE wouldn't trigger the #NPF(RMP+SIZEM),
> and instead the SET_MEMORY_ATTRIBUTES would zap the 2MB mapping, install
> 4K entries on next #NPF(NOT_PRESENT), and at *that* point we would get
> an #NPF(RMP) without the SIZEM bit set, due to the behavior described in
> the beginning of this email.
>
> handle_rmp_page_fault() can do the corresponding PSMASH to deal with that,
> but it is a little unfortunate since we can't differentiate that case from a
> spurious/unexpected RMP faults, so would need to attempt a PSMASH in all
> cases, sometimes failing.

The spurious case here is when the guest is accessing a private page
that's just been PSMASH'd by another thread. I thought these might still
occur before the PSMASH has completed so we'd still potentially see the
page-size bit set in the RMP entry, but the RMP faults only happen after
the PSMASH has finished, so the spurious cases can be filtered out by
just checking if the page-size bit is set before attempting a PSMASH.

>
> gmem itself could also trigger this case if the lpage_info_slot() tracking
> ever became more granular than what the guest was expected (which I don't
> think would happen normally, but I may have hit one case where it does, but
> haven't had a chance to debug if that's on the lpage_info_slot() side or
> something else on the SNP end.

Turns out that was for a case where the shared/private attributes for the
2M range really were mixed at the time of access. In this case it is
when OVMF converts some shared memory to private during early boot: the
end address is not 2MB-aligned, so it is in a mixed region, and the NPT
mapping is 4K, but the RMP entry is initialized as 2MB. In this case the
PVALIDATE and NPT agree on the 4K mapping size so the SIZEM bit isn't set,
just the #NPF(RMP).

So we need to be able to deal with that even for 'well-behaved' guests.
With RMP-init-during-mapping-time approach I had some checks that avoided
creating the 2MB RMP entry in this mixed case which is why I didn't need
handling for this previously. But it's just one extra #NPF(RMP) and can
be handled cleanly since it can be distinguished from spurious cases.

-Mike

2024-02-22 02:06:19

by Sean Christopherson

[permalink] [raw]

Subject: Re: [RFC PATCH v4 04/10] KVM: x86: Introduce PFERR_GUEST_ENC_MASK to indicate fault is private

On Fri, Jul 21, 2023, Isaku Yamahata wrote:
> From: Isaku Yamahata <[email protected]>
> Date: Wed, 14 Jun 2023 12:34:00 -0700
> Subject: [PATCH 4/8] KVM: x86: Use PFERR_GUEST_ENC_MASK to indicate fault is
> private
>
> SEV-SNP defines PFERR_GUEST_ENC_MASK (bit 32) in page-fault error bits to
> represent the guest page is encrypted. Use the bit to designate that the
> page fault is private and that it requires looking up memory attributes.
> The vendor kvm page fault handler should set PFERR_GUEST_ENC_MASK bit based
> on their fault information. It may or may not use the hardware value
> directly or parse the hardware value to set the bit.
>
> For KVM_X86_SW_PROTECTED_VM, ask memory attributes for the fault
> privateness. For async page fault, carry the bit and use it for kvm page
> fault handler.
>
> Signed-off-by: Isaku Yamahata <[email protected]>

..

> @@ -4315,7 +4316,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
> work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
> return;
>
> - kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
> + kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
> + true, NULL);

This is unnecessary, KVM doesn't suppoort async page fault behavior for private
memory (and doesn't need to, because guest_memmfd() doesn't support swap).

> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 7f9ec1e5b136..3a423403af01 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -295,13 +295,13 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> .user = err & PFERR_USER_MASK,
> .prefetch = prefetch,
> .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> + .is_private = err & PFERR_GUEST_ENC_MASK,

This breaks SEV and SEV-ES guests, because AFAICT, the APM is lying by defining
PFERR_GUEST_ENC_MASK in the context of SNP. The flag isn't just set when running
SEV-SNP guests, it's set for all C-bit=1 effective accesses when running on SNP
capable hardware (at least, that's my observation).

Grumpiness about discovering yet another problem that I would have expected
_someone_ to stumble upon...

FYI, I'm going to post a rambling series to cleanup code in the page fault path
(it started as a cleanup of the "no slot" code and then grew a few more heads).
One of the patches I'm going to include is something that looks like this patch,
but I'm going to use a KVM-defined synthetic bit, because stuffing a bit that KVM
would need _clear_ on _some_ hardware is gross.