On 5/10/24 15:58, Sean Christopherson wrote:
> On Thu, May 09, 2024, Michael Roth wrote:
>> The intended logic when handling #NPFs with the RMP bit set (31) is to
>> first check to see if the #NPF requires a shared<->private transition
>> and, if so, to go ahead and let the corresponding KVM_EXIT_MEMORY_FAULT
>> get forwarded on to userspace before proceeding with any handling of
>> other potential RMP fault conditions like needing to PSMASH the RMP
>> entry/etc (which will be done later if the guest still re-faults after
>> the KVM_EXIT_MEMORY_FAULT is processed by userspace).
>>
>> The determination of whether any userspace handling of
>> KVM_EXIT_MEMORY_FAULT is needed is done by interpreting the return code
>> of kvm_mmu_page_fault(). However, the current code misinterprets the
>> return code, expecting 0 to indicate a userspace exit rather than less
>> than 0 (-EFAULT). This leads to the following unexpected behavior:
>>
>> - for KVM_EXIT_MEMORY_FAULTs resulting for implicit shared->private
>> conversions, warnings get printed from sev_handle_rmp_fault()
>> because it does not expect to be called for GPAs where
>> KVM_MEMORY_ATTRIBUTE_PRIVATE is not set. Standard linux guests don't
>> generally do this, but it is allowed and should be handled
>> similarly to private->shared conversions rather than triggering any
>> sort of warnings
>>
>> - if gmem support for 2MB folios is enabled (via currently out-of-tree
>> code), implicit shared<->private conversions will always result in
>> a PSMASH being attempted, even if it's not actually needed to
>> resolve the RMP fault. This doesn't cause any harm, but results in a
>> needless PSMASH and zapping of the sPTE
>>
>> Resolve these issues by calling sev_handle_rmp_fault() only when
>> kvm_mmu_page_fault()'s return code is greater than or equal to 0,
>> indicating a KVM_MEMORY_EXIT_FAULT/-EFAULT isn't needed. While here,
>> simplify the code slightly and fix up the associated comments for better
>> clarity.
>>
>> Fixes: ccc9d836c5c3 ("KVM: SEV: Add support to handle RMP nested page faults")
>>
>> Signed-off-by: Michael Roth <[email protected]>
>> ---
>> arch/x86/kvm/svm/svm.c | 10 ++++------
>> 1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 426ad49325d7..9431ce74c7d4 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -2070,14 +2070,12 @@ static int npf_interception(struct kvm_vcpu *vcpu)
>> svm->vmcb->control.insn_len);
>>
>> /*
>> - * rc == 0 indicates a userspace exit is needed to handle page
>> - * transitions, so do that first before updating the RMP table.
>> + * rc < 0 indicates a userspace exit may be needed to handle page
>> + * attribute updates, so deal with that first before handling other
>> + * potential RMP fault conditions.
>> */
>> - if (error_code & PFERR_GUEST_RMP_MASK) {
>> - if (rc == 0)
>> - return rc;
>> + if (rc >= 0 && error_code & PFERR_GUEST_RMP_MASK)
>
> This isn't correct either. A return of '0' also indiciates "exit to userspace",
> it just doesn't happen with SNP because '0' is returned only when KVM attempts
> emulation, and that too gets short-circuited by svm_check_emulate_instruction().
>
> And I would honestly drop the comment, KVM's less-than-pleasant 1/0/-errno return
> values overload is ubiquitous enough that it should be relatively self-explanatory.
>
> Or if you prefer to keep a comment, drop the part that specifically calls out
> attributes updates, because that incorrectly implies that's the _only_ reason
> why KVM checks the return. But my vote is to drop the comment, because it
> essentially becomes "don't proceed to step 2 if step 1 failed", which kind of
> makes the reader go "well, yeah".
So IIUC you're suggesting
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 426ad49325d7..c39eaeb21981 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2068,16 +2068,11 @@ static int npf_interception(struct kvm_vcpu *vcpu)
static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
svm->vmcb->control.insn_bytes : NULL,
svm->vmcb->control.insn_len);
+ if (rc <= 0)
+ return rc;
- /*
- * rc == 0 indicates a userspace exit is needed to handle page
- * transitions, so do that first before updating the RMP table.
- */
- if (error_code & PFERR_GUEST_RMP_MASK) {
- if (rc == 0)
- return rc;
+ if (error_code & PFERR_GUEST_RMP_MASK)
sev_handle_rmp_fault(vcpu, fault_address, error_code);
- }
return rc;
}
?
So, we're... a bit tight for 6.10 to include SNP and that is an
understatement. My plan is to merge it for 6.11, but do so
immediately after the merge window ends. In other words, it
is a delay in terms of release but not in terms of time. I
don't want QEMU and kvm-unit-tests work to be delayed any
further, in particular.
Once we sort out the loose ends of patches 21-23, you could send
it as a pull request.
Paolo