2023-08-19 17:01:41

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end()


On 8/18/2023 12:55 PM, Sean Christopherson wrote:
> On Tue, Aug 15, 2023, [email protected] wrote:
>> From: Isaku Yamahata <[email protected]>
>>
>> kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress
>> and it's protected by kvm::mmu_lock. call kvm_mmu_invalidate_end() before
>> unlocking it. Not after the unlock.
>>
>> Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes")
>
> This fixes is wrong. It won't matter in the long run, but it makes my life that
> much harder.
>
>> Signed-off-by: Isaku Yamahata <[email protected]>
>> ---
>> virt/kvm/kvm_main.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 8bfeb615fc4d..49380cd62367 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range {
>> } arg;
>> gfn_handler_t handler;
>> on_lock_fn_t on_lock;
>> + on_unlock_fn_t before_unlock;
>> on_unlock_fn_t on_unlock;
>
> Ugh, shame on my past me. Having on_lock and on_unlock be asymmetrical with respect
> to the lock is nasty.
>
> I would much rather we either (a) be explicit, e.g. before_(un)lock and after_(un)lock,
> or (b) have just on_(un)lock, make them symetrical, and handle the SEV mess a
> different way.
>
> The SEV hook doesn't actually care about running immediately after unlock, it just
> wants to know if there was an overlapping memslot. It can run after SRCU is dropped,
> because even if we make the behavior more precise (right now it blasts WBINVD),
> just having a reference to memslots isn't sufficient, the code needs to guarantee
> memslots are *stable*. And that is already guaranteed by the notifier code, i.e.
> the SEV code could just reacquire SRCU.

On a separate note here, the SEV hook blasting WBINVD is still causing
serious performance degradation issues with SNP triggered via
AutoNUMA/numad/KSM, etc. With reference to previous discussions related
to it, we have plans to replace WBINVD with CLFLUSHOPT.

Pasting your previous thoughts on the same:

For SNP guests, KVM should use CLFLUSHOPT and not WBINVD.
That will slow down the SNP guest itself, but it should eliminate the
noisy neighbor problems.

In theory, KVM could do the same for SEV/SEV-ES guests, but that's
subtly quite difficult, because in order to use CLFLUSHOPT, the kernel
needs a valid VA=>PA mapping.
Because mmu_notifier_invalidate_range_start() calls aren't fully
serialized, KVM would encounter situations where there is no valid
mapping for the userspace VA.
KVM could ignore those, but IIRC when Mingwei and I last looked at this,
we weren't super confident that KVM wouldn't miss edge cases.

Using KVM's SPTEs to get the PA isn't a great option, as that would
require KVM to flush whenever a leaf SPTE were zapped, i.e. even when
_KVM_ initiates the zap.

UPM is supposed to make this easier because the notifier should be able
to provide the PFN(s) being unmapped and the use the direct map to
flush. I don't think the proposed series actually provides the PFN, but
it should not be difficult to add.

Thanks,
Ashish