by Mingwei Zhang

[permalink] [raw]

Subject: Re: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches

On Tue, Jan 3, 2023 at 5:00 PM Vipin Sharma <[email protected]> wrote:
>
> On Tue, Jan 3, 2023 at 11:32 AM Mingwei Zhang <[email protected]> wrote:
> >
> > On Wed, Dec 21, 2022 at 6:35 PM Vipin Sharma <[email protected]> wrote:
> > >
> > > +static void mmu_free_sp_memory_cache(struct kvm_mmu_memory_cache *cache,
> > > + spinlock_t *cache_lock)
> > > +{
> > > + int orig_nobjs;
> > > +
> > > + spin_lock(cache_lock);
> > > + orig_nobjs = cache->nobjs;
> > > + kvm_mmu_free_memory_cache(cache);
> > > + if (orig_nobjs)
> > > + percpu_counter_sub(&kvm_total_unused_mmu_pages, orig_nobjs);
> > > +
> > > + spin_unlock(cache_lock);
> > > +}
> >
> > I think the mmu_cache allocation and deallocation may cause the usage
> > of GFP_ATOMIC (as observed by other reviewers as well). Adding a new
> > lock would definitely sound like a plan, but I think it might affect
> > the performance. Alternatively, I am wondering if we could use a
> > mmu_cache_sequence similar to mmu_notifier_seq to help avoid the
> > concurrency?
> >
>
> Can you explain more about the performance impact? Each vcpu will have
> its own mutex. So, only contention will be with the mmu_shrinker. This
> shrinker will use mutex_try_lock() which will not block to wait for
> the lock, it will just pass on to the next vcpu. While shrinker is
> holding the lock, vcpu will be blocked in the page fault path but I
> think it should not have a huge impact considering it will execute
> rarely and for a small time.
>
> > Similar to mmu_notifier_seq, mmu_cache_sequence should be protected by
> > mmu write lock. In the page fault path, each vcpu has to collect a
> > snapshot of mmu_cache_sequence before calling into
> > mmu_topup_memory_caches() and check the value again when holding the
> > mmu lock. If the value is different, that means the mmu_shrinker has
> > removed the cache objects and because of that, the vcpu should retry.
> >
>
> Yeah, this can be one approach. I think it will come down to the
> performance impact of using mutex which I don't think should be a
> concern.

hmm, I think you are right that there is no performance overhead by
adding a mutex and letting the shrinker using mutex_trylock(). The
point of using a sequence counter is to avoid the new lock, since
introducing a new lock will increase management burden. So unless it
is necessary, we probably should choose a simple solution first.

In this case, I think we do have such a choice and since a similar
mechanism has already been used by mmu_notifiers.

best
-Mingwei

2023-01-16 04:40:32

@all, trim your replies!

On Tue, Jan 03, 2023, Vipin Sharma wrote:
> On Tue, Jan 3, 2023 at 10:01 AM Vipin Sharma <[email protected]> wrote:
> >
> > On Thu, Dec 29, 2022 at 1:55 PM David Matlack <[email protected]> wrote:
> > > > @@ -6646,66 +6690,49 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
> > > > static unsigned long
> > > > mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > {
> > > > - struct kvm *kvm;
> > > > - int nr_to_scan = sc->nr_to_scan;
> > > > + struct kvm_mmu_memory_cache *cache;
> > > > + struct kvm *kvm, *first_kvm = NULL;
> > > > unsigned long freed = 0;
> > > > + /* spinlock for memory cache */
> > > > + spinlock_t *cache_lock;
> > > > + struct kvm_vcpu *vcpu;
> > > > + unsigned long i;
> > > >
> > > > mutex_lock(&kvm_lock);
> > > >
> > > > list_for_each_entry(kvm, &vm_list, vm_list) {
> > > > - int idx;
> > > > - LIST_HEAD(invalid_list);
> > > > -
> > > > - /*
> > > > - * Never scan more than sc->nr_to_scan VM instances.
> > > > - * Will not hit this condition practically since we do not try
> > > > - * to shrink more than one VM and it is very unlikely to see
> > > > - * !n_used_mmu_pages so many times.
> > > > - */
> > > > - if (!nr_to_scan--)
> > > > + if (first_kvm == kvm)
> > > > break;
> > > > - /*
> > > > - * n_used_mmu_pages is accessed without holding kvm->mmu_lock
> > > > - * here. We may skip a VM instance errorneosly, but we do not
> > > > - * want to shrink a VM that only started to populate its MMU
> > > > - * anyway.
> > > > - */
> > > > - if (!kvm->arch.n_used_mmu_pages &&
> > > > - !kvm_has_zapped_obsolete_pages(kvm))
> > > > - continue;
> > > > + if (!first_kvm)
> > > > + first_kvm = kvm;
> > > > + list_move_tail(&kvm->vm_list, &vm_list);
> > > >
> > > > - idx = srcu_read_lock(&kvm->srcu);
> > > > - write_lock(&kvm->mmu_lock);
> > > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > >
> > > What protects this from racing with vCPU creation/deletion?
> > >
>
> vCPU deletion:
> We take kvm_lock in mmu_shrink_scan(), the same lock is taken in
> kvm_destroy_vm() to remove a vm from vm_list. So, once we are
> iterating vm_list we will not see any VM removal which will means no
> vcpu removal.
>
> I didn't find any other code for vCPU deletion except failures during
> VM and VCPU set up. A VM is only added to vm_list after successful
> creation.

Yep, KVM doesn't support destroying/freeing a vCPU after it's been added.

> vCPU creation:
> I think it will work.
>
> kvm_vm_ioctl_create_vcpus() initializes the vcpu, adds it to
> kvm->vcpu_array which is of the type xarray and is managed by RCU.
> After this online_vcpus is incremented. So, kvm_for_each_vcpu() which
> uses RCU to read entries, if it sees incremented online_vcpus value
> then it will also sees all of the vcpu initialization.

Yep. The shrinker may race with a vCPU creation, e.g. not process a just-created
vCPU, but that's totally ok in this case since the shrinker path is best effort
(and purging the caches of a newly created vCPU is likely pointless).

> @Sean, Paolo
>
> Is the above explanation correct, kvm_for_each_vcpu() is safe without any lock?

Well, in this case, you do need to hold kvm_lock ;-)

But yes, iterating over vCPUs without holding the per-VM kvm->lock is safe, the
caller just needs to ensure the VM can't be destroyed, i.e. either needs to hold
a reference to the VM or needs to hold kvm_lock.