2022-11-17 17:53:49

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH] KVM: x86: avoid memslot check in NX hugepage recovery if it cannot be true

Since gfn_to_memslot() is relatively expensive, it helps to
skip it if it the memslot cannot possibly have dirty logging
enabled. In order to do this, add to struct kvm a counter
of the number of log-page memslots. While the correct value
can only be read with slots_lock taken, the NX recovery thread
is content with using an approximate value. Therefore, the
counter is an atomic_t.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 22 +++++++++++++++++++---
include/linux/kvm_host.h | 5 +++++
virt/kvm/kvm_main.c | 5 +++++
3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cfff74685a25..d4ec9491d468 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6878,16 +6878,32 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
WARN_ON_ONCE(!sp->role.direct);

- slot = gfn_to_memslot(kvm, sp->gfn);
- WARN_ON_ONCE(!slot);
-
/*
* Unaccount and do not attempt to recover any NX Huge Pages
* that are being dirty tracked, as they would just be faulted
* back in as 4KiB pages. The NX Huge Pages in this slot will be
* recovered, along with all the other huge pages in the slot,
* when dirty logging is disabled.
+ *
+ * Since gfn_to_memslot() is relatively expensive, it helps to
+ * skip it if it the test cannot possibly return true. On the
+ * other hand, if any memslot has logging enabled, chances are
+ * good that all of them do, in which case unaccount_nx_huge_page()
+ * is much cheaper than zapping the page.
+ *
+ * If a memslot update is in progress, reading an incorrect value
+ * of kvm->nr_logpage_memslots is not a problem: if it is becoming
+ * zero, gfn_to_memslot() will be done unnecessarily; if it is
+ * becoming nonzero, the page will be zapped unnecessarily.
+ * Either way, this only affects efficiency in racy situations,
+ * and not correctness.
*/
+ slot = NULL;
+ if (atomic_read(&kvm->nr_logpage_memslots)) {
+ slot = gfn_to_memslot(kvm, sp->gfn);
+ WARN_ON_ONCE(!slot);
+ }
+
if (slot && kvm_slot_dirty_track_enabled(slot))
unaccount_nx_huge_page(kvm, sp);
else if (is_tdp_mmu_page(sp))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e6e66c5e56f2..b3c2b975e737 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -722,6 +722,11 @@ struct kvm {
/* The current active memslot set for each address space */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
struct xarray vcpu_array;
+ /*
+ * Protected by slots_lock, but can be read outside if an
+ * incorrect answer is acceptable.
+ */
+ atomic_t nr_logpage_memslots;

/* Used to wait for completion of MMU notifiers. */
spinlock_t mn_invalidate_lock;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 43bbe4fde078..7670ebd29bcf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1627,6 +1627,11 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
}
}

+ atomic_set(&kvm->nr_logpage_memslots,
+ atomic_read(&kvm->nr_logpage_memslots)
+ + !!(new->flags & KVM_MEM_LOG_DIRTY_PAGES)
+ - !!(old->flags & KVM_MEM_LOG_DIRTY_PAGES));
+
r = kvm_arch_prepare_memory_region(kvm, old, new, change);

/* Free the bitmap on failure if it was allocated above. */
--
2.31.1



2022-11-17 18:29:03

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: avoid memslot check in NX hugepage recovery if it cannot be true

On Thu, Nov 17, 2022 at 9:31 AM Paolo Bonzini <[email protected]> wrote:
>
> Since gfn_to_memslot() is relatively expensive, it helps to
> skip it if it the memslot cannot possibly have dirty logging
> enabled. In order to do this, add to struct kvm a counter
> of the number of log-page memslots. While the correct value
> can only be read with slots_lock taken, the NX recovery thread
> is content with using an approximate value. Therefore, the
> counter is an atomic_t.

Oo, good idea to use the counter to skip gfn_to_memslot() in the steady state.

FYI I sent an earlier patch to add an equivalent counter in case
you want to use that and apply the change to
kvm_recover_nx_huge_pages() as a separate patch.

https://lore.kernel.org/kvm/[email protected]/

>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 22 +++++++++++++++++++---
> include/linux/kvm_host.h | 5 +++++
> virt/kvm/kvm_main.c | 5 +++++
> 3 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index cfff74685a25..d4ec9491d468 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6878,16 +6878,32 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
> WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> WARN_ON_ONCE(!sp->role.direct);
>
> - slot = gfn_to_memslot(kvm, sp->gfn);
> - WARN_ON_ONCE(!slot);
> -
> /*
> * Unaccount and do not attempt to recover any NX Huge Pages
> * that are being dirty tracked, as they would just be faulted
> * back in as 4KiB pages. The NX Huge Pages in this slot will be
> * recovered, along with all the other huge pages in the slot,
> * when dirty logging is disabled.
> + *
> + * Since gfn_to_memslot() is relatively expensive, it helps to
> + * skip it if it the test cannot possibly return true. On the
> + * other hand, if any memslot has logging enabled, chances are
> + * good that all of them do, in which case unaccount_nx_huge_page()
> + * is much cheaper than zapping the page.
> + *
> + * If a memslot update is in progress, reading an incorrect value
> + * of kvm->nr_logpage_memslots is not a problem: if it is becoming
> + * zero, gfn_to_memslot() will be done unnecessarily; if it is
> + * becoming nonzero, the page will be zapped unnecessarily.
> + * Either way, this only affects efficiency in racy situations,
> + * and not correctness.
> */
> + slot = NULL;
> + if (atomic_read(&kvm->nr_logpage_memslots)) {
> + slot = gfn_to_memslot(kvm, sp->gfn);
> + WARN_ON_ONCE(!slot);
> + }
> +
> if (slot && kvm_slot_dirty_track_enabled(slot))
> unaccount_nx_huge_page(kvm, sp);
> else if (is_tdp_mmu_page(sp))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e6e66c5e56f2..b3c2b975e737 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,6 +722,11 @@ struct kvm {
> /* The current active memslot set for each address space */
> struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
> struct xarray vcpu_array;
> + /*
> + * Protected by slots_lock, but can be read outside if an
> + * incorrect answer is acceptable.
> + */
> + atomic_t nr_logpage_memslots;

Can also be int + READ_ONCE(), but I do like that atomic_t forces the
reader to use atomic_read().

>
> /* Used to wait for completion of MMU notifiers. */
> spinlock_t mn_invalidate_lock;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 43bbe4fde078..7670ebd29bcf 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1627,6 +1627,11 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
> }
> }
>
> + atomic_set(&kvm->nr_logpage_memslots,
> + atomic_read(&kvm->nr_logpage_memslots)
> + + !!(new->flags & KVM_MEM_LOG_DIRTY_PAGES)
> + - !!(old->flags & KVM_MEM_LOG_DIRTY_PAGES));

@new and @old can be NULL here if creating or destroying a memslot.


> +
> r = kvm_arch_prepare_memory_region(kvm, old, new, change);
>
> /* Free the bitmap on failure if it was allocated above. */
> --
> 2.31.1
>

2022-11-18 01:12:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: avoid memslot check in NX hugepage recovery if it cannot be true

On Thu, Nov 17, 2022, Paolo Bonzini wrote:
> + if (atomic_read(&kvm->nr_logpage_memslots)) {

Can we use something like nr_dirty_logged_memslots? logpage doesn't precisely
capture the "dirty log" aspect, e.g. for a (very brief) second I though this was
log(nr_memslot_pages).

> + slot = gfn_to_memslot(kvm, sp->gfn);
> + WARN_ON_ONCE(!slot);
> + }
> +
> if (slot && kvm_slot_dirty_track_enabled(slot))
> unaccount_nx_huge_page(kvm, sp);
> else if (is_tdp_mmu_page(sp))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e6e66c5e56f2..b3c2b975e737 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -722,6 +722,11 @@ struct kvm {
> /* The current active memslot set for each address space */
> struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
> struct xarray vcpu_array;
> + /*
> + * Protected by slots_lock, but can be read outside if an
> + * incorrect answer is acceptable.
> + */
> + atomic_t nr_logpage_memslots;
>
> /* Used to wait for completion of MMU notifiers. */
> spinlock_t mn_invalidate_lock;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 43bbe4fde078..7670ebd29bcf 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1627,6 +1627,11 @@ static int kvm_prepare_memory_region(struct kvm *kvm,

This needs to be done in the commit stage, e.g. if kvm_arch_prepare_memory_region()
fails the count will be all kinds of wrong. Even better, since this seems to be
x86-centric, put it in kvm_mmu_slot_apply_flags() under the

if ((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES)

to avoid atomic operations if dirty logging isn't being toggled. That would also
deal with the NULL pointer issues David pointed out.

> }
> }
>
> + atomic_set(&kvm->nr_logpage_memslots,
> + atomic_read(&kvm->nr_logpage_memslots)
> + + !!(new->flags & KVM_MEM_LOG_DIRTY_PAGES)
> + - !!(old->flags & KVM_MEM_LOG_DIRTY_PAGES));

I belive this can be:

atomic_add(+ !!(new_flags & KVM_MEM_LOG_DIRTY_PAGES)
- !!(old_flags & KVM_MEM_LOG_DIRTY_PAGES), ...);

or less weirdly...

if ((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES) {
...

if (new_flags & KVM_MEM_LOG_DIRTY_PAGES)
atomic_inc(...);
else
atomic_dec(...);
}