LinuxLists.cc - [PATCH 0/4] Verify dirty logging works properly with page stats

2022-03-21 22:07:33

Subject: [PATCH 0/4] Verify dirty logging works properly with page stats

This patchset is to verify if dirty logging works properly by checking page
stats from the per-VM interface. We discovered one performance bug in
disallowed_hugepage_adjust() which prevents KVM from recovering large pages
for the guest. The selftest logic added later could help validate the
problem.

The patchset borrowes two patches come from Ben's series: "[PATCH 00/13]
KVM: x86: Add a cap to disable NX hugepages on a VM" [1], which completes the
selftest library functions to use the stats interface.

[1] https://lore.kernel.org/all/[email protected]/T/

Ben Gardon (2):
selftests: KVM: Dump VM stats in binary stats test
selftests: KVM: Test reading a single stat

Mingwei Zhang (2):
KVM: x86/mmu: explicitly check nx_hugepage in
disallowed_hugepage_adjust()
selftests: KVM: use dirty logging to check if page stats work
correctly

arch/x86/kvm/mmu/mmu.c | 14 +-
.../selftests/kvm/dirty_log_perf_test.c | 52 +++++
.../selftests/kvm/include/kvm_util_base.h | 2 +
.../selftests/kvm/kvm_binary_stats_test.c | 6 +
tools/testing/selftests/kvm/lib/kvm_util.c | 196 ++++++++++++++++++
5 files changed, 268 insertions(+), 2 deletions(-)

--
2.35.1.894.gb6a874cedc-goog

2022-03-21 23:15:53

by Mingwei Zhang

[permalink] [raw]

Subject: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

Add extra check to specify the case of nx hugepage and allow KVM to
reconstruct large mapping after dirty logging is disabled. Existing code
works only for nx hugepage but the condition is too general in that does
not consider other usage case (such as dirty logging). Moreover, existing
code assumes that a present PMD or PUD indicates that there exist 'smaller
SPTEs' under the paging structure. This assumption may no be true if
consider the zapping leafs only behavior in MMU.

Missing the check causes KVM incorrectly regards the faulting page as a NX
huge page and refuse to map it at desired level. And this leads to back
performance in shadow mmu and potentiall TDP mmu.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: [email protected]

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Mingwei Zhang <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5628d0ba637e..4d358c273f6c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
cur_level == fault->goal_level &&
is_shadow_present_pte(spte) &&
!is_large_pte(spte)) {
+ struct kvm_mmu_page *sp;
+ u64 page_mask;
+ /*
+ * When nx hugepage flag is not set, there is no reason to
+ * go down to another level. This helps demand paging to
+ * generate large mappings.
+ */
+ sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+ if (!sp->lpage_disallowed)
+ return;
/*
* A small SPTE exists for this pfn, but FNAME(fetch)
* and __direct_map would like to create a large PTE
@@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
* patching back for them into pfn the next 9 bits of
* the address.
*/
- u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
- KVM_PAGES_PER_HPAGE(cur_level - 1);
+ page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+ KVM_PAGES_PER_HPAGE(cur_level - 1);
fault->pfn |= fault->gfn & page_mask;
fault->goal_level--;
}
--
2.35.1.894.gb6a874cedc-goog

2022-03-21 23:37:12

by David Matlack

[permalink] [raw]

Subject: Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

On Mon, Mar 21, 2022 at 3:00 PM David Matlack <[email protected]> wrote:
>
> On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <[email protected]> wrote:
> >
> > Add extra check to specify the case of nx hugepage and allow KVM to
> > reconstruct large mapping after dirty logging is disabled. Existing code
> > works only for nx hugepage but the condition is too general in that does
> > not consider other usage case (such as dirty logging).
>
> KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
> disabled. Why is that not sufficient?

Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs.
Could you add a blurb about this in the commit message for future
reference?

>
> > Moreover, existing
> > code assumes that a present PMD or PUD indicates that there exist 'smaller
> > SPTEs' under the paging structure. This assumption may no be true if
> > consider the zapping leafs only behavior in MMU.
>
> Good point. Although, that code just got reverted. Maybe say something like:
>
> This assumption may not be true in the future if KVM gains support
> for zapping only leaf SPTEs.

Nevermind, support for zapping leaf SPTEs already exists for zapping
collapsible SPTEs.

>
> >
> > Missing the check causes KVM incorrectly regards the faulting page as a NX
> > huge page and refuse to map it at desired level. And this leads to back
> > performance in shadow mmu and potentiall TDP mmu.
>
> s/potentiall/potentially/
>
> >
> > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> > Cc: [email protected]
> >
> > Reviewed-by: Ben Gardon <[email protected]>
> > Signed-off-by: Mingwei Zhang <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 5628d0ba637e..4d358c273f6c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > cur_level == fault->goal_level &&
> > is_shadow_present_pte(spte) &&
> > !is_large_pte(spte)) {
> > + struct kvm_mmu_page *sp;
> > + u64 page_mask;
> > + /*
> > + * When nx hugepage flag is not set, there is no reason to
> > + * go down to another level. This helps demand paging to
> > + * generate large mappings.
> > + */
> > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> > + if (!sp->lpage_disallowed)
> > + return;
> > /*
> > * A small SPTE exists for this pfn, but FNAME(fetch)
> > * and __direct_map would like to create a large PTE
> > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > * patching back for them into pfn the next 9 bits of
> > * the address.
> > */
> > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > - KVM_PAGES_PER_HPAGE(cur_level - 1);
> > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > + KVM_PAGES_PER_HPAGE(cur_level - 1);
> > fault->pfn |= fault->gfn & page_mask;
> > fault->goal_level--;
> > }
> > --
> > 2.35.1.894.gb6a874cedc-goog
> >

2022-03-21 23:39:41

by David Matlack

[permalink] [raw]

Subject: Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <[email protected]> wrote:
>
> Add extra check to specify the case of nx hugepage and allow KVM to
> reconstruct large mapping after dirty logging is disabled. Existing code
> works only for nx hugepage but the condition is too general in that does
> not consider other usage case (such as dirty logging).

KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
disabled. Why is that not sufficient?

> Moreover, existing
> code assumes that a present PMD or PUD indicates that there exist 'smaller
> SPTEs' under the paging structure. This assumption may no be true if
> consider the zapping leafs only behavior in MMU.

Good point. Although, that code just got reverted. Maybe say something like:

This assumption may not be true in the future if KVM gains support
for zapping only leaf SPTEs.

>
> Missing the check causes KVM incorrectly regards the faulting page as a NX
> huge page and refuse to map it at desired level. And this leads to back
> performance in shadow mmu and potentiall TDP mmu.

s/potentiall/potentially/

>
> Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> Cc: [email protected]
>
> Reviewed-by: Ben Gardon <[email protected]>
> Signed-off-by: Mingwei Zhang <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5628d0ba637e..4d358c273f6c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> cur_level == fault->goal_level &&
> is_shadow_present_pte(spte) &&
> !is_large_pte(spte)) {
> + struct kvm_mmu_page *sp;
> + u64 page_mask;
> + /*
> + * When nx hugepage flag is not set, there is no reason to
> + * go down to another level. This helps demand paging to
> + * generate large mappings.
> + */
> + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> + if (!sp->lpage_disallowed)
> + return;
> /*
> * A small SPTE exists for this pfn, but FNAME(fetch)
> * and __direct_map would like to create a large PTE
> @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> * patching back for them into pfn the next 9 bits of
> * the address.
> */
> - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> - KVM_PAGES_PER_HPAGE(cur_level - 1);
> + page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> + KVM_PAGES_PER_HPAGE(cur_level - 1);
> fault->pfn |= fault->gfn & page_mask;
> fault->goal_level--;
> }
> --
> 2.35.1.894.gb6a874cedc-goog
>

2022-03-22 05:25:08

by Mingwei Zhang

[permalink] [raw]

Subject: Re: [PATCH 3/4] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

On Mon, Mar 21, 2022, David Matlack wrote:
> On Mon, Mar 21, 2022 at 3:00 PM David Matlack <[email protected]> wrote:
> >
> > On Sun, Mar 20, 2022 at 5:26 PM Mingwei Zhang <[email protected]> wrote:
> > >
> > > Add extra check to specify the case of nx hugepage and allow KVM to
> > > reconstruct large mapping after dirty logging is disabled. Existing code
> > > works only for nx hugepage but the condition is too general in that does
> > > not consider other usage case (such as dirty logging).
> >
> > KVM calls kvm_mmu_zap_collapsible_sptes() when dirty logging is
> > disabled. Why is that not sufficient?
>
> Ahh I see, kvm_mmu_zap_collapsible_sptes() only zaps the leaf SPTEs.
> Could you add a blurb about this in the commit message for future
> reference?
>

will do.

> >
> > > Moreover, existing
> > > code assumes that a present PMD or PUD indicates that there exist 'smaller
> > > SPTEs' under the paging structure. This assumption may no be true if
> > > consider the zapping leafs only behavior in MMU.
> >
> > Good point. Although, that code just got reverted. Maybe say something like:
> >
> > This assumption may not be true in the future if KVM gains support
> > for zapping only leaf SPTEs.
>
> Nevermind, support for zapping leaf SPTEs already exists for zapping
> collapsible SPTEs.
>
>
>
> >
> > >
> > > Missing the check causes KVM incorrectly regards the faulting page as a NX
> > > huge page and refuse to map it at desired level. And this leads to back
> > > performance in shadow mmu and potentiall TDP mmu.
> >
> > s/potentiall/potentially/

Thanks for that.
> >
> > >
> > > Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
> > > Cc: [email protected]
> > >
> > > Reviewed-by: Ben Gardon <[email protected]>
> > > Signed-off-by: Mingwei Zhang <[email protected]>
> > > ---
> > > arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> > > 1 file changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 5628d0ba637e..4d358c273f6c 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -2919,6 +2919,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > > cur_level == fault->goal_level &&
> > > is_shadow_present_pte(spte) &&
> > > !is_large_pte(spte)) {
> > > + struct kvm_mmu_page *sp;
> > > + u64 page_mask;
> > > + /*
> > > + * When nx hugepage flag is not set, there is no reason to
> > > + * go down to another level. This helps demand paging to
> > > + * generate large mappings.
> > > + */
> > > + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> > > + if (!sp->lpage_disallowed)
> > > + return;
> > > /*
> > > * A small SPTE exists for this pfn, but FNAME(fetch)
> > > * and __direct_map would like to create a large PTE
> > > @@ -2926,8 +2936,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> > > * patching back for them into pfn the next 9 bits of
> > > * the address.
> > > */
> > > - u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > > - KVM_PAGES_PER_HPAGE(cur_level - 1);
> > > + page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
> > > + KVM_PAGES_PER_HPAGE(cur_level - 1);
> > > fault->pfn |= fault->gfn & page_mask;
> > > fault->goal_level--;
> > > }
> > > --
> > > 2.35.1.894.gb6a874cedc-goog
> > >