When zapping obsolete pages, update the running count of zapped pages
regardless of whether or not the list has become unstable due to zapping
a shadow page with its own child shadow pages. If the VM is backed by
mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
the batch count and thus without yielding. In the worst case scenario,
this can cause an RCU stall.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
softirq=15759/15759 fqs=5058
(t=21016 jiffies g=66453 q=238577)
NMI backtrace for cpu 52
Call Trace:
...
mark_page_accessed+0x266/0x2f0
kvm_set_pfn_accessed+0x31/0x40
handle_removed_tdp_mmu_page+0x259/0x2e0
__handle_changed_spte+0x223/0x2c0
handle_removed_tdp_mmu_page+0x1c1/0x2e0
__handle_changed_spte+0x223/0x2c0
handle_removed_tdp_mmu_page+0x1c1/0x2e0
__handle_changed_spte+0x223/0x2c0
zap_gfn_range+0x141/0x3b0
kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
kvm_mmu_zap_all_fast+0x121/0x190
kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
kvm_page_track_flush_slot+0x5c/0x80
kvm_arch_flush_shadow_memslot+0xe/0x10
kvm_set_memslot+0x172/0x4e0
__kvm_set_memory_region+0x337/0x590
kvm_vm_ioctl+0x49c/0xf80
Fixes: fbb158cb88b6 ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""")
Reported-by: David Matlack <[email protected]>
Cc: Ben Gardon <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
I haven't actually verified this makes David's RCU stall go away, but I did
verify that "batch" stays at "0" before and increments as expected after,
and that KVM does yield as expected after.
arch/x86/kvm/mmu/mmu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 33794379949e..89480fab09c6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5575,6 +5575,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
int nr_zapped, batch = 0;
+ bool unstable;
restart:
list_for_each_entry_safe_reverse(sp, node,
@@ -5606,11 +5607,12 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
goto restart;
}
- if (__kvm_mmu_prepare_zap_page(kvm, sp,
- &kvm->arch.zapped_obsolete_pages, &nr_zapped)) {
- batch += nr_zapped;
+ unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
+ &kvm->arch.zapped_obsolete_pages, &nr_zapped);
+ batch += nr_zapped;
+
+ if (unstable)
goto restart;
- }
}
/*
--
2.34.0.rc1.387.gb447b232ab-goog
On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <[email protected]> wrote:
>
> When zapping obsolete pages, update the running count of zapped pages
> regardless of whether or not the list has become unstable due to zapping
> a shadow page with its own child shadow pages. If the VM is backed by
> mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
> the batch count and thus without yielding. In the worst case scenario,
> this can cause an RCU stall.
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
> softirq=15759/15759 fqs=5058
> (t=21016 jiffies g=66453 q=238577)
> NMI backtrace for cpu 52
> Call Trace:
> ...
> mark_page_accessed+0x266/0x2f0
> kvm_set_pfn_accessed+0x31/0x40
> handle_removed_tdp_mmu_page+0x259/0x2e0
> __handle_changed_spte+0x223/0x2c0
> handle_removed_tdp_mmu_page+0x1c1/0x2e0
> __handle_changed_spte+0x223/0x2c0
> handle_removed_tdp_mmu_page+0x1c1/0x2e0
> __handle_changed_spte+0x223/0x2c0
> zap_gfn_range+0x141/0x3b0
> kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
> kvm_mmu_zap_all_fast+0x121/0x190
> kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
> kvm_page_track_flush_slot+0x5c/0x80
> kvm_arch_flush_shadow_memslot+0xe/0x10
> kvm_set_memslot+0x172/0x4e0
> __kvm_set_memory_region+0x337/0x590
> kvm_vm_ioctl+0x49c/0xf80
>
> Fixes: fbb158cb88b6 ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""")
> Reported-by: David Matlack <[email protected]>
> Cc: Ben Gardon <[email protected]>
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
While I can see this fixing the above stall, there's still a potential
issue where zapped_obsolete_pages can accumulate an arbitrary number
of pages from multiple batches of zaps. If this list gets very large,
we could see a stall after the loop while trying to free the pages.
I'm not aware of this ever happening, but it could be worth yielding
during that freeing process as well.
> ---
>
> I haven't actually verified this makes David's RCU stall go away, but I did
> verify that "batch" stays at "0" before and increments as expected after,
> and that KVM does yield as expected after.
>
> arch/x86/kvm/mmu/mmu.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 33794379949e..89480fab09c6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5575,6 +5575,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
> {
> struct kvm_mmu_page *sp, *node;
> int nr_zapped, batch = 0;
> + bool unstable;
>
> restart:
> list_for_each_entry_safe_reverse(sp, node,
> @@ -5606,11 +5607,12 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
> goto restart;
> }
>
> - if (__kvm_mmu_prepare_zap_page(kvm, sp,
> - &kvm->arch.zapped_obsolete_pages, &nr_zapped)) {
> - batch += nr_zapped;
> + unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
> + &kvm->arch.zapped_obsolete_pages, &nr_zapped);
> + batch += nr_zapped;
> +
> + if (unstable)
> goto restart;
> - }
> }
>
> /*
> --
> 2.34.0.rc1.387.gb447b232ab-goog
>
On Thu, Nov 11, 2021, Ben Gardon wrote:
> On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <[email protected]> wrote:
> > Fixes: fbb158cb88b6 ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""")
> > Reported-by: David Matlack <[email protected]>
> > Cc: Ben Gardon <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> Reviewed-by: Ben Gardon <[email protected]>
>
> While I can see this fixing the above stall, there's still a potential
> issue where zapped_obsolete_pages can accumulate an arbitrary number
> of pages from multiple batches of zaps. If this list gets very large,
> we could see a stall after the loop while trying to free the pages.
> I'm not aware of this ever happening, but it could be worth yielding
> during that freeing process as well.
Ya. I tagged this one for stable because its very much a regression that I
introduced when reverting the revert, i.e. the very original implemenation worked.
Sadly, I did not get to do a triple revert :-)
On Mon, Nov 15, 2021 at 11:23 AM Sean Christopherson <[email protected]> wrote:
>
> On Mon, Nov 15, 2021, David Matlack wrote:
> > On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <[email protected]> wrote:
> > >
> > > When zapping obsolete pages, update the running count of zapped pages
> > > regardless of whether or not the list has become unstable due to zapping
> > > a shadow page with its own child shadow pages. If the VM is backed by
> > > mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
> > > the batch count and thus without yielding. In the worst case scenario,
> > > this can cause an RCU stall.
> > >
> > > rcu: INFO: rcu_sched self-detected stall on CPU
> > > rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
> > > softirq=15759/15759 fqs=5058
> > > (t=21016 jiffies g=66453 q=238577)
> > > NMI backtrace for cpu 52
> > > Call Trace:
> > > ...
> > > mark_page_accessed+0x266/0x2f0
> > > kvm_set_pfn_accessed+0x31/0x40
> > > handle_removed_tdp_mmu_page+0x259/0x2e0
> > > __handle_changed_spte+0x223/0x2c0
> > > handle_removed_tdp_mmu_page+0x1c1/0x2e0
> > > __handle_changed_spte+0x223/0x2c0
> > > handle_removed_tdp_mmu_page+0x1c1/0x2e0
> > > __handle_changed_spte+0x223/0x2c0
> > > zap_gfn_range+0x141/0x3b0
> > > kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
> >
> > This is a useful patch but I don't see the connection with this stall.
> > The stall is detected in kvm_tdp_mmu_zap_invalidated_roots, which runs
> > after kvm_zap_obsolete_pages. How would rescheduling during
> > kvm_zap_obsolete_pages help?
>
> Ah shoot, I copy+pasted the wrong splat. The correct, revelant backtrace is:
Ok that makes more sense :). Also that was a soft lockup rather than
an RCU stall.
>
> mark_page_accessed+0x266/0x2e0
> kvm_set_pfn_accessed+0x31/0x40
> mmu_spte_clear_track_bits+0x136/0x1c0
> drop_spte+0x1a/0xc0
> mmu_page_zap_pte+0xef/0x120
> __kvm_mmu_prepare_zap_page+0x205/0x5e0
> kvm_mmu_zap_all_fast+0xd7/0x190
> kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
> kvm_page_track_flush_slot+0x5c/0x80
> kvm_arch_flush_shadow_memslot+0xe/0x10
> kvm_set_memslot+0x1a8/0x5d0
> __kvm_set_memory_region+0x337/0x590
> kvm_vm_ioctl+0xb08/0x1040
On Mon, Nov 15, 2021, David Matlack wrote:
> On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <[email protected]> wrote:
> >
> > When zapping obsolete pages, update the running count of zapped pages
> > regardless of whether or not the list has become unstable due to zapping
> > a shadow page with its own child shadow pages. If the VM is backed by
> > mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
> > the batch count and thus without yielding. In the worst case scenario,
> > this can cause an RCU stall.
> >
> > rcu: INFO: rcu_sched self-detected stall on CPU
> > rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
> > softirq=15759/15759 fqs=5058
> > (t=21016 jiffies g=66453 q=238577)
> > NMI backtrace for cpu 52
> > Call Trace:
> > ...
> > mark_page_accessed+0x266/0x2f0
> > kvm_set_pfn_accessed+0x31/0x40
> > handle_removed_tdp_mmu_page+0x259/0x2e0
> > __handle_changed_spte+0x223/0x2c0
> > handle_removed_tdp_mmu_page+0x1c1/0x2e0
> > __handle_changed_spte+0x223/0x2c0
> > handle_removed_tdp_mmu_page+0x1c1/0x2e0
> > __handle_changed_spte+0x223/0x2c0
> > zap_gfn_range+0x141/0x3b0
> > kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
>
> This is a useful patch but I don't see the connection with this stall.
> The stall is detected in kvm_tdp_mmu_zap_invalidated_roots, which runs
> after kvm_zap_obsolete_pages. How would rescheduling during
> kvm_zap_obsolete_pages help?
Ah shoot, I copy+pasted the wrong splat. The correct, revelant backtrace is:
mark_page_accessed+0x266/0x2e0
kvm_set_pfn_accessed+0x31/0x40
mmu_spte_clear_track_bits+0x136/0x1c0
drop_spte+0x1a/0xc0
mmu_page_zap_pte+0xef/0x120
__kvm_mmu_prepare_zap_page+0x205/0x5e0
kvm_mmu_zap_all_fast+0xd7/0x190
kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
kvm_page_track_flush_slot+0x5c/0x80
kvm_arch_flush_shadow_memslot+0xe/0x10
kvm_set_memslot+0x1a8/0x5d0
__kvm_set_memory_region+0x337/0x590
kvm_vm_ioctl+0xb08/0x1040
On Mon, Nov 15, 2021, David Matlack wrote:
> On Mon, Nov 15, 2021 at 11:23 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Mon, Nov 15, 2021, David Matlack wrote:
> > > On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <[email protected]> wrote:
> > > >
> > > > When zapping obsolete pages, update the running count of zapped pages
> > > > regardless of whether or not the list has become unstable due to zapping
> > > > a shadow page with its own child shadow pages. If the VM is backed by
> > > > mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
> > > > the batch count and thus without yielding. In the worst case scenario,
> > > > this can cause an RCU stall.
> > > >
> > > > rcu: INFO: rcu_sched self-detected stall on CPU
> > > > rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
> > > > softirq=15759/15759 fqs=5058
> > > > (t=21016 jiffies g=66453 q=238577)
> > > > NMI backtrace for cpu 52
> > > > Call Trace:
> > > > ...
> > > > mark_page_accessed+0x266/0x2f0
> > > > kvm_set_pfn_accessed+0x31/0x40
> > > > handle_removed_tdp_mmu_page+0x259/0x2e0
> > > > __handle_changed_spte+0x223/0x2c0
> > > > handle_removed_tdp_mmu_page+0x1c1/0x2e0
> > > > __handle_changed_spte+0x223/0x2c0
> > > > handle_removed_tdp_mmu_page+0x1c1/0x2e0
> > > > __handle_changed_spte+0x223/0x2c0
> > > > zap_gfn_range+0x141/0x3b0
> > > > kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
> > >
> > > This is a useful patch but I don't see the connection with this stall.
> > > The stall is detected in kvm_tdp_mmu_zap_invalidated_roots, which runs
> > > after kvm_zap_obsolete_pages. How would rescheduling during
> > > kvm_zap_obsolete_pages help?
> >
> > Ah shoot, I copy+pasted the wrong splat. The correct, revelant backtrace is:
>
> Ok that makes more sense :). Also that was a soft lockup rather than
> an RCU stall.
*sigh* I'm not sure which blatant "this is the wrong splat" goof is worse, the
explicit tdp_mmu in the backtrace, or the fact that the legacy MMU doesn't rely
on RCU...
I'll get v2 posted.
On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson <[email protected]> wrote:
>
> When zapping obsolete pages, update the running count of zapped pages
> regardless of whether or not the list has become unstable due to zapping
> a shadow page with its own child shadow pages. If the VM is backed by
> mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping
> the batch count and thus without yielding. In the worst case scenario,
> this can cause an RCU stall.
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000
> softirq=15759/15759 fqs=5058
> (t=21016 jiffies g=66453 q=238577)
> NMI backtrace for cpu 52
> Call Trace:
> ...
> mark_page_accessed+0x266/0x2f0
> kvm_set_pfn_accessed+0x31/0x40
> handle_removed_tdp_mmu_page+0x259/0x2e0
> __handle_changed_spte+0x223/0x2c0
> handle_removed_tdp_mmu_page+0x1c1/0x2e0
> __handle_changed_spte+0x223/0x2c0
> handle_removed_tdp_mmu_page+0x1c1/0x2e0
> __handle_changed_spte+0x223/0x2c0
> zap_gfn_range+0x141/0x3b0
> kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130
This is a useful patch but I don't see the connection with this stall.
The stall is detected in kvm_tdp_mmu_zap_invalidated_roots, which runs
after kvm_zap_obsolete_pages. How would rescheduling during
kvm_zap_obsolete_pages help?
> kvm_mmu_zap_all_fast+0x121/0x190
> kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10
> kvm_page_track_flush_slot+0x5c/0x80
> kvm_arch_flush_shadow_memslot+0xe/0x10
> kvm_set_memslot+0x172/0x4e0
> __kvm_set_memory_region+0x337/0x590
> kvm_vm_ioctl+0x49c/0xf80
>
> Fixes: fbb158cb88b6 ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""")
> Reported-by: David Matlack <[email protected]>
> Cc: Ben Gardon <[email protected]>
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>
> I haven't actually verified this makes David's RCU stall go away, but I did
> verify that "batch" stays at "0" before and increments as expected after,
> and that KVM does yield as expected after.
>
> arch/x86/kvm/mmu/mmu.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 33794379949e..89480fab09c6 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5575,6 +5575,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
> {
> struct kvm_mmu_page *sp, *node;
> int nr_zapped, batch = 0;
> + bool unstable;
nit: Declare unstable in the body of the loop. (So should nr_zapped
and batch but that's unrelated to your change.)
>
> restart:
> list_for_each_entry_safe_reverse(sp, node,
> @@ -5606,11 +5607,12 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
> goto restart;
> }
>
> - if (__kvm_mmu_prepare_zap_page(kvm, sp,
> - &kvm->arch.zapped_obsolete_pages, &nr_zapped)) {
> - batch += nr_zapped;
> + unstable = __kvm_mmu_prepare_zap_page(kvm, sp,
> + &kvm->arch.zapped_obsolete_pages, &nr_zapped);
> + batch += nr_zapped;
> +
> + if (unstable)
> goto restart;
> - }
> }
>
> /*
> --
> 2.34.0.rc1.387.gb447b232ab-goog
>