2022-05-28 20:47:34

by Qian Cai

[permalink] [raw]
Subject: [PATCH] KVM: arm64: Fix memory leaks from stage2 pagetable

Running some SR-IOV workloads could trigger some leak reports from
kmemleak.

unreferenced object 0xffff080243cef500 (size 128):
comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
hex dump (first 32 bytes):
28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR....
e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<......
backtrace:
kmem_cache_alloc_trace
kvm_init_stage2_mmu
kvm_arch_init_vm
kvm_create_vm
kvm_dev_ioctl
__arm64_sys_ioctl
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync

Since I yet to find a way to reproduce this at will, I just did a code
inspection and found this one spot that could happen. It is unlikely
that will fix my issue because I don't see mine went into the error
paths. But, we should fix it regardless.

If hardware_enable_all() or kvm_init_mmu_notifier() failed in
kvm_create_vm(), we ended up leaking stage2 pagetable memory from
kvm_init_stage2_mmu() because we will no longer call
kvm_arch_flush_shadow_all().

It seems that it is impossible to simply move kvm_free_stage2_pgd() from
kvm_arch_flush_shadow_all() into kvm_arch_destroy_vm() due to the issue
mentioned in the "Fixes" commit below. Thus, fixed it by freeing the
memory from kvm_arch_destroy_vm() only if the MMU notifier is not even
initialized.

Fixes: 293f293637b5 ("kvm-arm: Unmap shadow pagetables properly")
Signed-off-by: Qian Cai <[email protected]>
---
arch/arm64/kvm/arm.c | 3 +++
arch/arm64/kvm/mmu.c | 3 ++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 400bb0fe2745..7d12824f2034 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -180,6 +180,9 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
*/
void kvm_arch_destroy_vm(struct kvm *kvm)
{
+ if (!kvm->mmu_notifier.ops)
+ kvm_free_stage2_pgd(&kvm->arch.mmu);
+
bitmap_free(kvm->arch.pmu_filter);
free_cpumask_var(kvm->arch.supported_cpus);

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..13a527656ba7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1739,7 +1739,8 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)

void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
- kvm_free_stage2_pgd(&kvm->arch.mmu);
+ if (kvm->mmu_notifier.ops)
+ kvm_free_stage2_pgd(&kvm->arch.mmu);
}

void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
--
2.32.0



2022-05-31 17:53:50

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Fix memory leaks from stage2 pagetable

On Tue, May 31, 2022 at 06:01:58PM +0100, Will Deacon wrote:
> Have you spotted any pattern for when the leak occurs? How are you
> terminating the guest?

It just to send a SIGTERM to the qemu-system-aarch64 process. Origially,
right after sending the signal, it will remove_id/unbind from the vfio-pci
and then bind to the original (ixgbe) driver. However, since the process
might take a while to clean off itself, the bind might failed with -EBUSY.
I could reproduce it a few times one day while was unable to do so some
other days.

Later, we changed the code to make sure the process is disappeard first and
then remove_id/bind/unbind. Apparently, it make harder to reproduce if not
totally eliminate it.

2022-05-31 20:55:05

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Fix memory leaks from stage2 pagetable

On Tue, May 31, 2022 at 05:57:11PM +0100, Will Deacon wrote:
> On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote:
> > Running some SR-IOV workloads could trigger some leak reports from
> > kmemleak.
> >
> > unreferenced object 0xffff080243cef500 (size 128):
> > comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
> > hex dump (first 32 bytes):
> > 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR....
> > e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<......
> > backtrace:
> > kmem_cache_alloc_trace
> > kvm_init_stage2_mmu
>
> Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic.
> I don't really like the idea of papering over the report; we'd be better off
> trying to reproduce it.

... although the hexdump does look like {u32; u32; ptr; ptr; ptr}, which
would match 'struct kvm_pgtable'. I guess the allocation is aligned to
ARCH_DMA_MINALIGN, which could explain the size?

Have you spotted any pattern for when the leak occurs? How are you
terminating the guest?

Will

2022-06-01 21:06:27

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Fix memory leaks from stage2 pagetable

On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote:
> Running some SR-IOV workloads could trigger some leak reports from
> kmemleak.
>
> unreferenced object 0xffff080243cef500 (size 128):
> comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
> hex dump (first 32 bytes):
> 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR....
> e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<......
> backtrace:
> kmem_cache_alloc_trace
> kvm_init_stage2_mmu

Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic.
I don't really like the idea of papering over the report; we'd be better off
trying to reproduce it.

Will

2022-06-01 21:28:53

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] KVM: arm64: Fix memory leaks from stage2 pagetable

On Tue, May 31, 2022 at 05:57:11PM +0100, Will Deacon wrote:
> On Thu, May 26, 2022 at 04:39:56PM -0400, Qian Cai wrote:
> > Running some SR-IOV workloads could trigger some leak reports from
> > kmemleak.
> >
> > unreferenced object 0xffff080243cef500 (size 128):
> > comm "qemu-system-aar", pid 179935, jiffies 4298359506 (age 1629.732s)
> > hex dump (first 32 bytes):
> > 28 00 00 00 01 00 00 00 00 e0 4c 52 03 08 ff ff (.........LR....
> > e0 af a4 7f 7c d1 ff ff a8 3c b3 08 00 80 ff ff ....|....<......
> > backtrace:
> > kmem_cache_alloc_trace
> > kvm_init_stage2_mmu
>
> Hmm, I can't spot a 128-byte allocation in here so this is pretty cryptic.
> I don't really like the idea of papering over the report; we'd be better off
> trying to reproduce it.

As far as I would like to reproduce, I have tried it in the last a few
weeks without luck. It still happens from time to time though from our
daily CI, so I was thinking to plug the knowns leaks first.