Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751396AbdGQSXi (ORCPT ); Mon, 17 Jul 2017 14:23:38 -0400 Received: from mail-wm0-f52.google.com ([74.125.82.52]:37256 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751316AbdGQSXf (ORCPT ); Mon, 17 Jul 2017 14:23:35 -0400 Date: Mon, 17 Jul 2017 20:23:31 +0200 From: Christoffer Dall To: Andrea Arcangeli Cc: Suzuki K Poulose , Alexander Graf , "kvmarm@lists.cs.columbia.edu" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Stable Subject: Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm Message-ID: <20170717182331.GA14069@cbox> References: <20170705085700.GA16881@e107814-lin.cambridge.arm.com> <20170706074513.GC18106@cbox> <18e7012c-a095-ecfa-470c-cf81177698a1@arm.com> <20170706094205.GE18106@cbox> <5cb34cc0-27c1-c011-a8d4-c991e47141c3@arm.com> <20170716195658.GA31432@cbox> <20170717151617.GC6344@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170717151617.GC6344@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2449 Lines: 63 On Mon, Jul 17, 2017 at 05:16:17PM +0200, Andrea Arcangeli wrote: > On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote: > > I would also very much like to get to the bottom of this, and at the > > very least try to get a valid explanation as to how a thread can be > > *running* for a process where there are zero references to the struct > > mm? > > A thread shouldn't be possibly be running if mm->mm_users is zero. > ok, good, then I don't have to re-take OS 101. > > I guess I am asking where this mmput() can happen for a perfectly > > running thread, which hasn't processes signals or exited itself yet. > > mmput runs during exit(), after that point the vcpu can't run the KVM > ioctl anymore. > also very comforting that we agree on this. > > The dump you reference above seems to indicate that it's happening > > under memory pressure and trying to unmap memory from the VM to > > allocate memory to the VM, but all seems to be happening within a VCPU > > thread, or am I reading this wrong? > > In the oops the pgd was none while KVM vcpu ioctl was running, the > most likely explanation is there were two VM running in parallel in > the host, and the other one was quitting (mm_count of the other VM was > zero, while mm_count of the VM that oopsed within the vcpu ioctl was > > 0). The oops information itself can't tell if there was one or two VM > running in the host so > 1 VM running is the most plausible > explanation that doesn't break the above in invariants. That's very keenly observed, and a really good explanation. > It'd be nice > if Alexander can confirm it, if he remembers about that specific setup > after a couple of months since it happened. My guess is that this was observed on the suse build machines with arm64, and Alex ususally explains that these machines run *lots* of VMs at the same time, so this sounds very likely. Alex, can you confirm this was the type of workload? > > Even if there was just one VM running in the host, it would more > likely mean something inside KVM ARM code is clearing the pgd before > mm_users reaches zero, i.e. before the last mmput. I don't think we have this. > > It's very unlikely mm_users could have been > 0 while the vcpu thread > was running as many more things would fall apart in such case, not > just the needed pgd check during mmu notifier post process exit. > That was my rationale exactly. Thanks for confirming! -Christoffer