Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751377AbdGQPQV (ORCPT ); Mon, 17 Jul 2017 11:16:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58816 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751306AbdGQPQU (ORCPT ); Mon, 17 Jul 2017 11:16:20 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AB1844E047 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=aarcange@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com AB1844E047 Date: Mon, 17 Jul 2017 17:16:17 +0200 From: Andrea Arcangeli To: Christoffer Dall Cc: Suzuki K Poulose , Alexander Graf , "kvmarm@lists.cs.columbia.edu" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Stable Subject: Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm Message-ID: <20170717151617.GC6344@redhat.com> References: <1499235631-141725-1-git-send-email-agraf@suse.de> <20170705085700.GA16881@e107814-lin.cambridge.arm.com> <20170706074513.GC18106@cbox> <18e7012c-a095-ecfa-470c-cf81177698a1@arm.com> <20170706094205.GE18106@cbox> <5cb34cc0-27c1-c011-a8d4-c991e47141c3@arm.com> <20170716195658.GA31432@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 17 Jul 2017 15:16:20 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1823 Lines: 39 On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote: > I would also very much like to get to the bottom of this, and at the > very least try to get a valid explanation as to how a thread can be > *running* for a process where there are zero references to the struct > mm? A thread shouldn't be possibly be running if mm->mm_users is zero. > I guess I am asking where this mmput() can happen for a perfectly > running thread, which hasn't processes signals or exited itself yet. mmput runs during exit(), after that point the vcpu can't run the KVM ioctl anymore. > The dump you reference above seems to indicate that it's happening > under memory pressure and trying to unmap memory from the VM to > allocate memory to the VM, but all seems to be happening within a VCPU > thread, or am I reading this wrong? In the oops the pgd was none while KVM vcpu ioctl was running, the most likely explanation is there were two VM running in parallel in the host, and the other one was quitting (mm_count of the other VM was zero, while mm_count of the VM that oopsed within the vcpu ioctl was > 0). The oops information itself can't tell if there was one or two VM running in the host so > 1 VM running is the most plausible explanation that doesn't break the above in invariants. It'd be nice if Alexander can confirm it, if he remembers about that specific setup after a couple of months since it happened. Even if there was just one VM running in the host, it would more likely mean something inside KVM ARM code is clearing the pgd before mm_users reaches zero, i.e. before the last mmput. It's very unlikely mm_users could have been > 0 while the vcpu thread was running as many more things would fall apart in such case, not just the needed pgd check during mmu notifier post process exit. Thanks, Andrea