Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752061AbdGCXlw (ORCPT ); Mon, 3 Jul 2017 19:41:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41258 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750835AbdGCXlv (ORCPT ); Mon, 3 Jul 2017 19:41:51 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C7F6D8046A Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=aarcange@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com C7F6D8046A Date: Tue, 4 Jul 2017 01:41:49 +0200 From: Andrea Arcangeli To: Alexander Graf Cc: Christoffer Dall , kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH] KVM: arm/arm64: Handle hva aging while destroying the vm Message-ID: <20170703234149.GE5738@redhat.com> References: <1498231319-199734-1-git-send-email-agraf@suse.de> <20170703080313.GB4066@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 03 Jul 2017 23:41:51 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2946 Lines: 71 Hello, On Mon, Jul 03, 2017 at 10:48:03AM +0200, Alexander Graf wrote: > On 07/03/2017 10:03 AM, Christoffer Dall wrote: > > Hi Alex, > > > > On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote: > >> If we want to age an HVA while the VM is getting destroyed, we have a > >> tiny race window during which we may end up dereferencing an invalid > >> kvm->arch.pgd value. > >> > >> CPU0 CPU1 > >> > >> kvm_age_hva() > >> kvm_mmu_notifier_release() > >> kvm_arch_flush_shadow_all() > >> kvm_free_stage2_pgd() > >> > >> stage2_get_pmd() > >> > >> set kvm->arch.pgd = 0 > >> > >> > >> stage2_get_pud() > >> arch.pgd> > >> > > I don't think this sequence, can happen, but I think kvm_age_hva() can > > be called with the mmu_lock held and kvm->pgd already being NULL. > > > > Is that possible for the mmu notifiers to be calling clear(_flush)_young > > while also calling notifier_release? > > I *think* the aging happens completely orthogonally to release. But > let's ask Andrea - I'm sure he knows :). I think the sequence can happen. All mmu notifier methods are flushed out of CPUs only through synchronize_srcu() which is called as the last step in __mmu_notifier_release/unregister. Only after _unregister returns you're sure kvm_age_hva cannot run anymore, until that point it can still run. Even during exit_mmap->mmu_notifier_release it can still run if invoked through rmap walks. So while the ->release method runs, all other mmu notifier methods could be still invoked concurrently. mmu notifier methods are only protected by srcu to prevent the mmu notifier structure to be freed from under them, but there's no additional locking to serialize them (except for the synchronize_srcu that happens as the last step of mmu_notifier_release/unregister, well after they may have called the ->release method). There's also a comment about it in __mmu_notifier_release: * runs with mm_users == 0. Other tasks may still invoke mmu notifiers * in parallel despite there being no task using this mm any more, * through the vmas outside of the exit_mmap context, such as with * vmtruncate. This serializes against mmu_notifier_unregister with And in the mmu_notifier_unregister too: * calling mmu_notifier_unregister. ->release or any other notifier * method may be invoked concurrently with mmu_notifier_unregister, * and only after mmu_notifier_unregister returned we're guaranteed * that ->release or any other method can't run anymore. Even ->release could in theory run concurrently against itself if mmu_notifier_unregister runs concurrently with mmu_notifier_release but that's purely theoretical possibility. Thanks, Andrea