Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752447AbdGDIft (ORCPT ); Tue, 4 Jul 2017 04:35:49 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:38742 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752030AbdGDIfr (ORCPT ); Tue, 4 Jul 2017 04:35:47 -0400 Date: Tue, 4 Jul 2017 10:35:42 +0200 From: Christoffer Dall To: Andrea Arcangeli Cc: Alexander Graf , kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH] KVM: arm/arm64: Handle hva aging while destroying the vm Message-ID: <20170704083542.GK4066@cbox> References: <1498231319-199734-1-git-send-email-agraf@suse.de> <20170703080313.GB4066@cbox> <20170703234149.GE5738@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170703234149.GE5738@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3261 Lines: 77 On Tue, Jul 04, 2017 at 01:41:49AM +0200, Andrea Arcangeli wrote: > Hello, > > On Mon, Jul 03, 2017 at 10:48:03AM +0200, Alexander Graf wrote: > > On 07/03/2017 10:03 AM, Christoffer Dall wrote: > > > Hi Alex, > > > > > > On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote: > > >> If we want to age an HVA while the VM is getting destroyed, we have a > > >> tiny race window during which we may end up dereferencing an invalid > > >> kvm->arch.pgd value. > > >> > > >> CPU0 CPU1 > > >> > > >> kvm_age_hva() > > >> kvm_mmu_notifier_release() > > >> kvm_arch_flush_shadow_all() > > >> kvm_free_stage2_pgd() > > >> > > >> stage2_get_pmd() > > >> > > >> set kvm->arch.pgd = 0 > > >> > > >> > > >> stage2_get_pud() > > >> arch.pgd> > > >> > > > I don't think this sequence, can happen, but I think kvm_age_hva() can > > > be called with the mmu_lock held and kvm->pgd already being NULL. > > > > > > Is that possible for the mmu notifiers to be calling clear(_flush)_young > > > while also calling notifier_release? > > > > I *think* the aging happens completely orthogonally to release. But > > let's ask Andrea - I'm sure he knows :). > > I think the sequence can happen. All mmu notifier methods are flushed > out of CPUs only through synchronize_srcu() which is called as the > last step in __mmu_notifier_release/unregister. Only after _unregister > returns you're sure kvm_age_hva cannot run anymore, until that point > it can still run. Even during exit_mmap->mmu_notifier_release it can > still run if invoked through rmap walks. > > So while the ->release method runs, all other mmu notifier methods > could be still invoked concurrently. > > mmu notifier methods are only protected by srcu to prevent the mmu > notifier structure to be freed from under them, but there's no > additional locking to serialize them (except for the synchronize_srcu > that happens as the last step of mmu_notifier_release/unregister, well > after they may have called the ->release method). > > There's also a comment about it in __mmu_notifier_release: > > * runs with mm_users == 0. Other tasks may still invoke mmu notifiers > * in parallel despite there being no task using this mm any more, > * through the vmas outside of the exit_mmap context, such as with > * vmtruncate. This serializes against mmu_notifier_unregister with > > And in the mmu_notifier_unregister too: > > * calling mmu_notifier_unregister. ->release or any other notifier > * method may be invoked concurrently with mmu_notifier_unregister, > * and only after mmu_notifier_unregister returned we're guaranteed > * that ->release or any other method can't run anymore. > > Even ->release could in theory run concurrently against itself if > mmu_notifier_unregister runs concurrently with mmu_notifier_release > but that's purely theoretical possibility. > Thanks for the clarification. So Alex, I think you just need to fix the commit message of this patch. Thanks, -Christoffer