Received: by 2002:a05:6358:489b:b0:bb:da1:e618 with SMTP id x27csp899506rwn; Thu, 8 Sep 2022 10:14:49 -0700 (PDT) X-Google-Smtp-Source: AA6agR6G6jndkEIiet/lLpdlKBLV4dWxeIWTzdacPjSDUMxH6cZu4q8CT5UOJSW2gtAaznvoDEgj X-Received: by 2002:a2e:5c07:0:b0:26b:dabf:15e8 with SMTP id q7-20020a2e5c07000000b0026bdabf15e8mr1139473ljb.2.1662657289534; Thu, 08 Sep 2022 10:14:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662657289; cv=none; d=google.com; s=arc-20160816; b=p6BsZg4wyKCGP1sjPd8KQVnnxehhXOG9ZppUpYIn0ni3Vl/zdEsAr5/kkShdQqs7mB r7+28r99RrJn63xIrhoqpG+QdtTQWaNqW4GBQOdxu8VP7DHzJUxA0YAkPlc6ZFTB+po8 +HkkgWasW6sITtSAF60F5iKw9pY2T/ntsonll2Yqx+CUmneKzqNOTQh2lNO+eirk+PXw nfsFh75RLJ1dwbhBcnW858dfIdUraO8kQGNGGEhv1yhYbHPLQ7m88zCK9O6f0yDbFfEi PZzRSH+3dGDuAzWoOPsFg1lpCoExcNm6g55PFgH2vw4MKJHndDokzYjpM57+G7qORdBR mAAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=44f96G3kCRZjV7Y8+rtld0XE0LiSAJByVTyxhZXDjtw=; b=hitgIGxeVXAx0GK859LNSwoVMy3xm3IAfw2GeVSkkwWG2qdcxePtz0p/kWGHpd1dz8 XG+BLtmVQIKJ3YmX4jlJ3SCeH473cR1LnEHefWQ7PDhprOEB+8gfYGs9dlHwy1tfoVyF +LupzgzdEEqjig/ktiu3M0C+D+KiYQmD+DLUck65O2s/7QBKVHuKPI8Pjhk+nnhTurX0 1HZiPej/wfLRXZ89xTv8/Qe/QYZYE5nl4GxqhInurnlft7ghOLNw50l0aVMYjS2GEdRd qgDKo8CSTPTgVTkIjhQ3zGYkmuo/0Tt9alJ4iJFgdMmpBPtFNza3lzLLHfNMd2kcYCSd tcOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fl0OF4gP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m2-20020a2ea882000000b00261e194c2b0si8519755ljq.330.2022.09.08.10.14.19; Thu, 08 Sep 2022 10:14:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fl0OF4gP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229667AbiIHQk1 (ORCPT + 99 others); Thu, 8 Sep 2022 12:40:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230083AbiIHQkZ (ORCPT ); Thu, 8 Sep 2022 12:40:25 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BA9BE620F for ; Thu, 8 Sep 2022 09:40:23 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id o4so18339150pjp.4 for ; Thu, 08 Sep 2022 09:40:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date; bh=44f96G3kCRZjV7Y8+rtld0XE0LiSAJByVTyxhZXDjtw=; b=fl0OF4gPksEMzbO6l5ojcJJdCc+d34Eu6l34U9JHsmWZmYqMRhxCjrXQimUdVzxZ4Z h+tlSUaFHV90shVtftbvG5kp6jvxSzUOEKtBj2Y5ndd7aK8uDUqK0mMDgxzuYabkB9X/ yLPxRKLVx4lqp802+KHlONlx8+g7D2gfQV2FY9nkD/vJPTNBj3IQoZeW2yJ8c5KuVD4P 2btMsSs/jWYhOpHZVxep5SkPu3rK/BKMxCFCG1Re+YjZWo6eWO6o4IG7Gq8pQN42LiJO BGDLDtAidqsf49nSAQ+Iw8owS4IsNHK1io4In3hBdCVR0VwpuFEt11DTABFzWzxxrKt4 uuwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date; bh=44f96G3kCRZjV7Y8+rtld0XE0LiSAJByVTyxhZXDjtw=; b=WlHPQq9hxU0hwEtYSMgy4eCDcpXBM0YsJn3L+pxyiQQGEpBLcvLb41GFikpEYjRcLB Wwiv4Kw5l741okNFaC4VYfGjTpaPwdnx1EGoiJKebyIRrchA2kfK7XmtA+1IHMh/2BYY vIj+nAN4s6UO4ODJAZHsbjJFrdunKmGaCCyfYQbdThmJGgW0AdC2dzhQd73v9xts3mu5 eMR6gSCQ5LqVHbbL8iUHu08gOO89cve4qLAvAdOFDltB4OseJ97Fp4IrPwwSRqUK0f8L bNt0Yi7bbM98mRLfX3hT5Nd3pkRllNCjbksIUtiKNqdWInm2d6p0fYfXTF7YNDxRkSQS hS+w== X-Gm-Message-State: ACgBeo39xTmbgNmQ2zi8zUMcBgSsyBxiY27OfhkizJsMcWoX2uCDV0Bt PENw3c+3GdrXi71TueaaBxJAeQ== X-Received: by 2002:a17:90a:588c:b0:1fd:a1bc:ff71 with SMTP id j12-20020a17090a588c00b001fda1bcff71mr4945546pji.134.1662655222365; Thu, 08 Sep 2022 09:40:22 -0700 (PDT) Received: from google.com (223.103.125.34.bc.googleusercontent.com. [34.125.103.223]) by smtp.gmail.com with ESMTPSA id w24-20020aa79558000000b0053725e331a1sm14927604pfq.82.2022.09.08.09.40.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Sep 2022 09:40:21 -0700 (PDT) Date: Thu, 8 Sep 2022 09:40:16 -0700 From: David Matlack To: Oliver Upton Cc: Marc Zyngier , James Morse , Alexandru Elisei , Suzuki K Poulose , Catalin Marinas , Will Deacon , Quentin Perret , Ricardo Koller , Reiji Watanabe , Ben Gardon , Paolo Bonzini , Gavin Shan , Peter Xu , Sean Christopherson , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Message-ID: References: <20220830194132.962932-1-oliver.upton@linux.dev> <20220830194132.962932-10-oliver.upton@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 07, 2022 at 03:00:18PM -0700, David Matlack wrote: > On Tue, Aug 30, 2022 at 07:41:27PM +0000, Oliver Upton wrote: > > There is no real urgency to free a stage-2 subtree that was pruned. > > Nonetheless, KVM does the tear down in the stage-2 fault path while > > holding the MMU lock. > > > > Free removed stage-2 subtrees after an RCU grace period. To guarantee > > all stage-2 table pages are freed before killing a VM, add an > > rcu_barrier() to the flush path. > > > > Signed-off-by: Oliver Upton > > --- > > arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++++++++- > > 1 file changed, 34 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 91521f4aab97..265951c05879 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -97,6 +97,38 @@ static void *stage2_memcache_zalloc_page(void *arg) > > return kvm_mmu_memory_cache_alloc(mc); > > } > > > > +#define STAGE2_PAGE_PRIVATE_LEVEL_MASK GENMASK_ULL(2, 0) > > + > > +static inline unsigned long stage2_page_private(u32 level, void *arg) > > +{ > > + unsigned long pvt = (unsigned long)arg; > > + > > + BUILD_BUG_ON(KVM_PGTABLE_MAX_LEVELS > STAGE2_PAGE_PRIVATE_LEVEL_MASK); > > + WARN_ON_ONCE(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK); > > + > > + return pvt | level; > > +} > > + > > +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head) > > +{ > > + struct page *page = container_of(head, struct page, rcu_head); > > + unsigned long pvt = page_private(page); > > + void *arg = (void *)(pvt & ~STAGE2_PAGE_PRIVATE_LEVEL_MASK); > > + u32 level = (u32)(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK); > > + void *pgtable = page_to_virt(page); > > + > > + kvm_pgtable_stage2_free_removed(pgtable, level, arg); > > +} > > + > > +static void stage2_free_removed_table(void *pgtable, u32 level, void *arg) > > +{ > > + unsigned long pvt = stage2_page_private(level, arg); > > + struct page *page = virt_to_page(pgtable); > > + > > + set_page_private(page, (unsigned long)pvt); > > + call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb); > > +} > > + > > static void *kvm_host_zalloc_pages_exact(size_t size) > > { > > return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO); > > @@ -627,7 +659,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = { > > .zalloc_page = stage2_memcache_zalloc_page, > > .zalloc_pages_exact = kvm_host_zalloc_pages_exact, > > .free_pages_exact = free_pages_exact, > > - .free_removed_table = kvm_pgtable_stage2_free_removed, > > + .free_removed_table = stage2_free_removed_table, > > .get_page = kvm_host_get_page, > > .put_page = kvm_host_put_page, > > .page_count = kvm_host_page_count, > > @@ -770,6 +802,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) > > if (pgt) { > > kvm_pgtable_stage2_destroy(pgt); > > kfree(pgt); > > + rcu_barrier(); > > A comment here would be useful to document the behavior. e.g. > > /* > * Wait for all stage-2 page tables that are being freed > * asynchronously via RCU callback because ... > */ > > Speaking of, what's the reason for this rcu_barrier()? Is there any > reason why KVM can't let in-flight stage-2 freeing RCU callbacks run at > the end of the next grace period? After thinking about this more I have 2 follow-up questions: 1. Should the RCU barrier come before kvm_pgtable_stage2_destroy() and kfree(pgt)? Otherwise an RCU callback running kvm_pgtable_stage2_free_removed() could access the pgt after it has been freed? 2. In general, is it safe for kvm_pgtable_stage2_free_removed() to run outside of the MMU lock? Yes the page tables have already been disconnected from the tree, but kvm_pgtable_stage2_free_removed() also accesses shared data structures likstruct kvm_pgtable. I *think* it might be safe after you fix (1.) but it would be more robust to avoid accessing shared data structures at all outside of the MMU lock and just do the page table freeing in the RCU callback. > > > } > > } > > > > -- > > 2.37.2.672.g94769d06f0-goog > >