Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp837428iog; Fri, 17 Jun 2022 15:07:02 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uj2NvTQ/+Q0unxjJujh44Qx2c8407i51WAXU66D4BvuOt38Pt+cvYHnJI0cDbrl76DdJdG X-Received: by 2002:a17:90a:1485:b0:1ec:788e:a053 with SMTP id k5-20020a17090a148500b001ec788ea053mr4986832pja.16.1655503622410; Fri, 17 Jun 2022 15:07:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655503622; cv=none; d=google.com; s=arc-20160816; b=cFTqATnv5VYuk2UFr70I4K5tg8lVXOq9w612NGN+f57b6sF1N7+K8MY/rC8z1ngOx/ 7pK/TD2EES3YijH9r63BrRja7dNaR+RL/lazrdp70Vl+EyGk3jiBpKnsUY8g7PjROtFe OrvpX6VrsoscHkEZTBxNnOMen+HurrfYjB0moipIJh1H6mZTiSkMoXdPlPsqPgw8dcAT GyS7hBtRKWg+4bBUS8Tayl+k8LqLNzHGOU5Tezy7EQ+qouiuThhgliir0JKl2Df10mE5 e3cH2eorYOQL8K+aWsTh8YBHPuwvV192r8YHSOAFWNs9eTN+HkBE6EGxJY0d7+WxwUIz TNzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=UqA+O2iAjJVJVS6ckVZAwOszKMvVWnDYVURyulma9/8=; b=h1es7AJ6pr5zxGd9UQh+xIGBm03krDfHGau9JavbjdAkOOPpK4lWSXNLrhA+7baf83 wkOaEZPIBYFZqzbZqeNe7Njbgo0jP5LiKJpSxBKFVP8kJYhhpnjNLMTU0N4LCylR0Z1M StavYGBnBlRZxstCr6K4H1HBv3Nk0M1vrPfzkUS2MQLpFCG01qW1YeTyAREgi/94JYsD K0vS+x1BWHKycVssyjIV5PvIjeRwp3i4xm3gXbsVQamIXI+ts+nlRv1Ls9H2YCmwuCj9 8TYsJMApcdy5bPczyH5i3YZf7Eu5TwhTHesjRtfd2HsE472yM0gQ91FL8PVxmeK9fayO P1FA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rLPJAR88; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u10-20020a170902e80a00b0015d1f1c9235si8597050plg.561.2022.06.17.15.06.49; Fri, 17 Jun 2022 15:07:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rLPJAR88; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234426AbiFQVbE (ORCPT + 99 others); Fri, 17 Jun 2022 17:31:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231818AbiFQVbA (ORCPT ); Fri, 17 Jun 2022 17:31:00 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 547BD4BFE0 for ; Fri, 17 Jun 2022 14:30:58 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id k12-20020a17090a404c00b001eaabc1fe5dso5649354pjg.1 for ; Fri, 17 Jun 2022 14:30:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UqA+O2iAjJVJVS6ckVZAwOszKMvVWnDYVURyulma9/8=; b=rLPJAR88grVVrICam06LN8y5WgV/N1UDXOZUtwxnT05CZs2/+ScUbmlvPg7YNGt8jN CP/H23Hb5io+PPWzCiRobvcwa6QKaczN/XFJbCJ1xw5FJnKufBcNrQBq2QH3TVs6m/8S pT7Ot2jqLIXY5UYrEa9kMivftMhJQ3dtAiEYVicHj41LHpVN49WNN2gyfFOFkPL51bRO e2VE7PExegoxz8xRDdxSEKwgJlf0YXMCZQmM3cCvHURRIH33DMEbdALixtIGM0z5DMcv vU/sCc0n/YzSIEkmBVf46E+uON/ergwqcFB9WJP4GXemBs6Udj4I3LMO75CQIwPrcj5u /Kow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UqA+O2iAjJVJVS6ckVZAwOszKMvVWnDYVURyulma9/8=; b=MKRE2CjcZ2adF0it1SNPllKCoXRwzFn9oDyBM3A1W6EwwN3lA2e1k/VrYrccNNMIWL Kch5Fz5BGJL/dS/nuhheEzwLnchlmMHNlQoHJ+JphQWnuwSkZjBjkvfQRisXvd2ujRRU VnBAFiwGmfTvPZkNec9VD80oY1EPzdJ6fuj0iEaM9bQyJAzMtnRj8UaHWB1LJFT8n1QG E2LmXI3d+raRGpMT9mIhtSABcqL8p6HDyH7qr/Hpa+TYoYwVd7/TPMqmnh2frvGktNdY nNMiaZESzUhLpsD4V5Y7yb13sCAP2j+Fl8N+umD6YohSS+7Y1rOBP1SB4Ci2SeACLuSM qpzQ== X-Gm-Message-State: AJIora+TKSJLI/2qC/DwkvICbwCd5WytnV0jW8mcLS2VJtP8KHBMzvKS Y+UPba00NeD6TKMZSotVOqEDiw== X-Received: by 2002:a17:90a:e507:b0:1ea:fa43:21de with SMTP id t7-20020a17090ae50700b001eafa4321demr9134913pjy.177.1655501457594; Fri, 17 Jun 2022 14:30:57 -0700 (PDT) Received: from google.com (123.65.230.35.bc.googleusercontent.com. [35.230.65.123]) by smtp.gmail.com with ESMTPSA id f16-20020a633810000000b003fd9b8b865dsm4281929pga.0.2022.06.17.14.30.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 14:30:57 -0700 (PDT) Date: Fri, 17 Jun 2022 21:30:53 +0000 From: Sean Christopherson To: Chao Peng Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com Subject: Re: [PATCH v6 6/8] KVM: Handle page fault for private memory Message-ID: References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> <20220519153713.819591-7-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220519153713.819591-7-chao.p.peng@linux.intel.com> X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 19, 2022, Chao Peng wrote: > @@ -4028,8 +4081,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) > return true; > > - return fault->slot && > - mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva); > + if (fault->is_private) > + return mmu_notifier_retry(vcpu->kvm, mmu_seq); Hmm, this is somewhat undesirable, because faulting in private pfns will be blocked by unrelated mmu_notifier updates. The issue is mitigated to some degree by bumping the sequence count if and only if overlap with a memslot is detected, e.g. mapping changes that affects only userspace won't block the guest. It probably won't be an issue, but at the same time it's easy to solve, and I don't like piggybacking mmu_notifier_seq as private mappings shouldn't be subject to the mmu_notifier. That would also fix a theoretical bug in this patch where mmu_notifier_retry() wouldn't be defined if CONFIG_MEMFILE_NOTIFIER=y && CONFIG_MMU_NOTIFIER=n.a --- arch/x86/kvm/mmu/mmu.c | 11 ++++++----- include/linux/kvm_host.h | 16 +++++++++++----- virt/kvm/kvm_main.c | 2 +- 3 files changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0b455c16ec64..a4cbd29433e7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4100,10 +4100,10 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, return true; if (fault->is_private) - return mmu_notifier_retry(vcpu->kvm, mmu_seq); - else - return fault->slot && - mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva); + return memfile_notifier_retry(vcpu->kvm, mmu_seq); + + return fault->slot && + mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva); } static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -4127,7 +4127,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - mmu_seq = vcpu->kvm->mmu_notifier_seq; + mmu_seq = fault->is_private ? vcpu->kvm->memfile_notifier_seq : + vcpu->kvm->mmu_notifier_seq; smp_rmb(); r = kvm_faultin_pfn(vcpu, fault); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 92afa5bddbc5..31f704c83099 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -773,16 +773,15 @@ struct kvm { struct hlist_head irq_ack_notifier_list; #endif -#if (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)) ||\ - defined(CONFIG_MEMFILE_NOTIFIER) +#if (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)) unsigned long mmu_notifier_seq; -#endif - -#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) struct mmu_notifier mmu_notifier; long mmu_notifier_count; unsigned long mmu_notifier_range_start; unsigned long mmu_notifier_range_end; +#endif +#ifdef CONFIG_MEMFILE_NOTIFIER + unsigned long memfile_notifier_seq; #endif struct list_head devices; u64 manual_dirty_log_protect; @@ -1964,6 +1963,13 @@ static inline int mmu_notifier_retry_hva(struct kvm *kvm, } #endif +#ifdef CONFIG_MEMFILE_NOTIFIER +static inline bool memfile_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) +{ + return kvm->memfile_notifier_seq != mmu_seq; +} +#endif + #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING #define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2b416d3bd60e..e6d34c964d51 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -898,7 +898,7 @@ static void kvm_private_mem_notifier_handler(struct memfile_notifier *notifier, KVM_MMU_LOCK(kvm); if (kvm_unmap_gfn_range(kvm, &gfn_range)) kvm_flush_remote_tlbs(kvm); - kvm->mmu_notifier_seq++; + kvm->memfile_notifier_seq++; KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); } base-commit: 333ef501c7f6c6d4ef2b7678905cad0f8ef3e271 -- > + else > + return fault->slot && > + mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva); > } > > static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > @@ -4088,7 +4144,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > read_unlock(&vcpu->kvm->mmu_lock); > else > write_unlock(&vcpu->kvm->mmu_lock); > - kvm_release_pfn_clean(fault->pfn); > + > + if (fault->is_private) > + kvm_private_mem_put_pfn(fault->slot, fault->pfn); Why does the shmem path lock the page, and then unlock it here? Same question for why this path marks it dirty? The guest has the page mapped so the dirty flag is immediately stale. In other words, why does KVM need to do something different for private pfns? > + else > + kvm_release_pfn_clean(fault->pfn); > + > return r; > } > ... > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h > index 7f8f1c8dbed2..1d857919a947 100644 > --- a/arch/x86/kvm/mmu/paging_tmpl.h > +++ b/arch/x86/kvm/mmu/paging_tmpl.h > @@ -878,7 +878,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > out_unlock: > write_unlock(&vcpu->kvm->mmu_lock); > - kvm_release_pfn_clean(fault->pfn); > + if (fault->is_private) Indirect MMUs can't support private faults, i.e. this is unnecessary. > + kvm_private_mem_put_pfn(fault->slot, fault->pfn); > + else > + kvm_release_pfn_clean(fault->pfn); > return r; > } > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 3fd168972ecd..b0a7910505ed 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -2241,4 +2241,26 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) > /* Max number of entries allowed for each kvm dirty ring */ > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > +#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM > +static inline int kvm_private_mem_get_pfn(struct kvm_memory_slot *slot, > + gfn_t gfn, kvm_pfn_t *pfn, int *order) > +{ > + int ret; > + pfn_t pfnt; > + pgoff_t index = gfn - slot->base_gfn + > + (slot->private_offset >> PAGE_SHIFT); > + > + ret = slot->notifier.bs->get_lock_pfn(slot->private_file, index, &pfnt, > + order); > + *pfn = pfn_t_to_pfn(pfnt); > + return ret; > +} > + > +static inline void kvm_private_mem_put_pfn(struct kvm_memory_slot *slot, > + kvm_pfn_t pfn) > +{ > + slot->notifier.bs->put_unlock_pfn(pfn_to_pfn_t(pfn)); > +} > +#endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */ > + > #endif > -- > 2.25.1 >