Received: by 2002:ac2:464d:0:0:0:0:0 with SMTP id s13csp2175584lfo; Fri, 20 May 2022 16:31:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJysz8sM4UfC2r0xP2LLpzb85sKRecqCq1y3EFgGUqR9jTanvDgrO7HVVI8ChT5h6aYVJ91N X-Received: by 2002:a05:6402:2031:b0:42a:d157:66e9 with SMTP id ay17-20020a056402203100b0042ad15766e9mr13606476edb.395.1653089477322; Fri, 20 May 2022 16:31:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653089477; cv=none; d=google.com; s=arc-20160816; b=Wfm87/b7LgZOSO6xWyUu5fM8XIF0VWpPzYdVXNXEPPviUD5Oa7GwdRVQIXer3/nDru 19ALFTh2aUxwhdRSmJAFF5CD+zKvzBYA6qsbvoWqjDfL8wl2S2le//NCrU61pfWL9rV6 5RBpAnFT2mwsksxVWtxq+aSgWjOGmjCsRf4X3/fEhVQMDUrxydWhC+TREmHgZwovY6Ep JfVTDaBaGDFZvM+a1PkZWu5yoHP/BnCeFyFJbNfUemXFFyvjqqDtUDYHByKpqxMtRd61 LARXc2/27qASGSmSXOfbYosEHkEbjmhAW1q98yuRKMQQtXvSLBXoHd8/unBBXQkQ0R96 0SfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LXweUw4tQdn27/jSnpV1Ev5TqrlgfDxW+xgaSAQAVUI=; b=kpFoOrvON+KxXQtALtDx4vQqdgqo4N+3+7sot7k8Rg5sfoXXBi5Q2VebFuOXJQx1Wd WW0iIVoymBuOAjQFof1vpG0K73Q8Od2JQypskRFesCMT9VfaWVVLKS9Qv337l7B9o0qf 0RoU6dQz2AzE+w7CS8gvEDfGQM59OyCFWtoKV3YS0Ok5J1ruZhrv/wbl9Hjn84mnuFz2 c0ppSp8JVnQUXHPY++oA4SGAdWxlo25BzHBKy54q6sgA4G3wvqtWkMO6wKaJjUxqsydd LppwAp5IMzbETvg1eQPoKloGpXxwOPqWwhOXkK7uw7aMGHBJaO+eDi7IprWqPV7+vUOV Yhsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="fCwl/r/R"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id do9-20020a170906c10900b006f3a97bb895si9541806ejc.588.2022.05.20.16.30.50; Fri, 20 May 2022 16:31:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="fCwl/r/R"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241250AbiESPnI (ORCPT + 99 others); Thu, 19 May 2022 11:43:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241391AbiESPlz (ORCPT ); Thu, 19 May 2022 11:41:55 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BF0B5B884; Thu, 19 May 2022 08:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652974914; x=1684510914; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xibCq6JJQorznQt92fCArgufvi2t+qiQBqRcvVaVpXo=; b=fCwl/r/RPlg3gyaEsOh2Laxmp+Sf+/oKexB82qBb9Ow/9iEVaOqIwQjc K+Ytgtegx+qlovdZKH9kjepIyd9MaFlythqZ5lRnJQR6+3/BBKHkuv+LY 2p0Ha3ZT/jOP2iegSaywri6mJ/UeKkwKxOdq9Olp2ySxl9rs107BHa3YN w4y7x89x8Ct82WbGd39eJfWm7Ugr4S/7Tpk77etaVfL528hh3NLjBKknf 0k5hXdUN1ri5frGeYFbVa2u81JFv8FoZ6p0Xuk7sL2E8SDBFQ1cQUg0Tz ow1s3gETEjZdJyugq9CRfKKMDxjNmFlE83wzGnMpjkZDtOwtseRGpW1HT g==; X-IronPort-AV: E=McAfee;i="6400,9594,10352"; a="252143283" X-IronPort-AV: E=Sophos;i="5.91,237,1647327600"; d="scan'208";a="252143283" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2022 08:41:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,237,1647327600"; d="scan'208";a="598635417" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 19 May 2022 08:41:43 -0700 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com Subject: [PATCH v6 6/8] KVM: Handle page fault for private memory Date: Thu, 19 May 2022 23:37:11 +0800 Message-Id: <20220519153713.819591-7-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220519153713.819591-1-chao.p.peng@linux.intel.com> References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_PASS, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A page fault can carry the information of whether the access if private or not for KVM_MEM_PRIVATE memslot, this can be filled by architecture code(like TDX code). To handle page faut for such access, KVM maps the page only when this private property matches host's view on this page which can be decided by checking whether the corresponding page is populated in the private fd or not. A page is considered as private when the page is populated in the private fd, otherwise it's shared. For a successful match, private pfn is obtained with memfile_notifier callbacks from private fd and shared pfn is obtained with existing get_user_pages. For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to userspace. Userspace then can convert memory between private/shared from host's view then retry the access. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/mmu.c | 70 +++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 17 ++++++++ arch/x86/kvm/mmu/mmutrace.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 5 ++- include/linux/kvm_host.h | 22 +++++++++++ 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 7e258cc94152..c84835762249 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -176,6 +176,7 @@ struct kvm_page_fault { /* Derived from mmu and global state. */ const bool is_tdp; + const bool is_private; const bool nx_huge_page_workaround_enabled; /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index afe18d70ece7..e18460e0d743 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2899,6 +2899,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; + if (kvm_slot_is_private(slot)) + return max_level; + host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot); return min(host_level, max_level); } @@ -3948,10 +3951,54 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } +static inline u8 order_to_level(int order) +{ + enum pg_level level; + + for (level = KVM_MAX_HUGEPAGE_LEVEL; level > PG_LEVEL_4K; level--) + if (order >= page_level_shift(level) - PAGE_SHIFT) + return level; + return level; +} + +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + int order; + struct kvm_memory_slot *slot = fault->slot; + bool private_exist = !kvm_private_mem_get_pfn(slot, fault->gfn, + &fault->pfn, &order); + + if (fault->is_private != private_exist) { + if (private_exist) + kvm_private_mem_put_pfn(slot, fault->pfn); + + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + if (fault->is_private) + vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE; + else + vcpu->run->memory.flags = 0; + vcpu->run->memory.padding = 0; + vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; + vcpu->run->memory.size = PAGE_SIZE; + return RET_PF_USER; + } + + if (fault->is_private) { + fault->max_level = min(order_to_level(order), fault->max_level); + fault->map_writable = !(slot->flags & KVM_MEM_READONLY); + return RET_PF_FIXED; + } + + /* Fault is shared, fallthrough to the standard path. */ + return RET_PF_CONTINUE; +} + static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; bool async; + int r; /* * Retry the page fault if the gfn hit a memslot that is being deleted @@ -3980,6 +4027,12 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return RET_PF_EMULATE; } + if (kvm_slot_is_private(slot)) { + r = kvm_faultin_pfn_private(vcpu, fault); + if (r != RET_PF_CONTINUE) + return r == RET_PF_FIXED ? RET_PF_CONTINUE : r; + } + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, fault->write, &fault->map_writable, @@ -4028,8 +4081,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu)) return true; - return fault->slot && - mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva); + if (fault->is_private) + return mmu_notifier_retry(vcpu->kvm, mmu_seq); + else + return fault->slot && + mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva); } static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -4088,7 +4144,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault read_unlock(&vcpu->kvm->mmu_lock); else write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); + + if (fault->is_private) + kvm_private_mem_put_pfn(fault->slot, fault->pfn); + else + kvm_release_pfn_clean(fault->pfn); + return r; } @@ -5372,6 +5433,9 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, return -EIO; } + if (r == RET_PF_USER) + return 0; + if (r < 0) return r; if (r != RET_PF_EMULATE) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index c0e502b17ef7..14932cf97655 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -147,6 +147,7 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head); * RET_PF_RETRY: let CPU fault again on the address. * RET_PF_EMULATE: mmio page fault, emulate the instruction directly. * RET_PF_INVALID: the spte is invalid, let the real page fault path update it. + * RET_PF_USER: need to exit to userspace to handle this fault. * RET_PF_FIXED: The faulting entry has been fixed. * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU. * @@ -163,6 +164,7 @@ enum { RET_PF_RETRY, RET_PF_EMULATE, RET_PF_INVALID, + RET_PF_USER, RET_PF_FIXED, RET_PF_SPURIOUS, }; @@ -178,4 +180,19 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); +#ifndef CONFIG_HAVE_KVM_PRIVATE_MEM +static inline int kvm_private_mem_get_pfn(struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, int *order) +{ + WARN_ON_ONCE(1); + return -EOPNOTSUPP; +} + +static inline void kvm_private_mem_put_pfn(struct kvm_memory_slot *slot, + kvm_pfn_t pfn) +{ + WARN_ON_ONCE(1); +} +#endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */ + #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index ae86820cef69..2d7555381955 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -58,6 +58,7 @@ TRACE_DEFINE_ENUM(RET_PF_CONTINUE); TRACE_DEFINE_ENUM(RET_PF_RETRY); TRACE_DEFINE_ENUM(RET_PF_EMULATE); TRACE_DEFINE_ENUM(RET_PF_INVALID); +TRACE_DEFINE_ENUM(RET_PF_USER); TRACE_DEFINE_ENUM(RET_PF_FIXED); TRACE_DEFINE_ENUM(RET_PF_SPURIOUS); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 7f8f1c8dbed2..1d857919a947 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -878,7 +878,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault out_unlock: write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); + if (fault->is_private) + kvm_private_mem_put_pfn(fault->slot, fault->pfn); + else + kvm_release_pfn_clean(fault->pfn); return r; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3fd168972ecd..b0a7910505ed 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2241,4 +2241,26 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM +static inline int kvm_private_mem_get_pfn(struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, int *order) +{ + int ret; + pfn_t pfnt; + pgoff_t index = gfn - slot->base_gfn + + (slot->private_offset >> PAGE_SHIFT); + + ret = slot->notifier.bs->get_lock_pfn(slot->private_file, index, &pfnt, + order); + *pfn = pfn_t_to_pfn(pfnt); + return ret; +} + +static inline void kvm_private_mem_put_pfn(struct kvm_memory_slot *slot, + kvm_pfn_t pfn) +{ + slot->notifier.bs->put_unlock_pfn(pfn_to_pfn_t(pfn)); +} +#endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */ + #endif -- 2.25.1