Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2620356rdg; Mon, 16 Oct 2023 09:35:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGhvE5LPQUR8yRh5sVAJq9WJ3FE8jga1OUpUw6mc11cJ2APVX/VH/w7TXn7UdI+8Y+rdKcw X-Received: by 2002:a05:6a00:1399:b0:68e:351b:15b9 with SMTP id t25-20020a056a00139900b0068e351b15b9mr10287594pfg.4.1697474123279; Mon, 16 Oct 2023 09:35:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697474123; cv=none; d=google.com; s=arc-20160816; b=anqLubrktkFOwOOUjTfIO5/cHphc21E74ap403z4qtZGtl0eMRpTPmx+ucRwSw1ULx Yyj9aa8HyXuUa6w52DKgbWNoXyl64rWhb8Nac9aCDwIcRLRoY39Y5b2fwXtDdxbNNVJM bGSOtDOebZUVGV4XIDeJ8h4BfJ+MoE4+CZnVpOEGDpU1AEccsuRuMWME0oMXgieXjZ/D JMjW899rST4O9u1DpTq+R7KAl60joW0zAUr5knHYhmxqLeDk0Ww2lTddHyCjqP9k0/Pq caieFo8nI8HuPOMqnaQsoptdyrFG+rwrUwMifLsAI5+Ext5AO81chIzKmTtR25zMv4Hf kuyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FkeSxOUFGvrO9TUJI/sh7UPqQ+Ee7Hh89QeyUlLIEmY=; fh=PgPONM/W1VtmLglMgmd5Kb29Va4E/NEsDAvR3ZLGW2Y=; b=o4FHVLzvRW70gWa9pro8u4lkjOSTiWA0zJZMerTOsqVK5c+XbioQQ7XAQUbI0vj7JV SETw5DAbtn11cqh3GykPG5ZxFMhX02ZrBh27LFKYqeWL1CyEX5oKtmVkAvmiq+pBx7A0 Alr+ECnq3tqzRNhcNedYXnIAknrQSmrkobAbI8RX2xYGcXq5ENSmWt0Bdb/n0x58nCei uwmj9DoVDJjSNrkK3LY7BznZ68sZtMh78cB0OMKbk2/vzWuhKMHV2jHOxxHUg+DQxEO1 hGBkaHNjwh6NQ8Hvu9rCZi7SbmMQVlv6SNIWlxkxw1oWDOKoHJCFIt/k0O6nYvh7qfo2 i3NA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=geQE2Y5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id g3-20020aa796a3000000b006bc2cf9872fsi139003pfk.317.2023.10.16.09.35.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 09:35:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=geQE2Y5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id B10A38021EEF; Mon, 16 Oct 2023 09:35:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234285AbjJPQei (ORCPT + 99 others); Mon, 16 Oct 2023 12:34:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233895AbjJPQdu (ORCPT ); Mon, 16 Oct 2023 12:33:50 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46B728277; Mon, 16 Oct 2023 09:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697473397; x=1729009397; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zjEOSTfiX1VxaI5ktnpSnlZHAx5ak9uNUJUeujoIXdY=; b=geQE2Y5O6/WZf8aYXVq+V97Bq9oNJkXGHpu9MzrGGZNMFYqSJp+5uhlu d77fhmeIIsCfY9WHZbvyLdPvZRf20YyU/22Iu4c1jXb4ZMZDugCQ1gwee ZyMNjwDBGWQzUxUrHjo6ZXr6JToEsDS4oi7LoaqiCJQ5D3GDT3HTwPTrd lc/5yZyzrYhVHLhx32drXajzsSYRzdRi+tPBeMTRXgf24Hs6ToghcvD1S 3u84H1HXKy4oVPxislKewzWFH+15GoyNqIMu2rFd4jEoVZheIVy2DsNmh A2qrMboGvk7spwTq/N6QMbYT6kwkhwIeBw2pRx6v6I7WEAUrnXbeO8/UO w==; X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="471793195" X-IronPort-AV: E=Sophos;i="6.03,229,1694761200"; d="scan'208";a="471793195" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 09:21:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="899569259" X-IronPort-AV: E=Sophos;i="6.03,229,1694761200"; d="scan'208";a="899569259" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 09:19:17 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com, Xiaoyao Li Subject: [RFC PATCH v5 12/16] KVM: x86/tdp_mmu, TDX: Split a large page when 4KB page within it converted to shared Date: Mon, 16 Oct 2023 09:21:03 -0700 Message-Id: <3606f99cd9b083cdf44b7bfc81a524ed5ab85031.1697473009.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 16 Oct 2023 09:35:20 -0700 (PDT) From: Xiaoyao Li When mapping the shared page for TDX, it needs to zap private alias. In the case that private page is mapped as large page (2MB), it can be removed directly only when the whole 2MB is converted to shared. Otherwise, it has to split 2MB page into 512 4KB page, and only remove the pages that converted to shared. When a present large leaf spte switches to present non-leaf spte, TDX needs to split the corresponding SEPT page to reflect it. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu/tdp_mmu.c | 21 ++++++++++++++++----- arch/x86/kvm/vmx/tdx.c | 25 +++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx_arch.h | 1 + arch/x86/kvm/vmx/tdx_ops.h | 7 +++++++ 6 files changed, 50 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index ba74cb7199b3..d751c8b58c45 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -103,6 +103,7 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL(link_private_spt) KVM_X86_OP_OPTIONAL(free_private_spt) +KVM_X86_OP_OPTIONAL(split_private_spt) KVM_X86_OP_OPTIONAL(set_private_spte) KVM_X86_OP_OPTIONAL(remove_private_spte) KVM_X86_OP_OPTIONAL(zap_private_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bb2b4f8c0c57..d9ada89c0c55 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1745,6 +1745,8 @@ struct kvm_x86_ops { void *private_spt); int (*free_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *private_spt); + int (*split_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + void *private_spt); int (*set_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn); int (*remove_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 366f1b58af09..6828394cedec 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -647,23 +647,34 @@ static int __must_check __set_private_spte_present(struct kvm *kvm, tdp_ptep_t s { bool was_present = is_shadow_present_pte(old_spte); bool is_present = is_shadow_present_pte(new_spte); + bool was_leaf = was_present && is_last_spte(old_spte, level); bool is_leaf = is_present && is_last_spte(new_spte, level); kvm_pfn_t new_pfn = spte_to_pfn(new_spte); + void *private_spt; int ret = 0; lockdep_assert_held(&kvm->mmu_lock); - /* TDP MMU doesn't change present -> present */ - KVM_BUG_ON(was_present, kvm); /* * Use different call to either set up middle level * private page table, or leaf. */ - if (is_leaf) + if (level > PG_LEVEL_4K && was_leaf && !is_leaf) { + /* + * splitting large page into 4KB. + * tdp_mmu_split_huage_page() => tdp_mmu_link_sp() + */ + private_spt = get_private_spt(gfn, new_spte, level); + KVM_BUG_ON(!private_spt, kvm); + ret = static_call(kvm_x86_zap_private_spte)(kvm, gfn, level); + kvm_flush_remote_tlbs(kvm); + if (!ret) + ret = static_call(kvm_x86_split_private_spt)(kvm, gfn, + level, private_spt); + } else if (is_leaf) ret = static_call(kvm_x86_set_private_spte)(kvm, gfn, level, new_pfn); else { - void *private_spt = get_private_spt(gfn, new_spte, level); - + private_spt = get_private_spt(gfn, new_spte, level); KVM_BUG_ON(!private_spt, kvm); ret = static_call(kvm_x86_link_private_spt)(kvm, gfn, level, private_spt); } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 2c760947ab21..1db56696ad99 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1661,6 +1661,28 @@ static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, return 0; } +static int tdx_sept_split_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn) & KVM_HPAGE_MASK(level); + hpa_t hpa = __pa(private_spt); + struct tdx_module_args out; + u64 err; + + /* See comment in tdx_sept_set_private_spte() */ + err = tdh_mem_page_demote(kvm_tdx->tdr_pa, gpa, tdx_level, hpa, &out); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MEM_PAGE_DEMOTE, err, &out); + return -EIO; + } + + return 0; +} + static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level) { @@ -1674,8 +1696,6 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, if (unlikely(!is_hkid_assigned(kvm_tdx))) return 0; - /* For now large page isn't supported yet. */ - WARN_ON_ONCE(level != PG_LEVEL_4K); err = tdh_mem_range_block(kvm_tdx->tdr_pa, gpa, tdx_level, &out); if (unlikely(err == TDX_ERROR_SEPT_BUSY)) return -EAGAIN; @@ -3283,6 +3303,7 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops) x86_ops->link_private_spt = tdx_sept_link_private_spt; x86_ops->free_private_spt = tdx_sept_free_private_spt; + x86_ops->split_private_spt = tdx_sept_split_private_spt; x86_ops->set_private_spte = tdx_sept_set_private_spte; x86_ops->remove_private_spte = tdx_sept_remove_private_spte; x86_ops->zap_private_spte = tdx_sept_zap_private_spte; diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 93934851610b..0c9823fcf829 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -21,6 +21,7 @@ #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 #define TDH_MNG_RD 11 +#define TDH_MEM_PAGE_DEMOTE 15 #define TDH_MR_EXTEND 16 #define TDH_MR_FINALIZE 17 #define TDH_VP_FLUSH 18 diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index afc85e7ffb8e..7293510fa2e5 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -183,6 +183,13 @@ static inline u64 tdh_mng_rd(hpa_t tdr, u64 field, struct tdx_module_args *out) return tdx_seamcall(TDH_MNG_RD, tdr, field, 0, 0, out); } +static inline u64 tdh_mem_page_demote(hpa_t tdr, gpa_t gpa, int level, hpa_t page, + struct tdx_module_args *out) +{ + tdx_clflush_page(page, PG_LEVEL_4K); + return tdx_seamcall_sept(TDH_MEM_PAGE_DEMOTE, gpa | level, tdr, page, 0, out); +} + static inline u64 tdh_mr_extend(hpa_t tdr, gpa_t gpa, struct tdx_module_args *out) { -- 2.25.1