Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2621798rdg; Mon, 16 Oct 2023 09:37:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEVlYpHFb5glJXWqdhWV6zZSvAzCwiJ0LlXk+VyAKova1CCod+WCJTe/xlC3UlrzAto42hu X-Received: by 2002:a05:6a20:3d81:b0:174:210c:34b2 with SMTP id s1-20020a056a203d8100b00174210c34b2mr14161326pzi.19.1697474272937; Mon, 16 Oct 2023 09:37:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697474272; cv=none; d=google.com; s=arc-20160816; b=PaMk3vBOr7swq0zcCWKVwRl1XVLwHxuYwQla18YJxRZ46twB7Op83lUw16JJCmguuB vYHtpuJsCmXkc6V198M7EZJl17rcLRn2DQFNhsoKkNGMOYpxdus+zJ7SfDWruzQxn2ew ajANi+aL3/shiQdw2yLu9WnI+m3Anv+JYA+salkCxC2c/oN3kEo2QKH3v+0RsVH2dshS XlJ3NYIGXwGYaEB3aDZJA+iWSVXso5ENpa796JUsf4wg2q3RNN0zi5ZffUKJav9YyoSc s+pLE4UJX/pOX01Yi8WxU4hCxqLwhZblM+kqYN1E1VVmwMAUuFGhM0e7+auF6ukj3L5I EMPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=RIyIC3kuBRUt4t3WV0x7mimhhHJ5VfxXVlqR+OJIvn4=; fh=lRdU2Q/1zx5DcPdZuWBjshA5VT5Oc9cEhB1tCFiV0Nw=; b=dFm4vJ9OCdZhxIALFHy25+tlk26wqRHUi0Qo4Grhc1cg9+BWnlu5yPaINtBENttC/y P0RYVLYKFmxKUx9smOfqrOmJsx+XGxDzN5FgoP9/pHUeUB8Ghr9gExN+bQG38cTh8qZe M8gZ5mYt+2//OwdjA+9NT4dRFQcvZWWvH+tvR5HkLBKnqjMGklKNIVzM/zsfoyRUZlxH n+vZ3e3cByZ8cBD+HpDcuxR5ziEZiXoxDFhNSvgEMePu57kNXc8Ap6ZC0TVCC9WuNDLy 9fl4zWx5sq/HyGXrQID6K13bckJGz/lb2Es4pi2YXKYO9LpSe3iesSV6jkcJC2wPkdGl 6A1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OLi1AuAr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id j6-20020a633c06000000b005aa0e7d39f4si8722589pga.733.2023.10.16.09.37.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 09:37:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OLi1AuAr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1E7A3807E83C; Mon, 16 Oct 2023 09:37:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234431AbjJPQgf (ORCPT + 99 others); Mon, 16 Oct 2023 12:36:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234188AbjJPQfp (ORCPT ); Mon, 16 Oct 2023 12:35:45 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F7227EF5; Mon, 16 Oct 2023 09:23:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697473391; x=1729009391; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=TPd9XHGnB6axlNLqn2ihWH7KXBI9WL9Q+X1Fs74Iff8=; b=OLi1AuArBHChBn33xb/BgFAVszYQu4kgkJhimyIN+ZHzq0YgqVmk+FqN 5JHDUqJqvhR3DUMrj2ogHNy33KAL1Y1AsbHAuwqDvcVZpMmJgJ4U1PTIy ZPtCkgRrjBjwWNw7DN564wNQejHv5k4h85BpIgymvjc1bfAXQDYDTQEqH qDlJPJvOLn4G8yP/EW/jFh4tB9ncLvZpP8d4LHUGZULWxNehTZgVVV7gh jKYjTFapSWvuoJ9LLFJexYuJWarEbrThTFo0tBJu3Hr1b2foJjuFFTMh/ 9blI1kWyFF1y0y3Y2AJJvVsSbWEKyEwQA6LX2Be/mk5XQ9NtVp0LlU9Cs A==; X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="471793104" X-IronPort-AV: E=Sophos;i="6.03,229,1694761200"; d="scan'208";a="471793104" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 09:21:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="899569208" X-IronPort-AV: E=Sophos;i="6.03,229,1694761200"; d="scan'208";a="899569208" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 09:19:11 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [RFC PATCH v5 00/16] KVM TDX: TDP MMU: large page support Date: Mon, 16 Oct 2023 09:20:51 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 16 Oct 2023 09:37:50 -0700 (PDT) From: Isaku Yamahata This patch series is based on "v16 KVM TDX: basic feature support". It implements large page support for TDP MMU by allowing populating of the large page and splitting it when necessary. Feedback for options to merge sub-pages into a large page are welcome. Remaining TODOs =============== * Make nx recovery thread use TDH.MEM.RANGE.BLOCK instead of zapping EPT entry. * Record that the entry is blocked by introducing a bit in spte. On EPT violation, check if the entry is blocked or not. If the EPT violation is caused by a blocked Secure-EPT entry, trigger the page merge logic. Splitting large pages when necessary ==================================== * It already tracking whether GFN is private or shared. When it's changed, update lpage_info to prevent a large page. * TDX provides page level on Secure EPT violation. Pass around the page level that the lower level functions needs. * Even if the page is the large page in the host, at the EPT level, only some sub-pages are mapped. In such cases abandon to map large pages and step into the sub-page level, unlike the conventional EPT. * When zapping spte and the spte is for a large page, split and zap it unlike the conventional EPT because otherwise the protected page contents will be lost. Merging small pages into a large page if possible ================================================= On normal EPT violation, check whether pages can be merged into a large page after mapping it. TDX operation ============= The following describes what TDX operations procedures. * EPT violation trick Such track (zapping the EPT entry to trigger EPT violation) doesn't work for TDX. For TDX, it will lose the contents of the protected page to zap a page because the protected guest page is un-associated from the guest TD. Instead, TDX provides a different way to trigger EPT violation without losing the page contents so that VMM can detect guest TD activity by blocking/unblocking Secure-EPT entry. TDH.MEM.RANGE.BLOCK and TDH.MEM.RANGE.UNBLOCK. They correspond to clearing/setting a present bit in an EPT entry with page contents still kept. By TDH.MEM.RANGE.BLOCK and TLB shoot down, VMM can cause guest TD to trigger EPT violation. After that, VMM can unblock it by TDH.MEM.RANGE.UNBLOCK and resume guest TD execution. The procedure is as follows. - Block Secure-EPT entry by TDH.MEM.RANGE.BLOCK. - TLB shoot down. - Wait for guest TD to trigger EPT violation. - Unblock Secure-EPT entry by TDH.MEM.RANGE.UNBLOCK to resume the guest TD. * merging sub-pages into a large page The following steps are needed. - Ensure that all sub-pages are mapped. - TLB shoot down. - Merge sub-pages into a large page (TDH.MEM.PAGE.PROMOTE). This requires all sub-pages are mapped. - Cache flush Secure EPT page used to map subpages. Thanks, Chnages from v4: - Rebased to v16 TDX KVM v6.6-rc2 base Changes from v3: - Rebased to v15 TDX KVM v6.5-rc1 base Changes from v2: - implemented page merging path - rebased to TDX KVM v11 Changes from v1: - implemented page merging path - rebased to UPM v10 - rebased to TDX KVM v10 - rebased to kvm.git queue + v6.1-rc8 Isaku Yamahata (4): KVM: x86/tdp_mmu: Allocate private page table for large page split KVM: x86/tdp_mmu: Try to merge pages into a large page KVM: x86/tdp_mmu: TDX: Implement merge pages into a large page KVM: x86/mmu: Make kvm fault handler aware of large page of private memslot Xiaoyao Li (12): KVM: TDP_MMU: Go to next level if smaller private mapping exists KVM: TDX: Pass page level to cache flush before TDX SEAMCALL KVM: TDX: Pass KVM page level to tdh_mem_page_add() and tdh_mem_page_aug() KVM: TDX: Pass size to tdx_measure_page() KVM: TDX: Pass size to reclaim_page() KVM: TDX: Update tdx_sept_{set,drop}_private_spte() to support large page KVM: MMU: Introduce level info in PFERR code KVM: TDX: Pin pages via get_page() right before ADD/AUG'ed to TDs KVM: TDX: Pass desired page level in err code for page fault handler KVM: x86/tdp_mmu: Split the large page when zap leaf KVM: x86/tdp_mmu, TDX: Split a large page when 4KB page within it converted to shared KVM: TDX: Allow 2MB large page for TD GUEST arch/x86/include/asm/kvm-x86-ops.h | 3 + arch/x86/include/asm/kvm_host.h | 11 ++ arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 45 +++-- arch/x86/kvm/mmu/mmu_internal.h | 35 +++- arch/x86/kvm/mmu/tdp_iter.c | 37 +++- arch/x86/kvm/mmu/tdp_iter.h | 2 + arch/x86/kvm/mmu/tdp_mmu.c | 283 +++++++++++++++++++++++++++-- arch/x86/kvm/vmx/common.h | 6 +- arch/x86/kvm/vmx/tdx.c | 234 ++++++++++++++++++------ arch/x86/kvm/vmx/tdx_arch.h | 21 +++ arch/x86/kvm/vmx/tdx_errno.h | 2 + arch/x86/kvm/vmx/tdx_ops.h | 50 +++-- arch/x86/kvm/vmx/vmx.c | 2 +- 14 files changed, 611 insertions(+), 121 deletions(-) -- 2.25.1