Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1546345rwd; Thu, 8 Jun 2023 21:17:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5VMxOGHf/0hXFfeWOuAKRzrExfQHH6lb5XxcsAosSGIHEy2M8c63G/wNAa1+4zYAVonDFZ X-Received: by 2002:a05:6830:130c:b0:6b1:655c:376 with SMTP id p12-20020a056830130c00b006b1655c0376mr411009otq.7.1686284243164; Thu, 08 Jun 2023 21:17:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686284243; cv=none; d=google.com; s=arc-20160816; b=Iz1K27PMFxnfqhCIALymLb+Er5F+Nyw92e8QzodNJDFHBogsnhG8tOzLngNfOxy/gW fqc+U5jr8FMtHdLIBpZShnkwr4E/15CnzhiT5zTRRmsRkURfYjwMlbzOR99czrjrZlUc FINAR4pdDuv5rIN4Y6N/ZuOADA1ibnPHAWgbIsgT42I6Q3jHIPk3+at/eDaxCBKcGe+m Oztvmttf2zGNxLSpnPt9hAEFK8A6QNst8iikYyoD829lDMz7viXJKXiytomFnUrvhDa1 cDxxFbsPfqZNSdjv+A/ZLuVOOa9z8khNKK3aHbitorZtiXZP5c4JNksVTIe2nAhlnlPY VW8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=3EqxgslQB9O0cCwT+yzQ6UZDceDlITXERUnaouaZfiQ=; b=iMrI7ZmGNbn8p92T/1NGn4GjjnCbeLCWysiRYdrlxVXfyZrX0FuwYYmFb30znwxSUC 7PY4y/jJ/z4oMnqp+LhOW5HFtwi6O+T28XUbZ1MH/tnDUFLpNYLVggCC+8oiwHqP4wiK cg0sn39wfsdFCvZnrHQt4ubBtUr8rkHKdLjn9oKbdJFh69rXSDmax3Z0Q3OF1kq5bNob 7C0CgBe/1Xttixuq19cVpKLxSpIWMyHr5L+a/76vXTkIgg7jyRQvijmXQ5jKnvGpCtGF W89JJpPknuU43SVfdYKE9xbcmabOfDnG3p2O4N+lliYCCs2mlBRlNojcSuF+zyAPFzVp hWCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XKsh24D5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f30-20020aa79d9e000000b0063b60063f55si1827080pfq.382.2023.06.08.21.17.07; Thu, 08 Jun 2023 21:17:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XKsh24D5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238169AbjFIEBT (ORCPT + 99 others); Fri, 9 Jun 2023 00:01:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229503AbjFIEBQ (ORCPT ); Fri, 9 Jun 2023 00:01:16 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F079430E1; Thu, 8 Jun 2023 21:01:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686283272; x=1717819272; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=2EcSjpulKjk5XFbCfBa9nzXMzA3JLe/W3YU4+K6Ux5o=; b=XKsh24D5v8v2GduTx/USiYMRPOgeQvy5ErDQqUZKOIISM7KII+n26c9u mAy9MQilQY4wW3Y3PBzvUzUl+m1SCAUaOTb5UrGWBeWV/x2sfH67TYsp2 tNgc3gNcm5K/zO92x+nwg9rzj32Eba1tsizTQK9YyNhbWjUQ1smZxgMyv m28LtgyAcf1TP0fIP+RXzsbk0Xl92FZPAEokRNVB08CkiWUC9XXIzMDwq /YbrWT9u/cJURS8XRyG56zYtFizfyaZmtp9RWv6g8MSkuyn9m2mY4z0Nm cBz3rnQOoKtUbd5YngauNbTU6m5XRnFxmLLkSbhzO7R0L71k8rqxh1DZy g==; X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="337863856" X-IronPort-AV: E=Sophos;i="6.00,228,1681196400"; d="scan'208";a="337863856" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2023 21:01:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="687606956" X-IronPort-AV: E=Sophos;i="6.00,228,1681196400"; d="scan'208";a="687606956" Received: from rahmanma-mobl1.amr.corp.intel.com (HELO [10.251.4.72]) ([10.251.4.72]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2023 21:01:11 -0700 Message-ID: <77187035-52c0-77b4-7c9e-d09ece77e0ce@linux.intel.com> Date: Thu, 8 Jun 2023 21:01:11 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.11.0 Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions Content-Language: en-US To: Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, david@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com References: <927ec9871721d2a50f1aba7d1cf7c3be50e4f49b.1685887183.git.kai.huang@intel.com> From: Sathyanarayanan Kuppuswamy In-Reply-To: <927ec9871721d2a50f1aba7d1cf7c3be50e4f49b.1685887183.git.kai.huang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/4/23 7:27 AM, Kai Huang wrote: > Start to transit out the "multi-steps" to construct a list of "TD Memory > Regions" (TDMRs) to cover all TDX-usable memory regions. > > The kernel configures TDX-usable memory regions by passing a list of > TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains > the information of the base/size of a memory region, the base/size of the > associated Physical Address Metadata Table (PAMT) and a list of reserved > areas in the region. > > Do the first step to fill out a number of TDMRs to cover all TDX memory > regions. To keep it simple, always try to use one TDMR for each memory > region. As the first step only set up the base/size for each TDMR. As a first step? > > Each TDMR must be 1G aligned and the size must be in 1G granularity. > This implies that one TDMR could cover multiple memory regions. If a > memory region spans the 1GB boundary and the former part is already > covered by the previous TDMR, just use a new TDMR for the remaining > part. > > TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs > are consumed but there is more memory region to cover. > > There are fancier things that could be done like trying to merge > adjacent TDMRs. This would allow more pathological memory layouts to be > supported. But, current systems are not even close to exhausting the > existing TDMR resources in practice. For now, keep it simple. > > Signed-off-by: Kai Huang > --- > > v10 -> v11: > - No update > > v9 -> v10: > - No change. > > v8 -> v9: > > - Added the last paragraph in the changelog (Dave). > - Removed unnecessary type cast in tdmr_entry() (Dave). > > > --- > arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 93 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 7a20c72361e7..fa9fa8bc581a 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); > } > > +/* Get the TDMR from the list at the given index. */ > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, > + int idx) > +{ > + int tdmr_info_offset = tdmr_list->tdmr_sz * idx; > + > + return (void *)tdmr_list->tdmrs + tdmr_info_offset; > +} > + > +#define TDMR_ALIGNMENT BIT_ULL(30) > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) This macro is never used. Maybe you can drop it from this patch. > +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) > +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) > + > +static inline u64 tdmr_end(struct tdmr_info *tdmr) > +{ > + return tdmr->base + tdmr->size; > +} > + > +/* > + * Take the memory referenced in @tmb_list and populate the > + * preallocated @tdmr_list, following all the special alignment > + * and size rules for TDMR. > + */ > +static int fill_out_tdmrs(struct list_head *tmb_list, > + struct tdmr_info_list *tdmr_list) > +{ > + struct tdx_memblock *tmb; > + int tdmr_idx = 0; > + > + /* > + * Loop over TDX memory regions and fill out TDMRs to cover them. > + * To keep it simple, always try to use one TDMR to cover one > + * memory region. > + * > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to > + * cover all memory regions in reality if the admin doesn't use > + * 'memmap' to create a bunch of discrete memory regions. When > + * there's a real problem, enhancement can be done to merge TDMRs > + * to reduce the final number of TDMRs. > + */ > + list_for_each_entry(tmb, tmb_list, list) { > + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx); > + u64 start, end; > + > + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn)); > + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn)); > + > + /* > + * A valid size indicates the current TDMR has already > + * been filled out to cover the previous memory region(s). > + */ > + if (tdmr->size) { > + /* > + * Loop to the next if the current memory region > + * has already been fully covered. > + */ > + if (end <= tdmr_end(tdmr)) > + continue; > + > + /* Otherwise, skip the already covered part. */ > + if (start < tdmr_end(tdmr)) > + start = tdmr_end(tdmr); > + > + /* > + * Create a new TDMR to cover the current memory > + * region, or the remaining part of it. > + */ > + tdmr_idx++; > + if (tdmr_idx >= tdmr_list->max_tdmrs) { > + pr_warn("initialization failed: TDMRs exhausted.\n"); > + return -ENOSPC; > + } > + > + tdmr = tdmr_entry(tdmr_list, tdmr_idx); > + } > + > + tdmr->base = start; > + tdmr->size = end - start; > + } > + > + /* @tdmr_idx is always the index of last valid TDMR. */ > + tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1; > + > + return 0; > +} > + > /* > * Construct a list of TDMRs on the preallocated space in @tdmr_list > * to cover all TDX memory regions in @tmb_list based on the TDX module > @@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list, > struct tdmr_info_list *tdmr_list, > struct tdsysinfo_struct *sysinfo) > { > + int ret; > + > + ret = fill_out_tdmrs(tmb_list, tdmr_list); > + if (ret) > + return ret; > + > /* > * TODO: > * > - * - Fill out TDMRs to cover all TDX memory regions. > * - Allocate and set up PAMTs for each TDMR. > * - Designate reserved areas for each TDMR. > * Rest looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan -- Sathyanarayanan Kuppuswamy Linux Kernel Developer