Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp8043887rwb; Wed, 23 Nov 2022 14:41:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf7larZvNRnIPaYNBLrePRIYknOPA2RQBcfsXqcD6GGJID1Vzl9OpaApGSTQtPdIEE/vm5n/ X-Received: by 2002:a17:906:3396:b0:7b9:78c4:24dc with SMTP id v22-20020a170906339600b007b978c424dcmr5363567eja.754.1669243309434; Wed, 23 Nov 2022 14:41:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669243309; cv=none; d=google.com; s=arc-20160816; b=iztJa3Zb5hVRNDJziKaN+H3AbggOm1/63V81WBkzQ1rAm14l4X+++Wivad3A4+1ffs Fp37oURx4l/ETIrMnVNTgcfrUR+1nNwnHZshPEyVBOjUFMD1Kaksgb6uU6hrS1Cj/otE +mNKuhvkgX/XbpIi9W7iuLynPBMrVnPWxhKApA5xoIVvPljOMMPFXNrKcIXLxi9W57w9 01Zft/lDuJSPnV+uY4a8A+E+BUj+cQ8/9mHjd4OsiMd3E1Vow93i0M7hdcwYio4NIk/0 n9PsDcFmnvOAjqvivfYccbJT0tWAOlHe6PV1i1sd2hbLBQzeGBUMhNWOPCzNN0cGGRXf b++A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=P+DwiY+Gu9p8wZk5FTBhsw90xrj3qSer0dGkomiBslI=; b=u50tfjgINodEhl5a8RacNzJjBCIu4+EWjnZ3ML72kmSYbt3nXvWme3O78R8LEVUb9b XxlFAitxAZLDLGWDBw59u3H0s9u3+WNIF5Tkc6yh3oVXY9phneC8b5IN8DZe2yAbUJEn fqWywrY13mXpHGne1Mmx2uqgy6HyfaOH2XR2Kchz0nO79CV4nxu5V28/OkA9zU44sfwN gMndl1qNr4oprqME/DWjdH1SXD5dIGQUm83U32Lk1Zdi5JsqwGzsnCW/X0C7n4AxcWdh c0Mza/obS1SrbOvAZNSiwPHw9+6m7PNC1dtkb2l9ygny/Yrm1lW5JMatSVfmaGb2wnbR SmAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NF1T7D5A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oz36-20020a1709077da400b007b9eeb4879asi286414ejc.419.2022.11.23.14.41.26; Wed, 23 Nov 2022 14:41:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NF1T7D5A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229536AbiKWWRY (ORCPT + 88 others); Wed, 23 Nov 2022 17:17:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229457AbiKWWRW (ORCPT ); Wed, 23 Nov 2022 17:17:22 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B247213D4B; Wed, 23 Nov 2022 14:17:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669241841; x=1700777841; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=I/BlkC1P7hgj991AnYqsD9bDqASSBCtl+waBVuIqmhQ=; b=NF1T7D5A4MeaiUo6VJttbUN8fyQ82MLlzevZtWb3GeTr8Uy6cpgvOuni +kOV/RZ89LN1J9uRxveY/jR2528BPVr5afDqy8KHC/uzxReV+O8LpdWST k/2+xB25eyOq7CmwGHZ83VZ4wt+TPHkkBs4aRU8sp8DvBBXiDSxWUvQHW iS81qNOUlhA61bmNRWAxLdr7CjxldCOATkNjbJHlbaL3tN4cJYH21pxMW oRQ/G23YJI0/rrMaaNkggUNM+eIxnjBPVBvJK9AhitCauYSFXVSbGYXi5 LPIF/VdzjEb8tqSDZOj3EKtriM4ZNe73iVPBNG+CjgmwNGgGWC5jX0Kfj Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10540"; a="341055931" X-IronPort-AV: E=Sophos;i="5.96,187,1665471600"; d="scan'208";a="341055931" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2022 14:17:21 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10540"; a="592677293" X-IronPort-AV: E=Sophos;i="5.96,187,1665471600"; d="scan'208";a="592677293" Received: from vcbudden-mobl3.amr.corp.intel.com (HELO [10.212.129.67]) ([10.212.129.67]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2022 14:17:20 -0800 Message-ID: <6d4d429a-ade2-771d-0e4c-788bef45041a@intel.com> Date: Wed, 23 Nov 2022 14:17:19 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH v7 11/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Content-Language: en-US To: Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com References: <32c1968fe34c8cf3cb834e3a9966cd2a201efc5b.1668988357.git.kai.huang@intel.com> From: Dave Hansen In-Reply-To: <32c1968fe34c8cf3cb834e3a9966cd2a201efc5b.1668988357.git.kai.huang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/20/22 16:26, Kai Huang wrote: > TDX provides increased levels of memory confidentiality and integrity. > This requires special hardware support for features like memory > encryption and storage of memory integrity checksums. Not all memory > satisfies these requirements. > > As a result, the TDX introduced the concept of a "Convertible Memory s/the TDX introduced/TDX introduces/ > Region" (CMR). During boot, the firmware builds a list of all of the > memory ranges which can provide the TDX security guarantees. The list > of these ranges is available to the kernel by querying the TDX module. > > The TDX architecture needs additional metadata to record things like > which TD guest "owns" a given page of memory. This metadata essentially > serves as the 'struct page' for the TDX module. The space for this > metadata is not reserved by the hardware up front and must be allocated > by the kernel and given to the TDX module. > > Since this metadata consumes space, the VMM can choose whether or not to > allocate it for a given area of convertible memory. If it chooses not > to, the memory cannot receive TDX protections and can not be used by TDX > guests as private memory. > > For every memory region that the VMM wants to use as TDX memory, it sets > up a "TD Memory Region" (TDMR). Each TDMR represents a physically > contiguous convertible range and must also have its own physically > contiguous metadata table, referred to as a Physical Address Metadata > Table (PAMT), to track status for each page in the TDMR range. > > Unlike a CMR, each TDMR requires 1G granularity and alignment. To > support physical RAM areas that don't meet those strict requirements, > each TDMR permits a number of internal "reserved areas" which can be > placed over memory holes. If PAMT metadata is placed within a TDMR it > must be covered by one of these reserved areas. > > Let's summarize the concepts: > > CMR - Firmware-enumerated physical ranges that support TDX. CMRs are > 4K aligned. > TDMR - Physical address range which is chosen by the kernel to support > TDX. 1G granularity and alignment required. Each TDMR has > reserved areas where TDX memory holes and overlapping PAMTs can > be put into. s/put into/represented/ > PAMT - Physically contiguous TDX metadata. One table for each page size > per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G > PAMT. > > As one step of initializing the TDX module, the kernel configures > TDX-usable memory regions by passing an array of TDMRs to the TDX module. > > Constructing the array of TDMRs consists below steps: > > 1) Create TDMRs to cover all memory regions that the TDX module can use; Slight tweak: 1) Create TDMRs to cover all memory regions that the TDX module will use for TD memory The TDX module "uses" more memory than strictly the TMDR's. > 2) Allocate and set up PAMT for each TDMR; > 3) Set up reserved areas for each TDMR. s/Set up/Designate/ > Add a placeholder to construct TDMRs to do the above steps after all > TDX memory regions are verified to be truly convertible. Always free > TDMRs at the end of the initialization (no matter successful or not) > as TDMRs are only used during the initialization. The changelog here actually looks really good to me so far. > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 32af86e31c47..26048c6b0170 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -445,6 +445,63 @@ static int build_tdx_memory(void) > return ret; > } > > +/* Calculate the actual TDMR_INFO size */ > +static inline int cal_tdmr_size(void) I think we can spare the bytes to add "culate" in the function name so we don't think these are California TDMRs. > +{ > + int tdmr_sz; > + > + /* > + * The actual size of TDMR_INFO depends on the maximum number > + * of reserved areas. > + * > + * Note: for TDX1.0 the max_reserved_per_tdmr is 16, and > + * TDMR_INFO size is aligned up to 512-byte. Even it is > + * extended in the future, it would be insane if TDMR_INFO > + * becomes larger than 4K. The tdmr_sz here should never > + * overflow. > + */ > + tdmr_sz = sizeof(struct tdmr_info); > + tdmr_sz += sizeof(struct tdmr_reserved_area) * > + tdx_sysinfo.max_reserved_per_tdmr; First, I think 'tdx_sysinfo' should probably be a local variable in init_tdx_module() and have its address passed in here. Having global variables always makes it more opaque about who is initializing it. Second, if this code is making assumptions about 'max_reserved_per_tdmr', then let's actually add assertions or sanity checks. For instance: if (tdx_sysinfo.max_reserved_per_tdmr > MAX_TDMRS) return -1; or even: if (tdmr_sz > PAGE_SIZE) return -1; It does almost no good to just assert what the limits are in a comment. > + /* > + * TDX requires each TDMR_INFO to be 512-byte aligned. Always > + * round up TDMR_INFO size to the 512-byte boundary. > + */ More silly comments. The place to document this is TDMR_INFO_ALIGNMENT. If anyone wants to know what the alignment is, exactly, they can look at the definition. They don't need to be told *TWICE* what TDMR_INFO_ALIGNMENT #defines to in one comment. > + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); > +} > + > +static struct tdmr_info *alloc_tdmr_array(int *array_sz) > +{ > + /* > + * TDX requires each TDMR_INFO to be 512-byte aligned. > + * Use alloc_pages_exact() to allocate all TDMRs at once. > + * Each TDMR_INFO will still be 512-byte aligned since > + * cal_tdmr_size() always returns 512-byte aligned size. > + */ OK, I think you're just trolling me now. Two *MORE* mentions of the 512-byte alignment? > + *array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs; > + > + /* > + * Zero the buffer so 'struct tdmr_info::size' can be > + * used to determine whether a TDMR is valid. > + * > + * Note: for TDX1.0 the max_tdmrs is 64 and TDMR_INFO size > + * is 512-byte. Even they are extended in the future, it > + * would be insane if the total size exceeds 4MB. > + */ > + return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); > +} This looks massively over complicated. Get rid of this function entirely. Then create: static int tdmr_array_size(void) { return tdmr_size_single() * tdx_sysinfo.max_tdmrs; } The *caller* can do: tdmr_array = alloc_pages_exact(tdmr_array_size(), GFP_KERNEL | __GFP_ZERO); if (!tdmr_array) { ... Then the error path is: free_pages_exact(tdmr_array, tdmr_array_size()); Then, there are no size pointers going back and forth. Easy peasy. I'm OK with a little arithmetic being repeated. > +/* > + * Construct an array of TDMRs to cover all TDX memory ranges. > + * The actual number of TDMRs is kept to @tdmr_num. > + */ > +static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) > +{ > + /* Return -EINVAL until constructing TDMRs is done */ > + return -EINVAL; > +} > + > /* > * Detect and initialize the TDX module. > * > @@ -454,6 +511,9 @@ static int build_tdx_memory(void) > */ > static int init_tdx_module(void) > { > + struct tdmr_info *tdmr_array; > + int tdmr_array_sz; > + int tdmr_num; I tend to write these like: "tdmr_num" is the number of *a* TDMR. "nr_tdmrs" is the number of TDMRs. > int ret; > > /* > @@ -506,11 +566,34 @@ static int init_tdx_module(void) > ret = build_tdx_memory(); > if (ret) > goto out; > + > + /* Prepare enough space to construct TDMRs */ > + tdmr_array = alloc_tdmr_array(&tdmr_array_sz); > + if (!tdmr_array) { > + ret = -ENOMEM; > + goto out_free_tdx_mem; > + } > + > + /* Construct TDMRs to cover all TDX memory ranges */ > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > + if (ret) > + goto out_free_tdmrs; > + > /* > * Return -EINVAL until all steps of TDX module initialization > * process are done. > */ > ret = -EINVAL; > +out_free_tdmrs: > + /* > + * The array of TDMRs is freed no matter the initialization is > + * successful or not. They are not needed anymore after the > + * module initialization. > + */ > + free_pages_exact(tdmr_array, tdmr_array_sz); > +out_free_tdx_mem: > + if (ret) > + free_tdx_memory(); > out: > /* > * Memory hotplug checks the hot-added memory region against the > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h > index 8e273756098c..a737f2b51474 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.h > +++ b/arch/x86/virt/vmx/tdx/tdx.h > @@ -80,6 +80,29 @@ struct tdsysinfo_struct { > }; > } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); > > +struct tdmr_reserved_area { > + u64 offset; > + u64 size; > +} __packed; > + > +#define TDMR_INFO_ALIGNMENT 512 > + > +struct tdmr_info { > + u64 base; > + u64 size; > + u64 pamt_1g_base; > + u64 pamt_1g_size; > + u64 pamt_2m_base; > + u64 pamt_2m_size; > + u64 pamt_4k_base; > + u64 pamt_4k_size; > + /* > + * Actual number of reserved areas depends on > + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. > + */ > + struct tdmr_reserved_area reserved_areas[0]; > +} __packed __aligned(TDMR_INFO_ALIGNMENT); > + > /* > * Do not put any hardware-defined TDX structure representations below > * this comment!