Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1268879iob; Fri, 29 Apr 2022 01:15:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz9tx1hLG4pFVMWNvoSaER/9Tw2XNjEiE6ctA6wc3GRHFJf8wO4rVJjKmFK2LX6HFPwSbqw X-Received: by 2002:a63:ed45:0:b0:399:5116:312a with SMTP id m5-20020a63ed45000000b003995116312amr30598008pgk.611.1651220114574; Fri, 29 Apr 2022 01:15:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651220114; cv=none; d=google.com; s=arc-20160816; b=xsw6ttqN/GVjPJWtmF1ckrDG6dy0LCXPfgRXcL/8t4qjEGEDC8DrXnRHKj81ZJvNp3 0TdvmukciiVrhuQ6PWVEUNU14cJtymr5VsBE7hZwX4EV+Nrc5J+hfAZADR2+gDf4OGKq X0TMKTyDXFuv6dNH5x2JJShhTp11mSve2j/GeIFwbYZKQ/xDcgMBRlYg8oHyP7rXa8iD to3SRNE7VViPRnH7BtvKYaze59h+k4rb5+O9OEeGoMYdechOHruXULO99ZzhOMauEtn2 rgy3oHcsdPtVPoTAYAkxwrXlDEiZkDxdIUl69kLGiWQ85jT+PztA76grsdwBYjUAfnt3 3Hyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=QiuOgQpfYboUdsHK3/6iRHPTelIPwErDrImEsZ6rN7M=; b=lUCvXuQMtCH/LMrzJjtbvQJh9g1WJ3r7VHe3r/n5atIDLElRqDUy+4ywhAFRiMT/hI TEhltph7oiXwfKK0JbGKtNuJYg74ev7TE3LuL1l8+p37sAREhqSJANCnbqdPwSV3sUWx t7840boz7UpbkqEntNWOE9+Nc7oGSnI8xoAlRsmCGvQF992BnMACjRj3AmFtf1jrwvOl UHmiOze8kx6DxXv9EFm2lE9Ym7S82Q7Ka5wmMqxsH0iSBu8GI4r97dHv5U/iUKY9hXyX gMhIshTC6qzOoV8tdA0KqLmLHgG2cj4WayjJcMHZDPVEDT2nxDBlc7l+98oweoPN5Gzz 7eWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LPYuZkn9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m14-20020a63710e000000b003ab22453625si6770595pgc.746.2022.04.29.01.15.00; Fri, 29 Apr 2022 01:15:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LPYuZkn9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355159AbiD2H2F (ORCPT + 99 others); Fri, 29 Apr 2022 03:28:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355119AbiD2H2D (ORCPT ); Fri, 29 Apr 2022 03:28:03 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D6ADC0E51; Fri, 29 Apr 2022 00:24:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651217086; x=1682753086; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=ObAI3uSVJQpJfxcI/lo9gv1WvMtRkRU/0lnrcHPlkD4=; b=LPYuZkn9TGiLSGWw4rrL3Zu8NKBT0gO4we7rIvZrVXYzy3GiFooI96YJ n1gWzItmEst05SbIGOlyemoBKaRDb+06pCWdzL5XCkraH3xlvzDoOKyO0 8e17UQ7FKaeyfredYSXpvAUk+ySnnq3+aPUHNI4QRJK2ccbxM1h6gRDF1 4uvfMx75Fq5hsQtnSCbQLy8yqEBZ+GUD2ls76fvyde0IHubHQejFIddX4 OAoty5f8Pt3lP1Eo52FOFx96wAYLAeSACHshty630vH6C+10SbNB6vhy1 Wiyvrw2T+/Plb0dBq2gNaEDx8ND7ynzRy0UbGGrYzw67O8EapVz8B01rb g==; X-IronPort-AV: E=McAfee;i="6400,9594,10331"; a="253944151" X-IronPort-AV: E=Sophos;i="5.91,297,1647327600"; d="scan'208";a="253944151" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 00:24:45 -0700 X-IronPort-AV: E=Sophos;i="5.91,297,1647327600"; d="scan'208";a="581956923" Received: from jenegret-mobl2.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.254.59.236]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 00:24:42 -0700 Message-ID: <695f319e637e7afb33f228a230566f0c671e3a03.camel@intel.com> Subject: Re: [PATCH v3 12/21] x86/virt/tdx: Create TDMRs to cover all system RAM From: Kai Huang To: Dave Hansen , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com Date: Fri, 29 Apr 2022 19:24:40 +1200 In-Reply-To: References: <6cc984d5c23e06c9c87b4c7342758b29f8c8c022.1649219184.git.kai.huang@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2022-04-28 at 09:22 -0700, Dave Hansen wrote: > On 4/5/22 21:49, Kai Huang wrote: > > The kernel configures TDX usable memory regions to the TDX module via > > an array of "TD Memory Region" (TDMR). > > One bit of language that's repeated in these changelogs that I don't > like is "configure ... to". I think that's a misuse of the word > configure. I'd say something more like: > > The kernel configures TDX-usable memory regions by passing an > array of "TD Memory Regions" (TDMRs) to the TDX module. > > Could you please take a look over this series and reword those? Thanks will do. > > > Each TDMR entry (TDMR_INFO) > > contains the information of the base/size of a memory region, the > > base/size of the associated Physical Address Metadata Table (PAMT) and > > a list of reserved areas in the region. > > > > Create a number of TDMRs according to the verified e820 RAM entries. > > As the first step only set up the base/size information for each TDMR. > > > > TDMR must be 1G aligned and the size must be in 1G granularity. This > > ^ Each OK. > > > implies that one TDMR could cover multiple e820 RAM entries. If a RAM > > entry spans the 1GB boundary and the former part is already covered by > > the previous TDMR, just create a new TDMR for the latter part. > > > > TDX only supports a limited number of TDMRs (currently 64). Abort the > > TDMR construction process when the number of TDMRs exceeds this > > limitation. > > ... and what does this *MEAN*? Is TDX disabled? Does it throw away the > RAM? Does it eat puppies? How about: TDX only supports a limited number of TDMRs. Simply return error when the number of TDMRs exceeds the limitation. TDX is disabled in this case. > > > arch/x86/virt/vmx/tdx/tdx.c | 138 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 138 insertions(+) > > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index 6b0c51aaa7f2..82534e70df96 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -54,6 +54,18 @@ > > ((u32)(((_keyid_part) & 0xffffffffull) + 1)) > > #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32)) > > > > +/* TDMR must be 1gb aligned */ > > +#define TDMR_ALIGNMENT BIT_ULL(30) > > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) > > + > > +/* Align up and down the address to TDMR boundary */ > > +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) > > +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) > > + > > +/* TDMR's start and end address */ > > +#define TDMR_START(_tdmr) ((_tdmr)->base) > > +#define TDMR_END(_tdmr) ((_tdmr)->base + (_tdmr)->size) > > Make these 'static inline's please. #defines are only for constants or > things that can't use real functions. OK. > > > /* > > * TDX module status during initialization > > */ > > @@ -813,6 +825,44 @@ static int e820_check_against_cmrs(void) > > return 0; > > } > > > > +/* The starting offset of reserved areas within TDMR_INFO */ > > +#define TDMR_RSVD_START 64 > > ^ extra whitespace Will remove. > > > +static struct tdmr_info *__alloc_tdmr(void) > > +{ > > + int tdmr_sz; > > + > > + /* > > + * TDMR_INFO's actual size depends on maximum number of reserved > > + * areas that one TDMR supports. > > + */ > > + tdmr_sz = TDMR_RSVD_START + tdx_sysinfo.max_reserved_per_tdmr * > > + sizeof(struct tdmr_reserved_area); > > You have a structure for this. I know this because it's the return type > of the function. You have TDMR_RSVD_START available via the structure > itself. So, derive that 64 either via: > > sizeof(struct tdmr_info) > > or, > > offsetof(struct tdmr_info, reserved_areas); > > Which would make things look like this: > > tdmr_base_sz = sizeof(struct tdmr_info); > tdmr_reserved_area_sz = sizeof(struct tdmr_reserved_area) * > tdx_sysinfo.max_reserved_per_tdmr; > > tdmr_sz = tdmr_base_sz + tdmr_reserved_area_sz; > > Could you explain why on earth you felt the need for the TDMR_RSVD_START > #define? Will use sizeof (struct tdmr_info). Thanks for the tip. > > > + /* > > + * TDX requires TDMR_INFO to be 512 aligned. Always align up > > Again, 512 what? 512 pages? 512 hippos? Will change to 512-byte aligned. > > > + * TDMR_INFO size to 512 so the memory allocated via kzalloc() > > + * can meet the alignment requirement. > > + */ > > + tdmr_sz = ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); > > + > > + return kzalloc(tdmr_sz, GFP_KERNEL); > > +} > > + > > +/* Create a new TDMR at given index in the TDMR array */ > > +static struct tdmr_info *alloc_tdmr(struct tdmr_info **tdmr_array, int idx) > > +{ > > + struct tdmr_info *tdmr; > > + > > + if (WARN_ON_ONCE(tdmr_array[idx])) > > + return NULL; > > + > > + tdmr = __alloc_tdmr(); > > + tdmr_array[idx] = tdmr; > > + > > + return tdmr; > > +} > > + > > static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num) > > { > > int i; > > @@ -826,6 +876,89 @@ static void free_tdmrs(struct tdmr_info **tdmr_array, int tdmr_num) > > } > > } > > > > +/* > > + * Create TDMRs to cover all RAM entries in e820_table. The created > > + * TDMRs are saved to @tdmr_array and @tdmr_num is set to the actual > > + * number of TDMRs. All entries in @tdmr_array must be initially NULL. > > + */ > > +static int create_tdmrs(struct tdmr_info **tdmr_array, int *tdmr_num) > > +{ > > + struct tdmr_info *tdmr; > > + u64 start, end; > > + int i, tdmr_idx; > > + int ret = 0; > > + > > + tdmr_idx = 0; > > + tdmr = alloc_tdmr(tdmr_array, 0); > > + if (!tdmr) > > + return -ENOMEM; > > + /* > > + * Loop over all RAM entries in e820 and create TDMRs to cover > > + * them. To keep it simple, always try to use one TDMR to cover > > + * one RAM entry. > > + */ > > + e820_for_each_mem(i, start, end) { > > + start = TDMR_ALIGN_DOWN(start); > > + end = TDMR_ALIGN_UP(end); > ^ vertically align those ='s, please. OK. > > > > + /* > > + * If the current TDMR's size hasn't been initialized, it > > + * is a new allocated TDMR to cover the new RAM entry. > > + * Otherwise the current TDMR already covers the previous > > + * RAM entry. In the latter case, check whether the > > + * current RAM entry has been fully or partially covered > > + * by the current TDMR, since TDMR is 1G aligned. > > + */ > > + if (tdmr->size) { > > + /* > > + * Loop to next RAM entry if the current entry > > + * is already fully covered by the current TDMR. > > + */ > > + if (end <= TDMR_END(tdmr)) > > + continue; > > This loop is actually pretty well commented and looks OK. The > TDMR_END() construct even adds to readability. *BUT*, the > > > + /* > > + * If part of current RAM entry has already been > > + * covered by current TDMR, skip the already > > + * covered part. > > + */ > > + if (start < TDMR_END(tdmr)) > > + start = TDMR_END(tdmr); > > + > > + /* > > + * Create a new TDMR to cover the current RAM > > + * entry, or the remaining part of it. > > + */ > > + tdmr_idx++; > > + if (tdmr_idx >= tdx_sysinfo.max_tdmrs) { > > + ret = -E2BIG; > > + goto err; > > + } > > + tdmr = alloc_tdmr(tdmr_array, tdmr_idx); > > + if (!tdmr) { > > + ret = -ENOMEM; > > + goto err; > > + } > > This is a bit verbose for this loop. Why not just hide the 'max_tdmrs' > inside the alloc_tdmr() function? That will make this loop smaller and > easier to read. Based on suggestion, I'll change to use alloc_pages_exact() to allocate those TDMRs at once, so no need to allocate for each TDMR again here. I'll remove the alloc_tdmr() but keep the max_tdmrs check here. -- Thanks, -Kai