Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp248169iob; Thu, 28 Apr 2022 01:11:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzcMMKNAWtOYS8frKWuQHR+I0uZceXjvQ5ApJjLW/vxr2VwGWPWREed3UCyr8xNT08WxtV5 X-Received: by 2002:a17:902:7006:b0:156:3cbe:6b04 with SMTP id y6-20020a170902700600b001563cbe6b04mr32079797plk.68.1651133514513; Thu, 28 Apr 2022 01:11:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651133514; cv=none; d=google.com; s=arc-20160816; b=vVRjUIK9nhDSJoFY1lw96wHz5lpsLIQDVOaF2g2EFoHEXnpS35j3awywyschRiihua HMEII940DsHEfiO3MXB2q2n1LK3UwENOrZs1exIUDFkfIn5N74HqjIDyf2wg400rPJ8R 7Z7vimu9Zgspu+y30gUwtSZTZlIorhYTs0vYPMLqLD9C4oFXF6EE+sWKgEb+5AUZ2v0W nUdHcfHetlahXKbos0pKbLS6YIZkkwJbTDGU+czQQ6ybdrB5CnY3emhbEbgT3uOd9hvn no7V/tNg55sfKglPRkjuOzgJWfAe6rACzlcEhfPdQjnIGWcS2M7e7DiZB2VLkfngCyW2 1b1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=272YYN3ARnLFkaLD8oH0DmBMlESSIG2B6IEZeMe/X98=; b=nnVlL3JYY8Dh3RuTCGFNkq4v6diZnQkULxCMGbsgeQhqRxv8JqBweB4/CEvdiQC4SY VL7ZN1EvvSWr+Y/tyKxhlAFjVYdggLivVGWIhgbVEsPNy1vpcwD3m2pvaA/YKYVou1sH DY9zOiHKCrE6Z9pTf0kIuvjQDvfnSnssyzN6x3HzQ4KK6zdg0ddAAYC88kmCzDmax+Ib 5seYho27PSd1WOkjzX6dIJLme1Lsyyym8984vGVz+fwHRKBVwr+SDtfZ0K/+755IFyLY gkgN0GRZ5OWeN5LIRTnc8BIGcqdLBQzhXATd7V4/eFFNELb6e7o8D7ExfHT2QdMnJPhg wfGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=citxVuba; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kb18-20020a17090ae7d200b001cba62482efsi4166590pjb.96.2022.04.28.01.11.37; Thu, 28 Apr 2022 01:11:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=citxVuba; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240849AbiD1BjF (ORCPT + 99 others); Wed, 27 Apr 2022 21:39:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240811AbiD1BjA (ORCPT ); Wed, 27 Apr 2022 21:39:00 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C1854198D; Wed, 27 Apr 2022 18:35:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651109747; x=1682645747; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=cgA2i8n+gUQ0bO6YPVXx2hXtOlEvor0nj1EPo8UtNcM=; b=citxVubaE2sM4T6rNjCFeyT+74ZIhlKVEVHO6yWXATz5Wd7cZJm5nbH4 vVWFr2y8y7lthEuzUBQEpuFis+D+joATgiImLVFqIBY7zJdt8e2YjA58V 8peQJVzn1YFMtCaTk8YGKrm3glSLxI1+Ic+YWXgFTHI8lKDsauiKnk8gs +Tdg2GtkEPUmL0FZoEwP7f2chaFFwGcp2LSL+tcNETcNPsQ+OdrbziPe4 yjlzQnzIzY8tvDvP+HOaWRlgGpYNobnGYVfzLVn8lUAkC3bZ4qAlWsWQh 28rIJumF4lQ6nldrU63h49UjBHKTIJMJhVo1UEzMFEqBRQbuohSmX/dpy w==; X-IronPort-AV: E=McAfee;i="6400,9594,10330"; a="253508420" X-IronPort-AV: E=Sophos;i="5.90,294,1643702400"; d="scan'208";a="253508420" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2022 18:35:46 -0700 X-IronPort-AV: E=Sophos;i="5.90,294,1643702400"; d="scan'208";a="808324382" Received: from rrnambia-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.254.60.78]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2022 18:35:43 -0700 Message-ID: Subject: Re: [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory From: Kai Huang To: Dave Hansen , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com Date: Thu, 28 Apr 2022 13:35:41 +1200 In-Reply-To: <1624e839-81e5-7bc7-533b-c5c838d35f47@intel.com> References: <6230ef28be8c360ab326c8f592acf1964ac065c1.1649219184.git.kai.huang@intel.com> <228cfa7e5326fa378c1dde2b5e9022146f97b706.camel@intel.com> <1624e839-81e5-7bc7-533b-c5c838d35f47@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2022-04-27 at 18:07 -0700, Dave Hansen wrote: > On 4/27/22 17:53, Kai Huang wrote: > > On Wed, 2022-04-27 at 15:24 -0700, Dave Hansen wrote: > > > On 4/5/22 21:49, Kai Huang wrote: > > > > TDX provides increased levels of memory confidentiality and integrity. > > > > This requires special hardware support for features like memory > > > > encryption and storage of memory integrity checksums. Not all memory > > > > satisfies these requirements. > > > > > > > > As a result, TDX introduced the concept of a "Convertible Memory Region" > > > > (CMR). During boot, the firmware builds a list of all of the memory > > > > ranges which can provide the TDX security guarantees. The list of these > > > > ranges, along with TDX module information, is available to the kernel by > > > > querying the TDX module. > > > > > > > > In order to provide crypto protection to TD guests, the TDX architecture > > > > > > There's that "crypto protection" thing again. I'm not really a fan of > > > the changes made to this changelog since I wrote it. :) > > > > Sorry about that. I'll remove "In order to provide crypto protection to TD > > guests". > > Seriously, though. I took the effort to write these changelogs for you. > They were fine. I'm not stoked about needing to proofread them again. Yeah pretty clear to me now. Really thanks for your time. Won't happen again. If there's something I feel not right, I'll raise but not slightly change. > > > > > also needs additional metadata to record things like which TD guest > > > > "owns" a given page of memory. This metadata essentially serves as the > > > > 'struct page' for the TDX module. The space for this metadata is not > > > > reserved by the hardware upfront and must be allocated by the kernel > > > > > > ^ "up front" > > > > Thanks will change to "up front". > > > > Btw, the gmail grammar check gives me a red line if I use "up front", but it > > doesn't complain "upfront". > > I'm pretty sure it's wrong. "up front" is an adverb that applies to > "reserved". "Upfront" is an adjective and not how you used it in that > sentence. Thanks for explaining. Anyway the gmail grammar can have bug. > > > > > + * allocated individually within construct_tdmrs() to meet > > > > + * this requirement. > > > > + */ > > > > + tdmr_array = kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tdmr_info *), > > > > + GFP_KERNEL); > > > > > > Where, exactly is that alignment provided? A 'struct tdmr_info *' is 8 > > > bytes so a tdx_sysinfo.max_tdmrs=8 kcalloc() would only guarantee > > > 64-byte alignment. > > > > The entries in the array only contain a pointer to TDMR_INFO. The actual > > TDMR_INFO is allocated separately. The array itself is never used by TDX > > hardware so it doesn't matter. We just need to guarantee each TDMR_INFO is > > 512B-byte aligned. > > The comment was clear as mud about this. If you're going to talk about > alignment, then do it near the allocation that guarantees the alignment, > not in some other function near *ANOTHER* allocation. > > Also, considering that you're about to go allocate potentially gigabytes > of physically contiguous memory, it seems laughable that you'd go to any > trouble at all to allocate an array of pointers here. Why not just > > kcalloc(tdx_sysinfo.max_tdmrs, sizeof(struct tmdr_info), ...); kmalloc() guarantees the size-alignment if the size is power-of-two. TDMR_INFO (512-bytes) itself is power of two, but the 'max_tdmrs x sizeof(TDMR_INFO)' may not be power of two. For instance, when max_tdmrs == 3, the result is not power-of-two. Or am I wrong? I am not good at math though. > > Or, heck, just vmalloc() the dang thing. Why even bother with the array > of pointers? > > > > > > + if (!tdmr_array) { > > > > + ret = -ENOMEM; > > > > + goto out; > > > > + } > > > > + > > > > + /* Construct TDMRs to build TDX memory */ > > > > + ret = construct_tdmrs(tdmr_array, &tdmr_num); > > > > + if (ret) > > > > + goto out_free_tdmrs; > > > > + > > > > /* > > > > * Return -EFAULT until all steps of TDX module > > > > * initialization are done. > > > > */ > > > > ret = -EFAULT; > > > > > > There's the -EFAULT again. I'd replace these with a better error code. > > > > I couldn't think out a better error code. -EINVAL looks doesn't suit. -EAGAIN > > also doesn't make sense for now since we always shutdown the TDX module in case > > of any error so caller should never retry. I think we need some error code to > > tell "the job isn't done yet". Perhaps -EBUSY? > > Is this going to retry if it sees -EFAULT or -EBUSY? No. Currently we always shutdown the module in case of any error. Caller won't be able to retry. In the future, this can be optimized. We don't shutdown the module in case of *some* error (i.e. -ENOMEM), but record an internal state when error happened, so the caller can retry again. For now, there's no retry. -- Thanks, -Kai