Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4458477ioa; Wed, 27 Apr 2022 04:27:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzzdOKziwbDW4oggGRpFgHdWDmBMat45e/VGn/3YYtTR4qE0e+wVirW43M7v0enIDB420ZA X-Received: by 2002:a17:90b:1c8f:b0:1b8:c6dc:ca61 with SMTP id oo15-20020a17090b1c8f00b001b8c6dcca61mr32236724pjb.13.1651058832536; Wed, 27 Apr 2022 04:27:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651058832; cv=none; d=google.com; s=arc-20160816; b=HUJEhr4ul1HUzzxkc9FEU6HKJuZRpi5Tqnb4stNGAZ1k5fvVznlSJoFroxJUNhXV0Q NTcWVyy9V7PIGectFj/odmfDUSMBMnm7ZyOpEDnFG/5XhgbIKHkPR0/wmDO83a5D4FnF O2rC8giYQGEFweJbD0LR4CEonn78HIiMFsMWniX03jyz7FBiBgDrY+A+CJm9wBHY1Vh2 TbCYv1GWfyAJ++6UZX3ZAs+fL8lAA+uz/bHoHALkbpfd3T+7zfU7NcvCDVc3sRSrrWPq WDcyuY/+EiSbry49kv3GKzQHDTO7ctVXSK3fTDdH965fJXL6JOXkXpfrlzBCAs/PlD5V rDUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=H8hSKKw6Fp039x14eYSBVEIzzNsbUVIT2aUxfRYI1gg=; b=NN49CRD9/ku6LJUKJvUmYchD/dLjx8Q8YmYksVCa74k+yC+HVSIPi0p3RDwgsvW7j1 kRcLSEXX0ZOk6xYDFAPvvyvr/STtbPUyKZ0w0CDC8WtGAVAU7ALfiKGangAlYBjtsNt2 lP7VCA8T7Ic4OD1sX5YrsBa7ggI91GLQl7yTr/XGM14RH91WaPo2NE/OtLHd/nnv9bWc bZes/9WhENNXpjBh5oDOt1j8jOXgyXor0ccF/N8xuyDpMxa7ROuR6CnZ2u+EpjtVON6F I6jmuayavfx+WZN930ILmrr+WOn6YdVgo1xvCFsd5e7lzhkEfr16fVaOZID4xCOF3uOB vSog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KULwEPVq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id k8-20020a636f08000000b003ab0e9b4af6si1113594pgc.865.2022.04.27.04.27.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 04:27:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KULwEPVq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9385E37D0E9; Wed, 27 Apr 2022 03:24:31 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355150AbiDZUyJ (ORCPT + 99 others); Tue, 26 Apr 2022 16:54:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355181AbiDZUyD (ORCPT ); Tue, 26 Apr 2022 16:54:03 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D6C83205C; Tue, 26 Apr 2022 13:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651006254; x=1682542254; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=A7wf9ZwJ2Sqa+x3tSjwAOOXJmNTwe4K4AydPvl3jW+4=; b=KULwEPVqkjk7sGxbqtU8ewqxfIWz8ygXzgBIj7gLkQk/KEuh4VZ6V2Ae +UkjidiKfJjxkULPQFkl4K3n/1koxiuyUKcKf7/jYOmIiSfXfyGBrdbRQ ibr5ZgIgmOpFIkZXIOFrZ5pIZl+C+2huzCEAymdlwhj+A47u9QPDcZazS OK7iveT17F5YeAb27FjKGO+6xHHcvqyQ3GctKe9tvTJv3tdGO7Nu5SU1l PEL+9gev6b0oT8WRYw8qWJBwBMVPTCZOWtTL+XXMxMMcEjhsKN2KlwJPC tP+n4UKowlwHLzyezAjBSiDu9reUuxxBd2j8cMdLUxqbT07mOZyzX3R41 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10329"; a="265524986" X-IronPort-AV: E=Sophos;i="5.90,292,1643702400"; d="scan'208";a="265524986" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2022 13:50:53 -0700 X-IronPort-AV: E=Sophos;i="5.90,292,1643702400"; d="scan'208";a="580152010" Received: from dsocek-mobl2.amr.corp.intel.com (HELO [10.212.69.119]) ([10.212.69.119]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2022 13:50:52 -0700 Message-ID: Date: Tue, 26 Apr 2022 13:53:41 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand Content-Language: en-US To: Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com References: <32dcf4c7acc95244a391458d79cd6907125c5c29.1649219184.git.kai.huang@intel.com> From: Dave Hansen In-Reply-To: <32dcf4c7acc95244a391458d79cd6907125c5c29.1649219184.git.kai.huang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/5/22 21:49, Kai Huang wrote: > The TDX module is essentially a CPU-attested software module running > in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious > host and certain physical attacks. The TDX module implements the > functions to build, tear down and start execution of the protected VMs > called Trusted Domains (TD). Before the TDX module can be used to > create and run TD guests, it must be loaded into the SEAM Range Register > (SEAMRR) and properly initialized. The module isn't loaded into a register, right? It's loaded into a memory area pointed to *by* the register. > The TDX module is expected to be > loaded by BIOS before booting to the kernel, and the kernel is expected > to detect and initialize it, using the SEAMCALLs defined by TDX > architecture. Wait a sec... So, what was all this gobleygook about TDX module loading and SEAMRR's if the kernel just has the TDX module *handed* to it already loaded? It looks to me like you wrote all of this before the TDX module was being loaded by the BIOS and neglected to go and update these changelogs. > The TDX module can be initialized only once in its lifetime. Instead > of always initializing it at boot time, this implementation chooses an > on-demand approach to initialize TDX until there is a real need (e.g > when requested by KVM). This avoids consuming the memory that must be > allocated by kernel and given to the TDX module as metadata (~1/256th of > the TDX-usable memory), and also saves the time of initializing the TDX > module (and the metadata) when TDX is not used at all. Initializing the > TDX module at runtime on-demand also is more flexible to support TDX > module runtime updating in the future (after updating the TDX module, it > needs to be initialized again). > > Introduce two placeholders tdx_detect() and tdx_init() to detect and > initialize the TDX module on demand, with a state machine introduced to > orchestrate the entire process (in case of multiple callers). > > To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs. The > TDX module is reported as not loaded if either SEAMRR is not enabled, or > there are no enough TDX private KeyIDs to create any TD guest. The TDX > module itself requires one global TDX private KeyID to crypto protect > its metadata. This is stepping over the line into telling me what the code does instead of why. > And tdx_init() is currently empty. The TDX module will be initialized > in multi-steps defined by the TDX architecture: > > 1) Global initialization; > 2) Logical-CPU scope initialization; > 3) Enumerate the TDX module capabilities and platform configuration; > 4) Configure the TDX module about usable memory ranges and global > KeyID information; > 5) Package-scope configuration for the global KeyID; > 6) Initialize usable memory ranges based on 4). > > The TDX module can also be shut down at any time during its lifetime. > In case of any error during the initialization process, shut down the > module. It's pointless to leave the module in any intermediate state > during the initialization. > > SEAMCALL requires SEAMRR being enabled and CPU being already in VMX > operation (VMXON has been done), otherwise it generates #UD. So far > only KVM handles VMXON/VMXOFF. Choose to not handle VMXON/VMXOFF in > tdx_detect() and tdx_init() but depend on the caller to guarantee that, > since so far KVM is the only user of TDX. In the long term, more kernel > components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX > module runtime update), so a reference-based approach to do VMXON/VMXOFF > is likely needed. > > Signed-off-by: Kai Huang > --- > arch/x86/include/asm/tdx.h | 4 + > arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 226 insertions(+) > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index 1f29813b1646..c8af2ba6bb8a 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, > > #ifdef CONFIG_INTEL_TDX_HOST > void tdx_detect_cpu(struct cpuinfo_x86 *c); > +int tdx_detect(void); > +int tdx_init(void); > #else > static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { } > +static inline int tdx_detect(void) { return -ENODEV; } > +static inline int tdx_init(void) { return -ENODEV; } > #endif /* CONFIG_INTEL_TDX_HOST */ > > #endif /* !__ASSEMBLY__ */ > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index ba2210001ea8..53093d4ad458 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -9,6 +9,8 @@ > > #include > #include > +#include > +#include > #include > #include > #include > @@ -45,12 +47,33 @@ > ((u32)(((_keyid_part) & 0xffffffffull) + 1)) > #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32)) > > +/* > + * TDX module status during initialization > + */ > +enum tdx_module_status_t { > + /* TDX module status is unknown */ > + TDX_MODULE_UNKNOWN, > + /* TDX module is not loaded */ > + TDX_MODULE_NONE, > + /* TDX module is loaded, but not initialized */ > + TDX_MODULE_LOADED, > + /* TDX module is fully initialized */ > + TDX_MODULE_INITIALIZED, > + /* TDX module is shutdown due to error during initialization */ > + TDX_MODULE_SHUTDOWN, > +}; > + > /* BIOS must configure SEAMRR registers for all cores consistently */ > static u64 seamrr_base, seamrr_mask; > > static u32 tdx_keyid_start; > static u32 tdx_keyid_num; > > +static enum tdx_module_status_t tdx_module_status; > + > +/* Prevent concurrent attempts on TDX detection and initialization */ > +static DEFINE_MUTEX(tdx_module_lock); > + > static bool __seamrr_enabled(void) > { > return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS; > @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c) > detect_seam(c); > detect_tdx_keyids(c); > } > + > +static bool seamrr_enabled(void) > +{ > + /* > + * To detect any BIOS misconfiguration among cores, all logical > + * cpus must have been brought up at least once. This is true > + * unless 'maxcpus' kernel command line is used to limit the > + * number of cpus to be brought up during boot time. However > + * 'maxcpus' is basically an invalid operation mode due to the > + * MCE broadcast problem, and it should not be used on a TDX > + * capable machine. Just do paranoid check here and do not > + * report SEAMRR as enabled in this case. > + */ > + if (!cpumask_equal(&cpus_booted_once_mask, > + cpu_present_mask)) > + return false; > + > + return __seamrr_enabled(); > +} > + > +static bool tdx_keyid_sufficient(void) > +{ > + if (!cpumask_equal(&cpus_booted_once_mask, > + cpu_present_mask)) > + return false; I'd move this cpumask_equal() to a helper. > + /* > + * TDX requires at least two KeyIDs: one global KeyID to > + * protect the metadata of the TDX module and one or more > + * KeyIDs to run TD guests. > + */ > + return tdx_keyid_num >= 2; > +} > + > +static int __tdx_detect(void) > +{ > + /* The TDX module is not loaded if SEAMRR is disabled */ > + if (!seamrr_enabled()) { > + pr_info("SEAMRR not enabled.\n"); > + goto no_tdx_module; > + } Why even bother with the SEAMRR stuff? It sounded like you can "ping" the module with SEAMCALL. Why not just use that directly? > + /* > + * Also do not report the TDX module as loaded if there's > + * no enough TDX private KeyIDs to run any TD guests. > + */ > + if (!tdx_keyid_sufficient()) { > + pr_info("Number of TDX private KeyIDs too small: %u.\n", > + tdx_keyid_num); > + goto no_tdx_module; > + } > + > + /* Return -ENODEV until the TDX module is detected */ > +no_tdx_module: > + tdx_module_status = TDX_MODULE_NONE; > + return -ENODEV; > +} > + > +static int init_tdx_module(void) > +{ > + /* > + * Return -EFAULT until all steps of TDX module > + * initialization are done. > + */ > + return -EFAULT; > +} > + > +static void shutdown_tdx_module(void) > +{ > + /* TODO: Shut down the TDX module */ > + tdx_module_status = TDX_MODULE_SHUTDOWN; > +} > + > +static int __tdx_init(void) > +{ > + int ret; > + > + /* > + * Logical-cpu scope initialization requires calling one SEAMCALL > + * on all logical cpus enabled by BIOS. Shutting down the TDX > + * module also has such requirement. Further more, configuring > + * the key of the global KeyID requires calling one SEAMCALL for > + * each package. For simplicity, disable CPU hotplug in the whole > + * initialization process. > + * > + * It's perhaps better to check whether all BIOS-enabled cpus are > + * online before starting initializing, and return early if not. But you did some of this cpumask checking above. Right? > + * But none of 'possible', 'present' and 'online' CPU masks > + * represents BIOS-enabled cpus. For example, 'possible' mask is > + * impacted by 'nr_cpus' or 'possible_cpus' kernel command line. > + * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are > + * online. > + */ > + cpus_read_lock(); > + > + ret = init_tdx_module(); > + > + /* > + * Shut down the TDX module in case of any error during the > + * initialization process. It's meaningless to leave the TDX > + * module in any middle state of the initialization process. > + */ > + if (ret) > + shutdown_tdx_module(); > + > + cpus_read_unlock(); > + > + return ret; > +} > + > +/** > + * tdx_detect - Detect whether the TDX module has been loaded > + * > + * Detect whether the TDX module has been loaded and ready for > + * initialization. Only call this function when all cpus are > + * already in VMX operation. > + * > + * This function can be called in parallel by multiple callers. > + * > + * Return: > + * > + * * -0: The TDX module has been loaded and ready for > + * initialization. "-0", eh? > + * * -ENODEV: The TDX module is not loaded. > + * * -EPERM: CPU is not in VMX operation. > + * * -EFAULT: Other internal fatal errors. > + */ > +int tdx_detect(void) > +{ > + int ret; > + > + mutex_lock(&tdx_module_lock); > + > + switch (tdx_module_status) { > + case TDX_MODULE_UNKNOWN: > + ret = __tdx_detect(); > + break; > + case TDX_MODULE_NONE: > + ret = -ENODEV; > + break; > + case TDX_MODULE_LOADED: > + case TDX_MODULE_INITIALIZED: > + ret = 0; > + break; > + case TDX_MODULE_SHUTDOWN: > + ret = -EFAULT; > + break; > + default: > + WARN_ON(1); > + ret = -EFAULT; > + } > + > + mutex_unlock(&tdx_module_lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(tdx_detect); > + > +/** > + * tdx_init - Initialize the TDX module > + * > + * Initialize the TDX module to make it ready to run TD guests. This > + * function should be called after tdx_detect() returns successful. > + * Only call this function when all cpus are online and are in VMX > + * operation. CPU hotplug is temporarily disabled internally. > + * > + * This function can be called in parallel by multiple callers. > + * > + * Return: > + * > + * * -0: The TDX module has been successfully initialized. > + * * -ENODEV: The TDX module is not loaded. > + * * -EPERM: The CPU which does SEAMCALL is not in VMX operation. > + * * -EFAULT: Other internal fatal errors. > + */ > +int tdx_init(void) > +{ > + int ret; > + > + mutex_lock(&tdx_module_lock); > + > + switch (tdx_module_status) { > + case TDX_MODULE_NONE: > + ret = -ENODEV; > + break; > + case TDX_MODULE_LOADED: > + ret = __tdx_init(); > + break; > + case TDX_MODULE_INITIALIZED: > + ret = 0; > + break; > + default: > + ret = -EFAULT; > + break; > + } > + mutex_unlock(&tdx_module_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(tdx_init); Why does this need both a tdx_detect() and a tdx_init()? Shouldn't the interface from outside just be "get TDX up and running, please?"