Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp1115834rdb; Fri, 1 Dec 2023 07:27:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IEuYtWJdL/r+/sobe1daiEoLDTUNylQYErgpLUM3fFlHjP9Xn+z5vfMhgqwSU8h1pjaXPQI X-Received: by 2002:a17:903:643:b0:1cf:6453:b23a with SMTP id kh3-20020a170903064300b001cf6453b23amr20746514plb.53.1701444470851; Fri, 01 Dec 2023 07:27:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701444470; cv=none; d=google.com; s=arc-20160816; b=oCroehC92fjQyyfpadZ/lLkQXBeIP7jcceUthwFOY+aohn2VtNP7hQUYzcgeqBbe+Y EI14C2IgqRSESFw5rGy59s+b96xcOzlXP7TF6fSIBodxm00sIYEH89ZE7iuCoeiOyrNt qOEO3OVI+QpOyOOlYTGEHmctmB/7O75hEs8OGKROZLOJJltJM6cTb13uHWUTmfXHlD+G 3ntmUWPQUfKssRgzcbazp61WiZDZOLFgRRlTap0QaVMsYHIptTWM0wmNWAsQvs5ipZ56 MYaHJWXswHFqRnJcIodHO6aNoB1ITqmJKsRAAJ0lJVGnCH+Vtcqbj2ZMXGfGPv+6ZbcR Ox5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-filter; bh=ugsgDNc54VmuVqxUqJuTKbnR7XplRz9Jd0SR1rz/60Y=; fh=8khAWLXZySrNxaaXul1HcayZYvWCtZAQ4RUI43h4P34=; b=kTBmzPF9ls129pSUc9VlezTneXcjA8yBo8ZAjdMdmRPdHeFkTMb7fvG/7MQVrK2Rj2 FwQhpoRFwjJx73ZqNfV5l2hd2FHbJdOlQ9P8mv8wzEF0IwytmMf4tgd5uhi+b+Uf3VBK 09iu0BRULdlglVanUSFHlpdHkkYpaEhsbKaoR21HjG+EVi8mPJPAoKkjYtrFYap7ZhWe 4WubB/bh5/R/OTK4dww00yjS3oXswHrco6gMMR7msaq9mQyNLYDd9NK7Qn4dnVT6QynF tImt44bXtILYW55/B1Kz3QFaoqJY0ZpwIvSptMEzE8kvT0iSEfja6TzTD2afpRyUwztB R88g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=iQzBBw2f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id r5-20020a632045000000b005bdbdd396eesi3524855pgm.633.2023.12.01.07.27.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 07:27:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=iQzBBw2f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 56931845651B; Fri, 1 Dec 2023 07:27:46 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379370AbjLAP12 (ORCPT + 99 others); Fri, 1 Dec 2023 10:27:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379376AbjLAP11 (ORCPT ); Fri, 1 Dec 2023 10:27:27 -0500 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ED92C1721; Fri, 1 Dec 2023 07:27:32 -0800 (PST) Received: from [192.168.1.150] (181-28-144-85.ftth.glasoperator.nl [85.144.28.181]) by linux.microsoft.com (Postfix) with ESMTPSA id E3CDF20B74C0; Fri, 1 Dec 2023 07:27:27 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com E3CDF20B74C0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1701444452; bh=ugsgDNc54VmuVqxUqJuTKbnR7XplRz9Jd0SR1rz/60Y=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=iQzBBw2fclqekxjOIWDiHYRhxqMwTQRhRB2Xi1gI82zYAaoAWs6JIut5AlB9MF0B5 /y1FjDFn8p2H9JbXAXQ0tZNWF0H/MKrD8DDyUO/zrsiJl4PPcO0gtjnzkKmwuDuscs 8qHPBESj9ynA0Y6Al4D3UryrNrD+JzS+w+3wJz3E= Message-ID: Date: Fri, 1 Dec 2023 16:27:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/3] x86/coco: Disable TDX module calls when TD partitioning is active To: "Huang, Kai" , "kirill.shutemov@linux.intel.com" Cc: "tim.gardner@canonical.com" , "cascardo@canonical.com" , "dave.hansen@linux.intel.com" , "thomas.lendacky@amd.com" , "roxana.nicolescu@canonical.com" , "haiyangz@microsoft.com" , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "stable@vger.kernel.org" , "tglx@linutronix.de" , "stefan.bader@canonical.com" , "Cui, Dexuan" , "nik.borisov@suse.com" , "mhkelley58@gmail.com" , "hpa@zytor.com" , "peterz@infradead.org" , "linux-hyperv@vger.kernel.org" , "wei.liu@kernel.org" , "bp@alien8.de" , "sashal@kernel.org" , "kys@microsoft.com" , "x86@kernel.org" References: <20231122170106.270266-1-jpiotrowski@linux.microsoft.com> <20231122170106.270266-2-jpiotrowski@linux.microsoft.com> <20231123141318.rmskhl3scc2a6muw@box.shutemov.name> <837fb5e9-4a35-4e49-8ec6-1fcfd5a0da30@linux.microsoft.com> Content-Language: en-US From: Jeremi Piotrowski In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 01 Dec 2023 07:27:46 -0800 (PST) On 29/11/2023 11:37, Huang, Kai wrote: > On Fri, 2023-11-24 at 11:38 +0100, Jeremi Piotrowski wrote: >> On 23/11/2023 15:13, Kirill A. Shutemov wrote: >>> On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote: >>>> Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module >>>> calls are available. When TD partitioning is enabled, a L1 TD VMM handles most >>>> TDX facilities and the kernel running as an L2 TD VM does not have access to >>>> TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded >>>> to the VMM for processing, which is the L1 TD VM in this case. >>> >> >> Correction: it turns out TDVMCALL(0) is handled by L0 VMM. >>>> > Some thoughts after checking the spec more to make sure we don't have > misunderstanding on each other: > > The TDX module will unconditionally exit to L1 for any TDCALL (except the > TDVMCALL) from the L2. This is expected behaviour. Because the L2 isn't a true > TDX guest, L1 is expected to inject a #UD or #GP or whatever error to L2 based > on the hardware spec to make sure L2 gets an correct architectural behaviour for > the TDCALL instruction. > > I believe this is also the reason you mentioned "L2 TD VM does not have access > to TDX module calls". Right. Injecting #UD/#GP/returning an error (?) might be desirable but the L2 guest would still not be guaranteed to be able to rely on the functionality provided by these TDCALLS. Here the TDCALLs lead to guest termination, but the kernel would panic if some of them would return an error. > > However TDX module actually allows the L1 to control whether the L2 is allowed > to execute TDVMCALL by controlling whether the TDVMCALL from L2 will exit to L0 > or L1. > > I believe you mentioned "TDVMCALL(0) is handled by L0 VMM" is because the L1 > hypervisor -- specifically, hyperv -- chooses to let the TDVMCALL from L2 exit > to L0? That is correct. The L1 hypervisor here (it's not hyperv, so maybe lets keep referring to it as paravisor?) enables ENABLE_TDVMCALL so that TDVMCALLs exit straight to L0. The TDVMCALLs are used for the I/O path which is not emulated or intercepted by the L1 hypervisor at all. > > But IMHO this is purely the hyperv's implementation, i.e., KVM can choose not to > do so, and simply handle TDVMCALL in the same way as it handles normal TDCALL -- > inject the architecture defined error to L2. > > Also AFAICT there's no architectural thing that controlled by L2 to allow the L1 > know whether L2 is expecting to use TDVMCALL or not. In other words, whether to > support TDVMCALL is purely L1 hypervisor implementation specific. > Right, the only way to know whether TDVMCALL/TDCALL is allowed is to identify the L1 hypervisor and use that knowledge. > So to me this whole series is hyperv specific enlightenment for the L2 running > on TDX guest hyperv L1. And because of that, perhaps a better way to do is: > > 1) The default L2 should just be a normal VM that any TDX guest L1 hypervisor > should be able to handle (guaranteed by the TDX partitioning architecture). > When you say "normal VM" you mean "legacy VM"? 'Any TDX guest L1 hypervisor' is a bit of a reach: the only TDX guest L1 hypervisor implementation that I know exists does not support guests that are entirely unaware of TDX. Maybe it's best if we avoid the name "TDX guest L1 hypervisor" altogether and refer to is like AMD calls it: "Secure VM Service Module" because that more accurately reflects the intention: providing certain targeted services needed in the context of a confidential VM. No one is interested in running a full blown hypervisor implementation in there. > 2) Different L2/L1 hypervisor can have it's own enlightenments. We can even > have common enlightenments across different implementation of L1 hypervisors, > but that requires cross-hypervisor cooperation. > > But IMHO it's not a good idea to say: > > L2 is running on a TDX partitioning enabled environment, let us mark it > as a TDX guest but mark it as "TDX partitioning" to disable couple of  > TDX functionalities. > > Instead, perhaps it's better to let L2 explicitly opt-in TDX facilities that the > underneath hypervisor supports.> > TDVMCALL can be the first facility to begin with. > > At last, even TDVMCALL has bunch of leafs, and hypervisor can choose to support > them or not. Use a single "tdx_partitioning_active" to select what TDX > facilities are supported doesn't seem a good idea. > > That's my 2cents w/o knowing details of hyperv enlightenments. > I think on the whole we are on the same page. Let me rephrase what I hear you saying: 'tdx_partitioning_active' as a "catch all" is bad, but CC_ATTR_TDX_MODULE_CALLS is in the spirit of what we would like to have. So something like: case CC_ATTR_TDX_MODULE_CALLS: return tdx_status & TDCALL; and if (no_td_partitioning) tdx_status |= TDCALL; if (l1_td_vmm_supports_tdcalls) tdx_status |= TDCALL; would be ok? I can directly tell you that the next facility would control tdx_safe_halt() because that doesn't operate as intended (hlt traps to L1, tdx_safe_halt is a TDVMCALL and goes to L0). The other important goal of the patchset is ensuring that X86_FEATURE_TDX_GUEST is set.