Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp212973iob; Mon, 2 May 2022 17:23:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz+7GWSO2chJuI/Wn3GBZqUVXB6m0vlIiN0P0fbfTeiSwCtpdA6QwuBImxpY25VF/0aKQEy X-Received: by 2002:a05:6a00:a1e:b0:50d:bb0c:2e27 with SMTP id p30-20020a056a000a1e00b0050dbb0c2e27mr13578558pfh.49.1651537413613; Mon, 02 May 2022 17:23:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651537413; cv=none; d=google.com; s=arc-20160816; b=yMd2fOGMhG9nqcNCVmkeYrv/Lxad5nKLyCb5ENjE8NWbHKDJFL9iV0xxHYNBGbA/cj TyxVD5lr5ucXjthhypGhITlBJQcWH7aTc4iZhF1r9txWTILXNb/C/a1ma1fDKj26pJ7J 6FM62/D3Wck10vlfUdIWm9BNfZggqGUxoOh4911KFG0mFd45DHgpdepf3blORCMllhuh R4mAx9qJyOnNjQnqfCwK6zFFJSetcFjDlRErYYFozfJWnJtp5suZXIkMwNAj/FTbsKLV ROqctp1R20u9Rz8HD6QpIHq40EH+kfbiqxMWXwUWwknKHpnNs6em5/1h3Gcd2jn/9I5F gfBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=33f4fdzH8gJr7wjOCS5WsFkEvt3MMzh9tiSV8fEKmSk=; b=CBK4z9J+ECl93LQmcJUot+6ePjezdjI/N/CUh765qUDB5+h1LyN01xDc0FNB16Y7Fj 4Q/QmSynxY92amPUTHhqbWVbtCRk9MUe0gI1inUKkYJdt/qPnnp4B2lvGQXKB4yU5p81 +zdHobmJw0B/7e58Ssgbt1VKB/YKhA9UpfpefniePuqSBlDgDVZLyGknYxEH4OHIkAt4 T+BytrcLCnVRS4GuZn2yHxa8Y4lI3pIIee2mIJuN7vPCEO4ewVSC+zg2eKGVW/x6ADUX p2b4ce+hdZ5SODDyvSsQxRwIx2QhR4TTGG9Kxnnulz3ZvAUSnQDhZLOvBO1JwJ121UlI VW+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cvQWEBMK; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id x20-20020a17090a531400b001d931aa3f3fsi645685pjh.184.2022.05.02.17.23.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 17:23:33 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cvQWEBMK; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A71D38181; Mon, 2 May 2022 17:19:45 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344981AbiD2Bnp (ORCPT + 99 others); Thu, 28 Apr 2022 21:43:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235461AbiD2Bnh (ORCPT ); Thu, 28 Apr 2022 21:43:37 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FFBD289AA; Thu, 28 Apr 2022 18:40:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651196420; x=1682732420; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=GwJUH5+BhoGpf9qYL95LTGxqeFEmlinA0h7Bwlg2Htk=; b=cvQWEBMKjbrh3iTJOcbvzO0WQY8LHr+2YYH9vDQcjX4kN3DZowCqWCzz qFZ9sSm/ZPVZz9wZfim8cLNhz7bR8jx9rkR7oUdLIASE21k6Q8AATKtp6 K543SoAH4zJOvZmYs3ZO8ElJ5pWZUsS46Hz1L/BTq4c8362yEl6IA/z/+ hxDwsVo6yLRNMAirRluahxP5IekrTd7ZBU1K3ypo8RZ6l6LabMNj6EFqw 8FK0eAKuNqsuoM8uMf4M6lrT8QRs48MrcmO4ItiXolDQCbY0OZv9KtD7O /jdXNqKOTNC4K4YFOySuO1mW3NE4/riaDdY70HNwpE6FrJ2qmz/6rjTNG A==; X-IronPort-AV: E=McAfee;i="6400,9594,10331"; a="266647778" X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="266647778" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2022 18:40:19 -0700 X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="651528040" Received: from gshechtm-mobl.ger.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.254.60.191]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2022 18:40:16 -0700 Message-ID: Subject: Re: [PATCH v3 00/21] TDX host kernel support From: Kai Huang To: Dave Hansen , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com Date: Fri, 29 Apr 2022 13:40:13 +1200 In-Reply-To: References: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> <9b388f54f13b34fe684ef77603fc878952e48f87.camel@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote: > On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote: > > On 4/27/22 17:37, Kai Huang wrote: > > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote: > > > > In 5 years, if someone takes this code and runs it on Intel hardware > > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens? > > > > > > I thought we could document this in the documentation saying that this code can > > > only work on TDX machines that don't have above capabilities (SPR for now). We > > > can change the code and the documentation when we add the support of those > > > features in the future, and update the documentation. > > > > > > If 5 years later someone takes this code, he/she should take a look at the > > > documentation and figure out that he/she should choose a newer kernel if the > > > machine support those features. > > > > > > I'll think about design solutions if above doesn't look good for you. > > > > No, it doesn't look good to me. > > > > You can't just say: > > > > /* > > * This code will eat puppies if used on systems with hotplug. > > */ > > > > and merrily await the puppy bloodbath. > > > > If it's not compatible, then you have to *MAKE* it not compatible in a > > safe, controlled way. > > > > > > You can't just ignore the problems because they're not present on one > > > > version of the hardware. > > > > Please, please read this again ^^ > > OK. I'll think about solutions and come back later. > > Hi Dave, I think we have two approaches to handle memory hotplug interaction with the TDX module initialization. The first approach is simple. We just block memory from being added as system RAM managed by page allocator when the platform supports TDX [1]. It seems we can add some arch-specific-check to __add_memory_resource() and reject the new memory resource if platform supports TDX. __add_memory_resource() is called by both __add_memory() and add_memory_driver_managed() so it prevents from adding NVDIMM as system RAM and normal ACPI memory hotplug [2]. The second approach is relatively more complicated. Instead of directly rejecting the new memory resource in __add_memory_resource(), we check whether the memory resource can be added based on CMR and the TDX module initialization status. This is feasible as with the latest public P-SEAMLDR spec, we can get CMR from P-SEAMLDR SEAMCALL[3]. So we can detect P-SEAMLDR and get CMR info during kernel boots. And in __add_memory_resource() we do below check: tdx_init_disable(); /*similar to cpu_hotplug_disable() */ if (tdx_module_initialized()) // reject memory hotplug else if (new_memory_resource NOT in CMRs) // reject memory hotplug else allow memory hotplug tdx_init_enable(); /*similar to cpu_hotplug_enable() */ tdx_init_disable() temporarily disables TDX module initialization by trying to grab the mutex. If the TDX module initialization is already on going, then it waits until it completes. This should work better for future platforms, but would requires non-trivial more code as we need to add VMXON/VMXOFF support to the core-kernel to detect CMR using SEAMCALL. A side advantage is with VMXON in core-kernel we can shutdown the TDX module in kexec(). But for this series I think the second approach is overkill and we can choose to use the first simple approach? Any suggestions? [1] Platform supports TDX means SEAMRR is enabled, and there are at least 2 TDX keyIDs. Or we can just check SEAMRR is enabled, as in practice a SEAMRR is enabled means the machine is TDX-capable, and for now a TDX-capable machine doesn't support ACPI memory hotplug. [2] It prevents adding legacy PMEM as system RAM too but I think it's fine. If user wants legacy PMEM then it is unlikely user will add it back and use as system RAM. User is unlikely to use legacy PMEM as TD guest memory directly as TD guests is likely to use a new memfd backend which allows private page not accessible from usrspace, so in this way we can exclude legacy PMEM from TDMRs. [3] Please refer to SEAMLDR.SEAMINFO SEAMCALL in latest P-SEAMLDR spec: https://www.intel.com/content/dam/develop/external/us/en/documents-tps/intel-tdx-seamldr-interface-specification.pdf > > >