Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6018241iob; Tue, 10 May 2022 08:34:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzXNUeDDq88+JUVFkc6vhTHvk7qisGE8HL7lBYHs5L4w9whtr9yQaj5plxYoyFhGX5FoqRZ X-Received: by 2002:a63:2b05:0:b0:3c2:3ed1:5fa9 with SMTP id r5-20020a632b05000000b003c23ed15fa9mr17532467pgr.220.1652196858061; Tue, 10 May 2022 08:34:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652196858; cv=none; d=google.com; s=arc-20160816; b=v2Dki8miYvfu1QdCkFj7PT92aao5EIJTQ9kWn1/GcAdc3aWFHsrbDoleMlTEcn4RNj yvt+Xa0UqCheqcdzG2yZ/p3im3jSLzd6l6xiVW22KPNz7g/haK3XzirC0klATn86juru JkBj6NHHHsH4wYizXyWowbr7I7hvT0tZfSyGs7y4HruhBSWX6jdcAoZaZ3qKlozFHGE7 gbwlhlaB83khy80MOYaCsW46AMVjo/b/itOL/D8wpHi/W4Ob20nmEooA7bNiNgLltsp1 pXpea1iXRkiLTfAA3jpNrA/P02b4VgpVkhQ589UMkbjR1dTUtqYW2LUUenOOJJmzKmNI e+5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=bgyOagpBZw04GVyMul4eLaka5QP5m9FMPzZhmAdSo98=; b=aoj6ao+hS0KiUJvspK6t3K3gmhXCOqJvukhFNncBk/YI3FiuYguMekqNjgLvyhGbFF hLQd6BanmUnHGZzo/z/LNovnkPylDFKTjBZEDy0ycyVnTvXQ4QVpk5Jqc+pwvd0KDb3X 27CrWhtqv2KHEu5a/Mt5dOde62xjTWiRO5H1sWHQNAVKmse95GJfNlaTSadZYDGmyuYn JcFtXZwZz6KZtHmjbTGxG4SjdKSc5W0TvA/hPHAm20W25Mvwx89McHGWt0EOCoUla80v YejzNgpJj8QwMmvFDiyj6rbgMIqOjvWZZk9TbjU/z9K9yGkxroKdk82ZmGkUmB68GakD Y1gw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RtB8Qta4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u13-20020a65670d000000b0039cbfcd3b1dsi18023790pgf.830.2022.05.10.08.33.59; Tue, 10 May 2022 08:34:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RtB8Qta4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239784AbiEJK3Q (ORCPT + 99 others); Tue, 10 May 2022 06:29:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238069AbiEJK3N (ORCPT ); Tue, 10 May 2022 06:29:13 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EDD657142; Tue, 10 May 2022 03:25:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1652178316; x=1683714316; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=H4Ptgv7Vg5mce1o98ZylLhiyjCmBqWKpu0PzSLtJJIY=; b=RtB8Qta4sOjiUtRx/kumvTaHdHOMSCdNO2Q0eMyE7v5i1N3g5uG4rIOR 6sLKDTUZcmDXg4XezxzT22EpX5oFIrGF0IEQ63cIqGRkX5Ay+2fcK7iy/ xb0iCApinv5+cbo324wIijukcy53YC3cbdoUEf5PSiw9QY0GIIbxC5NBS 7WFgZv6l0sWemBAMOieAjUpkxUztzXYdoG7c0LeDTpI7iHBhqh/oLD/+I SWAhBATChJ9F7ffN0FPSgwEXIU2vfi+YOmvv+fhDFX6hJxNTo0VzCnwiG rBjuTBZyUTCAuV+wUOKfPt/SoOI0GFNqTLm+3K9NONFdKb0qcZD6Za6gK A==; X-IronPort-AV: E=McAfee;i="6400,9594,10342"; a="268170777" X-IronPort-AV: E=Sophos;i="5.91,214,1647327600"; d="scan'208";a="268170777" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2022 03:25:16 -0700 X-IronPort-AV: E=Sophos;i="5.91,214,1647327600"; d="scan'208";a="570610794" Received: from aadavis-mobl1.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.254.0.231]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2022 03:25:13 -0700 Message-ID: <30c7cc075fb68a2830304e6e807023ba9df7c17b.camel@intel.com> Subject: Re: [PATCH v3 00/21] TDX host kernel support From: Kai Huang To: Dan Williams Cc: Dave Hansen , Linux Kernel Mailing List , KVM list , Sean Christopherson , Paolo Bonzini , "Brown, Len" , "Luck, Tony" , Rafael J Wysocki , Reinette Chatre , Peter Zijlstra , Andi Kleen , "Kirill A. Shutemov" , Kuppuswamy Sathyanarayanan , Isaku Yamahata , Mike Rapoport Date: Tue, 10 May 2022 22:25:11 +1200 In-Reply-To: References: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> <9b388f54f13b34fe684ef77603fc878952e48f87.camel@intel.com> <664f8adeb56ba61774f3c845041f016c54e0f96e.camel@intel.com> <1b681365-ef98-ec78-96dc-04e28316cf0e@intel.com> <8bf596b45f68363134f431bcc550e16a9a231b80.camel@intel.com> <6bb89ca6e7346f4334f06ea293f29fd12df70fe4.camel@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > > > > > > > > > > Consider the fact that end users can break the kernel by specifying > > > > invalid memmap= command line options. The memory hotplug code does not > > > > take any steps to add safety in those cases because there are already > > > > too many ways it can go wrong. TDX is just one more corner case where > > > > the memmap= user needs to be careful. Otherwise, it is up to the > > > > platform firmware to make sure everything in the base memory map is > > > > TDX capable, and then all you need is documentation about the failure > > > > mode when extending "System RAM" beyond that baseline. > > > > > > So the fact is, if we don't include legacy PMEMs into TDMRs, and don't do > > > anything in memory hotplug, then if user does kmem-hot-add legacy PMEMs as > > > system RAM, a live TD may eventually be killed. > > > > > > If such case is a corner case that we don't need to guarantee, then even better. > > > And we have an additional reason that those legacy PMEMs don't need to be in > > > TDMRs. As you suggested, we can add some documentation to point out. > > > > > > But the point we want to do some code check and prevent memory hotplug is, as > > > Dave said, we want this piece of code to work on *ANY* TDX capable machines, > > > including future machines which may, i.e. supports NVDIMM/CLX memory as TDX > > > memory. If we don't do any code check in memory hotplug in this series, then > > > when this code runs in future platforms, user can plug NVDIMM or CLX memory as > > > system RAM thus break the assumption "all pages in page allocator are TDX > > > memory", which eventually leads to live TDs being killed potentially. > > > > > > Dave said we need to guarantee this code can work on *ANY* TDX machines. Some > > > documentation saying it only works one some platforms and you shouldn't do > > > things on other platforms are not good enough: > > > > > > https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@intel.com/T/#m6df45b6e1702bb03dcb027044a0dabf30a86e471 > > > > Yes, the incompatible cases cannot be ignored, but I disagree that > > they actively need to be prevented. One way to achieve that is to > > explicitly enumerate TDX capable memory and document how mempolicy can > > be used to avoid killing TDs. > > Hi Dan, > > Thanks for feedback. > > Could you elaborate what does "explicitly enumerate TDX capable memory" mean? > How to enumerate exactly? > > And for "document how mempolicy can be used to avoid killing TDs", what > mempolicy (and error reporting you mentioned below) are you referring to? > > I skipped to reply your below your two replies as I think they are referring to > the same "enumerate" and "mempolicy" that I am asking above. > > Hi Dan, I guess "explicitly enumerate TDX capable memory" means getting the Convertible Memory Regions (CMR). And "document how mempolicy can be used to avoid killing TDs" means we say something like below in the documentation? Any non TDX capable memory hot-add will result in non TDX capable pages being potentially allocated to a TD, in which case a TD may fail to be created or a live TD may be killed at runtime. And "error reporting" do you mean in memory hot-add code path, we check whether the new memory resource is TDX capable, if not we print some error similar to above message in documentation, but still allow the memory hot-add to happen? Something like below in add_memory_resource()? if (platform_has_tdx() && new memory resource NOT in CMRs) pr_err("Hot-add non-TDX memory on TDX capable system. TD may fail to be created, or a live TD may be killed during runtime.\n"); // allow memory hot-add anyway I have below concerns of this approach: 1) I think we should provide a consistent service to user, that is, we either to guarantee that TD won't be failed to be created randomly and a running TD won't be killed during runtime, or we don't provide any TDX functionality at all. So I am not sure only "document how mempolicy can be use to avoid killing TDs" is good enough. 2) Above code to check whether a new memory resource is in CMRs or not requires the kernel to get CMRs during kernel boot. However getting CMRs requires calling SEAMCALL which requires kernel to support VMXON/VMXOFF. VMXON/VMXOFF is currently only handled by KVM. We'd like to avoid adding VMXON/VMXOFF to core- kernel now if not mandatory, as eventually we will very likely need to have a reference-based approach to call VMXON/VMXOFF. This part is explained in the cover letter in this series. Dave suggested for now to keep things simple, we can use "winner take all" approach: If TDX is initialized first, don't allow memory hotplug. If memory hotplug happens first, don't allow TDX to be initialized. https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@intel.com/T/#mfa6b5dcc536d8a7b78522f46ccd1230f84d52ae0 I think this is perhaps more reasonable as we are at least providing some consistent service to user. And in this approach we don't need to handle VMXON/VMXOFF in core-kernel. Comments? -- Thanks, -Kai