Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1546497iob; Fri, 29 Apr 2022 07:39:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyUEyeorCFFbPQa/4ZBb2yg6MVeA1aN4eNhBb0Jn0MGUTliN6PeUzzH3rNPaPLKzgDT+HLf X-Received: by 2002:a63:eb13:0:b0:382:1cd5:7d06 with SMTP id t19-20020a63eb13000000b003821cd57d06mr32825476pgh.280.1651243145738; Fri, 29 Apr 2022 07:39:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651243145; cv=none; d=google.com; s=arc-20160816; b=UREmJZV+STJaiHNMwjfaqhiiQkeM/OinNubZWvdPAts8yz7JoTMLBJAW7F8ZB1FUBQ DOi8oMXMNWPeHgITwLuvG52s1/j0bbNjeWT3Ku8RcB4xm1gQ3b5fWXQVS5lqYU4pp8U7 mN4t4qK8KFIvHgSuk8/mRqU/Mg6yQt/r20FGm2nydJmds8lpLuZPOblWDRzH1VqBqz3z +6AXLakG93pbQW1O0sNmLJvLS1Mv428g8PHcV7xEUQSn9PYS3LybRIq8wOigKMl4UvGU NUrR+uzeCYwnpDTJigUHJ9Gwt0hthFokdfURB5b/S14b6vyMPfSH4zs2PVlJCepS5E2+ YDiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=5VVgFc+AWvo5UdP8jPH8kMYu27hfTPHx7TcE08ruWnQ=; b=qwKMe8bH6WEMucAZtpX/cPbemCP54CWk38BTcqH4nIO+nfrHCV3AnX5QY3ZVpKXL3k QWu0WA024qgHhfxZQL0emkpXSOESzYXgTDsqbQ+2DOkb/Ii7PhRuYBakhfsluJNgtHt6 SHcWXT7zzsCfrNlFeYwalQlIO+6rK/G2MfgyRg/7kMiJ4hwafprGUuERSwRTqrj2gw6y kodrOsWpsbmsb5d0XinRqTDwLppbK4qXkL4TJHb/LDPZpfSckC/HsoWG/QffdaHXyLtd KH7u0wEichQMcz/oZNEc5NVKAmisvw8S7ZA4Ral6FJrDS7mWRZYCvBzG2uzJoqks041B oK1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b="IR/xKb+G"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q7-20020a655247000000b0039e59307c86si7340412pgp.617.2022.04.29.07.38.49; Fri, 29 Apr 2022 07:39:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b="IR/xKb+G"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235002AbiD2DHs (ORCPT + 99 others); Thu, 28 Apr 2022 23:07:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234948AbiD2DHr (ORCPT ); Thu, 28 Apr 2022 23:07:47 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EF94BF330 for ; Thu, 28 Apr 2022 20:04:30 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id d11so161636plg.10 for ; Thu, 28 Apr 2022 20:04:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5VVgFc+AWvo5UdP8jPH8kMYu27hfTPHx7TcE08ruWnQ=; b=IR/xKb+GabRkSbkhHaocsMPuhj4+DswB7PeaFqXyL5ZCswzf+2jRMoChLc7LEI6GoG pWpKwQrOX3hFPBUwyDnlNrEbg4iGQ1/BRI22Yj6Ow/FLmxaasrJK5GHJ9mOYCBB8VX7p vOhomVSn/Jb9cvvd3HyWrCIRQDpD2CVgCnIjhGXjhbazcInjTmXGM0Em/R0Lp+kAsF+1 XnnFuduYAprfX1W2w09E2RkF9QA8/VG3WSAUI3pCasm2GnuCJqWwQgFzifBD56dL/f4e YZP++qNEiK7Wc9Y7tYIAYrfNr+S0n/ZhxX+ZT5daJZ/dFaFSBhn/MYis8k8LQufVqY96 POPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5VVgFc+AWvo5UdP8jPH8kMYu27hfTPHx7TcE08ruWnQ=; b=fgKnTwELy8Q3Jt8X/UzEu4c+BbAhlbfiyKPEPaXE50SSEEJ+X60Npd1+8X3kBM4clK DVMv2lRQicIFRChcsMSiFGyFmEep+C423DFNf/ZTsZEHXXShe3FIULQ+Gx0RvvLK07ah TmVax6kx3dt31r8i9x8dfTtcI5CJGQP5gcF0KJHDLtezgiNHg6vwRI8wXRQGxvvRaVsc 0iLMXa4jodMmsQuQeUaWmmgPKPu4y+Jda7lKkrUTXyYBs/479nZq4orh5VpdaSDSEQ3j 2tNlR9+6KS50QQCi7WvyPjYiSQX0BodcF5Ut++VddDeuRKQzyxoM2a7dmLGTTWE7JF/x efpg== X-Gm-Message-State: AOAM533IHGM9imlyBDwAKglqeHEtvkt7dDGiBKROkEqvGDvUqo03lU2P 2qpdHiEWTfB36C+l7wAUKac4ZCVHq28amxOvdrKyRA== X-Received: by 2002:a17:902:7296:b0:14b:4bc6:e81 with SMTP id d22-20020a170902729600b0014b4bc60e81mr36466562pll.132.1651201470117; Thu, 28 Apr 2022 20:04:30 -0700 (PDT) MIME-Version: 1.0 References: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> <9b388f54f13b34fe684ef77603fc878952e48f87.camel@intel.com> In-Reply-To: From: Dan Williams Date: Thu, 28 Apr 2022 20:04:19 -0700 Message-ID: Subject: Re: [PATCH v3 00/21] TDX host kernel support To: Kai Huang Cc: Dave Hansen , Linux Kernel Mailing List , KVM list , Sean Christopherson , Paolo Bonzini , "Brown, Len" , "Luck, Tony" , Rafael J Wysocki , Reinette Chatre , Peter Zijlstra , Andi Kleen , "Kirill A. Shutemov" , Kuppuswamy Sathyanarayanan , Isaku Yamahata Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 28, 2022 at 6:40 PM Kai Huang wrote: > > On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote: > > On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote: > > > On 4/27/22 17:37, Kai Huang wrote: > > > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote: > > > > > In 5 years, if someone takes this code and runs it on Intel hardware > > > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens? > > > > > > > > I thought we could document this in the documentation saying that this code can > > > > only work on TDX machines that don't have above capabilities (SPR for now). We > > > > can change the code and the documentation when we add the support of those > > > > features in the future, and update the documentation. > > > > > > > > If 5 years later someone takes this code, he/she should take a look at the > > > > documentation and figure out that he/she should choose a newer kernel if the > > > > machine support those features. > > > > > > > > I'll think about design solutions if above doesn't look good for you. > > > > > > No, it doesn't look good to me. > > > > > > You can't just say: > > > > > > /* > > > * This code will eat puppies if used on systems with hotplug. > > > */ > > > > > > and merrily await the puppy bloodbath. > > > > > > If it's not compatible, then you have to *MAKE* it not compatible in a > > > safe, controlled way. > > > > > > > > You can't just ignore the problems because they're not present on one > > > > > version of the hardware. > > > > > > Please, please read this again ^^ > > > > OK. I'll think about solutions and come back later. > > > > > Hi Dave, > > I think we have two approaches to handle memory hotplug interaction with the TDX > module initialization. > > The first approach is simple. We just block memory from being added as system > RAM managed by page allocator when the platform supports TDX [1]. It seems we > can add some arch-specific-check to __add_memory_resource() and reject the new > memory resource if platform supports TDX. __add_memory_resource() is called by > both __add_memory() and add_memory_driver_managed() so it prevents from adding > NVDIMM as system RAM and normal ACPI memory hotplug [2]. What if the memory being added *is* TDX capable? What if someone wanted to manage a memory range as soft-reserved and move it back and forth from the core-mm to device access. That should be perfectly acceptable as long as the memory is TDX capable. > The second approach is relatively more complicated. Instead of directly > rejecting the new memory resource in __add_memory_resource(), we check whether > the memory resource can be added based on CMR and the TDX module initialization > status. This is feasible as with the latest public P-SEAMLDR spec, we can get > CMR from P-SEAMLDR SEAMCALL[3]. So we can detect P-SEAMLDR and get CMR info > during kernel boots. And in __add_memory_resource() we do below check: > > tdx_init_disable(); /*similar to cpu_hotplug_disable() */ > if (tdx_module_initialized()) > // reject memory hotplug > else if (new_memory_resource NOT in CMRs) > // reject memory hotplug > else > allow memory hotplug > tdx_init_enable(); /*similar to cpu_hotplug_enable() */ > > tdx_init_disable() temporarily disables TDX module initialization by trying to > grab the mutex. If the TDX module initialization is already on going, then it > waits until it completes. > > This should work better for future platforms, but would requires non-trivial > more code as we need to add VMXON/VMXOFF support to the core-kernel to detect > CMR using SEAMCALL. A side advantage is with VMXON in core-kernel we can > shutdown the TDX module in kexec(). > > But for this series I think the second approach is overkill and we can choose to > use the first simple approach? This still sounds like it is trying to solve symptoms and not the root problem. Why must the core-mm never have non-TDX memory when VMs are fine to operate with either core-mm pages or memory from other sources like hugetlbfs and device-dax?