Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4909272iob; Mon, 9 May 2022 04:39:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwV2FN1gWsV3Y+8PchwVQpjns3gThHkrtU96L9rbfeUSRhI1FS7T9Z0OM9c8EoDsoyiTcf2 X-Received: by 2002:a05:6a00:cc4:b0:50d:e9db:6145 with SMTP id b4-20020a056a000cc400b0050de9db6145mr15738496pfv.56.1652096344026; Mon, 09 May 2022 04:39:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652096344; cv=none; d=google.com; s=arc-20160816; b=LGH+VuXw7JJKF1WFLHcRbXZIZPCIBeLnyWE+zchqs4kHhN5gekzy5WRABcLc+HkijJ 2K0MaCUe4+fgGBvmrnUhFrBi4aQ67DFsiOmsLwIVPTosaQvEFFTb5OJF+Z6EvH2FSBS/ qlvatqxHDDAVYWY8KQ6S3Zs3kEm1bfH/Muznv5b4IWm+8Ek0/+ZYo/xdG4GWWZ4A/Gv6 7UwJD8i7p2kgYl082TQiabzDNfZfixpCJblPQqag5Lmj7G/0VDNQiBP+FNcpQlpdfAaM O3tft6rBbtNAUzFp0dD93+fyeUW3pZw/0MsmOyOu9zt8pMf1p1cGITn/3UJ8wBckv+t7 KRYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=nzzZvI2ENGAYMQCRoPZSlbGChQVFHMFWdtQE/HLykv4=; b=sFlmuttrDmSK9fr6pLF3OBHGklkPvS2E5ic7w4JakcA4UBaD6WzItAIijDmPVW7Map LZ2oui1yiCZazrGFqSrjKsfAQdlPp2ej99WYUqfzIOtGaWTJlHI2aTXC5nAP4OQYyIXV cQP3rFdT43ejMSuK8ZCFxgdJVYWilF5ivxMvj48ZfekGsjXEt+UMWnbDGdvDc5zoLlhj yartfLW19xzZVfZsVV7llZHrxPvb6gPB7C+809ciOhoSEzbLELMYjT+nLTo3shxXBM0F A/prPsHGrjksu3zJstuw/XZAtqqx+e1i4BCHo6rziBroYe2bJu3C1BBSytggNDP4TfOK Rkzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KtSDi6ck; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id w1-20020a170902d3c100b001568126be52si10462836plb.605.2022.05.09.04.39.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 04:39:04 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=KtSDi6ck; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 72D772D76F6; Mon, 9 May 2022 03:43:11 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229924AbiEIKmd (ORCPT + 99 others); Mon, 9 May 2022 06:42:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229756AbiEIKmZ (ORCPT ); Mon, 9 May 2022 06:42:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99DFC205D6; Mon, 9 May 2022 03:38:26 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B250760FAD; Mon, 9 May 2022 10:34:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C8D94C385A8; Mon, 9 May 2022 10:34:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652092447; bh=NSHAgzSuzjtuD7Tjj9g99sthrjIGsxuDy6cZ0osNlgU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KtSDi6ckvqgMOaHS+FfGqqXgAe/cl6zO47IvsmxQYfNan8YoA+cE5a3FOxZpbs0Uj pV7P1D2rcTpBc5iYGdFIghkPSRctl/JvAmXEjuueqMF1kH+aY45wduQIBaCkp0dBC6 dSDExChHLMOXHczNDU6d22Yc1ttxZrUD4Fj1MFxr8xWrZhbK7DxnGwy291li+GhdQ8 41xOkBIwDDp6uTiGZEAC1XQxEneCVuF3i+O0DJIYgINkU2jICpMpSVcE/NIN312Dq/ HYBQz1yjC2WaMAjWQ5IZh6JHWrwmRwCZ9rPJm+n9YFUR4doi49RwB/A57IZXOMWGcV wwJ+uKTyUgG8g== Date: Mon, 9 May 2022 13:33:56 +0300 From: Mike Rapoport To: Kai Huang Cc: Dan Williams , Dave Hansen , Linux Kernel Mailing List , KVM list , Sean Christopherson , Paolo Bonzini , "Brown, Len" , "Luck, Tony" , Rafael J Wysocki , Reinette Chatre , Peter Zijlstra , Andi Kleen , "Kirill A. Shutemov" , Kuppuswamy Sathyanarayanan , Isaku Yamahata Subject: Re: [PATCH v3 00/21] TDX host kernel support Message-ID: References: <664f8adeb56ba61774f3c845041f016c54e0f96e.camel@intel.com> <1b681365-ef98-ec78-96dc-04e28316cf0e@intel.com> <8bf596b45f68363134f431bcc550e16a9a231b80.camel@intel.com> <6bb89ca6e7346f4334f06ea293f29fd12df70fe4.camel@intel.com> <5c7196b517398e7697464fe997018e9031d15470.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5c7196b517398e7697464fe997018e9031d15470.camel@intel.com> X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 08, 2022 at 10:00:39PM +1200, Kai Huang wrote: > On Fri, 2022-05-06 at 20:09 -0400, Mike Rapoport wrote: > > On Thu, May 05, 2022 at 06:51:20AM -0700, Dan Williams wrote: > > > [ add Mike ] > > > > > > On Thu, May 5, 2022 at 2:54 AM Kai Huang wrote: > > > [..] > > > > > > > > Hi Dave, > > > > > > > > Sorry to ping (trying to close this). > > > > > > > > Given we don't need to consider kmem-hot-add legacy PMEM after TDX module > > > > initialization, I think for now it's totally fine to exclude legacy PMEMs from > > > > TDMRs. The worst case is when someone tries to use them as TD guest backend > > > > directly, the TD will fail to create. IMO it's acceptable, as it is supposedly > > > > that no one should just use some random backend to run TD. > > > > > > The platform will already do this, right? I don't understand why this > > > is trying to take proactive action versus documenting the error > > > conditions and steps someone needs to take to avoid unconvertible > > > memory. There is already the CONFIG_HMEM_REPORTING that describes > > > relative performance properties between initiators and targets, it > > > seems fitting to also add security properties between initiators and > > > targets so someone can enumerate the numa-mempolicy that avoids > > > unconvertible memory. > > > > > > No, special casing in hotplug code paths needed. > > > > > > > > > > > I think w/o needing to include legacy PMEM, it's better to get all TDX memory > > > > blocks based on memblock, but not e820. The pages managed by page allocator are > > > > from memblock anyway (w/o those from memory hotplug). > > > > > > > > And I also think it makes more sense to introduce 'tdx_memblock' and > > > > 'tdx_memory' data structures to gather all TDX memory blocks during boot when > > > > memblock is still alive. When TDX module is initialized during runtime, TDMRs > > > > can be created based on the 'struct tdx_memory' which contains all TDX memory > > > > blocks we gathered based on memblock during boot. This is also more flexible to > > > > support other TDX memory from other sources such as CLX memory in the future. > > > > > > > > Please let me know if you have any objection? Thanks! > > > > > > It's already the case that x86 maintains sideband structures to > > > preserve memory after exiting the early memblock code. Mike, correct > > > me if I am wrong, but adding more is less desirable than just keeping > > > the memblock around? > > > > TBH, I didn't read the entire thread yet, but at the first glance, keeping > > memblock around is much more preferable that adding yet another { .start, > > .end, .flags } data structure. To keep memblock after boot all is needed is > > something like > > > > select ARCH_KEEP_MEMBLOCK if INTEL_TDX_HOST > > > > I'll take a closer look next week on the entire series, maybe I'm missing > > some details. > > > > Hi Mike, > > Thanks for feedback. > > Perhaps I haven't put a lot details of the new TDX data structures, so let me > point out that the new two data structures 'struct tdx_memblock' and 'struct > tdx_memory' that I am proposing are mostly supposed to be used by TDX code only, > which is pretty standalone. They are not supposed to be some basic > infrastructure that can be widely used by other random kernel components.? We already have "pretty standalone" numa_meminfo that originally was used to setup NUMA memory topology, but now it's used by other code as well. And e820 tables also contain similar data and they are supposedly should be used only at boot time, but in reality there are too much callbacks into e820 way after the system is booted. So any additional memory representation will only add to the overall complexity and well have even more "eventually consistent" collections of { .start, .end, .flags } structures. > In fact, currently the only operation we need is to allow memblock to register > all memory regions as TDX memory blocks when the memblock is still alive. > Therefore, in fact, the new data structures can even be completely invisible to > other kernel components. For instance, TDX code can provide below API w/o > exposing any data structures to other kernel components: > > int tdx_add_memory_block(phys_addr_t start, phys_addr_t end, int nid); > > And we call above API for each memory region in memblock when it is alive. > > TDX code internally manages those memory regions via the new data structures > that I mentioned above, so we don't need to keep memblock after boot. The > advantage of this approach is it is more flexible to support other potential TDX > memory resources (such as CLX memory) in the future. Please let keep things simple. If other TDX memory resources will need different handling it can be implemented then. For now, just enable ARCH_KEEP_MEMBLOCK and use memblock to track TDX memory. > Otherwise, we can do as you suggested to select ARCH_KEEP_MEMBLOCK when > INTEL_TDX_HOST is on and TDX code internally uses memblock API directly. > > -- > Thanks, > -Kai > > -- Sincerely yours, Mike.