Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp2442410iob; Sat, 30 Apr 2022 08:39:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyppAxidHWDFWMJuTa+vRSRXn2oDDOeZqX/MSJg4YhKg8BU9THL6Z7W3fNVnp1Kf7Rro5n X-Received: by 2002:ac2:4e12:0:b0:472:436e:fec0 with SMTP id e18-20020ac24e12000000b00472436efec0mr3350124lfr.230.1651333166831; Sat, 30 Apr 2022 08:39:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651333166; cv=none; d=google.com; s=arc-20160816; b=NS6UMHMltrmNthTwWV9OdbE1yzg/b/j798xmQcNW8NWE2RJK1+MWGNDZzDHEAzKz6c Lo2JSL4Y6wrbw11iO/qDxmj2B7AnS049NmEPkrUE55Ge0FEnrdc/jW/f6qPZiwhkpMmk cFUF6VqBuPfNob7RWuEAShO2FsyVaQ6kqXT+U4FwhVjaiQb0b8hZqUSep4z7VEi468Uz LeMijGL/hGOK1JKDkT7HFetQvzR+7HG4upC9pvXCEA09jGA6SNeGJ1+RJpNhPpfRgfBh HTBJBlaGDbwxxphlsG5I5pcoPRR/JX6h7ArOANaRStonOt4E4zChw3RZU2WCy5fWN8jU ybPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=rHgXu7RC6ge2Xt5U473Z2Wl4/pIwBeC9AkMg1zFcWN8=; b=e/aYGvkZ8Kx1hBeaZGkApWvem7N3pmH18zVxtPZ2P8pIrTfuUojcWB5iO3BKBfZsyo QjIuYveQQSUKMSvpVKcDH6C/qsMsgI92QzxI7Zn9G5usbXOKfFU9tTycCB946TRlakZY 3GLuDVb7kw16UygPrmatKHyrstce0YnExLfMaPknoQASv0zfXcxrkT1kd7A7Rw4+lNfP s1N6i8jfHsG0QTBY6htWfy9FmBnwkgLElG0CfnoD5nV0cF8xwGwkarEvpOx/Z4foMkn7 1VqX8Vm9CRbiCYPKPdkN+CyGGUR0HnUrEYBtZPPeOYnpjbhN0YVXhjNTw5hec2D58Jxq bWYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MhRHtQ9h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f14-20020a0565123b0e00b0044a0d13937asi12148271lfv.184.2022.04.30.08.39.00; Sat, 30 Apr 2022 08:39:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=MhRHtQ9h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379485AbiD2RVO (ORCPT + 99 others); Fri, 29 Apr 2022 13:21:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377752AbiD2RVN (ORCPT ); Fri, 29 Apr 2022 13:21:13 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 205D68878E; Fri, 29 Apr 2022 10:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651252674; x=1682788674; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=kvflxqM96jhZNLmg4d5WFE0pIbmtCjEb6LWVYafIRJQ=; b=MhRHtQ9hMyTv3MKPkVmCRnk2+lrUAzpNbE05hYpfwLSSepvpC0JigVFm oxP2DryITtvc8mbCzuj8ccjV7cm+fMoy0zKsExg1VTBwnnD5tJMSVXvi4 4X+B5G8RQ6uxR7GR1xmJqWUb+AMHzSd/t0O3JouPyKN1HwP3MqeppEt18 gp1BQr3RJUiZGDdd43axGUhSpf3Sm+GkD+2LtYzdcv3lYuHKx/MXW+a+K EPdFzCgUXym9nr/L3oN3ZMLzqx8unVt6aYOTAfKpZAwrSO8/g2MRsDZQh V3ck6ZHMvczjxS654/WhPfRXOZPaupbnzkOM2TZW8mzR3Cj+TjfrBVJ8B w==; X-IronPort-AV: E=McAfee;i="6400,9594,10332"; a="246622967" X-IronPort-AV: E=Sophos;i="5.91,185,1647327600"; d="scan'208";a="246622967" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 10:17:53 -0700 X-IronPort-AV: E=Sophos;i="5.91,185,1647327600"; d="scan'208";a="582273107" Received: from jinggu-mobl1.amr.corp.intel.com (HELO [10.212.30.227]) ([10.212.30.227]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 10:17:52 -0700 Message-ID: <4a5143cc-3102-5e30-08b4-c07e44f1a2fc@intel.com> Date: Fri, 29 Apr 2022 10:18:09 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v3 00/21] TDX host kernel support Content-Language: en-US To: Dan Williams Cc: Kai Huang , Linux Kernel Mailing List , KVM list , Sean Christopherson , Paolo Bonzini , "Brown, Len" , "Luck, Tony" , Rafael J Wysocki , Reinette Chatre , Peter Zijlstra , Andi Kleen , "Kirill A. Shutemov" , Kuppuswamy Sathyanarayanan , Isaku Yamahata References: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> <92af7b22-fa8a-5d42-ae15-8526abfd2622@intel.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/29/22 08:18, Dan Williams wrote: > Yes, I want to challenge the idea that all core-mm memory must be TDX > capable. Instead, this feels more like something that wants a > hugetlbfs / dax-device like capability to ask the kernel to gather / > set-aside the enumerated TDX memory out of all the general purpose > memory it knows about and then VMs use that ABI to get access to > convertible memory. Trying to ensure that all page allocator memory is > TDX capable feels too restrictive with all the different ways pfns can > get into the allocator. The KVM users are the problem here. They use a variety of ABIs to get memory and then hand it to KVM. KVM basically just consumes the physical addresses from the page tables. Also, there's no _practical_ problem here today. I can't actually think of a case where any memory that ends up in the allocator on today's TDX systems is not TDX capable. Tomorrow's systems are going to be the problem. They'll (presumably) have a mix of CXL devices that will have varying capabilities. Some will surely lack the metadata storage for checksums and TD-owner bits. TDX use will be *safe* on those systems: if you take this code and run it on one tomorrow's systems, it will notice the TDX-incompatible memory and will disable TDX. The only way around this that I can see is to introduce ABI today that anticipates the needs of the future systems. We could require that all the KVM memory be "validated" before handing it to TDX. Maybe a new syscall that says: "make sure this mapping works for TDX". It could be new sysfs ABI which specifies which NUMA nodes contain TDX-capable memory. But, neither of those really help with, say, a device-DAX mapping of TDX-*IN*capable memory handed to KVM. The "new syscall" would just throw up its hands and leave users with the same result: TDX can't be used. The new sysfs ABI for NUMA nodes wouldn't clearly apply to device-DAX because they don't respect the NUMA policy ABI. I'm open to ideas here. If there's a viable ABI we can introduce to train TDX users today that will work tomorrow too, I'm all for it.