Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp211986iob; Mon, 2 May 2022 17:21:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxKa8j/p2/r+oSnbiILiKYdr7nG7NnqHMcW1gD4mGB8QW/KBrClVY+k+pP+nNpfX/EeDy6O X-Received: by 2002:a65:6e8b:0:b0:3ab:a3fb:e95a with SMTP id bm11-20020a656e8b000000b003aba3fbe95amr11710223pgb.433.1651537291736; Mon, 02 May 2022 17:21:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651537291; cv=none; d=google.com; s=arc-20160816; b=y+szJS5DqjhuIYdCDVHJLqUz8g9tuHg54cqC4qE3kuxKKd1hR6djLkhTCoMPgU4r6D RuiaCXHKJ+miWekd44zkToVPPZBSvEIrSACJ2W60GCfLGPZDMG1/JY3kWpUMhIiObYDU mNU/hsOMoGMDWkuRd+sNLsSt6fm8aIIMWLwvSKjLyI1RtauEiU0tXXTf5fgVbAE7X2Fu DKHfQeI7Vx9sFMo2m1jPLkjj4YJFw5bI71ROdqbFr82a6+V64H3PbmypE1S5Jwd5sver hAIM/UutYXkYu779asYG707P2g2/OXPsfUK7dDoEKb7JMqHg+EdobvEY+Qj4ShLBJzDf jYYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Xu64DkHjupl+FmxXCmxa5faK4gPGiDUHehE3aPyFqrg=; b=std6GgUQALsWeomEnDlewNGFLGIsguZ+GJzOUAK3NTw65sVSY5A/Cl3+Iz2V5i8T1M m9mz2J41yhB4BedjGuj1/jScnrDZnBn4JTN8fVHc5YzNIBjD8tUab4bFaGltqerRH0UG 6/JZ7kLxst8cfgICycODaYq2JtZO0KgzMtIHAWAFneZuZagAx0pwfXvtONZdn1mRNmYs vtWm2/Zk7Hu9iX++h76J+NIIQultptqr6OilnhNseuUduuf39G3fKkbymqgHxHPp4YOk tuCfeMCU7JY8T0BrpiwDwtx/XrX7aScnFA9fVF0lC5ECBDo3t/rTagC84JtwBQFqkuLE 3d6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lYJy37yY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id m2-20020a632602000000b003ab25dfda9dsi15240025pgm.73.2022.05.02.17.21.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 17:21:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lYJy37yY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 79DF13B027; Mon, 2 May 2022 17:18:06 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380331AbiD2TYH (ORCPT + 99 others); Fri, 29 Apr 2022 15:24:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380278AbiD2TXo (ORCPT ); Fri, 29 Apr 2022 15:23:44 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79D646428; Fri, 29 Apr 2022 12:20:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651260024; x=1682796024; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=cgLtOlQqQhDyaF67C74v6XD8hoopctzCMPzs3mVdX60=; b=lYJy37yYx/0FFnVU8lDCsiz3NFSqp8hA3tGiHNDQAcuT1srnUl4RQ0fF xqChoLdNHKVPp4929lmh6RD+CFfbjUX2babBXHyE/jOkGEPi0HEQzRDGT nakrY7t//ZDGjLyW55TVrxvNS+oskUHlR4UF+exX4Oh7+sVzrRP/vqnwE sac1i5ZMUQ39xttVyYu8STfh7hiv4OclHoghjc1dpdPRw6Z0C7btUuktT /KIJV7sT2nmeqkmlyzz01XrnOfR7VaTD7+j1sJF9GETnJXCrOgY8C+ZJ3 hoDs697xrIiHgh5bYBSY3rBnHp3/ELg+OELwY7bwoXpfOVZDY+eG7BOLE A==; X-IronPort-AV: E=McAfee;i="6400,9594,10332"; a="265593809" X-IronPort-AV: E=Sophos;i="5.91,186,1647327600"; d="scan'208";a="265593809" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 12:20:24 -0700 X-IronPort-AV: E=Sophos;i="5.91,186,1647327600"; d="scan'208";a="582344666" Received: from jinggu-mobl1.amr.corp.intel.com (HELO [10.212.30.227]) ([10.212.30.227]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2022 12:20:23 -0700 Message-ID: <73ed1e55-7e7c-2995-b411-8e26b711cc22@intel.com> Date: Fri, 29 Apr 2022 12:20:40 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v3 00/21] TDX host kernel support Content-Language: en-US To: Dan Williams Cc: Kai Huang , Linux Kernel Mailing List , KVM list , Sean Christopherson , Paolo Bonzini , "Brown, Len" , "Luck, Tony" , Rafael J Wysocki , Reinette Chatre , Peter Zijlstra , Andi Kleen , "Kirill A. Shutemov" , Kuppuswamy Sathyanarayanan , Isaku Yamahata References: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> <92af7b22-fa8a-5d42-ae15-8526abfd2622@intel.com> <4a5143cc-3102-5e30-08b4-c07e44f1a2fc@intel.com> <4d0c7316-3564-ef27-1113-042019d583dc@intel.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/29/22 11:47, Dan Williams wrote: > On Fri, Apr 29, 2022 at 11:34 AM Dave Hansen wrote: >> >> On 4/29/22 10:48, Dan Williams wrote: >>>> But, neither of those really help with, say, a device-DAX mapping of >>>> TDX-*IN*capable memory handed to KVM. The "new syscall" would just >>>> throw up its hands and leave users with the same result: TDX can't be >>>> used. The new sysfs ABI for NUMA nodes wouldn't clearly apply to >>>> device-DAX because they don't respect the NUMA policy ABI. >>> They do have "target_node" attributes to associate node specific >>> metadata, and could certainly express target_node capabilities in its >>> own ABI. Then it's just a matter of making pfn_to_nid() do the right >>> thing so KVM kernel side can validate the capabilities of all inbound >>> pfns. >> >> Let's walk through how this would work with today's kernel on tomorrow's >> hardware, without KVM validating PFNs: >> >> 1. daxaddr mmap("/dev/dax1234") >> 2. kvmfd = open("/dev/kvm") >> 3. ioctl(KVM_SET_USER_MEMORY_REGION, { daxaddr }; > > At least for a file backed mapping the capability lookup could be done > here, no need to wait for the fault. For DAX mappings, sure. But, anything that's backed by page cache, you can't know until the RAM is allocated. ... >> Those pledges are hard for anonymous memory though. To fulfill the >> pledge, we not only have to validate that the NUMA policy is compatible >> at KVM_SET_USER_MEMORY_REGION, we also need to decline changes to the >> policy that might undermine the pledge. > > I think it's less that the kernel needs to enforce a pledge and more > that an interface is needed to communicate the guest death reason. > I.e. "here is the impossible thing you asked for, next time set this > policy to avoid this problem". IF this code is booted on a system where non-TDX-capable memory is discovered, do we: 1. Disable TDX, printk() some nasty message, then boot as normal or, 2a. Boot normally with TDX enabled 2b. Add enhanced error messages in case of TDH.MEM.PAGE.AUG/ADD failure (the "SEAMCALLs" which are the last line of defense and will reject the request to add non-TDX-capable memory to a guest). Or maybe an even earlier message. For #1, if TDX is on, we are quite sure it will work. But, it will probably throw up its hands on tomorrow's hardware. (This patch set). For #2, TDX might break (guests get killed) at runtime on tomorrow's hardware, but it also might be just fine. Users might be able to work around things by, for instance, figuring out a NUMA policy which excludes TDX-incapable memory. (I think what Dan is looking for) Is that a fair summary?