Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp216020iob; Mon, 2 May 2022 17:28:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw8+RQWYMBscsSgSSGX8cZFEvDIGZZgB48GbI+h1ZdTf/vttH94C9k3ffv2x+pJqzKUglW/ X-Received: by 2002:a65:404d:0:b0:3c2:4b63:42d9 with SMTP id h13-20020a65404d000000b003c24b6342d9mr3833152pgp.257.1651537739009; Mon, 02 May 2022 17:28:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651537739; cv=none; d=google.com; s=arc-20160816; b=CLHzNjP5ocIMajjmRTE1q+ZeeEBrXSeThGZEkm2wLpYsLcQywL+ikHBg7MY8ej1BRx cSlLEyeEbJg9s2HOJhxOrkEmKlPy7RpoqNFf1jBchv3XVmGQWatV1Ar0cmpRS1jRgm03 mmVigigOKzynCqf4QTtcBZHSUMsNZk5zJnLvKZnv1X2BjeWodoq9t9fuyZbfpmIAdIsa 8lGsz/L8b8lBmh85u7QECnczry30gGJb0xpZYb7MQo/nRyV+BMvM1A97/i3ZM0oJWR+h Drr571sq5+u4KrjbSyJF4MSUxpxOEWKPa78HBhTf1hQWRHvrojQmfryA7qsM8YXIKlL9 P57A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mv9DFGVUjNjUvqKoeugVcxd/ako81vZgq6bN3/Q4KTM=; b=MOo5y/vDFFzO36PUU6dZ87ITRKziJUFFhsaBybK8m5RBZbJZue17Ruc/aM+R67gIOJ L4RME3JvnMlwYZIq7ZKbVMU7R7zLkjVi8YBBuCRp6bj2CV43QizS286Da//7ezcJAVpS XDZdSMREpc8jLhR1J19L0T2LSAM9f/KarER3RkhglyEPSwmHrGFr7nmfhbFRTYSPT/5x 66ybaL+6tneJh7yHn9yLW7cPG604SLzUidzC2rhUcGd6D1Aby/mfZDToHwHizI8h/DQg k2ZgaRcToWaweGLIIn6o5BOM68WOlc331VExZszYsUwVWCmnMN/DUX2G5NinWJ50sFs+ dDjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=YX11bDow; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e11-20020a056a001a8b00b0050d476df702si1900813pfv.167.2022.05.02.17.28.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 17:28:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=YX11bDow; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0D3163C700; Mon, 2 May 2022 17:23:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237916AbiD2VYp (ORCPT + 99 others); Fri, 29 Apr 2022 17:24:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238374AbiD2VYE (ORCPT ); Fri, 29 Apr 2022 17:24:04 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBE921F61B for ; Fri, 29 Apr 2022 14:20:26 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id cx11-20020a17090afd8b00b001d9fe5965b3so9645696pjb.3 for ; Fri, 29 Apr 2022 14:20:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mv9DFGVUjNjUvqKoeugVcxd/ako81vZgq6bN3/Q4KTM=; b=YX11bDowuMsHld0w/gipG/XgdOmxj1kgDmE5tMo2jIz8rxGjO3VsQu7T2QXaz7Z6sU PU4lSh1a1owSkqGU5yW8N7tGu8dmUCxKlHQhHr3na2wPJCn6XVK1+FZ5r7A87VV+t1XC 6xreO4ia6lTVzLB3JRtX+S0Ii6yWsnWGHOjJIZCvthh4xw8WKIrfUn1WeS7ACEdVrqrj NJ2+i2MNj93Wv7koCo0IleJIU0EZLKguMx0uvsSan1YJLoPw5lWGC3m7BHKy/9jTfEoF Ra2cLQQ090Awym3IrQCdk+fev3rLdz+kEfX/tl7cRaKOF6g6gIEMQuEMOeIWJYFnB8CQ RL8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mv9DFGVUjNjUvqKoeugVcxd/ako81vZgq6bN3/Q4KTM=; b=LOuy08AV5dbfUl/mFjGhsserD7C+gK0ZC+ZMGa0uVfZ/kBOgSNeoX+dcQtZyWyVkcr Sxgqx3KFSGnhBeGtRiMekXUzoQaFyq1jUKnlt9HjgOzEx0/Fg/i85nnQT2EdGWiH1qgf OLUWvoYoZe7kXZP0vFogPmORNRZHNUD68v+Yri6Xp6dGkIX2htqeYMszBudy9S9irzlw yo1vYPFThN1zJD2I2aYMKLyGJWKAPzUadRpzCphMOysgEqXs5DMlYWUrOxEnY1wmCEJO j2OZwXTzuuWYGeqIwLlCFtcxif67uZT/Kqh/IA6FNfmeD4yszAFvXP8W9UqMHHZjpEl0 imSg== X-Gm-Message-State: AOAM531N0T96eKDQtnTCzcxGj4s7RA7SFhWB7uMfWskXUwbiuUyvliI9 nXiFBUVDvgcH5Kpc24tK/iNDjQ543N9AZaemtihLAQ== X-Received: by 2002:a17:902:da81:b0:15d:37b9:70df with SMTP id j1-20020a170902da8100b0015d37b970dfmr1257018plx.34.1651267222305; Fri, 29 Apr 2022 14:20:22 -0700 (PDT) MIME-Version: 1.0 References: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> <92af7b22-fa8a-5d42-ae15-8526abfd2622@intel.com> <4a5143cc-3102-5e30-08b4-c07e44f1a2fc@intel.com> <4d0c7316-3564-ef27-1113-042019d583dc@intel.com> <73ed1e55-7e7c-2995-b411-8e26b711cc22@intel.com> In-Reply-To: <73ed1e55-7e7c-2995-b411-8e26b711cc22@intel.com> From: Dan Williams Date: Fri, 29 Apr 2022 14:20:11 -0700 Message-ID: Subject: Re: [PATCH v3 00/21] TDX host kernel support To: Dave Hansen Cc: Kai Huang , Linux Kernel Mailing List , KVM list , Sean Christopherson , Paolo Bonzini , "Brown, Len" , "Luck, Tony" , Rafael J Wysocki , Reinette Chatre , Peter Zijlstra , Andi Kleen , "Kirill A. Shutemov" , Kuppuswamy Sathyanarayanan , Isaku Yamahata Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 29, 2022 at 12:20 PM Dave Hansen wrote: > > On 4/29/22 11:47, Dan Williams wrote: > > On Fri, Apr 29, 2022 at 11:34 AM Dave Hansen wrote: > >> > >> On 4/29/22 10:48, Dan Williams wrote: > >>>> But, neither of those really help with, say, a device-DAX mapping of > >>>> TDX-*IN*capable memory handed to KVM. The "new syscall" would just > >>>> throw up its hands and leave users with the same result: TDX can't be > >>>> used. The new sysfs ABI for NUMA nodes wouldn't clearly apply to > >>>> device-DAX because they don't respect the NUMA policy ABI. > >>> They do have "target_node" attributes to associate node specific > >>> metadata, and could certainly express target_node capabilities in its > >>> own ABI. Then it's just a matter of making pfn_to_nid() do the right > >>> thing so KVM kernel side can validate the capabilities of all inbound > >>> pfns. > >> > >> Let's walk through how this would work with today's kernel on tomorrow's > >> hardware, without KVM validating PFNs: > >> > >> 1. daxaddr mmap("/dev/dax1234") > >> 2. kvmfd = open("/dev/kvm") > >> 3. ioctl(KVM_SET_USER_MEMORY_REGION, { daxaddr }; > > > > At least for a file backed mapping the capability lookup could be done > > here, no need to wait for the fault. > > For DAX mappings, sure. But, anything that's backed by page cache, you > can't know until the RAM is allocated. > > ... > >> Those pledges are hard for anonymous memory though. To fulfill the > >> pledge, we not only have to validate that the NUMA policy is compatible > >> at KVM_SET_USER_MEMORY_REGION, we also need to decline changes to the > >> policy that might undermine the pledge. > > > > I think it's less that the kernel needs to enforce a pledge and more > > that an interface is needed to communicate the guest death reason. > > I.e. "here is the impossible thing you asked for, next time set this > > policy to avoid this problem". > > IF this code is booted on a system where non-TDX-capable memory is > discovered, do we: > 1. Disable TDX, printk() some nasty message, then boot as normal > or, > 2a. Boot normally with TDX enabled > 2b. Add enhanced error messages in case of TDH.MEM.PAGE.AUG/ADD failure > (the "SEAMCALLs" which are the last line of defense and will reject > the request to add non-TDX-capable memory to a guest). Or maybe > an even earlier message. > > For #1, if TDX is on, we are quite sure it will work. But, it will > probably throw up its hands on tomorrow's hardware. (This patch set). > > For #2, TDX might break (guests get killed) at runtime on tomorrow's > hardware, but it also might be just fine. Users might be able to work > around things by, for instance, figuring out a NUMA policy which > excludes TDX-incapable memory. (I think what Dan is looking for) > > Is that a fair summary? Yes, just the option for TDX and non-TDX to live alongside each other... although in the past I have argued to do option-1 and enforce it at the lowest level [1]. Like platform BIOS is responsible to disable CXL if CXL support for a given CPU security feature is missing. However, I think end users will want to have their confidential computing and capacity too. As long as that is not precluded to be added after the fact, option-1 can be a way forward until a concrete user for mixed mode shows up. Is there something already like this today for people that, for example, attempt to use PCI BAR mappings as memory? Or does KVM simply allow for garbage-in garbage-out? In the end the patches shouldn't talk about whether or not PMEM is supported on a platform or not, that's irrelevant. What matters is that misconfigurations can happen, should be rare to non-existent on current platforms, and if it becomes a problem the kernel can grow ABI to let userspace enumerate the conflicts. [1]: https://lore.kernel.org/linux-cxl/CAPcyv4jMQbHYQssaDDDQFEbOR1v14VUnejcSwOP9VGUnZSsCKw@mail.gmail.com/