Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1403616rbb; Mon, 26 Feb 2024 08:12:37 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUJbl1CdL9cHZO833lsSH0tklz87SCX6laaCwIxzZcYds6TZDf2QB5ITjqWkKIElEblXZpmibYCEiApYW5BeGVexqBFzBh/IhA+IpIlOw== X-Google-Smtp-Source: AGHT+IER0V4UvxYB/xD1Wa3jkDhJkqhwoH6OUQ4ZtAi+54gw016EFt0D/zHvo9RjHF9w1mnBXaNy X-Received: by 2002:a92:d104:0:b0:365:4aa1:8bd5 with SMTP id a4-20020a92d104000000b003654aa18bd5mr7738591ilb.5.1708963957206; Mon, 26 Feb 2024 08:12:37 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708963957; cv=pass; d=google.com; s=arc-20160816; b=l91ixz2x+Nnm0z87ZZHRsHBCgeh4ZahhR1vCfm80U1K0SK9R5YtWjpEAeMwQRlTUtK DZ36ZHjKFFCynZC2so+izXfwV9Ygpwhm9wKcgwq2Y6Hq5LU2MImQudEGuzfTklKHlt1W T9zzhVns8LDzMJX/4uNzEJfoQess41W+ub+SERwwuAv68Kcbm/QYmakxv1eQ9/eMzYnx PyiDlZ1uvDaP694yKeykoREsQqd99cHRGa8JSQvnid5Vpy44uFNB2ZyE3w/E6Q/ph5X8 pN+qcpSyYpos9i8uPD4/W0uqIUgZW6FgtDT9EM+Q2XQ+sKr3Ty3yRafyO6XPS1+RDkxc I54Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=DPyeDZVIIyCXYt9YL0uYOoqOy56Yb0xUHXTmaer34Ao=; fh=+msxEfBangtTom/KIp4Ol6DYP2aPNOVVeBlCB4M+0E0=; b=aoTjUORR3pIr6fLflDWZpzs91nVRCqmyKoRv8AZOJqZBWtxDi/vx8YmKSAsDf+CJ8M D+8KYY937o24PKHWhdaOEI9yuL7OlU3fHEOLmMbpdqa/LtMuM0wdNaXsN2WRmNeXUsJX AO/26KpmRPFKKuy0sKNkmAf712ecCNhHxIcif1sCfzK0jRuXMmd7UKr4UnJMSLBUSKh5 SxYpAWNjT826Jv70b5xhBGxoW5yUJg8KzpHt/6uZs2mDUjInudSgg/zX730plUVJwX9g T+mf6PBOiEzbuXX3hV9vw7Oq+/64mTTxn6uIhSnq7ujoMg2fkQM507dqQPcNdCuZOq2L vj7w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=hrsTFM81; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-81798-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-81798-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id a30-20020a631a5e000000b005dc4b7ae767si3858421pgm.459.2024.02.26.08.12.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 08:12:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-81798-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=hrsTFM81; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-81798-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-81798-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E0128B225D0 for ; Mon, 26 Feb 2024 15:27:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B54DF129A60; Mon, 26 Feb 2024 15:27:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hrsTFM81" Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EB2712AAC8 for ; Mon, 26 Feb 2024 15:27:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708961239; cv=none; b=D1etR/g06Wb6gmBIyRDUZVBE2eSHIgEe2dhOrBIKqiixPywQoEHglKmxS2i2yotAnePxHYtQN0CACrRnW8ZhL9uEVxA0hQqH93wMCefvIcUHna6ZnLCKFt3wn+epQZNtIZZeDui0z0j/+pVFK5xFHoXTs680v0bO1zSdoZxRoeM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708961239; c=relaxed/simple; bh=ujz3KPBAMBQU33eq3LN9zOuebEUgTrKcSEL6G1EBSkE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Tr8Oajn48lV76lIFWu8PpcpYktT0WjdUIE4POCkFxf7mG5ylZiBoIL3WjOEx0Of9JPF+iiboy1YlRwRSqT/GmsAmpNx4XuuSf3UO7mWCMQ77oipcqIwZe4jDbEmYCh16BajC246ZWNt6YNWH2ZskE+j+pfepEP9FJk/9txaoYUc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hrsTFM81; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dcc05887ee9so4314190276.1 for ; Mon, 26 Feb 2024 07:27:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1708961237; x=1709566037; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=DPyeDZVIIyCXYt9YL0uYOoqOy56Yb0xUHXTmaer34Ao=; b=hrsTFM81d20T/qXvuoeg1B0y+bo9Z2l/Ae7J70FNWGSCFr8wgaocjWSPcx/oobaZ/P 9YziBwwgZGPIDMB0xtl9x495rSdjPMkErpHsgNGHw7PIp8ycnapA7Hbg1ebEv+ljRNvG cNK4TuMEt9qUpnPiIOIYCtrS05jpPqfSdFvW8+QLL7aHFP7PNY+jdeM2RBHgRz21UU3g Z6X36jAA0YGrzThloQ1o6TXCsrbO15dskNw7B5osY8MgLUK571fWemUW+h51Kp9WUGwo RY72Pg6hJ4hccx3Vd1IHtPT+E0EcHIzO2q/cOK4xTpSGQqB88miegxSWleKTLjs4P+DV U0lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708961237; x=1709566037; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DPyeDZVIIyCXYt9YL0uYOoqOy56Yb0xUHXTmaer34Ao=; b=sr1bnPASIpnxKOG1PDr0QooqWhs4wQuktV4tP2BBF3sZGGpDZY5tGiqB4mzA+dKRiN TUuSTLFK6iXZDPxM/iSa2ScOb4/1ToPQpXQFTOTmHzjzVg/TcV0R9HjhLKIyCtRzQTAU zA+G3dxusEEjG+Ugv2/Ve/4iLi8joNnU3uQXARpOg9mW0yIrYEJEAzPIhhlUXBCgT8aA O7i32OEmgyNVhAZW8mEhu+yCR2y5YrdU6slbcbnSHdczl2WsRolWpgpDvzq/UFnntEX7 EkRDGUVhwwn3RZ9NfPIdqZ3GZWONBNy1JPZjekr5rhfoLfPjZYxhhtr0tmfIC4krr5qZ IxFQ== X-Forwarded-Encrypted: i=1; AJvYcCXb84kfK2ANjIk9WX+XYETruM8p0r+GEgm3M2aw4z+14wcoUheaHM5VwTwb3d0aB+252oUXqub8jkDJtgPLC0EHIHRvCmVWHnZVTMX8 X-Gm-Message-State: AOJu0YxyZYj851BmP37Yl0ViW0IDwdTd/PFd3rYhJSqlm/z3Qgs3IKLh zx6/hUPV7TmRd3rhOi1ibqUhz8I+9M3CzY8cxJUvGZKfYc9jJ998K6TJGPp2QrWAAP95YgodhvV oiA== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1245:b0:dbe:30cd:8fcb with SMTP id t5-20020a056902124500b00dbe30cd8fcbmr261623ybu.0.1708961237221; Mon, 26 Feb 2024 07:27:17 -0800 (PST) Date: Mon, 26 Feb 2024 07:27:15 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240110002340.485595-1-seanjc@google.com> <170864656017.3080257.14048100709856204250.b4-ty@google.com> Message-ID: Subject: Re: [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace From: Sean Christopherson To: Tao Su Cc: Xiaoyao Li , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yi Lai , Xudong Hao Content-Type: text/plain; charset="us-ascii" On Mon, Feb 26, 2024, Tao Su wrote: > On Mon, Feb 26, 2024 at 09:30:33AM +0800, Xiaoyao Li wrote: > > On 2/23/2024 9:35 AM, Sean Christopherson wrote: > > > On Tue, 09 Jan 2024 16:23:40 -0800, Sean Christopherson wrote: > > > > Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query > > > > whether or not the CPU supports 5-level EPT paging. EPT capabilities are > > > > enumerated via MSR, i.e. aren't accessible to userspace without help from > > > > the kernel, and knowing whether or not 5-level EPT is supported is sadly > > > > necessary for userspace to correctly configure KVM VMs. > > > > > > > > When EPT is enabled, bits 51:49 of guest physical addresses are consumed > > > > if and only if 5-level EPT is enabled. For CPUs with MAXPHYADDR > 48, KVM > > > > *can't* map all legal guest memory if 5-level EPT is unsupported, e.g. > > > > creating a VM with RAM (or anything that gets stuffed into KVM's memslots) > > > > above bit 48 will be completely broken. > > > > > > > > [...] > > > > > > Applied to kvm-x86 vmx, with a massaged changelog to avoid presenting this as a > > > bug fix (and finally fixed the 51:49=>51:48 goof): > > > > > > Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query > > > whether or not the CPU supports 5-level EPT paging. EPT capabilities are > > > enumerated via MSR, i.e. aren't accessible to userspace without help from > > > the kernel, and knowing whether or not 5-level EPT is supported is useful > > > for debug, triage, testing, etc. > > > For example, when EPT is enabled, bits 51:48 of guest physical addresses > > > are consumed by the CPU if and only if 5-level EPT is enabled. For CPUs > > > with MAXPHYADDR > 48, KVM *can't* map all legal guest memory if 5-level > > > EPT is unsupported, making it more or less necessary to know whether or > > > not 5-level EPT is supported. > > > > > > [1/1] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace > > > https://github.com/kvm-x86/linux/commit/b1a3c366cbc7 > > > > Do we need a new KVM CAP for this? This decides how to interact with old > > kernel without this patch. In that case, no ept_5level in /proc/cpuinfo, > > what should we do in the absence of ept_5level? treat it only 4 level EPT > > supported? > > Maybe also adding flag for 4-level EPT can be an option. If userspace > checks both 4-level and 5-level are not in /proc/cpuinfo, it can regard > the kernel as old. The intent is that this is informational only, not something that userspace can or should use to make decisions about how to configure KVM guests. As pointed out elsewhere in the thread, simply restricting guest.MAXPHYADDR to 48 doesn't actually create an architecturally viable VM. At the very least, KVM needs to be configured with allow_smaller_maxphyaddr=1, and aside from the gaping holes in KVM related to that knob, AIUI allow_smaller_maxphyaddr=1 isn't an option in this case due to other quirks/flaws with the CPU in question. I don't think there's been an on-list summary posted, but the plan is to figure out a way to inform guest firmware of the max _usable_ physical address, so that firmware doesn't create BARs and whatnot in memory that KVM can't map. And then have KVM relay the usuable guest.MAXPHYADDR to userspace. That way userspace doesn't need to infer the effective guest.MAXPHYADDR from EPT knobs.