Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1094062rdd; Wed, 10 Jan 2024 08:26:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IETlIupVIRTL+PofLfMoTiXoWL5hO6HGNdfWNIbUELnmLXGjQDSbMy4Awo30T+Uut/yCde8 X-Received: by 2002:a17:907:585:b0:a23:6b6e:22e with SMTP id vw5-20020a170907058500b00a236b6e022emr370173ejb.80.1704903998434; Wed, 10 Jan 2024 08:26:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704903998; cv=none; d=google.com; s=arc-20160816; b=KZPzwjMMPUISwLzwJ1JXT0UZxWbitdRstvDjVrfcll5LgL+QkSqxV5P+59lZKbtoDr cnfO5uLL5oRjs/kH9bcAMbkCdsngqlJW9T0nr8vTmJWE4I2uQs0/LLn9C9P0xcCuxgfZ V/BU3qobZaisoZL5H1cDFWMohSklof8CrRWMm2+f05wlxVwScEMsqRf29oIAYKbTTWsm JGvybu9a+YiOwGWg23Adu61bkJEDXoX63953CPLEilpkxYMQy9ohF6lehQrttwckFrCJ mOWKlZENxDK0Ui8k1NM5i2svcuvDBeb5wOQgralfUYznWRS+hGfccCsW018fJ0Qu3yt+ uWUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=KFFHpaSTADtH/VEIQF/AhIPKZkXPJShFtYY++SD4R64=; fh=NsVTenkD7yio+N1aALnSORyBvPDpoUDYwWn3JQD5+T0=; b=yhoNs5395ZrbTgoTfAGVraNZtmAYceip6m6Ucv637BovPjRHbbRAIJqnFGJMJ5Xyja vxHBYbpOHm4q6EwQfVaVal6DcKXv+MiXihiQz4NBMZMXdiq0wFUQ/uasXtSVm61phBnw SQqW7McGIdxR9Vyzkl0P/crFxMO6YGNrWqyNDZaiteHsT14kgavJb6wTFMfyKdBBsyg5 zo88S31mwjfL5wt3wO2s7PQFRrIEtPwVxZQ8ymbIpNwCdDNJt1dC5ZimJVXH4d086tod 0fJX4kgah62SnldZ8hHxqoVmJBufWHfRI+0NEsFfSOv9aac8oEjv1NlsnC8+5b/P1G06 blJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=SWdmK6B1; spf=pass (google.com: domain of linux-kernel+bounces-22494-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22494-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id ka7-20020a170907990700b00a2c21c39024si69710ejc.224.2024.01.10.08.26.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Jan 2024 08:26:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-22494-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=SWdmK6B1; spf=pass (google.com: domain of linux-kernel+bounces-22494-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22494-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 2A5121F22743 for ; Wed, 10 Jan 2024 16:26:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5AB934CDFE; Wed, 10 Jan 2024 16:26:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SWdmK6B1" Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C13A4CDE9 for ; Wed, 10 Jan 2024 16:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-5f6f51cd7e8so58740817b3.1 for ; Wed, 10 Jan 2024 08:26:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704903987; x=1705508787; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KFFHpaSTADtH/VEIQF/AhIPKZkXPJShFtYY++SD4R64=; b=SWdmK6B1IFOTm/tk1owjnCf51Y+5eBEJMU15aS5lYr7kuiJmJ4Wv5pYSyCZQ3ipiXy teQ1K9N2wLauvoRiaXGwXtrrreb6Bh3YtWGpz7vy8jQpDlZ03FXqmRv/RmLqFpSfjnIK 9cQlhiAcn6YHEWw852toBlxRH7pJbd4/sSPKTgCArw1nneQKLZo0K33+M1ePIWLua6N0 IxT6eaZ7fxhbYPmDAFBPm7U9EkJc3CT9lHCHFaUCTSSJG6pePjripc5csD1UDZ+cNoBl uucjTtC2SLM4TfEGWvvsR0pjAeDA23UThrTkrMfHTqu06QrSy5Uf7Ou3+X+Wma2okOlE cHPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704903987; x=1705508787; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KFFHpaSTADtH/VEIQF/AhIPKZkXPJShFtYY++SD4R64=; b=xO0ybfJK7gbNWw4VFD3EzlZxsPXjhRMOoCskyoR1O/wVEmdJgdCbThhgooNNPFGwex HKLtv0UZeQgzBC/aD+z/P5/2GH9czHx4JAqRajGw8V4m7dVEi0RMKiKB1RAsIVRVUzf1 ZSTH9Y3Nxj2U+hok3hdOV/5ivuFKkKuxm7RXCOMelGcfdW+91C831uYAE7OIUIpHMXwD 1rUumgOOTQYC93rYleEIw4vmJcyfNRYCFilT7q6yBs15gtJgq/OLAmRTV+dJ8Ah7eRdd Tz/0ELmz5Nt/kKqkHAvWhQo39etGhu/0iqMuLJ9NWS3c+91qIwErEpBfJYubF29NpGZe hh2w== X-Gm-Message-State: AOJu0Yw3x+JU1V/u6r3XXkpJmCgIGvgynBcEAN+kNxpWxvI3scM1fkEp UbfU3uoE6L26ISUOSNrMF6azz+xSzglXMmhKLA== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:134a:b0:d9a:efcc:42af with SMTP id g10-20020a056902134a00b00d9aefcc42afmr48907ybu.2.1704903987151; Wed, 10 Jan 2024 08:26:27 -0800 (PST) Date: Wed, 10 Jan 2024 08:26:25 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240110002340.485595-1-seanjc@google.com> Message-ID: Subject: Re: [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace From: Sean Christopherson To: Chao Gao Cc: Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yi Lai , Tao Su , Xudong Hao Content-Type: text/plain; charset="us-ascii" On Wed, Jan 10, 2024, Chao Gao wrote: > On Tue, Jan 09, 2024 at 04:23:40PM -0800, Sean Christopherson wrote: > >Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query > >whether or not the CPU supports 5-level EPT paging. EPT capabilities are > >enumerated via MSR, i.e. aren't accessible to userspace without help from > >the kernel, and knowing whether or not 5-level EPT is supported is sadly > >necessary for userspace to correctly configure KVM VMs. > > This assumes procfs is enabled in Kconfig and userspace has permission to > access /proc/cpuinfo. But it isn't always true. So, I think it is better to > advertise max addressable GPA via KVM ioctls. Hrm, so the help for PROC_FS says: Several programs depend on this, so everyone should say Y here. Given that this is working around something that is borderline an erratum, I'm inclined to say that userspace shouldn't simply assume the worst if /proc isn't available. Practically speaking, I don't think a "real" VM is likely to be affected; AFAIK, there's no reason for QEMU or any other VMM to _need_ to expose a memslot at GPA[51:48] unless the VM really has however much memory that is (hundreds of terabytes?). And a if someone is trying to run such a massive VM on such a goofy CPU... I don't think it's unreasonable for KVM selftests to require access to /proc/cpuinfo. Or actually, they can probably do the same thing and self-limit to 48-bit addresses if /proc/cpuinfo isn't available. I'm not totally opposed to adding a more programmatic way for userspace to query 5-level EPT support, it just seems unnecessary. E.g. unlike CPUID, userspace can't directly influence whether or not KVM uses 5-level EPT. Even in hindsight, I'm not entirely sure KVM should expose such a knob, as it raises questions around interactions guest.MAXPHYADDR and memslots that I would rather avoid. And even if we do add such uAPI, enumerating 5-level EPT in /proc/cpuinfo is definitely worthwhile, the only thing that would need to be tweaked is the justification in the changelog. One thing we can do irrespective of feature enumeration is have kvm_mmu_page_fault() exit to userspace with an explicit error if the guest faults ona GPA that KVM knows it can't map, i.e. exit with KVM_EXIT_INTERNAL_ERROR or maybe even KVM_EXIT_MEMORY_FAULT instead of looping indefinitely.