Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp734731rdb; Thu, 30 Nov 2023 17:52:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IGyoVASWeMkpphJHhskxWmM09YAaZUYun4HO2UGT4rbFEGvoGQoZ6KBZ7kLDzvh22OgYjWo X-Received: by 2002:a05:6a20:8f01:b0:18c:a983:a5f6 with SMTP id b1-20020a056a208f0100b0018ca983a5f6mr16998153pzk.23.1701395520257; Thu, 30 Nov 2023 17:52:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701395520; cv=none; d=google.com; s=arc-20160816; b=igVPBjJS7lvOSrnK7GZq8fy7te4Yxg8CPNUPgF6KpUzcQbEbRPeNDQJC3e3PEhPCAL 3+tkRFhKfiyb6VYbKP8DKxxquLXqEY9NAuL2GZ/8JG5BPliFvKoJGuj7C/y92wPqDM52 eAW8UWq4C9L3vhYWTQfm3lHdVS+V2hGGEBB4GmzpHZdWU8Rqbg/XTEPVVaZTkpy7SV5y zggWNtBaeFRuA30UhaNGs6HexDzGhxW6Sa6imF2AoTxpIr3PZIFkHNrI9XT0delCxxsl SXtxMKngTqHQLe1gYzKBKpGzJxdwk7Gm088xM1pma+MRN5sgASvuzCx9z9irx98ZBc2i TQZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=byu1akO/sublqF94hO2zFUsUyjhsuxmNz1CACi7ldr8=; fh=Qi9ktl1QVzt6ga3awAfqep4MfgaA1etYvnwPMZ9DX5k=; b=WpsfkBkQ+uIZ/geBfLdRV0TZKF76I+QDLEvlcVeZs1CJml6tcrwvXKSHy2YkpW0EJc I6QuvSnTpoiYNIzDWmwhCKch0aI5jfaISwY6dBZTIWHXwjGHopZ1Kvn+zDT0LnXXtFWz 9Vbv2U4lRS+//ZIyhGQ4X/91Xa2YpcSK5EDVgLhKMnHcKpPe20fpV4rwNm58co8tCaIy 1LJi9EbxCGVZaiISJMCJgSpKGsMp/soXmRiDyX9d/NzyZ8KkI85XyfZKTronmjzWjAcJ xYkrHCHpXALynDK4/tyA1eig/HHpqgUbHQEDxHySo+0Rjdp16NSI5XD9GwCdPE/sac2x pq4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=M85+qU07; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id lr10-20020a17090b4b8a00b002859a8aee84si2497193pjb.172.2023.11.30.17.51.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 17:52:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=M85+qU07; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id C98DD809BE70; Thu, 30 Nov 2023 17:51:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229808AbjLABvZ (ORCPT + 99 others); Thu, 30 Nov 2023 20:51:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229781AbjLABvY (ORCPT ); Thu, 30 Nov 2023 20:51:24 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D9E6F2 for ; Thu, 30 Nov 2023 17:51:30 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-daf6c1591d5so73682276.1 for ; Thu, 30 Nov 2023 17:51:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701395489; x=1702000289; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=byu1akO/sublqF94hO2zFUsUyjhsuxmNz1CACi7ldr8=; b=M85+qU07mok2nKpVVS14I5/AtqsAHKxCiAisoAz/blADIlHI2KJuZnWlksqI7T0OFR ue9p2JozxL6HfwbULIlUlYsDubgFWhrX8jlx8uSltjobn6e39yR/XbvjHPj7jWbirtsM FLMHB3l4AttMtV/2aD7eTyT7e6K/jhyYxwlyJgMRYDYFlyJ+2oWIp9t2w97ErpOh276B vj7KmggnSosFdcDI5GdnBt+R7yjQ62Y0GyUDamDfG3FCcUNT4pIdQUbAvhadQDVyqt+d HhEhhTj3GpHJOgQNYBcIa6yWrXGzq6emei+G1BbWPvFAZKbrZbYn8FHahVUPCdEjQuZs dgig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701395489; x=1702000289; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=byu1akO/sublqF94hO2zFUsUyjhsuxmNz1CACi7ldr8=; b=Hnxyf+nIXoNHaATh20RDlxxj4KgDUvrOm/FArnp2G8E60ZX3zuv6VpVoTLi/ZKopRA k6FeQgQDG81cLrnXjdQmxVqGF5gV9sFlq7liDY3iqS97vy3VQob1CXemoYVY3BNrVkoW K0FvmtJxgrTrT9wtXUmUn3bHTvss1ViPpEXO8GvZoSLJOHFxAGWivAbKZmtCEkBxbTUC xr5vDA+8HeuxPzrukqu+lQTmYYXHtGqnVm+aelaGqz4nAM/xY+Gf2SOrvgNQXYpZB88s qyF/5AtKGUgu1Mx3OHr3wrLVJ/Hg/c6WuNgQO4WqAaDzae4d+yiiaYxhS3vZGpOoezY6 4SHg== X-Gm-Message-State: AOJu0Ywro/6Tu5AZnYkRy+EtSl6d3snKZ1iRnyJjuTY8xnuGNiJUcEfg +E/nOuKjOEwYUh7RsbGsz+qSAbw7DHA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:a3e6:0:b0:db4:5d34:fa5 with SMTP id e93-20020a25a3e6000000b00db45d340fa5mr754053ybi.0.1701395489584; Thu, 30 Nov 2023 17:51:29 -0800 (PST) Date: Thu, 30 Nov 2023 17:51:28 -0800 In-Reply-To: <3ad69657ba8e1b19d150db574193619cf0cb34df.camel@redhat.com> Mime-Version: 1.0 References: <20231110235528.1561679-1-seanjc@google.com> <20231110235528.1561679-4-seanjc@google.com> <3ad69657ba8e1b19d150db574193619cf0cb34df.camel@redhat.com> Message-ID: Subject: Re: [PATCH 3/9] KVM: x86: Initialize guest cpu_caps based on guest CPUID From: Sean Christopherson To: Maxim Levitsky Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 30 Nov 2023 17:51:48 -0800 (PST) On Sun, Nov 19, 2023, Maxim Levitsky wrote: > On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > > +/* > > + * This isn't truly "unsafe", but all callers except kvm_cpu_after_set_cpuid() > > + * should use __cpuid_entry_get_reg(), which provides compile-time validation > > + * of the input. > > + */ > > +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) > > +{ > > + switch (reg) { > > + case CPUID_EAX: > > + return entry->eax; > > + case CPUID_EBX: > > + return entry->ebx; > > + case CPUID_ECX: > > + return entry->ecx; > > + case CPUID_EDX: > > + return entry->edx; > > + default: > > + WARN_ON_ONCE(1); > > + return 0; > > + } > > +} ... > > static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > { > > struct kvm_lapic *apic = vcpu->arch.apic; > > struct kvm_cpuid_entry2 *best; > > bool allow_gbpages; > > + int i; > > > > - memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > + BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); > > + > > + /* > > + * Reset guest capabilities to userspace's guest CPUID definition, i.e. > > + * honor userspace's definition for features that don't require KVM or > > + * hardware management/support (or that KVM simply doesn't care about). > > + */ > > + for (i = 0; i < NR_KVM_CPU_CAPS; i++) { > > + const struct cpuid_reg cpuid = reverse_cpuid[i]; > > + > > + best = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); > > + if (best) > > + vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(best, cpuid.reg); > > Why not just use __cpuid_entry_get_reg? > > cpuid.reg comes from read/only 'reverse_cpuid' anyway, and in fact > it seems that all callers of __cpuid_entry_get_reg, take the reg value from > x86_feature_cpuid() which also takes it from 'reverse_cpuid'. > > So if the compiler is smart enough to not complain in these cases, I don't > see why this case is different. It's because the input isn't a compile-time constant, and so the BUILD_BUG() in the default path will fire. All of the compile-time assertions in reverse_cpuid.h rely on the feature being a constant value, which allows the compiler to optimize away the dead paths, i.e. turn __cpuid_entry_get_reg()'s switch statement into simple pointer arithmetic and thus omit the BUILD_BUG() code. > Also why not to initialize guest_caps = host_caps & userspace_cpuid? > > If this was the default we won't need any guest_cpu_cap_restrict and such, > instead it will just work. Hrm, I definitely like the idea. Unfortunately, unless we do an audit of all ~120 uses of guest_cpuid_has(), restricting those based on kvm_cpu_caps might break userspace. Aside from purging the governed feature nomenclature, the main goal of this series provide a way to do fast lookups of all known guest CPUID bits without needing to opt-in on a feature-by-feature basis, including for features that are fully controlled by userspace. It's definitely doable, but I'm not all that confident that the end result would be a net positive, e.g. I believe we would need to special case things like the feature bits that gate MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. MOVBE and RDPID are other features that come to mind, where KVM emulates the feature in software but it won't be set in kvm_cpu_caps. Oof, and MONITOR and MWAIT too, as KVM deliberately doesn't advertise those to userspace. So yeah, I'm not opposed to trying that route at some point, but I really don't want to do that in this series as the risk of subtly breaking something is super high. > Special code will only be needed in few more complex cases, like forced exposed > of a feature to a guest due to a virtualization hole. > > > > + else > > + vcpu->arch.cpu_caps[i] = 0; > > + } > > > > /* > > * If TDP is enabled, let the guest use GBPAGES if they're supported in > > @@ -342,8 +380,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > */ > > allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) : > > guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES); > > - if (allow_gbpages) > > - guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); > > + guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages); > > IMHO the original code was more readable, now I need to look up the > 'guest_cpu_cap_change()' to understand what is going on. The change is "necessary". The issue is that with the caps 0-initialied, the !allow_gbpages could simply do nothing. Now, KVM needs to explicitly clear the flag, i.e. would need to do: if (allow_gbpages) guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); else guest_cpu_cap_clear(vcpu, X86_FEATURE_GBPAGES); I don't much love the name either, but it pairs with cpuid_entry_change() and I want to keep the kvm_cpu_cap, cpuid_entry, and guest_cpu_cap APIs in sync as far as the APIs go. The only reason kvm_cpu_cap_change() doesn't exist is because there aren't any flows that need to toggle a bit. > > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 8a99a73b6ee5..5827328e30f1 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -4315,14 +4315,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > * XSS on VM-Enter/VM-Exit. Failure to do so would effectively give > > * the guest read/write access to the host's XSS. > > */ > > - if (boot_cpu_has(X86_FEATURE_XSAVE) && > > - boot_cpu_has(X86_FEATURE_XSAVES) && > > - guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > > - guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES); > > + guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES, > > + boot_cpu_has(X86_FEATURE_XSAVE) && > > + boot_cpu_has(X86_FEATURE_XSAVES) && > > + guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)); > > In theory this change does change behavior, now the X86_FEATURE_XSAVE will > be set iff the condition is true, but before it was set *if* the condition was true. No, before it was set if and only if the condition was true, because in that case caps were 0-initialized, i.e. this was/is the only way for XSAVE to be set. > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV); > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_NRIPS); > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_TSCRATEMSR); > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_LBRV); > > One of the main reasons I don't like governed features is this manual list. To be fair, the manual lists predate the governed features. > I want to reach the point that one won't need to add anything manually, > unless there is a good reason to do so, and there are only a few exceptions > when the guest cap is set, while the host's isn't. Yeah, agreed.