Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5896923rdb; Thu, 14 Dec 2023 02:54:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IHKI/ZzfD1N9XXIexfd3/2FGHKtBzpd45dIsFlb2/fVGOlBenWzWbRmnNBFT9tXNksXGRMn X-Received: by 2002:a81:5b85:0:b0:5d6:85f1:1539 with SMTP id p127-20020a815b85000000b005d685f11539mr9043414ywb.18.1702551259270; Thu, 14 Dec 2023 02:54:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702551259; cv=none; d=google.com; s=arc-20160816; b=h7kENibJ8vcO4Dl35sLKjj9yByqvMIlmL8ScvGFOB6yLLPNKR/b0/OzyqEuIvAvz9y czxoJWOJR/FcF2JP4KDAvQJAb5aePBPZJfAsiqJZNPoUqxE67rajWvUNYu49ANs4LPlb KGIjTd1AkQCLhY5bpnrE4IzPmzFceWaEhIrvZCjPzyA0nZE7tpxxUzNwYttUSXfekxJV wZaAPmEJXLiNs/iF4uzy6PcEIhfgl8UW4ibceWxjyGZt9rlUxE+X2lIMj5LBCcYovceQ mvpbcvgyXuO0bgqba9voLnC1YbEzrba30577haVF+Dw7Ir4Mhc2br3XZi/QGcTZGLKNW OUUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=MnhJLfTgIa3SLXx4xPA9543TtdSKVB5MmnU2aw58KwM=; fh=PVnJm1mlbf+0F6VB+YjO5Ca1uWhLFa/z1SAAfpgdlHE=; b=oS3MTKW0SwgqdYF42zWsMpBL3bDyImdezOHI3ymIBH0g4bsCkEG63eVs1iBatiMbXg SoGLOmpujCUpFXHvdm6qimFykRQXzXqdDgnmPWGcSn7Ymfob56o5jmHiUO+I/snutRmi CMMQpa/TpYFs9yNvrGH3mGoP/nNDf2ig6WT2GUtYW2oFVLHoBqhF9lHE/zWDTzdkFExL Zio5WnePFq1KinIjFg2WWj46pB/nTAPmBE/W8XQZN0OMpzMtESdUTD0DaflgS80rseYT XXKKccFNEJYzGbI4qlr2l7f4A+9NE9jjqul2iJvIJhOSS33gYCTiQMPL+O6MXiXLSxl0 N0LA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="O4NTfBY/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id h14-20020a170902f54e00b001d00a56b03asi11375230plf.337.2023.12.14.02.54.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 02:54:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="O4NTfBY/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id BB3088142DB3; Thu, 14 Dec 2023 02:54:14 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443947AbjLNKxr (ORCPT + 99 others); Thu, 14 Dec 2023 05:53:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443775AbjLNKxS (ORCPT ); Thu, 14 Dec 2023 05:53:18 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC5BF118 for ; Thu, 14 Dec 2023 02:53:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702551191; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=MnhJLfTgIa3SLXx4xPA9543TtdSKVB5MmnU2aw58KwM=; b=O4NTfBY/aNbDhS5U0AzZYyhTzQhNSzQejM/SIBfc0utcvRQFk6BbvYqsNFPWw3RBQyktlF kC5Szle2HbR77ffMyF1EgRaSYf3Gp/Q2RtH5b3psblfLf/IoEpzaLP/4ToFHFxH4MeHzJ6 XtaerW6iJ8D/2s2kmWxpPrdnBdz/UoM= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-351-yFxpNAI4PVOmNByIEKsYcw-1; Thu, 14 Dec 2023 05:53:09 -0500 X-MC-Unique: yFxpNAI4PVOmNByIEKsYcw-1 Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-50bf00775ecso7862630e87.0 for ; Thu, 14 Dec 2023 02:53:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702551188; x=1703155988; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MnhJLfTgIa3SLXx4xPA9543TtdSKVB5MmnU2aw58KwM=; b=Bxwsujb2HFY9jY/QL09gf1Rf+wBFydtIeOJj1O8NrOsIcNRLpnbOjKJZn6rX2f2B0g 4eAV8Ao9HVR8kkfmNJVKak85Ns2GQo1X7cuIzW2i7WFA6nHQ5FOfWOu/Tum2VJfK7sPn rEy6wJZ1abb4hNtE/71/nQC9WoGlweHudztyA5+sBbpB9tmoADMWDqOKMnvnwRTFJ82W eILNkSwd9JlXtOpO8D0QyCUb2a1FMoSfgcMLkazJKwqfj55aHYXxdPAxiUh17VllyOke l2/XFGetU9DIRBgHv2cRoMTO5qxyR1uNgPaMWlRDYErY7ZvYntyzdoWLGeKhqat7hpQB mjiQ== X-Gm-Message-State: AOJu0Yx3YeU1ECrOdL71UQ1Yaavn2kEMmZKNdiNKmNOr6RtDUezG2+On YiJQSlW1kSRJuZXJznhB3d7PcLoVB/uLmWWUJjUj2qR7urbnegNSXAyOome3SncSN3knLn0JVyJ wsg0rl/CZyLqpa7He2xcj6Ai+ X-Received: by 2002:ac2:5bc6:0:b0:50b:f7c1:e560 with SMTP id u6-20020ac25bc6000000b0050bf7c1e560mr4949393lfn.64.1702551188437; Thu, 14 Dec 2023 02:53:08 -0800 (PST) X-Received: by 2002:ac2:5bc6:0:b0:50b:f7c1:e560 with SMTP id u6-20020ac25bc6000000b0050bf7c1e560mr4949372lfn.64.1702551188062; Thu, 14 Dec 2023 02:53:08 -0800 (PST) Received: from fedora (g2.ign.cz. [91.219.240.8]) by smtp.gmail.com with ESMTPSA id g13-20020a056000118d00b003333d46a9e8sm15714036wrx.56.2023.12.14.02.53.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 02:53:07 -0800 (PST) From: Vitaly Kuznetsov To: "Vineeth Pillai (Google)" Cc: Suleiman Souhlal , Masami Hiramatsu , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, Joel Fernandes , Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , "H . Peter Anvin" , Ingo Molnar , Juri Lelli , Mel Gorman , Paolo Bonzini , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Wanpeng Li Subject: Re: [RFC PATCH 1/8] kvm: x86: MSR for setting up scheduler info shared memory In-Reply-To: <20231214024727.3503870-2-vineeth@bitbyteword.org> References: <20231214024727.3503870-1-vineeth@bitbyteword.org> <20231214024727.3503870-2-vineeth@bitbyteword.org> Date: Thu, 14 Dec 2023 11:53:06 +0100 Message-ID: <877clhkqct.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 14 Dec 2023 02:54:15 -0800 (PST) "Vineeth Pillai (Google)" writes: > Implement a kvm MSR that guest uses to provide the GPA of shared memory > for communicating the scheduling information between host and guest. > > wrmsr(0) disables the feature. wrmsr(valid_gpa) enables the feature and > uses the gpa for further communication. > > Also add a new cpuid feature flag for the host to advertise the feature > to the guest. > > Co-developed-by: Joel Fernandes (Google) > Signed-off-by: Joel Fernandes (Google) > Signed-off-by: Vineeth Pillai (Google) > --- > arch/x86/include/asm/kvm_host.h | 25 ++++++++++++ > arch/x86/include/uapi/asm/kvm_para.h | 24 +++++++++++ > arch/x86/kvm/Kconfig | 12 ++++++ > arch/x86/kvm/cpuid.c | 2 + > arch/x86/kvm/x86.c | 61 ++++++++++++++++++++++++++++ > include/linux/kvm_host.h | 5 +++ > 6 files changed, 129 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index f72b30d2238a..f89ba1f07d88 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -987,6 +987,18 @@ struct kvm_vcpu_arch { > /* Protected Guests */ > bool guest_state_protected; > > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > + /* > + * MSR to setup a shared memory for scheduling > + * information sharing between host and guest. > + */ > + struct { > + enum kvm_vcpu_boost_state boost_status; > + u64 msr_val; > + struct gfn_to_hva_cache data; > + } pv_sched; > +#endif > + > /* > * Set when PDPTS were loaded directly by the userspace without > * reading the guest memory > @@ -2217,4 +2229,17 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); > */ > #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) > > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > +static inline bool kvm_arch_vcpu_pv_sched_enabled(struct kvm_vcpu_arch *arch) > +{ > + return arch->pv_sched.msr_val; > +} > + > +static inline void kvm_arch_vcpu_set_boost_status(struct kvm_vcpu_arch *arch, > + enum kvm_vcpu_boost_state boost_status) > +{ > + arch->pv_sched.boost_status = boost_status; > +} > +#endif > + > #endif /* _ASM_X86_KVM_HOST_H */ > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h > index 6e64b27b2c1e..6b1dea07a563 100644 > --- a/arch/x86/include/uapi/asm/kvm_para.h > +++ b/arch/x86/include/uapi/asm/kvm_para.h > @@ -36,6 +36,7 @@ > #define KVM_FEATURE_MSI_EXT_DEST_ID 15 > #define KVM_FEATURE_HC_MAP_GPA_RANGE 16 > #define KVM_FEATURE_MIGRATION_CONTROL 17 > +#define KVM_FEATURE_PV_SCHED 18 > > #define KVM_HINTS_REALTIME 0 > > @@ -58,6 +59,7 @@ > #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 > #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 > #define MSR_KVM_MIGRATION_CONTROL 0x4b564d08 > +#define MSR_KVM_PV_SCHED 0x4b564da0 > > struct kvm_steal_time { > __u64 steal; > @@ -150,4 +152,26 @@ struct kvm_vcpu_pv_apf_data { > #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK > #define KVM_PV_EOI_DISABLED 0x0 > > +/* > + * VCPU boost state shared between the host and guest. > + */ > +enum kvm_vcpu_boost_state { > + /* Priority boosting feature disabled in host */ > + VCPU_BOOST_DISABLED = 0, > + /* > + * vcpu is not explicitly boosted by the host. > + * (Default priority when the guest started) > + */ > + VCPU_BOOST_NORMAL, > + /* vcpu is boosted by the host */ > + VCPU_BOOST_BOOSTED > +}; > + > +/* > + * Structure passed in via MSR_KVM_PV_SCHED > + */ > +struct pv_sched_data { > + __u64 boost_status; > +}; > + > #endif /* _UAPI_ASM_X86_KVM_PARA_H */ > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 89ca7f4c1464..dbcba73fb508 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -141,4 +141,16 @@ config KVM_XEN > config KVM_EXTERNAL_WRITE_TRACKING > bool > > +config PARAVIRT_SCHED_KVM > + bool "Enable paravirt scheduling capability for kvm" > + depends on KVM > + help > + Paravirtualized scheduling facilitates the exchange of scheduling > + related information between the host and guest through shared memory, > + enhancing the efficiency of vCPU thread scheduling by the hypervisor. > + An illustrative use case involves dynamically boosting the priority of > + a vCPU thread when the guest is executing a latency-sensitive workload > + on that specific vCPU. > + This config enables paravirt scheduling in the kvm hypervisor. > + > endif # VIRTUALIZATION > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 7bdc66abfc92..960ef6e869f2 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -1113,6 +1113,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > (1 << KVM_FEATURE_POLL_CONTROL) | > (1 << KVM_FEATURE_PV_SCHED_YIELD) | > (1 << KVM_FEATURE_ASYNC_PF_INT); > + if (IS_ENABLED(CONFIG_PARAVIRT_SCHED_KVM)) > + entry->eax |= (1 << KVM_FEATURE_PV_SCHED); > > if (sched_info_on()) > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 7bcf1a76a6ab..0f475b50ac83 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3879,6 +3879,33 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > return 1; > break; > > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > + case MSR_KVM_PV_SCHED: > + if (!guest_pv_has(vcpu, KVM_FEATURE_PV_SCHED)) > + return 1; > + > + if (!(data & KVM_MSR_ENABLED)) > + break; > + > + if (!(data & ~KVM_MSR_ENABLED)) { > + /* > + * Disable the feature > + */ > + vcpu->arch.pv_sched.msr_val = 0; > + kvm_set_vcpu_boosted(vcpu, false); > + } if (!kvm_gfn_to_hva_cache_init(vcpu->kvm, > + &vcpu->arch.pv_sched.data, data & ~KVM_MSR_ENABLED, > + sizeof(struct pv_sched_data))) { > + vcpu->arch.pv_sched.msr_val = data; > + kvm_set_vcpu_boosted(vcpu, false); > + } else { > + pr_warn("MSR_KVM_PV_SCHED: kvm:%p, vcpu:%p, " > + "msr value: %llx, kvm_gfn_to_hva_cache_init failed!\n", > + vcpu->kvm, vcpu, data & ~KVM_MSR_ENABLED); As this is triggerable by the guest please drop this print (which is not even ratelimited!). I think it would be better to just 'return 1;' in case of kvm_gfn_to_hva_cache_init() failure but maybe you also need to account for 'msr_info->host_initiated' to not fail setting this MSR from the host upon migration. > + } > + break; > +#endif > + > case MSR_KVM_POLL_CONTROL: > if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL)) > return 1; > @@ -4239,6 +4266,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > msr_info->data = vcpu->arch.pv_eoi.msr_val; > break; > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > + case MSR_KVM_PV_SCHED: > + msr_info->data = vcpu->arch.pv_sched.msr_val; > + break; > +#endif > case MSR_KVM_POLL_CONTROL: > if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL)) > return 1; > @@ -9820,6 +9852,29 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu) > return kvm_skip_emulated_instruction(vcpu); > } > > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > +static void record_vcpu_boost_status(struct kvm_vcpu *vcpu) > +{ > + u64 val = vcpu->arch.pv_sched.boost_status; > + > + if (!kvm_arch_vcpu_pv_sched_enabled(&vcpu->arch)) > + return; > + > + pagefault_disable(); > + kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.pv_sched.data, > + &val, offsetof(struct pv_sched_data, boost_status), sizeof(u64)); > + pagefault_enable(); > +} > + > +void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted) > +{ > + kvm_arch_vcpu_set_boost_status(&vcpu->arch, > + boosted ? VCPU_BOOST_BOOSTED : VCPU_BOOST_NORMAL); > + > + kvm_make_request(KVM_REQ_VCPU_BOOST_UPDATE, vcpu); > +} > +#endif > + > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > { > unsigned long nr, a0, a1, a2, a3, ret; > @@ -10593,6 +10648,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > } > if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) > record_steal_time(vcpu); > + > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > + if (kvm_check_request(KVM_REQ_VCPU_BOOST_UPDATE, vcpu)) > + record_vcpu_boost_status(vcpu); > +#endif > + > #ifdef CONFIG_KVM_SMM > if (kvm_check_request(KVM_REQ_SMI, vcpu)) > process_smi(vcpu); > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 9d3ac7720da9..a74aeea55347 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -167,6 +167,7 @@ static inline bool is_error_page(struct page *page) > #define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > #define KVM_REQ_UNBLOCK 2 > #define KVM_REQ_DIRTY_RING_SOFT_FULL 3 > +#define KVM_REQ_VCPU_BOOST_UPDATE 6 > #define KVM_REQUEST_ARCH_BASE 8 > > /* > @@ -2287,4 +2288,8 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) > /* Max number of entries allowed for each kvm dirty ring */ > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > +#ifdef CONFIG_PARAVIRT_SCHED_KVM > +void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted); > +#endif > + > #endif -- Vitaly