Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2726709lqz; Wed, 3 Apr 2024 07:02:08 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWsdGVOOz4rsmoKvA+09LhjP1FDaGAtr9fF9GsfcXjXW1iyUqugtA8pNqi6OGv1ESbdmylswBu1rOgXqdIhoEXldnF9TeuAjJO7LFejsg== X-Google-Smtp-Source: AGHT+IEZK9D53FVUyjArCKsVwQGyiN9J/sOSJEuzCD0V9a9gW9yH0uGfhEGLth1eGeyqWDRHigSJ X-Received: by 2002:a50:a45d:0:b0:56e:ddc:17ad with SMTP id v29-20020a50a45d000000b0056e0ddc17admr671312edb.30.1712152928324; Wed, 03 Apr 2024 07:02:08 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712152928; cv=pass; d=google.com; s=arc-20160816; b=pvt+vSM3qvVcLNNzJwOZLimN+dJaHyg56u+y8d5YNMyGNBTlu3MWBFMrg8p11+noy+ IJBOFwd0HRWEAbnZ8XMEx2wnQE8Qn1Yt/b9Ll1ovx0bAoAXh9W3jMzP6lTUHtX1FDYCh XBGDjspp93XGUeVktGN7/ZzCT78RQdIlBJ4KfiMGQ18QkRiegVQJMjcfB2R6YR/kfzNH tO0cejoSu2t9r++M9CQ/5gjXhLey8LhtQsfJt4QJVz28kFF3MYvnAOnAhoBeuSek2kG/ H4KWJtZihKqnnx1hfvVXEdBU7A6YptaluUmGFZCjroOoZWpnXyC6hHj/cHZ4+KNIoW/f zh1Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=/BPGVzMI1+eo9yvIAgnlTxqk+J0LKPxHXNZYDNRJA58=; fh=UIhv0EX85kTEVIiuSTxJYTZgmg/YFop0rTo8N6Vy5E8=; b=sXAJoYdcvxFrHdfXSFY3WahVhVzOt2HG1mxM7QJP+KnOxVi6OQF7hs+bBV6v/wNmEF 4ZDgo9bXXsWP3uVJWGF5FUGrvGEG8bUayNP4ZkIa44n73Wh8J+w9Cj/1LQ0QGDvCIiVN AhnieNTrZo0rfMIp4+DcTgR0nlOguPkVFzqj9s6bqrjfbsjiDwM4TEZICeJEzZgYgv3b vYq1sd5zmvb9jIhCaZ9CZPFHButiA0+9wQa7M/D7h16stSi5ZvLRHt6uwQa7TepDe/Jv gpMQe5XKxKWKP+htlID1CVy16Iml8QidjecbOS045DLe+xYW0uYZRZZVbZNlRG01bnR5 S0FA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bitbyteword.org header.s=google header.b=JH9KYukU; arc=pass (i=1 spf=pass spfdomain=bitbyteword.org dkim=pass dkdomain=bitbyteword.org); spf=pass (google.com: domain of linux-kernel+bounces-129902-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-129902-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id q19-20020a170906a09300b00a470767e993si6746532ejy.688.2024.04.03.07.02.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 07:02:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-129902-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@bitbyteword.org header.s=google header.b=JH9KYukU; arc=pass (i=1 spf=pass spfdomain=bitbyteword.org dkim=pass dkdomain=bitbyteword.org); spf=pass (google.com: domain of linux-kernel+bounces-129902-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-129902-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A26EE1F26807 for ; Wed, 3 Apr 2024 14:02:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3CAE8149C51; Wed, 3 Apr 2024 14:01:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bitbyteword.org header.i=@bitbyteword.org header.b="JH9KYukU" Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BEA7149016 for ; Wed, 3 Apr 2024 14:01:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712152888; cv=none; b=kfywfsLBENM7SzM0C1OvdwObtPZvpfaSX8bB6ioPcbXWtOsan0PBh6a2ksaMHscAoNKZc3hvtpSNSVnCMQX5b5B6XJklBH1uOiIpn8XQuD6CuXk3GK2IFhSGcbw5tfo64oQORzZSXKYZNtPjRYy3JjqdxhEh6Aq/jSHEG6pJVVs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712152888; c=relaxed/simple; bh=TDctzrjvOyJsGkgtTdTPKD3E/sGdBBWNU2PAEkT4FWw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SLbqt7OmGFz36ccjWkBXjLeN8U7C/rFuACrXJjSfRXCooWp87GIo+uBcMz2RPGqhRSqN08tPpj9ta+ItkHlbRThDdSP7X0LMThifpjsLmsUiAugmY6jvPcXtkmcIIvfbKNkT1zSwwnkHk2/vDpoD+Qnjay1qsw9i8jyK+l/vBFE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bitbyteword.org; spf=pass smtp.mailfrom=bitbyteword.org; dkim=pass (2048-bit key) header.d=bitbyteword.org header.i=@bitbyteword.org header.b=JH9KYukU; arc=none smtp.client-ip=209.85.219.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bitbyteword.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bitbyteword.org Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6992d8e7e6dso832556d6.0 for ; Wed, 03 Apr 2024 07:01:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1712152885; x=1712757685; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/BPGVzMI1+eo9yvIAgnlTxqk+J0LKPxHXNZYDNRJA58=; b=JH9KYukU0oTCHZDEXfYpMIVEVyCtwpFQMM6lm1YY/9JXIcKNCh/Y4EHvwHXbqt7R/4 fJ8avagTe7AKzsl5yQxW/IRyoPOqTWhnNwznhHn/ooov6AULZBak0k68dKKa58jJ0vYc cy+3FI7DQjSnugPZsk1ilrIeoBJdbRRq7JjS0p+OBCYP75NikQ9NfnWSNfDv6kbRm0V6 ERYOR9M6GuYi3KiJk958KYx1NxHYX5GIkk0MMcm5h6C/t6l937gvr/slczv6dioileq8 i6eHTCmYstWhbg25ZLXSro6urFSxJrJY0xnavry+h9hZH4LjRFZqpt6uy7mN1MYbyL3J aOZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712152885; x=1712757685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/BPGVzMI1+eo9yvIAgnlTxqk+J0LKPxHXNZYDNRJA58=; b=uPkz9fvsYGCiKMG5ybEfi3j3/llWyAH2QhEBWOBFxLx+EcTRB4eeqrfEmxS8P7WH/H y0PvIBAc43LlEuqXGw2Ng/iLkBWrtmpJdvgIQpxuqAssFN3BnJbAUYD9NKrzM5P238cw 6DMAnAkqDvFwS1IAgFoS4eTmTvK2+750ABrzQQDjE9zZ0mpeH49QpOZ/RiefBn9rTlwC /8A2P9A309Il3ovWSvi5x8s0G8OzrlzBTSBpDZOHa0bqiAcUrQmx+JeHhZycku1LATb9 7HX7drTlK48WNFv0V3TBnWPERkVoUpNIz1xFHiwKOCD0NIGZa0Yrq/JErWgUlxp7NDIz ipvw== X-Forwarded-Encrypted: i=1; AJvYcCU8h0kQQG0Ti3mx8ENbQit7CqyIFAsd/3OzecBjo0jb1qy4t9pA8aRaRDkrX2imeHoC9xHDKGfAQV+WsRra1tAf2NWAGwDo3PHOyluQ X-Gm-Message-State: AOJu0Yw3BxA2JgVXRAAgpe1BXR5YGoVA0QnfhmQHZuLUlFwgVb70Z/w5 4D7gCoZGBSjTmTlZjSdrOypEVRpXBeenuy8KZQ9YED1HxcGxnygyRYOyhNkHcAU= X-Received: by 2002:a0c:c249:0:b0:699:1fd3:95a3 with SMTP id w9-20020a0cc249000000b006991fd395a3mr3804251qvh.24.1712152884922; Wed, 03 Apr 2024 07:01:24 -0700 (PDT) Received: from vinbuntup3.lan (c-73-143-21-186.hsd1.vt.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id gf12-20020a056214250c00b00698d06df322sm5945706qvb.122.2024.04.03.07.01.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 07:01:24 -0700 (PDT) From: "Vineeth Pillai (Google)" To: Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , "H . Peter Anvin" , Ingo Molnar , Juri Lelli , Mel Gorman , Paolo Bonzini , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Vitaly Kuznetsov , Wanpeng Li Cc: "Vineeth Pillai (Google)" , Steven Rostedt , Joel Fernandes , Suleiman Souhlal , Masami Hiramatsu , himadrics@inria.fr, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [RFC PATCH v2 2/5] kvm: Implement the paravirt sched framework for kvm Date: Wed, 3 Apr 2024 10:01:13 -0400 Message-Id: <20240403140116.3002809-3-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240403140116.3002809-1-vineeth@bitbyteword.org> References: <20240403140116.3002809-1-vineeth@bitbyteword.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit kvm uses the kernel's paravirt sched framework to assign an available pvsched driver for a guest. guest vcpus registers with the pvsched driver and calls into the driver callback to notify the events that the driver is interested in. This PoC doesn't do the callback on interrupt injection yet. Will be implemented in subsequent iterations. Signed-off-by: Vineeth Pillai (Google) Signed-off-by: Joel Fernandes (Google) --- arch/x86/kvm/Kconfig | 13 ++++ arch/x86/kvm/x86.c | 3 + include/linux/kvm_host.h | 32 +++++++++ virt/kvm/kvm_main.c | 148 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 196 insertions(+) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 65ed14b6540b..c1776cdb5b65 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -189,4 +189,17 @@ config KVM_MAX_NR_VCPUS the memory footprint of each KVM guest, regardless of how many vCPUs are created for a given VM. +config PARAVIRT_SCHED_KVM + bool "Enable paravirt scheduling capability for kvm" + depends on KVM + default n + help + Paravirtualized scheduling facilitates the exchange of scheduling + related information between the host and guest through shared memory, + enhancing the efficiency of vCPU thread scheduling by the hypervisor. + An illustrative use case involves dynamically boosting the priority of + a vCPU thread when the guest is executing a latency-sensitive workload + on that specific vCPU. + This config enables paravirt scheduling in the kvm hypervisor. + endif # VIRTUALIZATION diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ffe580169c93..d0abc2c64d47 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10896,6 +10896,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) preempt_disable(); + kvm_vcpu_pvsched_notify(vcpu, PVSCHED_VCPU_VMENTER); + static_call(kvm_x86_prepare_switch_to_guest)(vcpu); /* @@ -11059,6 +11061,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) guest_timing_exit_irqoff(); local_irq_enable(); + kvm_vcpu_pvsched_notify(vcpu, PVSCHED_VCPU_VMEXIT); preempt_enable(); kvm_vcpu_srcu_read_lock(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 179df96b20f8..6381569f3de8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -45,6 +45,8 @@ #include #include +#include + #ifndef KVM_MAX_VCPU_IDS #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS #endif @@ -832,6 +834,11 @@ struct kvm { bool vm_bugged; bool vm_dead; +#ifdef CONFIG_PARAVIRT_SCHED_KVM + spinlock_t pvsched_ops_lock; + struct pvsched_vcpu_ops __rcu *pvsched_ops; +#endif + #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; #endif @@ -2413,4 +2420,29 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, } #endif /* CONFIG_KVM_PRIVATE_MEM */ +#ifdef CONFIG_PARAVIRT_SCHED_KVM +int kvm_vcpu_pvsched_notify(struct kvm_vcpu *vcpu, u32 events); +int kvm_vcpu_pvsched_register(struct kvm_vcpu *vcpu); +void kvm_vcpu_pvsched_unregister(struct kvm_vcpu *vcpu); + +int kvm_replace_pvsched_ops(struct kvm *kvm, char *name); +#else +static inline int kvm_vcpu_pvsched_notify(struct kvm_vcpu *vcpu, u32 events) +{ + return 0; +} +static inline int kvm_vcpu_pvsched_register(struct kvm_vcpu *vcpu) +{ + return 0; +} +static inline void kvm_vcpu_pvsched_unregister(struct kvm_vcpu *vcpu) +{ +} + +static inline int kvm_replace_pvsched_ops(struct kvm *kvm, char *name) +{ + return 0; +} +#endif + #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0f50960b0e3a..0546814e4db7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -170,6 +170,142 @@ bool kvm_is_zone_device_page(struct page *page) return is_zone_device_page(page); } +#ifdef CONFIG_PARAVIRT_SCHED_KVM +typedef enum { + PVSCHED_CB_REGISTER = 1, + PVSCHED_CB_UNREGISTER = 2, + PVSCHED_CB_NOTIFY = 3 +} pvsched_vcpu_callback_t; + +/* + * Helper function to invoke the pvsched driver callback. + */ +static int __vcpu_pvsched_callback(struct kvm_vcpu *vcpu, u32 events, + pvsched_vcpu_callback_t action) +{ + int ret = 0; + struct pid *pid; + struct pvsched_vcpu_ops *ops; + + rcu_read_lock(); + ops = rcu_dereference(vcpu->kvm->pvsched_ops); + if (!ops) { + ret = -ENOENT; + goto out; + } + + pid = rcu_dereference(vcpu->pid); + if (WARN_ON_ONCE(!pid)) { + ret = -EINVAL; + goto out; + } + get_pid(pid); + switch(action) { + case PVSCHED_CB_REGISTER: + ops->pvsched_vcpu_register(pid); + break; + case PVSCHED_CB_UNREGISTER: + ops->pvsched_vcpu_unregister(pid); + break; + case PVSCHED_CB_NOTIFY: + if (ops->events & events) { + ops->pvsched_vcpu_notify_event( + NULL, /* TODO: Pass guest allocated sharedmem addr */ + pid, + ops->events & events); + } + break; + default: + WARN_ON_ONCE(1); + } + put_pid(pid); + +out: + rcu_read_unlock(); + return ret; +} + +int kvm_vcpu_pvsched_notify(struct kvm_vcpu *vcpu, u32 events) +{ + return __vcpu_pvsched_callback(vcpu, events, PVSCHED_CB_NOTIFY); +} + +int kvm_vcpu_pvsched_register(struct kvm_vcpu *vcpu) +{ + return __vcpu_pvsched_callback(vcpu, 0, PVSCHED_CB_REGISTER); + /* + * TODO: Action if the registration fails? + */ +} + +void kvm_vcpu_pvsched_unregister(struct kvm_vcpu *vcpu) +{ + __vcpu_pvsched_callback(vcpu, 0, PVSCHED_CB_UNREGISTER); +} + +/* + * Replaces the VM's current pvsched driver. + * if name is NULL or empty string, unassign the + * current driver. + */ +int kvm_replace_pvsched_ops(struct kvm *kvm, char *name) +{ + int ret = 0; + unsigned long i; + struct kvm_vcpu *vcpu = NULL; + struct pvsched_vcpu_ops *ops = NULL, *prev_ops; + + + spin_lock(&kvm->pvsched_ops_lock); + + prev_ops = rcu_dereference(kvm->pvsched_ops); + + /* + * Unassign operation if the passed in value is + * NULL or an empty string. + */ + if (name && *name) { + ops = pvsched_get_vcpu_ops(name); + if (!ops) { + ret = -EINVAL; + goto out; + } + } + + if (prev_ops) { + /* + * Unregister current pvsched driver. + */ + kvm_for_each_vcpu(i, vcpu, kvm) { + kvm_vcpu_pvsched_unregister(vcpu); + } + + pvsched_put_vcpu_ops(prev_ops); + } + + + rcu_assign_pointer(kvm->pvsched_ops, ops); + if (ops) { + /* + * Register new pvsched driver. + */ + kvm_for_each_vcpu(i, vcpu, kvm) { + WARN_ON_ONCE(kvm_vcpu_pvsched_register(vcpu)); + } + } + +out: + spin_unlock(&kvm->pvsched_ops_lock); + + if (ret) + return ret; + + synchronize_rcu(); + + return 0; +} +#endif + /* * Returns a 'struct page' if the pfn is "valid" and backed by a refcounted * page, NULL otherwise. Note, the list of refcounted PG_reserved page types @@ -508,6 +644,8 @@ static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_arch_vcpu_destroy(vcpu); kvm_dirty_ring_free(&vcpu->dirty_ring); + kvm_vcpu_pvsched_unregister(vcpu); + /* * No need for rcu_read_lock as VCPU_RUN is the only place that changes * the vcpu->pid pointer, and at destruction time all file descriptors @@ -1221,6 +1359,10 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX); +#ifdef CONFIG_PARAVIRT_SCHED_KVM + spin_lock_init(&kvm->pvsched_ops_lock); +#endif + /* * Force subsequent debugfs file creations to fail if the VM directory * is not created (by kvm_create_vm_debugfs()). @@ -1343,6 +1485,8 @@ static void kvm_destroy_vm(struct kvm *kvm) int i; struct mm_struct *mm = kvm->mm; + kvm_replace_pvsched_ops(kvm, NULL); + kvm_destroy_pm_notifier(kvm); kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm); kvm_destroy_vm_debugfs(kvm); @@ -3779,6 +3923,8 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu) if (kvm_vcpu_check_block(vcpu) < 0) break; + kvm_vcpu_pvsched_notify(vcpu, PVSCHED_VCPU_HALT); + waited = true; schedule(); } @@ -4434,6 +4580,7 @@ static long kvm_vcpu_ioctl(struct file *filp, /* The thread running this VCPU changed. */ struct pid *newpid; + kvm_vcpu_pvsched_unregister(vcpu); r = kvm_arch_vcpu_run_pid_change(vcpu); if (r) break; @@ -4442,6 +4589,7 @@ static long kvm_vcpu_ioctl(struct file *filp, rcu_assign_pointer(vcpu->pid, newpid); if (oldpid) synchronize_rcu(); + kvm_vcpu_pvsched_register(vcpu); put_pid(oldpid); } r = kvm_arch_vcpu_ioctl_run(vcpu); -- 2.40.1