Hi,
This patch series is a continuation of the talk Saravana gave at LPC 2022
titled "CPUfreq/sched and VM guest workload problems" [1][2][3]. The gist
of the talk is that workloads running in a guest VM get terrible task
placement and CPUfreq behavior when compared to running the same workload
in the host. Effectively, no EAS(Energy Aware Scheduling) for threads
inside VMs. This would make power and performance terrible just by running
the workload in a VM even if we assume there is zero virtualization
overhead.
We have been iterating over different options for communicating between
guest and host, ways of applying the information coming from the
guest/host, etc to figure out the best performance and power improvements
we could get.
The patch series in its current state is NOT meant for landing in the
upstream kernel. We are sending this patch series to share the current
progress and data we have so far. The patch series is meant to be easy to
cherry-pick and test on various devices to see what performance and power
benefits this might give for others.
With this series, a workload running in a VM gets the same task placement
and CPUfreq behavior as it would when running in the host.
As expected, we see significant performance improvement and better
performance/power ratio. If anyone else wants to try this out for your VM
workloads and report findings, that'd be very much appreciated.
The idea is to improve VM CPUfreq/sched behavior by:
- Having guest kernel to do accurate load tracking by taking host CPU
arch/type and frequency into account.
- Sharing vCPU run queue utilization information with the host so that the
host can do proper frequency scaling and task placement on the host side.
Results:
========
As of right now, the best results have been with using hypercalls (see more
below first) to communicate between host and guest and treating the vCPU
run queue util similar to util_est on the host side vCPU thread. So that's
what this patch series does.
Let's look at the results for this series first and then look at the other
options we are trying/tried out:
Use cases running Android inside a VM on a Chromebook:
======================================================
PCMark (Emulates real world usecases)
Higher is better
+-------------------+----------+------------+--------+
| Test Case (score) | Baseline | Util_guest | %delta |
+-------------------+----------+------------+--------+
| Weighted Total | 6136 | 7274 | +19% |
+-------------------+----------+------------+--------+
| Web Browsing | 5558 | 6273 | +13% |
+-------------------+----------+------------+--------+
| Video Editing | 4921 | 5221 | +6% |
+-------------------+----------+------------+--------+
| Writing | 6864 | 8825 | +29% |
+-------------------+----------+------------+--------+
| Photo Editing | 7983 | 11593 | +45% |
+-------------------+----------+------------+--------+
| Data Manipulation | 5814 | 6081 | +5% |
+-------------------+----------+------------+--------+
PCMark Performance/mAh
Higher is better
+-----------+----------+------------+--------+
| | Baseline | Util_guest | %delta |
+-----------+----------+------------+--------+
| Score/mAh | 79 | 88 | +11% |
+-----------+----------+------------+--------+
Roblox
Higher is better
+-----+----------+------------+--------+
| | Baseline | Util_guest | %delta |
+-----+----------+------------+--------+
| FPS | 18.25 | 28.66 | +57% |
+-----+----------+------------+--------+
Roblox Frames/mAh
Higher is better
+--------+----------+------------+--------+
| | Baseline | Util_guest | %delta |
+--------+----------+------------+--------+
| Frames | 91.25 | 114.64 | +26% |
+--------+----------+------------+--------+
Use cases running a minimal system inside a VM on a Pixel 6:
============================================================
FIO
Higher is better
+----------------------+----------+------------+--------+
| Test Case (avg MB/s) | Baseline | Util_guest | %delta |
+----------------------+----------+------------+--------+
| Seq Write | 9.27 | 12.6 | +36% |
+----------------------+----------+------------+--------+
| Rand Write | 9.34 | 11.9 | +27% |
+----------------------+----------+------------+--------+
| Seq Read | 106 | 124 | +17% |
+----------------------+----------+------------+--------+
| Rand Read | 33.6 | 35 | +4% |
+----------------------+----------+------------+--------+
CPU-based ML Inference Benchmark
Lower is better
+-------------------------+----------+------------+--------+
| Test Case (ms) | Baseline | Util_guest | %delta |
+-------------------------+----------+------------+--------+
| Cached Sample Inference | 2.57 | 1.75 | -32% |
+-------------------------+----------+------------+--------+
| Small Sample Inference | 6.8 | 5.57 | -18% |
+-------------------------+----------+------------+--------+
| Large Sample Inference | 31.2 | 26.58 | -15% |
+-------------------------+----------+------------+--------+
These patches expect the host to:
- Affine vCPUs to specific clusters.
- Set vCPU capacity to match the host CPU they are running on.
To make this easy to do/try out, we have put up patches[4][5] to do this on
CrosVM. Once you pick up those patches, you can use options
"--host-cpu-topology" and "--virt-cpufreq" to achieve the above.
The patch series can be broken into:
Patch 1: Add util_guest as an additional PELT signal for host vCPU threads
Patch 2: Hypercall for guest to get current pCPU's frequency
Patch 3: Send vCPU run queue util to host and apply as util_guest
Patch 4: Query pCPU freq table from guest (we'll move this to DT in the
future)
Patch 5: Virtual cpufreq driver that uses the hypercalls to send util to
host and implement frequency invariance in the guest.
Alternative we have implemented and profiled:
=============================================
util_guest vs uclamp_min
========================
One suggestion at LPC was to use uclamp_min to apply the util info coming
from the guest. As we suspected, it doesn't perform as well because
uclamp_min is not additive, whereas the actual workload on the host CPU due
to the vCPU is additive to the existing workloads on the host. Uclamp_min
also has the undesirable side-effect of threads forked from the vCPU thread
inheriting whatever uclamp_min value the vCPU thread had and then getting
stuck with that uclamp_min value.
Below are some additional benchmark results comparing the uclamp_min
prototype (listed as Uclamp) using the same test environment as before
(including hypercalls).
As before, %delta is always comparing to baseline.
PCMark
Higher is better
+-------------------+----------+------------+--------+--------+--------+
| Test Case (score) | Baseline | Util_guest | %delta | Uclamp | %delta |
+-------------------+----------+------------+--------+--------+--------+
| Weighted Total | 6136 | 7274 | +19% | 6848 | +12% |
+-------------------+----------+------------+--------+--------+--------+
| Web Browsing | 5558 | 6273 | +13% | 6050 | +9% |
+-------------------+----------+------------+--------+--------+--------+
| Video Editing | 4921 | 5221 | +6% | 5091 | +3% |
+-------------------+----------+------------+--------+--------+--------+
| Writing | 6864 | 8825 | +29% | 8523 | +24% |
+-------------------+----------+------------+--------+--------+--------+
| Photo Editing | 7983 | 11593 | +45% | 9865 | +24% |
+-------------------+----------+------------+--------+--------+--------+
| Data Manipulation | 5814 | 6081 | +5% | 5836 | 0% |
+-------------------+----------+------------+--------+--------+--------+
PCMark Performance/mAh
Higher is better
+-----------+----------+------------+--------+--------+--------+
| | Baseline | Util_guest | %delta | Uclamp | %delta |
+-----------+----------+------------+--------+--------+--------+
| Score/mAh | 79 | 88 | +11% | 83 | +7% |
+-----------+----------+------------+--------+--------+--------+
Hypercalls vs MMIO:
===================
We realize that hypercalls are not the recommended choice for this and we
have no attachment to any communication method as long as it gives good
results.
We started off with hypercalls to see what is the best we could achieve if
we didn't have to context switch into host side userspace.
To see the impact of switching from hypercalls to MMIO, we kept util_guest
and only switched from hypercall to MMIO. So in the results below:
- Hypercall = hypercall + util_guest
- MMIO = MMIO + util_guest
As before, %delta is always comparing to baseline.
PCMark
Higher is better
+-------------------+----------+------------+--------+-------+--------+
| Test Case (score) | Baseline | Hypercall | %delta | MMIO | %delta |
+-------------------+----------+------------+--------+-------+--------+
| Weighted Total | 6136 | 7274 | +19% | 6867 | +12% |
+-------------------+----------+------------+--------+-------+--------+
| Web Browsing | 5558 | 6273 | +13% | 6035 | +9% |
+-------------------+----------+------------+--------+-------+--------+
| Video Editing | 4921 | 5221 | +6% | 5167 | +5% |
+-------------------+----------+------------+--------+-------+--------+
| Writing | 6864 | 8825 | +29% | 8529 | +24% |
+-------------------+----------+------------+--------+-------+--------+
| Photo Editing | 7983 | 11593 | +45% | 10812 | +35% |
+-------------------+----------+------------+--------+-------+--------+
| Data Manipulation | 5814 | 6081 | +5% | 5327 | -8% |
+-------------------+----------+------------+--------+-------+--------+
PCMark Performance/mAh
Higher is better
+-----------+----------+-----------+--------+------+--------+
| | Baseline | Hypercall | %delta | MMIO | %delta |
+-----------+----------+-----------+--------+------+--------+
| Score/mAh | 79 | 88 | +11% | 83 | +7% |
+-----------+----------+-----------+--------+------+--------+
Roblox
Higher is better
+-----+----------+------------+--------+-------+--------+
| | Baseline | Hypercall | %delta | MMIO | %delta |
+-----+----------+------------+--------+-------+--------+
| FPS | 18.25 | 28.66 | +57% | 24.06 | +32% |
+-----+----------+------------+--------+-------+--------+
Roblox Frames/mAh
Higher is better
+------------+----------+------------+--------+--------+--------+
| | Baseline | Hypercall | %delta | MMIO | %delta |
+------------+----------+------------+--------+--------+--------+
| Frames/mAh | 91.25 | 114.64 | +26% | 103.11 | +13% |
+------------+----------+------------+--------+--------+--------+
Next steps:
===========
We are continuing to look into communication mechanisms other than
hypercalls that are just as/more efficient and avoid switching into the VMM
userspace. Any inputs in this regard are greatly appreciated.
Thanks,
David & Saravana
Cc: Saravana Kannan <[email protected]>
v1 -> v2:
- Added description for EAS and removed DVFS in coverletter.
- Added a v2 tag to the subject.
- Fixed up the inconsistent "units" between tables.
- Made sure everyone is To/Cc-ed for all the patches in the series.
[1] - https://lpc.events/event/16/contributions/1195/
[2] - https://lpc.events/event/16/contributions/1195/attachments/970/1893/LPC%202022%20-%20VM%20DVFS.pdf
[3] - https://www.youtube.com/watch?v=hIg_5bg6opU
[4] - https://chromium-review.googlesource.com/c/crosvm/crosvm/+/4208668
[5] - https://chromium-review.googlesource.com/c/crosvm/crosvm/+/4288027
David Dai (6):
sched/fair: Add util_guest for tasks
kvm: arm64: Add support for get_cur_cpufreq service
kvm: arm64: Add support for util_hint service
kvm: arm64: Add support for get_freqtbl service
dt-bindings: cpufreq: add bindings for virtual kvm cpufreq
cpufreq: add kvm-cpufreq driver
.../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++
Documentation/virt/kvm/api.rst | 28 ++
.../virt/kvm/arm/get_cur_cpufreq.rst | 21 ++
Documentation/virt/kvm/arm/get_freqtbl.rst | 23 ++
Documentation/virt/kvm/arm/index.rst | 3 +
Documentation/virt/kvm/arm/util_hint.rst | 22 ++
arch/arm64/include/uapi/asm/kvm.h | 3 +
arch/arm64/kvm/arm.c | 3 +
arch/arm64/kvm/hypercalls.c | 60 +++++
drivers/cpufreq/Kconfig | 13 +
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/kvm-cpufreq.c | 245 ++++++++++++++++++
include/linux/arm-smccc.h | 21 ++
include/linux/sched.h | 12 +
include/uapi/linux/kvm.h | 3 +
kernel/sched/core.c | 24 +-
kernel/sched/fair.c | 15 +-
tools/arch/arm64/include/uapi/asm/kvm.h | 3 +
18 files changed, 536 insertions(+), 3 deletions(-)
create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
create mode 100644 Documentation/virt/kvm/arm/get_cur_cpufreq.rst
create mode 100644 Documentation/virt/kvm/arm/get_freqtbl.rst
create mode 100644 Documentation/virt/kvm/arm/util_hint.rst
create mode 100644 drivers/cpufreq/kvm-cpufreq.c
--
2.40.0.348.gf938b09366-goog
For virtualization usecases, util_est and util_avg currently tracked
on the host aren't sufficient to accurately represent the workload on
vCPU threads, which results in poor frequency selection and performance.
For example, when a large workload migrates from a busy vCPU thread to
an idle vCPU thread, it incurs additional DVFS ramp-up latencies
as util accumulates.
Introduce a new "util_guest" member as an additional PELT signal that's
independently updated by the guest. When used, it's max aggregated to
provide a boost to both task_util and task_util_est.
Updating task_util and task_util_est will ensure:
-Better task placement decisions for vCPU threads on the host
-Correctly updating util_est.ewma during dequeue
-Additive util with other threads on the same runqueue for more
accurate frequency responses
Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
include/linux/sched.h | 11 +++++++++++
kernel/sched/core.c | 18 +++++++++++++++++-
kernel/sched/fair.c | 15 +++++++++++++--
3 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 63d242164b1a..d8c346fcdf52 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -445,6 +445,16 @@ struct util_est {
#define UTIL_AVG_UNCHANGED 0x80000000
} __attribute__((__aligned__(sizeof(u64))));
+/*
+ * For sched_setattr_nocheck() (kernel) only
+ *
+ * Allow vCPU threads to use UTIL_GUEST as a way to hint the scheduler with more
+ * accurate utilization info. This is useful when guest kernels have some way of
+ * tracking its own runqueue's utilization.
+ *
+ */
+#define SCHED_FLAG_UTIL_GUEST 0x20000000
+
/*
* The load/runnable/util_avg accumulates an infinite geometric series
* (see __update_load_avg_cfs_rq() in kernel/sched/pelt.c).
@@ -499,6 +509,7 @@ struct sched_avg {
unsigned long load_avg;
unsigned long runnable_avg;
unsigned long util_avg;
+ unsigned long util_guest;
struct util_est util_est;
} ____cacheline_aligned;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0d18c3969f90..7700ef5610c1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2024,6 +2024,16 @@ static inline void uclamp_post_fork(struct task_struct *p) { }
static inline void init_uclamp(void) { }
#endif /* CONFIG_UCLAMP_TASK */
+static void __setscheduler_task_util(struct task_struct *p,
+ const struct sched_attr *attr)
+{
+
+ if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_GUEST)))
+ return;
+
+ p->se.avg.util_guest = attr->sched_util_min;
+}
+
bool sched_task_on_rq(struct task_struct *p)
{
return task_on_rq_queued(p);
@@ -7561,7 +7571,7 @@ static int __sched_setscheduler(struct task_struct *p,
return -EINVAL;
}
- if (attr->sched_flags & ~(SCHED_FLAG_ALL | SCHED_FLAG_SUGOV))
+ if (attr->sched_flags & ~(SCHED_FLAG_ALL | SCHED_FLAG_SUGOV | SCHED_FLAG_UTIL_GUEST))
return -EINVAL;
/*
@@ -7583,6 +7593,9 @@ static int __sched_setscheduler(struct task_struct *p,
if (attr->sched_flags & SCHED_FLAG_SUGOV)
return -EINVAL;
+ if (attr->sched_flags & SCHED_FLAG_UTIL_GUEST)
+ return -EINVAL;
+
retval = security_task_setscheduler(p);
if (retval)
return retval;
@@ -7629,6 +7642,8 @@ static int __sched_setscheduler(struct task_struct *p,
goto change;
if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
goto change;
+ if (attr->sched_flags & SCHED_FLAG_UTIL_GUEST)
+ goto change;
p->sched_reset_on_fork = reset_on_fork;
retval = 0;
@@ -7718,6 +7733,7 @@ static int __sched_setscheduler(struct task_struct *p,
__setscheduler_prio(p, newprio);
}
__setscheduler_uclamp(p, attr);
+ __setscheduler_task_util(p, attr);
if (queued) {
/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6986ea31c984..998649554344 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf);
static inline unsigned long task_util(struct task_struct *p)
{
- return READ_ONCE(p->se.avg.util_avg);
+ return max(READ_ONCE(p->se.avg.util_avg),
+ READ_ONCE(p->se.avg.util_guest));
}
static inline unsigned long _task_util_est(struct task_struct *p)
{
struct util_est ue = READ_ONCE(p->se.avg.util_est);
- return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
+ return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest),
+ max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)));
}
static inline unsigned long task_util_est(struct task_struct *p)
@@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
*/
util_est_enqueue(&rq->cfs, p);
+ /*
+ * The normal code path for host thread enqueue doesn't take into
+ * account guest task migrations when updating cpufreq util.
+ * So, always update the cpufreq when a vCPU thread has a
+ * non-zero util_guest value.
+ */
+ if (READ_ONCE(p->se.avg.util_guest))
+ cpufreq_update_util(rq, 0);
+
/*
* If in_iowait is set, the code below may not trigger any cpufreq
* utilization updates, so do it here explicitly with the IOWAIT flag
--
2.40.0.348.gf938b09366-goog
This service allows guests to query the host for frequency of the CPU
that the vCPU is currently running on.
Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
Documentation/virt/kvm/api.rst | 8 +++++++
.../virt/kvm/arm/get_cur_cpufreq.rst | 21 +++++++++++++++++++
Documentation/virt/kvm/arm/index.rst | 1 +
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/hypercalls.c | 18 ++++++++++++++++
include/linux/arm-smccc.h | 7 +++++++
include/uapi/linux/kvm.h | 1 +
tools/arch/arm64/include/uapi/asm/kvm.h | 1 +
9 files changed, 59 insertions(+)
create mode 100644 Documentation/virt/kvm/arm/get_cur_cpufreq.rst
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 62de0768d6aa..b0ff0ad700bf 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8380,6 +8380,14 @@ structure.
When getting the Modified Change Topology Report value, the attr->addr
must point to a byte where the value will be stored or retrieved from.
+8.40 KVM_CAP_GET_CUR_CPUFREQ
+------------------------
+
+:Architectures: arm64
+
+This capability indicates that KVM supports getting the
+frequency of the current CPU that the vCPU thread is running on.
+
9. Known KVM API problems
=========================
diff --git a/Documentation/virt/kvm/arm/get_cur_cpufreq.rst b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
new file mode 100644
index 000000000000..06e0ed5b3868
--- /dev/null
+++ b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
@@ -0,0 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+get_cur_cpufreq support for arm/arm64
+=============================
+
+Get_cur_cpufreq support is used to get current frequency(in KHz) of the
+current CPU that the vCPU thread is running on.
+
+* ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID: 0x86000040
+
+This hypercall uses the SMC32/HVC32 calling convention:
+
+ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID
+ ============== ======== =====================================
+ Function ID: (uint32) 0x86000040
+ Return Values: (int32) NOT_SUPPORTED(-1) on error, or
+ (uint32) Frequency in KHz of current CPU that the
+ vCPU thread is running on.
+ Endianness: Must be the same endianness
+ as the host.
+ ============== ======== =====================================
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index e84848432158..47afc5c1f24a 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -11,3 +11,4 @@ ARM
hypercalls
pvtime
ptp_kvm
+ get_cur_cpufreq
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index f8129c624b07..ed8b63e91bdc 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -367,6 +367,7 @@ enum {
enum {
KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT = 0,
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
+ KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ = 2,
#ifdef __KERNEL__
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
#endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3bd732eaf087..f960b136c611 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
case KVM_CAP_ARM_SYSTEM_SUSPEND:
+ case KVM_CAP_GET_CUR_CPUFREQ:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..b3f4b90c024b 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -3,6 +3,9 @@
#include <linux/arm-smccc.h>
#include <linux/kvm_host.h>
+#include <linux/cpufreq.h>
+#include <linux/sched.h>
+#include <uapi/linux/sched/types.h>
#include <asm/kvm_emulate.h>
@@ -16,6 +19,15 @@
#define KVM_ARM_SMCCC_VENDOR_HYP_FEATURES \
GENMASK(KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT - 1, 0)
+static void kvm_sched_get_cur_cpufreq(struct kvm_vcpu *vcpu, u64 *val)
+{
+ unsigned long ret_freq;
+
+ ret_freq = cpufreq_get(task_cpu(current));
+
+ val[0] = ret_freq;
+}
+
static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
{
struct system_time_snapshot systime_snapshot;
@@ -116,6 +128,9 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_PTP,
&smccc_feat->vendor_hyp_bmap);
+ case ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID:
+ return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ,
+ &smccc_feat->vendor_hyp_bmap);
default:
return kvm_hvc_call_default_allowed(func_id);
}
@@ -213,6 +228,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
kvm_ptp_get_time(vcpu, val);
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID:
+ kvm_sched_get_cur_cpufreq(vcpu, val);
+ break;
case ARM_SMCCC_TRNG_VERSION:
case ARM_SMCCC_TRNG_FEATURES:
case ARM_SMCCC_TRNG_GET_UUID:
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 220c8c60e021..e15f1bdcf3f1 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -112,6 +112,7 @@
/* KVM "vendor specific" services */
#define ARM_SMCCC_KVM_FUNC_FEATURES 0
#define ARM_SMCCC_KVM_FUNC_PTP 1
+#define ARM_SMCCC_KVM_FUNC_GET_CUR_CPUFREQ 64
#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
#define ARM_SMCCC_KVM_NUM_FUNCS 128
@@ -138,6 +139,12 @@
#define KVM_PTP_VIRT_COUNTER 0
#define KVM_PTP_PHYS_COUNTER 1
+#define ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_GET_CUR_CPUFREQ)
+
/* Paravirtualised time calls (defined by ARM DEN0057A) */
#define ARM_SMCCC_HV_PV_TIME_FEATURES \
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d77aef872a0a..0a1a260243bf 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1184,6 +1184,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
+#define KVM_CAP_GET_CUR_CPUFREQ 512
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index f8129c624b07..ed8b63e91bdc 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -367,6 +367,7 @@ enum {
enum {
KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT = 0,
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
+ KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ = 2,
#ifdef __KERNEL__
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
#endif
--
2.40.0.348.gf938b09366-goog
This service allows guests to send the utilization of workoads on its vCPUs
to the host. Utilization is represented as an arbitrary value of 0-1024
where 1024 represents the highest performance point normalized for
frequency and architecture across all CPUs. This hint is used by
the host for scheduling vCPU threads and deciding CPU frequency.
Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
Documentation/virt/kvm/api.rst | 12 ++++++++++++
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/util_hint.rst | 22 ++++++++++++++++++++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/hypercalls.c | 20 ++++++++++++++++++++
include/linux/arm-smccc.h | 7 +++++++
include/uapi/linux/kvm.h | 1 +
tools/arch/arm64/include/uapi/asm/kvm.h | 1 +
9 files changed, 66 insertions(+)
create mode 100644 Documentation/virt/kvm/arm/util_hint.rst
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b0ff0ad700bf..38ce33564efc 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8388,6 +8388,18 @@ must point to a byte where the value will be stored or retrieved from.
This capability indicates that KVM supports getting the
frequency of the current CPU that the vCPU thread is running on.
+8.41 KVM_CAP_UTIL_HINT
+----------------------
+
+:Architectures: arm64
+
+This capability indicates that the KVM supports taking utilization
+hints from the guest. Utilization is represented as a value from 0-1024
+where 1024 represents the highest performance point across all physical CPUs
+after normalizing for architecture. This is useful when guests are tracking
+workload on its vCPUs. Util hints allow the host to make more accurate
+frequency selections and task placement for vCPU threads.
+
9. Known KVM API problems
=========================
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index 47afc5c1f24a..f83877663813 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -12,3 +12,4 @@ ARM
pvtime
ptp_kvm
get_cur_cpufreq
+ util_hint
diff --git a/Documentation/virt/kvm/arm/util_hint.rst b/Documentation/virt/kvm/arm/util_hint.rst
new file mode 100644
index 000000000000..262d142d62d9
--- /dev/null
+++ b/Documentation/virt/kvm/arm/util_hint.rst
@@ -0,0 +1,22 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Util_hint support for arm64
+============================
+
+Util_hint is used for sharing the utilization value from the guest
+to the host.
+
+* ARM_SMCCC_HYP_KVM_UTIL_HINT_FUNC_ID: 0x86000041
+
+This hypercall using the SMC32/HVC32 calling convention:
+
+ARM_SMCCC_HYP_KVM_UTIL_HINT_FUNC_ID
+ ============== ========= ============================
+ Function ID: (uint32) 0x86000041
+ Arguments: (uint32) util value(0-1024) where 1024 represents
+ the highest performance point normalized
+ across all CPUs
+ Return values: (int32) NOT_SUPPORTED(-1) on error.
+ Endianness: Must be the same endianness
+ as the host.
+ ============== ======== ============================
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ed8b63e91bdc..61309ecb7241 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -368,6 +368,7 @@ enum {
KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT = 0,
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ = 2,
+ KVM_REG_ARM_VENDOR_HYP_BIT_UTIL_HINT = 3,
#ifdef __KERNEL__
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
#endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f960b136c611..bf3c4d4b9b67 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -221,6 +221,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PTP_KVM:
case KVM_CAP_ARM_SYSTEM_SUSPEND:
case KVM_CAP_GET_CUR_CPUFREQ:
+ case KVM_CAP_UTIL_HINT:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index b3f4b90c024b..01dba07b5183 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -28,6 +28,20 @@ static void kvm_sched_get_cur_cpufreq(struct kvm_vcpu *vcpu, u64 *val)
val[0] = ret_freq;
}
+static void kvm_sched_set_util(struct kvm_vcpu *vcpu, u64 *val)
+{
+ struct sched_attr attr = {
+ .sched_flags = SCHED_FLAG_UTIL_GUEST,
+ };
+ int ret;
+
+ attr.sched_util_min = smccc_get_arg1(vcpu);
+
+ ret = sched_setattr_nocheck(current, &attr);
+
+ val[0] = (u64)ret;
+}
+
static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
{
struct system_time_snapshot systime_snapshot;
@@ -131,6 +145,9 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID:
return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ,
&smccc_feat->vendor_hyp_bmap);
+ case ARM_SMCCC_VENDOR_HYP_KVM_UTIL_HINT_FUNC_ID:
+ return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_UTIL_HINT,
+ &smccc_feat->vendor_hyp_bmap);
default:
return kvm_hvc_call_default_allowed(func_id);
}
@@ -231,6 +248,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID:
kvm_sched_get_cur_cpufreq(vcpu, val);
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_UTIL_HINT_FUNC_ID:
+ kvm_sched_set_util(vcpu, val);
+ break;
case ARM_SMCCC_TRNG_VERSION:
case ARM_SMCCC_TRNG_FEATURES:
case ARM_SMCCC_TRNG_GET_UUID:
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index e15f1bdcf3f1..9f747e5025b6 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -113,6 +113,7 @@
#define ARM_SMCCC_KVM_FUNC_FEATURES 0
#define ARM_SMCCC_KVM_FUNC_PTP 1
#define ARM_SMCCC_KVM_FUNC_GET_CUR_CPUFREQ 64
+#define ARM_SMCCC_KVM_FUNC_UTIL_HINT 65
#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
#define ARM_SMCCC_KVM_NUM_FUNCS 128
@@ -145,6 +146,12 @@
ARM_SMCCC_OWNER_VENDOR_HYP, \
ARM_SMCCC_KVM_FUNC_GET_CUR_CPUFREQ)
+#define ARM_SMCCC_VENDOR_HYP_KVM_UTIL_HINT_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_UTIL_HINT)
+
/* Paravirtualised time calls (defined by ARM DEN0057A) */
#define ARM_SMCCC_HV_PV_TIME_FEATURES \
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0a1a260243bf..7f667ab344ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1185,6 +1185,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
#define KVM_CAP_GET_CUR_CPUFREQ 512
+#define KVM_CAP_UTIL_HINT 513
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index ed8b63e91bdc..61309ecb7241 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -368,6 +368,7 @@ enum {
KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT = 0,
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ = 2,
+ KVM_REG_ARM_VENDOR_HYP_BIT_UTIL_HINT = 3,
#ifdef __KERNEL__
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
#endif
--
2.40.0.348.gf938b09366-goog
This service allows guests to query the host for the frequency table
of the CPU that the vCPU is currently running on.
Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
Documentation/virt/kvm/api.rst | 8 ++++++++
Documentation/virt/kvm/arm/get_freqtbl.rst | 23 ++++++++++++++++++++++
Documentation/virt/kvm/arm/index.rst | 1 +
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/hypercalls.c | 22 +++++++++++++++++++++
include/linux/arm-smccc.h | 7 +++++++
include/uapi/linux/kvm.h | 1 +
tools/arch/arm64/include/uapi/asm/kvm.h | 1 +
9 files changed, 65 insertions(+)
create mode 100644 Documentation/virt/kvm/arm/get_freqtbl.rst
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38ce33564efc..8f905456e2b4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8400,6 +8400,14 @@ after normalizing for architecture. This is useful when guests are tracking
workload on its vCPUs. Util hints allow the host to make more accurate
frequency selections and task placement for vCPU threads.
+8.42 KVM_CAP_GET_CPUFREQ_TBL
+---------------------------
+
+:Architectures: arm64
+
+This capability indicates that the KVM supports getting the
+frequency table of the current CPU that the vCPU thread is running on.
+
9. Known KVM API problems
=========================
diff --git a/Documentation/virt/kvm/arm/get_freqtbl.rst b/Documentation/virt/kvm/arm/get_freqtbl.rst
new file mode 100644
index 000000000000..f6832d7566e7
--- /dev/null
+++ b/Documentation/virt/kvm/arm/get_freqtbl.rst
@@ -0,0 +1,23 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+get_freqtbl support for arm/arm64
+=============================
+
+Allows guest to query the frequency(in KHz) table of the current CPU that
+the vCPU thread is running on.
+
+* ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID: 0x86000042
+
+This hypercall uses the SMC32/HVC32 calling convention:
+
+ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID
+ ============== ======== =====================================
+ Function ID: (uint32) 0x86000042
+ Arguments: (uint32) index of the current CPU's frequency table
+ Return Values: (int32) NOT_SUPPORTED(-1) on error, or
+ (uint32) Frequency table entry of requested index
+ in KHz
+ of current CPU(r1)
+ Endianness: Must be the same endianness
+ as the host.
+ ============== ======== =====================================
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index f83877663813..e2e56bb41491 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -13,3 +13,4 @@ ARM
ptp_kvm
get_cur_cpufreq
util_hint
+ get_freqtbl
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 61309ecb7241..ed6f593264bd 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -369,6 +369,7 @@ enum {
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ = 2,
KVM_REG_ARM_VENDOR_HYP_BIT_UTIL_HINT = 3,
+ KVM_REG_ARM_VENDOR_HYP_BIT_GET_CPUFREQ_TBL = 4,
#ifdef __KERNEL__
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
#endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index bf3c4d4b9b67..cd76128e4af4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -222,6 +222,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_SYSTEM_SUSPEND:
case KVM_CAP_GET_CUR_CPUFREQ:
case KVM_CAP_UTIL_HINT:
+ case KVM_CAP_GET_CPUFREQ_TBL:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 01dba07b5183..6f96579dda80 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -42,6 +42,22 @@ static void kvm_sched_set_util(struct kvm_vcpu *vcpu, u64 *val)
val[0] = (u64)ret;
}
+static void kvm_sched_get_cpufreq_table(struct kvm_vcpu *vcpu, u64 *val)
+{
+ struct cpufreq_policy *policy;
+ u32 idx = smccc_get_arg1(vcpu);
+
+ policy = cpufreq_cpu_get(task_cpu(current));
+
+ if (!policy)
+ return;
+
+ val[0] = SMCCC_RET_SUCCESS;
+ val[1] = policy->freq_table[idx].frequency;
+
+ cpufreq_cpu_put(policy);
+}
+
static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
{
struct system_time_snapshot systime_snapshot;
@@ -148,6 +164,9 @@ static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
case ARM_SMCCC_VENDOR_HYP_KVM_UTIL_HINT_FUNC_ID:
return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_UTIL_HINT,
&smccc_feat->vendor_hyp_bmap);
+ case ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID:
+ return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_GET_CPUFREQ_TBL,
+ &smccc_feat->vendor_hyp_bmap);
default:
return kvm_hvc_call_default_allowed(func_id);
}
@@ -251,6 +270,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case ARM_SMCCC_VENDOR_HYP_KVM_UTIL_HINT_FUNC_ID:
kvm_sched_set_util(vcpu, val);
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID:
+ kvm_sched_get_cpufreq_table(vcpu, val);
+ break;
case ARM_SMCCC_TRNG_VERSION:
case ARM_SMCCC_TRNG_FEATURES:
case ARM_SMCCC_TRNG_GET_UUID:
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 9f747e5025b6..19fefb73a9bd 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -114,6 +114,7 @@
#define ARM_SMCCC_KVM_FUNC_PTP 1
#define ARM_SMCCC_KVM_FUNC_GET_CUR_CPUFREQ 64
#define ARM_SMCCC_KVM_FUNC_UTIL_HINT 65
+#define ARM_SMCCC_KVM_FUNC_GET_CPUFREQ_TBL 66
#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
#define ARM_SMCCC_KVM_NUM_FUNCS 128
@@ -152,6 +153,12 @@
ARM_SMCCC_OWNER_VENDOR_HYP, \
ARM_SMCCC_KVM_FUNC_UTIL_HINT)
+#define ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_GET_CPUFREQ_TBL)
+
/* Paravirtualised time calls (defined by ARM DEN0057A) */
#define ARM_SMCCC_HV_PV_TIME_FEATURES \
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7f667ab344ae..90a7f37f046d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1186,6 +1186,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
#define KVM_CAP_GET_CUR_CPUFREQ 512
#define KVM_CAP_UTIL_HINT 513
+#define KVM_CAP_GET_CPUFREQ_TBL 514
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index 61309ecb7241..ebf9a3395c1b 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -369,6 +369,7 @@ enum {
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
KVM_REG_ARM_VENDOR_HYP_BIT_GET_CUR_CPUFREQ = 2,
KVM_REG_ARM_VENDOR_HYP_BIT_UTIL_HINT = 3,
+ KVM_REG_ARM_VENDOR_HYP_BIT_CPUFREQ_TBL = 4,
#ifdef __KERNEL__
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
#endif
--
2.40.0.348.gf938b09366-goog
Add devicetree bindings for a virtual kvm cpufreq driver.
Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
.../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++++++++++++++++++
1 file changed, 39 insertions(+)
create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
new file mode 100644
index 000000000000..31e64558a7f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
@@ -0,0 +1,39 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/cpufreq/cpufreq-virtual-kvm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Virtual KVM CPUFreq
+
+maintainers:
+ - David Dai <[email protected]>
+
+description: |
+
+ KVM CPUFreq is a virtualized driver in guest kernels that sends utilization
+ of its vCPUs as a hint to the host. The host uses hint to schedule vCPU
+ threads and select CPU frequency. It enables accurate Per-Entity Load
+ Tracking for tasks running in the guest by querying host CPU frequency
+ unless a virtualized FIE exists(Like AMUs).
+
+properties:
+ compatible:
+ const: virtual,kvm-cpufreq
+
+required:
+ - compatible
+
+additionalProperties: false
+
+examples:
+ - |
+ {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ cpufreq {
+ compatible = "virtual,kvm-cpufreq";
+ };
+
+ };
--
2.40.0.348.gf938b09366-goog
Introduce a virtualized cpufreq driver for guest kernels to improve
performance and power of workloads within VMs.
This driver does two main things:
1. Sends utilization of vCPUs as a hint to the host. The host uses the
hint to schedule the vCPU threads and decide physical CPU frequency.
2. If a VM does not support a virtualized FIE(like AMUs), it uses
hypercalls to update the guest's frequency scaling factor periodically
by querying the host CPU frequency. This enables accurate
Per-Entity Load Tracking for tasks running in the guest.
Note that because the host already employs a rate_limit_us, we set the
transition_delay_us of the cpufreq policy to a miniscule value(1)
to avoid any additional delays between when the runqueue's util change
and a frequency response on the host.
Co-developed-by: Saravana Kannan <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
drivers/cpufreq/Kconfig | 13 ++
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/kvm-cpufreq.c | 245 ++++++++++++++++++++++++++++++++++
include/linux/sched.h | 1 +
kernel/sched/core.c | 6 +
5 files changed, 266 insertions(+)
create mode 100644 drivers/cpufreq/kvm-cpufreq.c
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 2c839bd2b051..0ef9d5be7c4d 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -217,6 +217,19 @@ config CPUFREQ_DT
If in doubt, say N.
+config CPUFREQ_KVM
+ tristate "KVM cpufreq driver"
+ help
+ This adds a virtualized KVM cpufreq driver for guest kernels that
+ uses hypercalls to communicate with the host. It sends utilization
+ updates to the host and gets used to schedule vCPU threads and
+ select CPU frequency. If a VM does not support a virtualized FIE
+ such as AMUs, it updates the frequency scaling factor by polling
+ host CPU frequency to enable accurate Per-Entity Load Tracking
+ for tasks running in the guest.
+
+ If in doubt, say N.
+
config CPUFREQ_DT_PLATDEV
bool
help
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index ef8510774913..179ea8d45135 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_CPU_FREQ_GOV_ATTR_SET) += cpufreq_governor_attr_set.o
obj-$(CONFIG_CPUFREQ_DT) += cpufreq-dt.o
obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o
+obj-$(CONFIG_CPUFREQ_KVM) += kvm-cpufreq.o
# Traces
CFLAGS_amd-pstate-trace.o := -I$(src)
diff --git a/drivers/cpufreq/kvm-cpufreq.c b/drivers/cpufreq/kvm-cpufreq.c
new file mode 100644
index 000000000000..1542c9ac4119
--- /dev/null
+++ b/drivers/cpufreq/kvm-cpufreq.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Google LLC
+ */
+
+#include <linux/arch_topology.h>
+#include <linux/arm-smccc.h>
+#include <linux/cpufreq.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/pm_opp.h>
+#include <linux/slab.h>
+
+static void kvm_scale_freq_tick(void)
+{
+ unsigned long scale, cur_freq, max_freq;
+ struct arm_smccc_res hvc_res;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID,
+ 0, &hvc_res);
+
+ cur_freq = hvc_res.a0;
+ max_freq = cpufreq_get_hw_max_freq(task_cpu(current));
+ scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
+
+ this_cpu_write(arch_freq_scale, (unsigned long)scale);
+}
+
+static struct scale_freq_data kvm_sfd = {
+ .source = SCALE_FREQ_SOURCE_ARCH,
+ .set_freq_scale = kvm_scale_freq_tick,
+};
+
+struct remote_data {
+ int ret;
+ struct cpufreq_frequency_table *table;
+};
+
+static void remote_get_freqtbl_num_entries(void *data)
+{
+ struct arm_smccc_res hvc_res;
+ u32 freq = 1UL;
+ int *idx = data;
+
+ while (freq != CPUFREQ_TABLE_END) {
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID,
+ *idx, &hvc_res);
+ if (hvc_res.a0) {
+ *idx = -ENODEV;
+ return;
+ }
+ freq = hvc_res.a1;
+ (*idx)++;
+ }
+}
+
+static int kvm_cpufreq_get_freqtbl_num_entries(int cpu)
+{
+ int num_entries = 0;
+
+ smp_call_function_single(cpu, remote_get_freqtbl_num_entries, &num_entries, true);
+ return num_entries;
+}
+
+static void remote_populate_freqtbl(void *data)
+{
+ struct arm_smccc_res hvc_res;
+ struct remote_data *freq_data = data;
+ struct cpufreq_frequency_table *pos;
+ struct cpufreq_frequency_table *table = freq_data->table;
+ int idx;
+
+ cpufreq_for_each_entry_idx(pos, table, idx) {
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID,
+ idx, &hvc_res);
+ if (hvc_res.a0) {
+ freq_data->ret = -ENODEV;
+ return;
+ }
+ pos->frequency = hvc_res.a1;
+ }
+ freq_data->ret = 0;
+}
+
+static int kvm_cpufreq_populate_freqtbl(struct cpufreq_frequency_table *table, int cpu)
+{
+ struct remote_data freq_data;
+
+ freq_data.table = table;
+ smp_call_function_single(cpu, remote_populate_freqtbl, &freq_data, true);
+ return freq_data.ret;
+}
+
+static unsigned int kvm_cpufreq_setutil_hyp(struct cpufreq_policy *policy)
+{
+ struct arm_smccc_res hvc_res;
+ u32 util = sched_cpu_util_freq(policy->cpu);
+ u32 cap = arch_scale_cpu_capacity(policy->cpu);
+ u32 threshold = cap - (cap >> 2);
+
+ if (util > threshold)
+ util = (cap + threshold) >> 1;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_UTIL_HINT_FUNC_ID,
+ util, &hvc_res);
+
+ return hvc_res.a0;
+}
+
+static unsigned int kvm_cpufreq_fast_switch(struct cpufreq_policy *policy,
+ unsigned int target_freq)
+{
+ kvm_cpufreq_setutil_hyp(policy);
+ return target_freq;
+}
+
+static int kvm_cpufreq_target_index(struct cpufreq_policy *policy,
+ unsigned int index)
+{
+ return kvm_cpufreq_setutil_hyp(policy);
+}
+
+static const struct of_device_id kvm_cpufreq_match[] = {
+ { .compatible = "virtual,kvm-cpufreq"},
+ {}
+};
+MODULE_DEVICE_TABLE(of, kvm_cpufreq_match);
+
+static int kvm_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ struct device *cpu_dev;
+ struct cpufreq_frequency_table *table;
+ int num_entries;
+
+ cpu_dev = get_cpu_device(policy->cpu);
+ if (!cpu_dev) {
+ pr_err("%s: failed to get cpu%d device\n", __func__,
+ policy->cpu);
+ return -ENODEV;
+ }
+
+ num_entries = kvm_cpufreq_get_freqtbl_num_entries(policy->cpu);
+ if (num_entries == -ENODEV)
+ return -ENODEV;
+
+ table = kcalloc(num_entries, sizeof(*table), GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+
+ table[num_entries-1].frequency = CPUFREQ_TABLE_END;
+
+ if (kvm_cpufreq_populate_freqtbl(table, policy->cpu))
+ return -ENODEV;
+
+ policy->freq_table = table;
+ policy->dvfs_possible_from_any_cpu = false;
+ policy->fast_switch_possible = true;
+ policy->transition_delay_us = 1;
+
+ /*
+ * Only takes effect if another FIE source such as AMUs
+ * have not been registered.
+ */
+ topology_set_scale_freq_source(&kvm_sfd, policy->cpus);
+
+ return 0;
+}
+
+static int kvm_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+ kfree(policy->freq_table);
+ return 0;
+}
+
+static int kvm_cpufreq_online(struct cpufreq_policy *policy)
+{
+ /* Nothing to restore. */
+ return 0;
+}
+
+static int kvm_cpufreq_offline(struct cpufreq_policy *policy)
+{
+ /* Dummy offline() to avoid exit() being called and freeing resources. */
+ return 0;
+}
+
+static struct cpufreq_driver cpufreq_kvm_driver = {
+ .name = "kvm-cpufreq",
+ .init = kvm_cpufreq_cpu_init,
+ .exit = kvm_cpufreq_cpu_exit,
+ .online = kvm_cpufreq_online,
+ .offline = kvm_cpufreq_offline,
+ .verify = cpufreq_generic_frequency_table_verify,
+ .target_index = kvm_cpufreq_target_index,
+ .fast_switch = kvm_cpufreq_fast_switch,
+ .attr = cpufreq_generic_attr,
+};
+
+static int kvm_cpufreq_driver_probe(struct platform_device *pdev)
+{
+ int ret;
+
+ ret = cpufreq_register_driver(&cpufreq_kvm_driver);
+ if (ret) {
+ dev_err(&pdev->dev, "KVM CPUFreq driver failed to register: %d\n", ret);
+ return ret;
+ }
+
+ dev_dbg(&pdev->dev, "KVM CPUFreq driver initialized\n");
+ return 0;
+}
+
+static int kvm_cpufreq_driver_remove(struct platform_device *pdev)
+{
+ cpufreq_unregister_driver(&cpufreq_kvm_driver);
+ return 0;
+}
+
+static struct platform_driver kvm_cpufreq_driver = {
+ .probe = kvm_cpufreq_driver_probe,
+ .remove = kvm_cpufreq_driver_remove,
+ .driver = {
+ .name = "kvm-cpufreq",
+ .of_match_table = kvm_cpufreq_match,
+ },
+};
+
+static int __init kvm_cpufreq_init(void)
+{
+ return platform_driver_register(&kvm_cpufreq_driver);
+}
+postcore_initcall(kvm_cpufreq_init);
+
+static void __exit kvm_cpufreq_exit(void)
+{
+ platform_driver_unregister(&kvm_cpufreq_driver);
+}
+module_exit(kvm_cpufreq_exit);
+
+MODULE_DESCRIPTION("KVM cpufreq driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d8c346fcdf52..bd38aa32a57c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2303,6 +2303,7 @@ static inline bool owner_on_cpu(struct task_struct *owner)
/* Returns effective CPU energy utilization, as seen by the scheduler */
unsigned long sched_cpu_util(int cpu);
+unsigned long sched_cpu_util_freq(int cpu);
#endif /* CONFIG_SMP */
#ifdef CONFIG_RSEQ
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7700ef5610c1..dd46f4cc629b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7421,6 +7421,12 @@ unsigned long sched_cpu_util(int cpu)
{
return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
}
+
+unsigned long sched_cpu_util_freq(int cpu)
+{
+ return effective_cpu_util(cpu, cpu_util_cfs(cpu), FREQUENCY_UTIL, NULL);
+}
+
#endif /* CONFIG_SMP */
/**
--
2.40.0.348.gf938b09366-goog
On 31/03/2023 03:43, David Dai wrote:
> Add devicetree bindings for a virtual kvm cpufreq driver.
Why? Why virtual devices should be documented in DT? DT is for
non-discoverable hardware, right? You have entire commit msg to explain
it instead of saying something easily visible by the diff.
>
> Co-developed-by: Saravana Kannan <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> Signed-off-by: David Dai <[email protected]>
> ---
> .../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++++++++++++++++++
> 1 file changed, 39 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
>
> diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> new file mode 100644
> index 000000000000..31e64558a7f1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> @@ -0,0 +1,39 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/cpufreq/cpufreq-virtual-kvm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Virtual KVM CPUFreq
> +
> +maintainers:
> + - David Dai <[email protected]>
> +
> +description: |
Do not need '|'.
> +
Drop stray blank line.
> + KVM CPUFreq is a virtualized driver in guest kernels that sends utilization
> + of its vCPUs as a hint to the host. The host uses hint to schedule vCPU
> + threads and select CPU frequency. It enables accurate Per-Entity Load
> + Tracking for tasks running in the guest by querying host CPU frequency
> + unless a virtualized FIE exists(Like AMUs).
No clue why you need DT bindings for this. KVM has interfaces between
host and guests.
> +
> +properties:
> + compatible:
> + const: virtual,kvm-cpufreq
> +
> +required:
> + - compatible
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + {
This is some broken syntax and/or indentation.
I don't get what this node is about.
> + #address-cells = <2>;
> + #size-cells = <2>;
Why?
> +
> + cpufreq {
> + compatible = "virtual,kvm-cpufreq";
> + };
> +
Drop stray blank lines
> + };
Best regards,
Krzysztof
On Thu, 30 Mar 2023 18:43:49 -0700, David Dai wrote:
> Add devicetree bindings for a virtual kvm cpufreq driver.
>
> Co-developed-by: Saravana Kannan <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> Signed-off-by: David Dai <[email protected]>
> ---
> .../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++++++++++++++++++
> 1 file changed, 39 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
>
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors:
Error: Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.example.dts:18.9-10 syntax error
FATAL ERROR: Unable to parse input tree
make[1]: *** [scripts/Makefile.lib:419: Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.example.dtb] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1512: dt_binding_check] Error 2
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/[email protected]
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
On Thu, Mar 30, 2023 at 8:45 PM David Dai <[email protected]> wrote:
>
> Add devicetree bindings for a virtual kvm cpufreq driver.
>
> Co-developed-by: Saravana Kannan <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> Signed-off-by: David Dai <[email protected]>
> ---
> .../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++++++++++++++++++
> 1 file changed, 39 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
>
> diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> new file mode 100644
> index 000000000000..31e64558a7f1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> @@ -0,0 +1,39 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/cpufreq/cpufreq-virtual-kvm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Virtual KVM CPUFreq
> +
> +maintainers:
> + - David Dai <[email protected]>
> +
> +description: |
> +
> + KVM CPUFreq is a virtualized driver in guest kernels that sends utilization
> + of its vCPUs as a hint to the host. The host uses hint to schedule vCPU
> + threads and select CPU frequency. It enables accurate Per-Entity Load
> + Tracking for tasks running in the guest by querying host CPU frequency
> + unless a virtualized FIE exists(Like AMUs).
> +
> +properties:
> + compatible:
> + const: virtual,kvm-cpufreq
> +
> +required:
> + - compatible
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + {
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + cpufreq {
> + compatible = "virtual,kvm-cpufreq";
> + };
The same thing was tried on non-virtual h/w too. This is not a device
so it doesn't go in DT. It is just an abuse of DT as a kernel driver
instantiation mechanism.
Rob
On Thu, Mar 30, 2023 at 06:43:46PM -0700, David Dai wrote:
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 62de0768d6aa..b0ff0ad700bf 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8380,6 +8380,14 @@ structure.
> When getting the Modified Change Topology Report value, the attr->addr
> must point to a byte where the value will be stored or retrieved from.
>
> +8.40 KVM_CAP_GET_CUR_CPUFREQ
> +------------------------
> +
> +:Architectures: arm64
> +
> +This capability indicates that KVM supports getting the
> +frequency of the current CPU that the vCPU thread is running on.
> +
> 9. Known KVM API problems
> =========================
>
> diff --git a/Documentation/virt/kvm/arm/get_cur_cpufreq.rst b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
> new file mode 100644
> index 000000000000..06e0ed5b3868
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
> @@ -0,0 +1,21 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +get_cur_cpufreq support for arm/arm64
> +=============================
> +
> +Get_cur_cpufreq support is used to get current frequency(in KHz) of the
> +current CPU that the vCPU thread is running on.
> +
> +* ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID: 0x86000040
> +
> +This hypercall uses the SMC32/HVC32 calling convention:
> +
> +ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID
> + ============== ======== =====================================
> + Function ID: (uint32) 0x86000040
> + Return Values: (int32) NOT_SUPPORTED(-1) on error, or
> + (uint32) Frequency in KHz of current CPU that the
> + vCPU thread is running on.
> + Endianness: Must be the same endianness
> + as the host.
> + ============== ======== =====================================
Sphinx reports htmldocs warnings:
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/api.rst:8384: WARNING: Title underline too short.
8.40 KVM_CAP_GET_CUR_CPUFREQ
------------------------
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/api.rst:8384: WARNING: Title underline too short.
8.40 KVM_CAP_GET_CUR_CPUFREQ
------------------------
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/api.rst:8404: WARNING: Title underline too short.
I have applied the fixup:
---- >8 ----
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8f905456e2b4a1..baf8a4c43b5839 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8381,7 +8381,7 @@ When getting the Modified Change Topology Report value, the attr->addr
must point to a byte where the value will be stored or retrieved from.
8.40 KVM_CAP_GET_CUR_CPUFREQ
-------------------------
+----------------------------
:Architectures: arm64
diff --git a/Documentation/virt/kvm/arm/get_cur_cpufreq.rst b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
index 06e0ed5b3868d7..76f112efb99f92 100644
--- a/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
+++ b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
@@ -11,11 +11,12 @@ current CPU that the vCPU thread is running on.
This hypercall uses the SMC32/HVC32 calling convention:
ARM_SMCCC_VENDOR_HYP_KVM_GET_CUR_CPUFREQ_FUNC_ID
- ============== ======== =====================================
+
+ ============== ======== ========================================
Function ID: (uint32) 0x86000040
Return Values: (int32) NOT_SUPPORTED(-1) on error, or
(uint32) Frequency in KHz of current CPU that the
vCPU thread is running on.
Endianness: Must be the same endianness
as the host.
- ============== ======== =====================================
+ ============== ======== ========================================
Thanks.
--
An old man doll... just what I always wanted! - Clara
On Thu, Mar 30, 2023 at 06:43:46PM -0700, David Dai wrote:
> +get_cur_cpufreq support for arm/arm64
> +=============================
Oops, I have to also fix this heading too:
---- >8 ----
diff --git a/Documentation/virt/kvm/arm/get_cur_cpufreq.rst b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
index 76f112efb99f92..21c2b2fe0c8acf 100644
--- a/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
+++ b/Documentation/virt/kvm/arm/get_cur_cpufreq.rst
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
get_cur_cpufreq support for arm/arm64
-=============================
+=====================================
Get_cur_cpufreq support is used to get current frequency(in KHz) of the
current CPU that the vCPU thread is running on.
Thanks.
--
An old man doll... just what I always wanted! - Clara
On Thu, Mar 30, 2023 at 06:43:47PM -0700, David Dai wrote:
> +ARM_SMCCC_HYP_KVM_UTIL_HINT_FUNC_ID
> + ============== ========= ============================
> + Function ID: (uint32) 0x86000041
> + Arguments: (uint32) util value(0-1024) where 1024 represents
> + the highest performance point normalized
> + across all CPUs
> + Return values: (int32) NOT_SUPPORTED(-1) on error.
> + Endianness: Must be the same endianness
> + as the host.
> + ============== ======== ============================
Sphinx reports htmldocs warning:
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/arm/util_hint.rst:21: WARNING: Malformed table.
Column span alignment problem in table line 8.
============== ========= ============================
Function ID: (uint32) 0x86000041
Arguments: (uint32) util value(0-1024) where 1024 represents
the highest performance point normalized
across all CPUs
Return values: (int32) NOT_SUPPORTED(-1) on error.
Endianness: Must be the same endianness
as the host.
============== ======== ============================
I have to fix the table:
---- >8 ----
diff --git a/Documentation/virt/kvm/arm/util_hint.rst b/Documentation/virt/kvm/arm/util_hint.rst
index 262d142d62d91e..99e5bf99446d90 100644
--- a/Documentation/virt/kvm/arm/util_hint.rst
+++ b/Documentation/virt/kvm/arm/util_hint.rst
@@ -11,7 +11,8 @@ to the host.
This hypercall using the SMC32/HVC32 calling convention:
ARM_SMCCC_HYP_KVM_UTIL_HINT_FUNC_ID
- ============== ========= ============================
+
+ ============== ========= ========================================
Function ID: (uint32) 0x86000041
Arguments: (uint32) util value(0-1024) where 1024 represents
the highest performance point normalized
@@ -19,4 +20,4 @@ ARM_SMCCC_HYP_KVM_UTIL_HINT_FUNC_ID
Return values: (int32) NOT_SUPPORTED(-1) on error.
Endianness: Must be the same endianness
as the host.
- ============== ======== ============================
+ ============== ========= ========================================
Thanks.
--
An old man doll... just what I always wanted! - Clara
On Thu, Mar 30, 2023 at 06:43:48PM -0700, David Dai wrote:
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 38ce33564efc..8f905456e2b4 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8400,6 +8400,14 @@ after normalizing for architecture. This is useful when guests are tracking
> workload on its vCPUs. Util hints allow the host to make more accurate
> frequency selections and task placement for vCPU threads.
>
> +8.42 KVM_CAP_GET_CPUFREQ_TBL
> +---------------------------
> +
> +:Architectures: arm64
> +
> +This capability indicates that the KVM supports getting the
> +frequency table of the current CPU that the vCPU thread is running on.
> +
> 9. Known KVM API problems
> =========================
>
> diff --git a/Documentation/virt/kvm/arm/get_freqtbl.rst b/Documentation/virt/kvm/arm/get_freqtbl.rst
> new file mode 100644
> index 000000000000..f6832d7566e7
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/get_freqtbl.rst
> @@ -0,0 +1,23 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +get_freqtbl support for arm/arm64
> +=============================
> +
> +Allows guest to query the frequency(in KHz) table of the current CPU that
> +the vCPU thread is running on.
> +
> +* ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID: 0x86000042
> +
> +This hypercall uses the SMC32/HVC32 calling convention:
> +
> +ARM_SMCCC_VENDOR_HYP_KVM_GET_CPUFREQ_TBL_FUNC_ID
> + ============== ======== =====================================
> + Function ID: (uint32) 0x86000042
> + Arguments: (uint32) index of the current CPU's frequency table
> + Return Values: (int32) NOT_SUPPORTED(-1) on error, or
> + (uint32) Frequency table entry of requested index
> + in KHz
> + of current CPU(r1)
> + Endianness: Must be the same endianness
> + as the host.
> + ============== ======== =====================================
Sphinx reports htmldocs warnings:
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/api.rst:8404: WARNING: Title underline too short.
8.42 KVM_CAP_GET_CPUFREQ_TBL
---------------------------
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/api.rst:8404: WARNING: Title underline too short.
8.42 KVM_CAP_GET_CPUFREQ_TBL
---------------------------
/home/bagas/repo/linux-kernel/Documentation/virt/kvm/arm/get_freqtbl.rst:4: WARNING: Title underline too short.
get_freqtbl support for arm/arm64
=============================
I have applied the fixup:
---- >8 ----
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index baf8a4c43b5839..3579c470375938 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8401,7 +8401,7 @@ workload on its vCPUs. Util hints allow the host to make more accurate
frequency selections and task placement for vCPU threads.
8.42 KVM_CAP_GET_CPUFREQ_TBL
----------------------------
+----------------------------
:Architectures: arm64
diff --git a/Documentation/virt/kvm/arm/get_freqtbl.rst b/Documentation/virt/kvm/arm/get_freqtbl.rst
index f6832d7566e7e5..215bf0f653e461 100644
--- a/Documentation/virt/kvm/arm/get_freqtbl.rst
+++ b/Documentation/virt/kvm/arm/get_freqtbl.rst
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
get_freqtbl support for arm/arm64
-=============================
+=================================
Allows guest to query the frequency(in KHz) table of the current CPU that
the vCPU thread is running on.
Thanks.
--
An old man doll... just what I always wanted! - Clara
On Fri, Mar 31, 2023 at 5:47 AM Rob Herring <[email protected]> wrote:
>
> On Thu, Mar 30, 2023 at 8:45 PM David Dai <[email protected]> wrote:
> >
> > Add devicetree bindings for a virtual kvm cpufreq driver.
> >
> > Co-developed-by: Saravana Kannan <[email protected]>
> > Signed-off-by: Saravana Kannan <[email protected]>
> > Signed-off-by: David Dai <[email protected]>
> > ---
> > .../bindings/cpufreq/cpufreq-virtual-kvm.yaml | 39 +++++++++++++++++++
> > 1 file changed, 39 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> > new file mode 100644
> > index 000000000000..31e64558a7f1
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-virtual-kvm.yaml
> > @@ -0,0 +1,39 @@
> > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/cpufreq/cpufreq-virtual-kvm.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Virtual KVM CPUFreq
> > +
> > +maintainers:
> > + - David Dai <[email protected]>
> > +
> > +description: |
> > +
> > + KVM CPUFreq is a virtualized driver in guest kernels that sends utilization
> > + of its vCPUs as a hint to the host. The host uses hint to schedule vCPU
> > + threads and select CPU frequency. It enables accurate Per-Entity Load
> > + Tracking for tasks running in the guest by querying host CPU frequency
> > + unless a virtualized FIE exists(Like AMUs).
> > +
> > +properties:
> > + compatible:
> > + const: virtual,kvm-cpufreq
> > +
> > +required:
> > + - compatible
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > + - |
> > + {
> > + #address-cells = <2>;
> > + #size-cells = <2>;
> > +
> > + cpufreq {
> > + compatible = "virtual,kvm-cpufreq";
> > + };
>
> The same thing was tried on non-virtual h/w too. This is not a device
> so it doesn't go in DT. It is just an abuse of DT as a kernel driver
> instantiation mechanism.
Because it has no registers it's reading and writing, right? Yeah,
just went with this for now to make it easy for people to cherry pick
and test it. Maybe we shouldn't have added documentation and made this
look too official.
In the end, I'm expecting this will be a real MMIO device. Until we
move from RFC to PATCH, feel free to ignore this patch.
-Saravana