From ACPI spec[1] below 3 modes for CPPC can be defined:
1. Non autonomous: OS scaling governor specifies operating frequency/
performance level through `Desired Performance` register and PMFW
follows that.
2. Guided autonomous: OS scaling governor specifies min and max
frequencies/ performance levels through `Minimum Performance` and
`Maximum Performance` register, and PMFW can autonomously select an
operating frequency in this range.
3. Fully autonomous: OS only hints (via EPP) to PMFW for the required
energy performance preference for the workload and PMFW autonomously
scales the frequency.
Currently (1) is supported by amd_pstate as passive mode, and (3) is
implemented by EPP support[2]. This change is to support (2).
In guided autonomous mode the min_perf is based on the input from the
scaling governor. For example, in case of schedutil this value depends
on the current utilization. And max_perf is set to max capacity.
To activate guided auto mode ``amd_pstate=guided`` command line
parameter has to be passed in the kernel.
Below are the results (normalized) of benchmarks with this patch:
System: Genoa 96C 192T
Kernel: v6.1-rc6 + patch
Scaling governor: schedutil
================ tbench ================
tbench result comparison: (higher the better)
Here results are throughput (MB/s)
Clients acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 1.16 (16.00 pct) 2.20 (120.00 pct)
2 1.97 (0.00 pct) 2.29 (16.24 pct) 4.38 (122.33 pct)
4 3.95 (0.00 pct) 4.51 (14.17 pct) 8.50 (115.18 pct)
8 7.83 (0.00 pct) 8.89 (13.53 pct) 16.62 (112.26 pct)
16 15.28 (0.00 pct) 16.81 (10.01 pct) 31.02 (103.01 pct)
32 41.64 (0.00 pct) 30.67 (-26.34 pct) 55.63 (33.59 pct)
64 91.29 (0.00 pct) 79.67 (-12.72 pct) 91.74 (0.49 pct)
128 118.06 (0.00 pct) 122.34 (3.62 pct) 122.04 (3.37 pct)
256 260.47 (0.00 pct) 264.31 (1.47 pct) 264.49 (1.54 pct)
512 254.16 (0.00 pct) 245.25 (-3.50 pct) 245.50 (-3.40 pct)
tbench power comparison: (lower the better)
Clients acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 1.00 (0.00 pct) 1.15 (15.00 pct)
2 0.99 (0.00 pct) 1.00 (1.01 pct) 1.17 (18.18 pct)
4 1.01 (0.00 pct) 1.02 (0.99 pct) 1.24 (22.77 pct)
8 1.05 (0.00 pct) 1.06 (0.95 pct) 1.36 (29.52 pct)
16 1.15 (0.00 pct) 1.13 (-1.73 pct) 1.58 (37.39 pct)
32 1.71 (0.00 pct) 1.30 (-23.97 pct) 1.96 (14.61 pct)
64 2.35 (0.00 pct) 2.15 (-8.51 pct) 2.36 (0.42 pct)
128 2.77 (0.00 pct) 2.77 (0.00 pct) 2.78 (0.36 pct)
256 3.39 (0.00 pct) 3.41 (0.58 pct) 3.43 (1.17 pct)
512 3.42 (0.00 pct) 3.40 (-0.58 pct) 3.41 (-0.29 pct)
================ dbench ================
dbench result comparison: (higher the better)
Here results are throughput (MB/s)
Clients acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 0.96 (-4.00 pct) 1.02 (2.00 pct)
2 1.89 (0.00 pct) 1.90 (0.52 pct) 1.91 (1.05 pct)
4 3.39 (0.00 pct) 3.31 (-2.35 pct) 3.38 (-0.29 pct)
8 5.56 (0.00 pct) 5.46 (-1.79 pct) 5.60 (0.71 pct)
16 7.25 (0.00 pct) 7.90 (8.96 pct) 8.29 (14.34 pct)
32 10.85 (0.00 pct) 10.00 (-7.83 pct) 10.40 (-4.14 pct)
64 12.30 (0.00 pct) 11.94 (-2.92 pct) 11.82 (-3.90 pct)
128 12.56 (0.00 pct) 12.30 (-2.07 pct) 12.98 (3.34 pct)
256 6.55 (0.00 pct) 6.54 (-0.15 pct) 7.38 (12.67 pct)
512 1.61 (0.00 pct) 1.58 (-1.86 pct) 1.95 (21.11 pct)
dbench power comparison: (lower the better)
Clients acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 1.01 (1.00 pct) 1.05 (5.00 pct)
2 1.07 (0.00 pct) 1.07 (0.00 pct) 1.09 (1.86 pct)
4 1.15 (0.00 pct) 1.15 (0.00 pct) 1.16 (0.86 pct)
8 1.26 (0.00 pct) 1.26 (0.00 pct) 1.27 (0.79 pct)
16 1.39 (0.00 pct) 1.41 (1.43 pct) 1.43 (2.87 pct)
32 1.60 (0.00 pct) 1.56 (-2.50 pct) 1.59 (-0.62 pct)
64 1.75 (0.00 pct) 1.75 (0.00 pct) 1.74 (-0.57 pct)
128 1.90 (0.00 pct) 1.91 (0.52 pct) 1.93 (1.57 pct)
256 1.76 (0.00 pct) 1.77 (0.56 pct) 1.85 (5.11 pct)
512 1.55 (0.00 pct) 1.49 (-3.87 pct) 1.73 (11.61 pct)
================ git-source ================
git-source result comparison: (higher the better)
Here results are throughput (compilations per 1000 sec)
Threads acpi-cpufreq amd_pst+passive amd_pst+guided
192 1000.00 (0.00 pct) 943.00 (-5.70 pct) 1000.00 (0.00 pct)
git-source power comparison: (lower the better)
Threads acpi-cpufreq amd_pst+passive amd_pst+guided
192 1.00 (0.00 pct) 1.03 (3.00 pct) 1.02 (2.00 pct)
================ kernbench ================
kernbench result comparison: (higher the better)
Here results are throughput (compilations per 1000 sec)
Load acpi-cpufreq amd_pst+passive amd_pst+guided
32 1.00 (0.00 pct) 0.94 (-6.00 pct) 1.02 (2.00 pct)
48 1.24 (0.00 pct) 1.16 (-6.45 pct) 1.24 (0.00 pct)
64 1.35 (0.00 pct) 1.30 (-3.70 pct) 1.39 (2.96 pct)
96 1.42 (0.00 pct) 1.28 (-9.85 pct) 1.48 (4.22 pct)
128 1.39 (0.00 pct) 1.29 (-7.19 pct) 1.41 (1.43 pct)
192 1.32 (0.00 pct) 1.18 (-10.60 pct) 1.32 (0.00 pct)
256 1.28 (0.00 pct) 1.14 (-10.93 pct) 1.29 (0.78 pct)
384 1.28 (0.00 pct) 1.13 (-11.71 pct) 1.27 (-0.78 pct)
git-source power comparison: (lower the better)
Clients acpi-cpufreq amd_pst+passive amd_pst+guided
32 1.00 (0.00 pct) 1.04 (4.00 pct) 0.95 (-5.00 pct)
48 0.83 (0.00 pct) 0.90 (8.43 pct) 0.82 (-1.20 pct)
64 0.80 (0.00 pct) 0.82 (2.50 pct) 0.75 (-6.25 pct)
96 0.77 (0.00 pct) 0.81 (5.19 pct) 0.75 (-2.59 pct)
128 0.78 (0.00 pct) 0.82 (5.12 pct) 0.75 (-3.84 pct)
192 0.84 (0.00 pct) 0.89 (5.95 pct) 0.83 (-1.19 pct)
256 0.84 (0.00 pct) 0.89 (5.95 pct) 0.84 (0.00 pct)
384 0.84 (0.00 pct) 0.90 (7.14 pct) 0.84 (0.00 pct)
[1]: https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
[2]: https://lore.kernel.org/lkml/[email protected]/
Wyes Karny (4):
cpufreq: amd_pstate: Add guided autonomous mode
Documentation: amd_pstate: Move amd_pstate param to alphabetical order
cpufreq: amd_pstate: Expose sysfs interface to control state
Documentation: amd_pstate: Add amd_pstate state sysfs file
.../admin-guide/kernel-parameters.txt | 26 ++-
Documentation/admin-guide/pm/amd-pstate.rst | 11 +
drivers/cpufreq/amd-pstate.c | 202 +++++++++++++++++-
3 files changed, 217 insertions(+), 22 deletions(-)
--
2.34.1
As amd_pstate is a built-in driver (commit 456ca88d8a52 ("cpufreq:
amd-pstate: change amd-pstate driver to be built-in type")), it can't be
unloaded. However, it can be registered/unregistered from the
cpufreq-core at runtime.
Add a sysfs file to show the current amd_pstate status as well as add
register/un-register capability to amd_pstate through this file.
This sysfs interface can be used to change the driver mode dynamically.
To show current state:
#cat /sys/devices/system/cpu/amd_pstate/state
# disable [passive] guided
"[]" indicates current mode.
To disable amd_pstate driver:
#echo disable > /sys/devices/system/cpu/amd_pstate/state
To enable passive mode:
#echo passive > /sys/devices/system/cpu/amd_pstate/state
To change mode from passive to guided:
#echo guided > /sys/devices/system/cpu/amd_pstate/state
Signed-off-by: Wyes Karny <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 142 +++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 8bffcb9f80cf..023c4384a46a 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -50,6 +50,8 @@
#define AMD_PSTATE_TRANSITION_LATENCY 20000
#define AMD_PSTATE_TRANSITION_DELAY 1000
+typedef int (*cppc_mode_transition_fn)(int);
+
enum amd_pstate_mode {
CPPC_DISABLE = 0,
CPPC_PASSIVE,
@@ -87,6 +89,8 @@ static inline int get_mode_idx_from_str(const char *str, size_t size)
return -EINVAL;
}
+static DEFINE_MUTEX(amd_pstate_driver_lock);
+
static inline int pstate_enable(bool enable)
{
return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable);
@@ -623,6 +627,124 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
return sprintf(&buf[0], "%u\n", perf);
}
+static int amd_pstate_driver_cleanup(void)
+{
+ amd_pstate_enable(false);
+ cppc_state = CPPC_DISABLE;
+ return 0;
+}
+
+static int amd_pstate_register_driver(int mode)
+{
+ int ret;
+
+ ret = cpufreq_register_driver(&amd_pstate_driver);
+ if (ret) {
+ amd_pstate_driver_cleanup();
+ return ret;
+ }
+
+ cppc_state = mode;
+ return 0;
+}
+
+static int amd_pstate_unregister_driver(int dummy)
+{
+ int ret;
+
+ ret = cpufreq_unregister_driver(&amd_pstate_driver);
+
+ if (ret)
+ return ret;
+
+ amd_pstate_driver_cleanup();
+ return 0;
+}
+
+static int amd_pstate_change_driver_mode(int mode)
+{
+ cppc_state = mode;
+ return 0;
+}
+
+/* Mode transition table */
+cppc_mode_transition_fn mode_state_machine[CPPC_MODE_MAX][CPPC_MODE_MAX] = {
+ [CPPC_DISABLE] = {
+ [CPPC_DISABLE] = NULL,
+ [CPPC_PASSIVE] = amd_pstate_register_driver,
+ [CPPC_GUIDED] = amd_pstate_register_driver,
+ },
+ [CPPC_PASSIVE] = {
+ [CPPC_DISABLE] = amd_pstate_unregister_driver,
+ [CPPC_PASSIVE] = NULL,
+ [CPPC_GUIDED] = amd_pstate_change_driver_mode,
+ },
+ [CPPC_GUIDED] = {
+ [CPPC_DISABLE] = amd_pstate_unregister_driver,
+ [CPPC_PASSIVE] = amd_pstate_change_driver_mode,
+ [CPPC_GUIDED] = NULL,
+ },
+};
+
+static int amd_pstate_update_status(const char *buf, size_t size)
+{
+ int mode_req = 0;
+
+ mode_req = get_mode_idx_from_str(buf, size);
+
+ if (mode_req < 0 || mode_req >= CPPC_MODE_MAX)
+ return -EINVAL;
+
+ if (mode_state_machine[cppc_state][mode_req])
+ return mode_state_machine[cppc_state][mode_req](mode_req);
+ return -EBUSY;
+}
+
+static ssize_t amd_pstate_show_status(char *buf)
+{
+ int i, j = 0;
+
+ for (i = 0; i < CPPC_MODE_MAX; ++i) {
+ if (i == cppc_state)
+ j += sprintf(buf + j, "[%s] ", amd_pstate_mode_string[i]);
+ else
+ j += sprintf(buf + j, "%s ", amd_pstate_mode_string[i]);
+ }
+ j += sprintf(buf + j, "\n");
+ return j;
+}
+
+static ssize_t state_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ ssize_t ret = 0;
+ char *p = memchr(buf, '\n', count);
+
+ mutex_lock(&amd_pstate_driver_lock);
+ ret = amd_pstate_update_status(buf, p ? p - buf : count);
+ mutex_unlock(&amd_pstate_driver_lock);
+
+ return ret < 0 ? ret : count;
+}
+static ssize_t state_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ int ret;
+
+ mutex_lock(&amd_pstate_driver_lock);
+ ret = amd_pstate_show_status(buf);
+ mutex_unlock(&amd_pstate_driver_lock);
+ return ret;
+}
+
+static struct kobj_attribute state_attr = __ATTR_RW(state);
+static struct attribute *amd_pstate_attrs[] = {
+ &state_attr.attr,
+ NULL,
+};
+
+ATTRIBUTE_GROUPS(amd_pstate);
+
cpufreq_freq_attr_ro(amd_pstate_max_freq);
cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -648,6 +770,25 @@ static struct cpufreq_driver amd_pstate_driver = {
.attr = amd_pstate_attr,
};
+static struct kobject *amd_pstate_kobject;
+
+static void __init amd_pstate_sysfs_expose_param(void)
+{
+ int ret = 0;
+
+ amd_pstate_kobject = kobject_create_and_add("amd_pstate",
+ &cpu_subsys.dev_root->kobj);
+
+ if (WARN_ON(!amd_pstate_kobject))
+ return;
+
+ ret = sysfs_create_groups(amd_pstate_kobject, amd_pstate_groups);
+ if (ret) {
+ pr_err("sysfs group creation failed (%d)", ret);
+ return;
+ }
+}
+
static int __init amd_pstate_init(void)
{
int ret;
@@ -695,6 +836,7 @@ static int __init amd_pstate_init(void)
if (ret)
pr_err("failed to register amd_pstate_driver with return %d\n",
ret);
+ amd_pstate_sysfs_expose_param();
return ret;
}
--
2.34.1
On Wed, Dec 7, 2022 at 4:47 PM Wyes Karny <[email protected]> wrote:
>
> From ACPI spec[1] below 3 modes for CPPC can be defined:
> 1. Non autonomous: OS scaling governor specifies operating frequency/
> performance level through `Desired Performance` register and PMFW
> follows that.
> 2. Guided autonomous: OS scaling governor specifies min and max
> frequencies/ performance levels through `Minimum Performance` and
> `Maximum Performance` register, and PMFW can autonomously select an
> operating frequency in this range.
> 3. Fully autonomous: OS only hints (via EPP) to PMFW for the required
> energy performance preference for the workload and PMFW autonomously
> scales the frequency.
>
> Currently (1) is supported by amd_pstate as passive mode, and (3) is
> implemented by EPP support[2]. This change is to support (2).
Well, can you guys please agree on priorities? Like which one is more
important and so it should go in first?
At this point I'm not sure what the ordering assumptions with respect
to the EPP series are.
Also this submission is late for 6.2, so I will only regard it as 6.3
material on the condition the above gets clarified. In the meantime,
please address review comments and consider resending after 6.2-rc1 is
out.
Thanks!
Hi Rafael,
On 12/8/2022 5:18 PM, Rafael J. Wysocki wrote:
> Well, can you guys please agree on priorities? Like which one is more
> important and so it should go in first?
>
> At this point I'm not sure what the ordering assumptions with respect
> to the EPP series are.
I am working with Perry on this and we will post a new patchset.
Meanwhile, any review comments on this patchset is welcome.
>
> Also this submission is late for 6.2, so I will only regard it as 6.3
> material on the condition the above gets clarified. In the meantime,
> please address review comments and consider resending after 6.2-rc1 is
> out.
Sure, Thanks!
--
Thanks & Regards,
Wyes