2023-02-16 08:18:37

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 0/6] cpufreq: amd-pstate: Add guided autonomous mode support

From ACPI spec[1] below 3 modes for CPPC can be defined:
1. Non autonomous: OS scaling governor specifies operating frequency/
performance level through `Desired Performance` register and platform
follows that.
2. Guided autonomous: OS scaling governor specifies min and max
frequencies/ performance levels through `Minimum Performance` and
`Maximum Performance` register, and platform can autonomously select an
operating frequency in this range.
3. Fully autonomous: OS only hints (via EPP) to platform for the required
energy performance preference for the workload and platform autonomously
scales the frequency.

Currently (1) is supported by amd_pstate as passive mode, and (3) is
implemented by EPP support[2]. This change is to support (2).

In guided autonomous mode the min_perf is based on the input from the
scaling governor. For example, in case of schedutil this value depends
on the current utilization. And max_perf is set to max capacity.

To activate guided auto mode ``amd_pstate=guided`` command line
parameter has to be passed in the kernel.

Below are the results (normalized) of benchmarks with this patch:
System: Genoa 96C 192T
Kernel: 6.2.0-rc2 + EPP v12 + patch
Scaling governor: schedutil

================ dbench comparisons ================
dbench result comparison:
Here results are throughput (MB/s)
Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 1.01 (1.00 pct) 1.02 (2.00 pct)
2 1.07 (0.00 pct) 1.06 (-0.93 pct) 1.07 (0.00 pct)
4 1.68 (0.00 pct) 1.70 (1.19 pct) 1.72 (2.38 pct)
8 2.61 (0.00 pct) 2.68 (2.68 pct) 2.76 (5.74 pct)
16 4.16 (0.00 pct) 4.24 (1.92 pct) 4.53 (8.89 pct)
32 5.98 (0.00 pct) 6.17 (3.17 pct) 7.30 (22.07 pct)
64 8.67 (0.00 pct) 8.99 (3.69 pct) 10.71 (23.52 pct)
128 11.98 (0.00 pct) 12.52 (4.50 pct) 14.67 (22.45 pct)
256 15.73 (0.00 pct) 16.13 (2.54 pct) 17.81 (13.22 pct)
512 15.77 (0.00 pct) 16.32 (3.48 pct) 16.39 (3.93 pct)
dbench power comparison:
Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 1.00 (0.00 pct) 1.04 (4.00 pct)
2 0.99 (0.00 pct) 0.97 (-2.02 pct) 1.02 (3.03 pct)
4 0.98 (0.00 pct) 0.98 (0.00 pct) 1.02 (4.08 pct)
8 0.98 (0.00 pct) 0.99 (1.02 pct) 1.02 (4.08 pct)
16 0.99 (0.00 pct) 1.00 (1.01 pct) 1.04 (5.05 pct)
32 1.02 (0.00 pct) 1.02 (0.00 pct) 1.07 (4.90 pct)
64 1.05 (0.00 pct) 1.05 (0.00 pct) 1.11 (5.71 pct)
128 1.08 (0.00 pct) 1.08 (0.00 pct) 1.15 (6.48 pct)
256 1.12 (0.00 pct) 1.12 (0.00 pct) 1.20 (7.14 pct)
512 1.18 (0.00 pct) 1.17 (-0.84 pct) 1.26 (6.77 pct)

================ git-source comparisons ================
git-source result comparison:
Here results are throughput (compilations per 1000 sec)
Threads: acpi-cpufreq amd_pst+passive amd_pst+guided
192 1.00 (0.00 pct) 0.93 (-7.00 pct) 1.00 (0.00 pct)
git-source power comparison:
Threads: acpi-cpufreq amd_pst+passive amd_pst+guided
192 1.00 (0.00 pct) 1.00 (0.00 pct) 0.96 (-4.00 pct)

================ kernbench comparisons ================
kernbench result comparison:
Here results are throughput (compilations per 1000 sec)
Load: acpi-cpufreq amd_pst+passive amd_pst+guided
32 1.00 (0.00 pct) 1.01 (1.00 pct) 1.02 (2.00 pct)
48 1.26 (0.00 pct) 1.28 (1.58 pct) 1.25 (-0.79 pct)
64 1.39 (0.00 pct) 1.47 (5.75 pct) 1.43 (2.87 pct)
96 1.48 (0.00 pct) 1.50 (1.35 pct) 1.49 (0.67 pct)
128 1.29 (0.00 pct) 1.32 (2.32 pct) 1.33 (3.10 pct)
192 1.17 (0.00 pct) 1.20 (2.56 pct) 1.21 (3.41 pct)
256 1.17 (0.00 pct) 1.18 (0.85 pct) 1.20 (2.56 pct)
384 1.16 (0.00 pct) 1.17 (0.86 pct) 1.21 (4.31 pct)
kernbench power comparison:
Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
32 1.00 (0.00 pct) 0.97 (-3.00 pct) 1.00 (0.00 pct)
48 0.87 (0.00 pct) 0.81 (-6.89 pct) 0.88 (1.14 pct)
64 0.81 (0.00 pct) 0.73 (-9.87 pct) 0.77 (-4.93 pct)
96 0.75 (0.00 pct) 0.74 (-1.33 pct) 0.75 (0.00 pct)
128 0.83 (0.00 pct) 0.79 (-4.81 pct) 0.83 (0.00 pct)
192 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
256 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
384 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)

================ tbench comparisons ================
tbench result comparison:
Here results are throughput (MB/s)
Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 0.70 (-30.00 pct) 1.37 (37.00 pct)
2 2.64 (0.00 pct) 1.39 (-47.34 pct) 2.70 (2.27 pct)
4 4.89 (0.00 pct) 2.75 (-43.76 pct) 5.28 (7.97 pct)
8 9.46 (0.00 pct) 5.42 (-42.70 pct) 10.22 (8.03 pct)
16 19.05 (0.00 pct) 10.42 (-45.30 pct) 19.94 (4.67 pct)
32 37.50 (0.00 pct) 20.23 (-46.05 pct) 36.87 (-1.68 pct)
64 61.24 (0.00 pct) 43.08 (-29.65 pct) 62.96 (2.80 pct)
128 67.16 (0.00 pct) 69.08 (2.85 pct) 67.34 (0.26 pct)
256 154.59 (0.00 pct) 162.33 (5.00 pct) 156.78 (1.41 pct)
512 154.02 (0.00 pct) 156.74 (1.76 pct) 153.48 (-0.35 pct)
tbench power comparison:
Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
1 1.00 (0.00 pct) 0.97 (-3.00 pct) 1.08 (8.00 pct)
2 1.04 (0.00 pct) 0.97 (-6.73 pct) 1.11 (6.73 pct)
4 1.12 (0.00 pct) 0.99 (-11.60 pct) 1.18 (5.35 pct)
8 1.25 (0.00 pct) 1.04 (-16.80 pct) 1.31 (4.80 pct)
16 1.53 (0.00 pct) 1.13 (-26.14 pct) 1.58 (3.26 pct)
32 2.01 (0.00 pct) 1.36 (-32.33 pct) 2.03 (0.99 pct)
64 2.58 (0.00 pct) 2.14 (-17.05 pct) 2.61 (1.16 pct)
128 2.80 (0.00 pct) 2.81 (0.35 pct) 2.81 (0.35 pct)
256 3.39 (0.00 pct) 3.43 (1.17 pct) 3.42 (0.88 pct)
512 3.44 (0.00 pct) 3.44 (0.00 pct) 3.44 (0.00 pct)

Note: this series is based on top of EPP v12 [3] series

Change log:

v6 -> v7:
- Addressed comments by Ray
- Reorder and rebase patches
- Pick up Ack by Ray

v5 -> v6:
- Don't return -EBUSY when changing to same mode

v4 -> v5:
- Rebased on top of EPP v12 series
- Addressed comments form Mario regarding documentation
- Picked up RB flags from Mario and Bagas Sanjaya

v3 -> v4:
- Fixed active mode low frequency issue reported by Peter Jung and Tor Vic
- Documentation modification suggested by Bagas Sanjaya

v2 -> v3:
- Addressed review comments form Mario.
- Picked up RB tag from Mario.
- Rebase on top of EPP v11 [3].

v1 -> v2:
- Fix issue with shared mem systems.
- Rebase on top of EPP series.

[1]: https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: https://lore.kernel.org/linux-pm/[email protected]/

Wyes Karny (6):
acpi: cppc: Add min and max perf reg writing support
acpi: cppc: Add auto select register read/write support
Documentation: cpufreq: amd-pstate: Move amd_pstate param to
alphabetical order
cpufreq: amd-pstate: Add guided autonomous mode
cpufreq: amd-pstate: Add guided mode control support via sysfs
Documentation: cpufreq: amd-pstate: Update amd_pstate status sysfs for
guided

.../admin-guide/kernel-parameters.txt | 40 ++--
Documentation/admin-guide/pm/amd-pstate.rst | 31 ++-
drivers/acpi/cppc_acpi.c | 121 +++++++++++-
drivers/cpufreq/amd-pstate.c | 177 +++++++++++++-----
include/acpi/cppc_acpi.h | 11 ++
include/linux/amd-pstate.h | 2 +
6 files changed, 302 insertions(+), 80 deletions(-)

--
2.34.1



2023-02-16 08:19:05

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 1/6] acpi: cppc: Add min and max perf reg writing support

Currently writing of min and max perf register is deferred in
cppc_set_perf function. In CPPC guided mode, these registers needed to
be written to guide the platform about min and max perf levels. Add this support
to make guided mode work properly on AMD shared memory systems.

Acked-by: Huang Rui <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Wyes Karny <[email protected]>
---
drivers/acpi/cppc_acpi.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 09aa4c4f9bf5..ad19e4a91e4e 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1488,7 +1488,7 @@ EXPORT_SYMBOL_GPL(cppc_set_enable);
int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
{
struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
- struct cpc_register_resource *desired_reg;
+ struct cpc_register_resource *desired_reg, *min_perf_reg, *max_perf_reg;
int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
struct cppc_pcc_data *pcc_ss_data = NULL;
int ret = 0;
@@ -1499,6 +1499,8 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
}

desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF];
+ min_perf_reg = &cpc_desc->cpc_regs[MIN_PERF];
+ max_perf_reg = &cpc_desc->cpc_regs[MAX_PERF];

/*
* This is Phase-I where we want to write to CPC registers
@@ -1507,7 +1509,7 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
* Since read_lock can be acquired by multiple CPUs simultaneously we
* achieve that goal here
*/
- if (CPC_IN_PCC(desired_reg)) {
+ if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(min_perf_reg) || CPC_IN_PCC(max_perf_reg)) {
if (pcc_ss_id < 0) {
pr_debug("Invalid pcc_ss_id\n");
return -ENODEV;
@@ -1530,13 +1532,19 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
cpc_desc->write_cmd_status = 0;
}

- /*
- * Skip writing MIN/MAX until Linux knows how to come up with
- * useful values.
- */
cpc_write(cpu, desired_reg, perf_ctrls->desired_perf);

- if (CPC_IN_PCC(desired_reg))
+ /**
+ * Only write if min_perf and max_perf not zero. Some drivers pass zero
+ * value to min and max perf, but they don't mean to set the zero value,
+ * they just don't want to write to those registers.
+ */
+ if (perf_ctrls->min_perf)
+ cpc_write(cpu, min_perf_reg, perf_ctrls->min_perf);
+ if (perf_ctrls->max_perf)
+ cpc_write(cpu, max_perf_reg, perf_ctrls->max_perf);
+
+ if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(min_perf_reg) || CPC_IN_PCC(max_perf_reg))
up_read(&pcc_ss_data->pcc_lock); /* END Phase-I */
/*
* This is Phase-II where we transfer the ownership of PCC to Platform
@@ -1584,7 +1592,7 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
* case during a CMD_READ and if there are pending writes it delivers
* the write command before servicing the read command
*/
- if (CPC_IN_PCC(desired_reg)) {
+ if (CPC_IN_PCC(desired_reg) || CPC_IN_PCC(min_perf_reg) || CPC_IN_PCC(max_perf_reg)) {
if (down_write_trylock(&pcc_ss_data->pcc_lock)) {/* BEGIN Phase-II */
/* Update only if there are pending write commands */
if (pcc_ss_data->pending_pcc_write_cmd)
--
2.34.1


2023-02-16 08:19:38

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 2/6] acpi: cppc: Add auto select register read/write support

For some AMD shared memory based systems, the autonomous selection bit
needed to be set explicitly. Add autonomous selection register related
APIs to acpi driver, which amd_pstate driver uses later.

Acked-by: Huang Rui <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Wyes Karny <[email protected]>
---
drivers/acpi/cppc_acpi.c | 97 ++++++++++++++++++++++++++++++++++++++++
include/acpi/cppc_acpi.h | 11 +++++
2 files changed, 108 insertions(+)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index ad19e4a91e4e..e2fbc3c92891 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1433,6 +1433,103 @@ int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
}
EXPORT_SYMBOL_GPL(cppc_set_epp_perf);

+/*
+ * cppc_get_auto_sel_caps - Read autonomous selection register.
+ * @cpunum : CPU from which to read register.
+ * @perf_caps : struct where autonomous selection register value is updated.
+ */
+int cppc_get_auto_sel_caps(int cpunum, struct cppc_perf_caps *perf_caps)
+{
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
+ struct cpc_register_resource *auto_sel_reg;
+ u64 auto_sel;
+
+ if (!cpc_desc) {
+ pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
+ return -ENODEV;
+ }
+
+ auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
+
+ if (!CPC_SUPPORTED(auto_sel_reg))
+ pr_warn_once("Autonomous mode is not unsupported!\n");
+
+ if (CPC_IN_PCC(auto_sel_reg)) {
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = 0;
+
+ if (pcc_ss_id < 0)
+ return -ENODEV;
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+
+ if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
+ cpc_read(cpunum, auto_sel_reg, &auto_sel);
+ perf_caps->auto_sel = (bool)auto_sel;
+ } else {
+ ret = -EIO;
+ }
+
+ up_write(&pcc_ss_data->pcc_lock);
+
+ return ret;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cppc_get_auto_sel_caps);
+
+/*
+ * cppc_set_auto_sel - Write autonomous selection register.
+ * @cpu : CPU to which to write register.
+ * @enable : the desired value of autonomous selection resiter to be updated.
+ */
+int cppc_set_auto_sel(int cpu, bool enable)
+{
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
+ struct cpc_register_resource *auto_sel_reg;
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = -EINVAL;
+
+ if (!cpc_desc) {
+ pr_debug("No CPC descriptor for CPU:%d\n", cpu);
+ return -ENODEV;
+ }
+
+ auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
+
+ if (CPC_IN_PCC(auto_sel_reg)) {
+ if (pcc_ss_id < 0) {
+ pr_debug("Invalid pcc_ss_id\n");
+ return -ENODEV;
+ }
+
+ if (CPC_SUPPORTED(auto_sel_reg)) {
+ ret = cpc_write(cpu, auto_sel_reg, enable);
+ if (ret)
+ return ret;
+ }
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+ /* after writing CPC, transfer the ownership of PCC to platform */
+ ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
+ up_write(&pcc_ss_data->pcc_lock);
+ } else {
+ ret = -ENOTSUPP;
+ pr_debug("_CPC in PCC is not supported\n");
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cppc_set_auto_sel);
+
+
/**
* cppc_set_enable - Set to enable CPPC on the processor by writing the
* Continuous Performance Control package EnableRegister field.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 6b487a5bd638..6126c977ece0 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -109,6 +109,7 @@ struct cppc_perf_caps {
u32 lowest_freq;
u32 nominal_freq;
u32 energy_perf;
+ bool auto_sel;
};

struct cppc_perf_ctrls {
@@ -153,6 +154,8 @@ extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
+extern int cppc_get_auto_sel_caps(int cpunum, struct cppc_perf_caps *perf_caps);
+extern int cppc_set_auto_sel(int cpu, bool enable);
#else /* !CONFIG_ACPI_CPPC_LIB */
static inline int cppc_get_desired_perf(int cpunum, u64 *desired_perf)
{
@@ -214,6 +217,14 @@ static inline int cppc_get_epp_perf(int cpunum, u64 *epp_perf)
{
return -ENOTSUPP;
}
+static inline int cppc_set_auto_sel(int cpu, bool enable)
+{
+ return -ENOTSUPP;
+}
+static inline int cppc_get_auto_sel_caps(int cpunum, struct cppc_perf_caps *perf_caps)
+{
+ return -ENOTSUPP;
+}
#endif /* !CONFIG_ACPI_CPPC_LIB */

#endif /* _CPPC_ACPI_H*/
--
2.34.1


2023-02-16 08:19:54

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 3/6] Documentation: cpufreq: amd-pstate: Move amd_pstate param to alphabetical order

Move amd_pstate command line param description to correct alphabetical
order.

Acked-by: Huang Rui <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Wyes Karny <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 35 ++++++++++---------
1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e3618dfdb36a..6ffcfb73e62f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -339,6 +339,24 @@
This mode requires kvm-amd.avic=1.
(Default when IOMMU HW support is present.)

+ amd_pstate= [X86]
+ disable
+ Do not enable amd_pstate as the default
+ scaling driver for the supported processors
+ passive
+ Use amd_pstate as a scaling driver, driver requests a
+ desired performance on this abstract scale and the power
+ management firmware translates the requests into actual
+ hardware states (core frequency, data fabric and memory
+ clocks etc.)
+ active
+ Use amd_pstate_epp driver instance as the scaling driver,
+ driver provides a hint to the hardware if software wants
+ to bias toward performance (0x0) or energy efficiency (0xff)
+ to the CPPC firmware. then CPPC power algorithm will
+ calculate the runtime workload and adjust the realtime cores
+ frequency.
+
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -7010,20 +7028,3 @@
xmon commands.
off xmon is disabled.

- amd_pstate= [X86]
- disable
- Do not enable amd_pstate as the default
- scaling driver for the supported processors
- passive
- Use amd_pstate as a scaling driver, driver requests a
- desired performance on this abstract scale and the power
- management firmware translates the requests into actual
- hardware states (core frequency, data fabric and memory
- clocks etc.)
- active
- Use amd_pstate_epp driver instance as the scaling driver,
- driver provides a hint to the hardware if software wants
- to bias toward performance (0x0) or energy efficiency (0xff)
- to the CPPC firmware. then CPPC power algorithm will
- calculate the runtime workload and adjust the realtime cores
- frequency.
--
2.34.1


2023-02-16 08:20:28

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 4/6] cpufreq: amd-pstate: Add guided autonomous mode

From ACPI spec below 3 modes for CPPC can be defined:
1. Non autonomous: OS scaling governor specifies operating frequency/
performance level through `Desired Performance` register and platform
follows that.
2. Guided autonomous: OS scaling governor specifies min and max
frequencies/ performance levels through `Minimum Performance` and
`Maximum Performance` register, and platform can autonomously select an
operating frequency in this range.
3. Fully autonomous: OS only hints (via EPP) to platform for the required
energy performance preference for the workload and platform autonomously
scales the frequency.

Currently (1) is supported by amd_pstate as passive mode, and (3) is
implemented by EPP support. This change is to support (2).

In guided autonomous mode the min_perf is based on the input from the
scaling governor. For example, in case of schedutil this value depends
on the current utilization. And max_perf is set to max capacity.

To activate guided auto mode ``amd_pstate=guided`` command line
parameter has to be passed in the kernel.

Acked-by: Huang Rui <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Wyes Karny <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 15 +++++---
drivers/cpufreq/amd-pstate.c | 34 +++++++++++++++----
include/linux/amd-pstate.h | 2 ++
3 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6ffcfb73e62f..e1241141535b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -344,11 +344,11 @@
Do not enable amd_pstate as the default
scaling driver for the supported processors
passive
- Use amd_pstate as a scaling driver, driver requests a
- desired performance on this abstract scale and the power
- management firmware translates the requests into actual
- hardware states (core frequency, data fabric and memory
- clocks etc.)
+ Use amd_pstate with passive mode as a scaling driver.
+ In this mode autonomous selection is disabled.
+ Driver requests a desired performance level and platform
+ tries to match the same performance level if it is
+ satisfied by guaranteed performance level.
active
Use amd_pstate_epp driver instance as the scaling driver,
driver provides a hint to the hardware if software wants
@@ -356,6 +356,11 @@
to the CPPC firmware. then CPPC power algorithm will
calculate the runtime workload and adjust the realtime cores
frequency.
+ guided
+ Activate guided autonomous mode. Driver requests minimum and
+ maximum performance level and the platform autonomously
+ selects a performance level in this range and appropriate
+ to the current workload.

amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index b8862afef4e4..92f8402ebb34 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -308,7 +308,22 @@ static int cppc_init_perf(struct amd_cpudata *cpudata)
cppc_perf.lowest_nonlinear_perf);
WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);

- return 0;
+ if (cppc_state == AMD_PSTATE_ACTIVE)
+ return 0;
+
+ ret = cppc_get_auto_sel_caps(cpudata->cpu, &cppc_perf);
+ if (ret) {
+ pr_warn("failed to get auto_sel, ret: %d\n", ret);
+ return 0;
+ }
+
+ ret = cppc_set_auto_sel(cpudata->cpu,
+ (cppc_state == AMD_PSTATE_PASSIVE) ? 0 : 1);
+
+ if (ret)
+ pr_warn("failed to set auto_sel, ret: %d\n", ret);
+
+ return ret;
}

DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
@@ -385,12 +400,18 @@ static inline bool amd_pstate_sample(struct amd_cpudata *cpudata)
}

static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
- u32 des_perf, u32 max_perf, bool fast_switch)
+ u32 des_perf, u32 max_perf, bool fast_switch, int gov_flags)
{
u64 prev = READ_ONCE(cpudata->cppc_req_cached);
u64 value = prev;

des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf);
+
+ if ((cppc_state == AMD_PSTATE_GUIDED) && (gov_flags & CPUFREQ_GOV_DYNAMIC_SWITCHING)) {
+ min_perf = des_perf;
+ des_perf = 0;
+ }
+
value &= ~AMD_CPPC_MIN_PERF(~0L);
value |= AMD_CPPC_MIN_PERF(min_perf);

@@ -445,7 +466,7 @@ static int amd_pstate_target(struct cpufreq_policy *policy,

cpufreq_freq_transition_begin(policy, &freqs);
amd_pstate_update(cpudata, min_perf, des_perf,
- max_perf, false);
+ max_perf, false, policy->governor->flags);
cpufreq_freq_transition_end(policy, &freqs, false);

return 0;
@@ -479,7 +500,8 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
if (max_perf < min_perf)
max_perf = min_perf;

- amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true);
+ amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true,
+ policy->governor->flags);
cpufreq_cpu_put(policy);
}

@@ -1278,7 +1300,7 @@ static int __init amd_pstate_init(void)
/* capability check */
if (boot_cpu_has(X86_FEATURE_CPPC)) {
pr_debug("AMD CPPC MSR based functionality is supported\n");
- if (cppc_state == AMD_PSTATE_PASSIVE)
+ if (cppc_state != AMD_PSTATE_ACTIVE)
current_pstate_driver->adjust_perf = amd_pstate_adjust_perf;
} else {
pr_debug("AMD CPPC shared memory based functionality is supported\n");
@@ -1340,7 +1362,7 @@ static int __init amd_pstate_param(char *str)
if (cppc_state == AMD_PSTATE_ACTIVE)
current_pstate_driver = &amd_pstate_epp_driver;

- if (cppc_state == AMD_PSTATE_PASSIVE)
+ if (cppc_state == AMD_PSTATE_PASSIVE || cppc_state == AMD_PSTATE_GUIDED)
current_pstate_driver = &amd_pstate_driver;

return 0;
diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
index f5f22418e64b..c10ebf8c42e6 100644
--- a/include/linux/amd-pstate.h
+++ b/include/linux/amd-pstate.h
@@ -97,6 +97,7 @@ enum amd_pstate_mode {
AMD_PSTATE_DISABLE = 0,
AMD_PSTATE_PASSIVE,
AMD_PSTATE_ACTIVE,
+ AMD_PSTATE_GUIDED,
AMD_PSTATE_MAX,
};

@@ -104,6 +105,7 @@ static const char * const amd_pstate_mode_string[] = {
[AMD_PSTATE_DISABLE] = "disable",
[AMD_PSTATE_PASSIVE] = "passive",
[AMD_PSTATE_ACTIVE] = "active",
+ [AMD_PSTATE_GUIDED] = "guided",
NULL,
};
#endif /* _LINUX_AMD_PSTATE_H */
--
2.34.1


2023-02-16 08:20:51

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 5/6] cpufreq: amd-pstate: Add guided mode control support via sysfs

amd_pstate driver's `status` sysfs entry helps to control the driver's
mode dynamically by user. After the addition of guided mode the
combinations of mode transitions have been increased (16 combinations).
Therefore optimise the amd_pstate_update_status function by implementing
a state transition table.

There are 4 states amd_pstate supports, namely: 'disable', 'passive',
'active', and 'guided'. The transition from any state to any other
state is possible after this change.

Sysfs interface:

To disable amd_pstate driver:
# echo disable > /sys/devices/system/cpu/amd_pstate/status

To enable passive mode:
# echo passive > /sys/devices/system/cpu/amd_pstate/status

To change mode to active:
# echo active > /sys/devices/system/cpu/amd_pstate/status

To change mode to guided:
# echo guided > /sys/devices/system/cpu/amd_pstate/status

Acked-by: Huang Rui <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Wyes Karny <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 143 +++++++++++++++++++++++++----------
1 file changed, 101 insertions(+), 42 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 92f8402ebb34..3ff047bbcfe6 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -106,6 +106,8 @@ static unsigned int epp_values[] = {
[EPP_INDEX_POWERSAVE] = AMD_CPPC_EPP_POWERSAVE,
};

+typedef int (*cppc_mode_transition_fn)(int);
+
static inline int get_mode_idx_from_str(const char *str, size_t size)
{
int i;
@@ -838,6 +840,98 @@ static ssize_t show_energy_performance_preference(
return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
}

+static void amd_pstate_driver_cleanup(void)
+{
+ amd_pstate_enable(false);
+ cppc_state = AMD_PSTATE_DISABLE;
+ current_pstate_driver = NULL;
+}
+
+static int amd_pstate_register_driver(int mode)
+{
+ int ret;
+
+ if (mode == AMD_PSTATE_PASSIVE || mode == AMD_PSTATE_GUIDED)
+ current_pstate_driver = &amd_pstate_driver;
+ else if (mode == AMD_PSTATE_ACTIVE)
+ current_pstate_driver = &amd_pstate_epp_driver;
+ else
+ return -EINVAL;
+
+ cppc_state = mode;
+ ret = cpufreq_register_driver(current_pstate_driver);
+ if (ret) {
+ amd_pstate_driver_cleanup();
+ return ret;
+ }
+ return 0;
+}
+
+static int amd_pstate_unregister_driver(int dummy)
+{
+ cpufreq_unregister_driver(current_pstate_driver);
+ amd_pstate_driver_cleanup();
+ return 0;
+}
+
+static int amd_pstate_change_mode_without_dvr_change(int mode)
+{
+ int cpu = 0;
+
+ cppc_state = mode;
+
+ if (boot_cpu_has(X86_FEATURE_CPPC) || cppc_state == AMD_PSTATE_ACTIVE)
+ return 0;
+
+ for_each_present_cpu(cpu) {
+ cppc_set_auto_sel(cpu, (cppc_state == AMD_PSTATE_PASSIVE) ? 0 : 1);
+ }
+
+ return 0;
+}
+
+static int amd_pstate_change_driver_mode(int mode)
+{
+ int ret;
+
+ ret = amd_pstate_unregister_driver(0);
+ if (ret)
+ return ret;
+
+ ret = amd_pstate_register_driver(mode);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+cppc_mode_transition_fn mode_state_machine[AMD_PSTATE_MAX][AMD_PSTATE_MAX] = {
+ [AMD_PSTATE_DISABLE] = {
+ [AMD_PSTATE_DISABLE] = NULL,
+ [AMD_PSTATE_PASSIVE] = amd_pstate_register_driver,
+ [AMD_PSTATE_ACTIVE] = amd_pstate_register_driver,
+ [AMD_PSTATE_GUIDED] = amd_pstate_register_driver,
+ },
+ [AMD_PSTATE_PASSIVE] = {
+ [AMD_PSTATE_DISABLE] = amd_pstate_unregister_driver,
+ [AMD_PSTATE_PASSIVE] = NULL,
+ [AMD_PSTATE_ACTIVE] = amd_pstate_change_driver_mode,
+ [AMD_PSTATE_GUIDED] = amd_pstate_change_mode_without_dvr_change,
+ },
+ [AMD_PSTATE_ACTIVE] = {
+ [AMD_PSTATE_DISABLE] = amd_pstate_unregister_driver,
+ [AMD_PSTATE_PASSIVE] = amd_pstate_change_driver_mode,
+ [AMD_PSTATE_ACTIVE] = NULL,
+ [AMD_PSTATE_GUIDED] = amd_pstate_change_driver_mode,
+ },
+ [AMD_PSTATE_GUIDED] = {
+ [AMD_PSTATE_DISABLE] = amd_pstate_unregister_driver,
+ [AMD_PSTATE_PASSIVE] = amd_pstate_change_mode_without_dvr_change,
+ [AMD_PSTATE_ACTIVE] = amd_pstate_change_driver_mode,
+ [AMD_PSTATE_GUIDED] = NULL,
+ },
+};
+
static ssize_t amd_pstate_show_status(char *buf)
{
if (!current_pstate_driver)
@@ -846,57 +940,22 @@ static ssize_t amd_pstate_show_status(char *buf)
return sysfs_emit(buf, "%s\n", amd_pstate_mode_string[cppc_state]);
}

-static void amd_pstate_driver_cleanup(void)
-{
- current_pstate_driver = NULL;
-}
-
static int amd_pstate_update_status(const char *buf, size_t size)
{
- int ret = 0;
int mode_idx;

- if (size > 7 || size < 6)
+ if (size > strlen("passive") || size < strlen("active"))
return -EINVAL;
- mode_idx = get_mode_idx_from_str(buf, size);

- switch(mode_idx) {
- case AMD_PSTATE_DISABLE:
- if (!current_pstate_driver)
- return -EINVAL;
- if (cppc_state == AMD_PSTATE_ACTIVE)
- return -EBUSY;
- cpufreq_unregister_driver(current_pstate_driver);
- amd_pstate_driver_cleanup();
- break;
- case AMD_PSTATE_PASSIVE:
- if (current_pstate_driver) {
- if (current_pstate_driver == &amd_pstate_driver)
- return 0;
- cpufreq_unregister_driver(current_pstate_driver);
- cppc_state = AMD_PSTATE_PASSIVE;
- current_pstate_driver = &amd_pstate_driver;
- }
+ mode_idx = get_mode_idx_from_str(buf, size);

- ret = cpufreq_register_driver(current_pstate_driver);
- break;
- case AMD_PSTATE_ACTIVE:
- if (current_pstate_driver) {
- if (current_pstate_driver == &amd_pstate_epp_driver)
- return 0;
- cpufreq_unregister_driver(current_pstate_driver);
- current_pstate_driver = &amd_pstate_epp_driver;
- cppc_state = AMD_PSTATE_ACTIVE;
- }
+ if (mode_idx < 0 || mode_idx >= AMD_PSTATE_MAX)
+ return -EINVAL;

- ret = cpufreq_register_driver(current_pstate_driver);
- break;
- default:
- ret = -EINVAL;
- break;
- }
+ if (mode_state_machine[cppc_state][mode_idx])
+ return mode_state_machine[cppc_state][mode_idx](mode_idx);

- return ret;
+ return 0;
}

static ssize_t show_status(struct kobject *kobj,
--
2.34.1


2023-02-16 08:21:30

by Wyes Karny

[permalink] [raw]
Subject: [PATCH v7 6/6] Documentation: cpufreq: amd-pstate: Update amd_pstate status sysfs for guided

Update amd_pstate status sysfs for guided mode.

Acked-by: Huang Rui <[email protected]>
Reviewed-by: Bagas Sanjaya <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Wyes Karny <[email protected]>
---
Documentation/admin-guide/pm/amd-pstate.rst | 31 ++++++++++++++++-----
1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
index 5304adf2fc2f..95d2d0a803fe 100644
--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -303,13 +303,18 @@ efficiency frequency management method on AMD processors.
AMD Pstate Driver Operation Modes
=================================

-``amd_pstate`` CPPC has two operation modes: CPPC Autonomous(active) mode and
-CPPC non-autonomous(passive) mode.
-active mode and passive mode can be chosen by different kernel parameters.
-When in Autonomous mode, CPPC ignores requests done in the Desired Performance
-Target register and takes into account only the values set to the Minimum requested
-performance, Maximum requested performance, and Energy Performance Preference
-registers. When Autonomous is disabled, it only considers the Desired Performance Target.
+``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode,
+non-autonomous (passive) mode and guided autonomous (guided) mode.
+Active/passive/guided mode can be chosen by different kernel parameters.
+
+- In autonomous mode, platform ignores the desired performance level request
+ and takes into account only the values set to the minimum, maximum and energy
+ performance preference registers.
+- In non-autonomous mode, platform gets desired performance level
+ from OS directly through Desired Performance Register.
+- In guided-autonomous mode, platform sets operating performance level
+ autonomously according to the current workload and within the limits set by
+ OS through min and max performance registers.

Active Mode
------------
@@ -338,6 +343,15 @@ to the Performance Reduction Tolerance register. Above the nominal performance l
processor must provide at least nominal performance requested and go higher if current
operating conditions allow.

+Guided Mode
+-----------
+
+``amd_pstate=guided``
+
+If ``amd_pstate=guided`` is passed to kernel command line option then this mode
+is activated. In this mode, driver requests minimum and maximum performance
+level and the platform autonomously selects a performance level in this range
+and appropriate to the current workload.

User Space Interface in ``sysfs``
=================================
@@ -358,6 +372,9 @@ control its functionality at the system level. They are located in the
"passive"
The driver is functional and in the ``passive mode``

+ "guided"
+ The driver is functional and in the ``guided mode``
+
"disable"
The driver is unregistered and not functional now.

--
2.34.1


2023-02-16 08:35:41

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 0/6] cpufreq: amd-pstate: Add guided autonomous mode support

On Thu, Feb 16, 2023 at 04:17:56PM +0800, Karny, Wyes wrote:
> From ACPI spec[1] below 3 modes for CPPC can be defined:
> 1. Non autonomous: OS scaling governor specifies operating frequency/
> performance level through `Desired Performance` register and platform
> follows that.
> 2. Guided autonomous: OS scaling governor specifies min and max
> frequencies/ performance levels through `Minimum Performance` and
> `Maximum Performance` register, and platform can autonomously select an
> operating frequency in this range.
> 3. Fully autonomous: OS only hints (via EPP) to platform for the required
> energy performance preference for the workload and platform autonomously
> scales the frequency.
>
> Currently (1) is supported by amd_pstate as passive mode, and (3) is
> implemented by EPP support[2]. This change is to support (2).
>
> In guided autonomous mode the min_perf is based on the input from the
> scaling governor. For example, in case of schedutil this value depends
> on the current utilization. And max_perf is set to max capacity.
>
> To activate guided auto mode ``amd_pstate=guided`` command line
> parameter has to be passed in the kernel.
>
> Below are the results (normalized) of benchmarks with this patch:
> System: Genoa 96C 192T
> Kernel: 6.2.0-rc2 + EPP v12 + patch
> Scaling governor: schedutil
>
> ================ dbench comparisons ================
> dbench result comparison:
> Here results are throughput (MB/s)
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 1.01 (1.00 pct) 1.02 (2.00 pct)
> 2 1.07 (0.00 pct) 1.06 (-0.93 pct) 1.07 (0.00 pct)
> 4 1.68 (0.00 pct) 1.70 (1.19 pct) 1.72 (2.38 pct)
> 8 2.61 (0.00 pct) 2.68 (2.68 pct) 2.76 (5.74 pct)
> 16 4.16 (0.00 pct) 4.24 (1.92 pct) 4.53 (8.89 pct)
> 32 5.98 (0.00 pct) 6.17 (3.17 pct) 7.30 (22.07 pct)
> 64 8.67 (0.00 pct) 8.99 (3.69 pct) 10.71 (23.52 pct)
> 128 11.98 (0.00 pct) 12.52 (4.50 pct) 14.67 (22.45 pct)
> 256 15.73 (0.00 pct) 16.13 (2.54 pct) 17.81 (13.22 pct)
> 512 15.77 (0.00 pct) 16.32 (3.48 pct) 16.39 (3.93 pct)
> dbench power comparison:
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 1.00 (0.00 pct) 1.04 (4.00 pct)
> 2 0.99 (0.00 pct) 0.97 (-2.02 pct) 1.02 (3.03 pct)
> 4 0.98 (0.00 pct) 0.98 (0.00 pct) 1.02 (4.08 pct)
> 8 0.98 (0.00 pct) 0.99 (1.02 pct) 1.02 (4.08 pct)
> 16 0.99 (0.00 pct) 1.00 (1.01 pct) 1.04 (5.05 pct)
> 32 1.02 (0.00 pct) 1.02 (0.00 pct) 1.07 (4.90 pct)
> 64 1.05 (0.00 pct) 1.05 (0.00 pct) 1.11 (5.71 pct)
> 128 1.08 (0.00 pct) 1.08 (0.00 pct) 1.15 (6.48 pct)
> 256 1.12 (0.00 pct) 1.12 (0.00 pct) 1.20 (7.14 pct)
> 512 1.18 (0.00 pct) 1.17 (-0.84 pct) 1.26 (6.77 pct)
>
> ================ git-source comparisons ================
> git-source result comparison:
> Here results are throughput (compilations per 1000 sec)
> Threads: acpi-cpufreq amd_pst+passive amd_pst+guided
> 192 1.00 (0.00 pct) 0.93 (-7.00 pct) 1.00 (0.00 pct)
> git-source power comparison:
> Threads: acpi-cpufreq amd_pst+passive amd_pst+guided
> 192 1.00 (0.00 pct) 1.00 (0.00 pct) 0.96 (-4.00 pct)
>
> ================ kernbench comparisons ================
> kernbench result comparison:
> Here results are throughput (compilations per 1000 sec)
> Load: acpi-cpufreq amd_pst+passive amd_pst+guided
> 32 1.00 (0.00 pct) 1.01 (1.00 pct) 1.02 (2.00 pct)
> 48 1.26 (0.00 pct) 1.28 (1.58 pct) 1.25 (-0.79 pct)
> 64 1.39 (0.00 pct) 1.47 (5.75 pct) 1.43 (2.87 pct)
> 96 1.48 (0.00 pct) 1.50 (1.35 pct) 1.49 (0.67 pct)
> 128 1.29 (0.00 pct) 1.32 (2.32 pct) 1.33 (3.10 pct)
> 192 1.17 (0.00 pct) 1.20 (2.56 pct) 1.21 (3.41 pct)
> 256 1.17 (0.00 pct) 1.18 (0.85 pct) 1.20 (2.56 pct)
> 384 1.16 (0.00 pct) 1.17 (0.86 pct) 1.21 (4.31 pct)
> kernbench power comparison:
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 32 1.00 (0.00 pct) 0.97 (-3.00 pct) 1.00 (0.00 pct)
> 48 0.87 (0.00 pct) 0.81 (-6.89 pct) 0.88 (1.14 pct)
> 64 0.81 (0.00 pct) 0.73 (-9.87 pct) 0.77 (-4.93 pct)
> 96 0.75 (0.00 pct) 0.74 (-1.33 pct) 0.75 (0.00 pct)
> 128 0.83 (0.00 pct) 0.79 (-4.81 pct) 0.83 (0.00 pct)
> 192 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
> 256 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
> 384 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
>
> ================ tbench comparisons ================
> tbench result comparison:
> Here results are throughput (MB/s)
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 0.70 (-30.00 pct) 1.37 (37.00 pct)
> 2 2.64 (0.00 pct) 1.39 (-47.34 pct) 2.70 (2.27 pct)
> 4 4.89 (0.00 pct) 2.75 (-43.76 pct) 5.28 (7.97 pct)
> 8 9.46 (0.00 pct) 5.42 (-42.70 pct) 10.22 (8.03 pct)
> 16 19.05 (0.00 pct) 10.42 (-45.30 pct) 19.94 (4.67 pct)
> 32 37.50 (0.00 pct) 20.23 (-46.05 pct) 36.87 (-1.68 pct)
> 64 61.24 (0.00 pct) 43.08 (-29.65 pct) 62.96 (2.80 pct)
> 128 67.16 (0.00 pct) 69.08 (2.85 pct) 67.34 (0.26 pct)
> 256 154.59 (0.00 pct) 162.33 (5.00 pct) 156.78 (1.41 pct)
> 512 154.02 (0.00 pct) 156.74 (1.76 pct) 153.48 (-0.35 pct)
> tbench power comparison:
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 0.97 (-3.00 pct) 1.08 (8.00 pct)
> 2 1.04 (0.00 pct) 0.97 (-6.73 pct) 1.11 (6.73 pct)
> 4 1.12 (0.00 pct) 0.99 (-11.60 pct) 1.18 (5.35 pct)
> 8 1.25 (0.00 pct) 1.04 (-16.80 pct) 1.31 (4.80 pct)
> 16 1.53 (0.00 pct) 1.13 (-26.14 pct) 1.58 (3.26 pct)
> 32 2.01 (0.00 pct) 1.36 (-32.33 pct) 2.03 (0.99 pct)
> 64 2.58 (0.00 pct) 2.14 (-17.05 pct) 2.61 (1.16 pct)
> 128 2.80 (0.00 pct) 2.81 (0.35 pct) 2.81 (0.35 pct)
> 256 3.39 (0.00 pct) 3.43 (1.17 pct) 3.42 (0.88 pct)
> 512 3.44 (0.00 pct) 3.44 (0.00 pct) 3.44 (0.00 pct)
>
> Note: this series is based on top of EPP v12 [3] series
>
> Change log:
>
> v6 -> v7:
> - Addressed comments by Ray
> - Reorder and rebase patches
> - Pick up Ack by Ray
>
> v5 -> v6:
> - Don't return -EBUSY when changing to same mode
>
> v4 -> v5:
> - Rebased on top of EPP v12 series
> - Addressed comments form Mario regarding documentation
> - Picked up RB flags from Mario and Bagas Sanjaya
>
> v3 -> v4:
> - Fixed active mode low frequency issue reported by Peter Jung and Tor Vic
> - Documentation modification suggested by Bagas Sanjaya
>
> v2 -> v3:
> - Addressed review comments form Mario.
> - Picked up RB tag from Mario.
> - Rebase on top of EPP v11 [3].
>
> v1 -> v2:
> - Fix issue with shared mem systems.
> - Rebase on top of EPP series.
>
> [1]: https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
> [2]: https://lore.kernel.org/lkml/[email protected]/
> [3]: https://lore.kernel.org/linux-pm/[email protected]/
>
> Wyes Karny (6):
> acpi: cppc: Add min and max perf reg writing support
> acpi: cppc: Add auto select register read/write support
> Documentation: cpufreq: amd-pstate: Move amd_pstate param to
> alphabetical order
> cpufreq: amd-pstate: Add guided autonomous mode
> cpufreq: amd-pstate: Add guided mode control support via sysfs
> Documentation: cpufreq: amd-pstate: Update amd_pstate status sysfs for
> guided
>
> .../admin-guide/kernel-parameters.txt | 40 ++--
> Documentation/admin-guide/pm/amd-pstate.rst | 31 ++-
> drivers/acpi/cppc_acpi.c | 121 +++++++++++-
> drivers/cpufreq/amd-pstate.c | 177 +++++++++++++-----
> include/acpi/cppc_acpi.h | 11 ++
> include/linux/amd-pstate.h | 2 +
> 6 files changed, 302 insertions(+), 80 deletions(-)
>

Hi Rafael,

Could you please apply these series into 6.3 or please kindly let us know
if you have any comments?

Thanks,
Ray

2023-02-19 12:08:08

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH v7 0/6] cpufreq: amd-pstate: Add guided autonomous mode support

Hello.

On čtvrtek 16. února 2023 9:17:56 CET Wyes Karny wrote:
> >From ACPI spec[1] below 3 modes for CPPC can be defined:
> 1. Non autonomous: OS scaling governor specifies operating frequency/
> performance level through `Desired Performance` register and platform
> follows that.
> 2. Guided autonomous: OS scaling governor specifies min and max
> frequencies/ performance levels through `Minimum Performance` and
> `Maximum Performance` register, and platform can autonomously select an
> operating frequency in this range.
> 3. Fully autonomous: OS only hints (via EPP) to platform for the required
> energy performance preference for the workload and platform autonomously
> scales the frequency.
>
> Currently (1) is supported by amd_pstate as passive mode, and (3) is
> implemented by EPP support[2]. This change is to support (2).
>
> In guided autonomous mode the min_perf is based on the input from the
> scaling governor. For example, in case of schedutil this value depends
> on the current utilization. And max_perf is set to max capacity.
>
> To activate guided auto mode ``amd_pstate=guided`` command line
> parameter has to be passed in the kernel.
>
> Below are the results (normalized) of benchmarks with this patch:
> System: Genoa 96C 192T
> Kernel: 6.2.0-rc2 + EPP v12 + patch
> Scaling governor: schedutil
>
> ================ dbench comparisons ================
> dbench result comparison:
> Here results are throughput (MB/s)
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 1.01 (1.00 pct) 1.02 (2.00 pct)
> 2 1.07 (0.00 pct) 1.06 (-0.93 pct) 1.07 (0.00 pct)
> 4 1.68 (0.00 pct) 1.70 (1.19 pct) 1.72 (2.38 pct)
> 8 2.61 (0.00 pct) 2.68 (2.68 pct) 2.76 (5.74 pct)
> 16 4.16 (0.00 pct) 4.24 (1.92 pct) 4.53 (8.89 pct)
> 32 5.98 (0.00 pct) 6.17 (3.17 pct) 7.30 (22.07 pct)
> 64 8.67 (0.00 pct) 8.99 (3.69 pct) 10.71 (23.52 pct)
> 128 11.98 (0.00 pct) 12.52 (4.50 pct) 14.67 (22.45 pct)
> 256 15.73 (0.00 pct) 16.13 (2.54 pct) 17.81 (13.22 pct)
> 512 15.77 (0.00 pct) 16.32 (3.48 pct) 16.39 (3.93 pct)
> dbench power comparison:
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 1.00 (0.00 pct) 1.04 (4.00 pct)
> 2 0.99 (0.00 pct) 0.97 (-2.02 pct) 1.02 (3.03 pct)
> 4 0.98 (0.00 pct) 0.98 (0.00 pct) 1.02 (4.08 pct)
> 8 0.98 (0.00 pct) 0.99 (1.02 pct) 1.02 (4.08 pct)
> 16 0.99 (0.00 pct) 1.00 (1.01 pct) 1.04 (5.05 pct)
> 32 1.02 (0.00 pct) 1.02 (0.00 pct) 1.07 (4.90 pct)
> 64 1.05 (0.00 pct) 1.05 (0.00 pct) 1.11 (5.71 pct)
> 128 1.08 (0.00 pct) 1.08 (0.00 pct) 1.15 (6.48 pct)
> 256 1.12 (0.00 pct) 1.12 (0.00 pct) 1.20 (7.14 pct)
> 512 1.18 (0.00 pct) 1.17 (-0.84 pct) 1.26 (6.77 pct)
>
> ================ git-source comparisons ================
> git-source result comparison:
> Here results are throughput (compilations per 1000 sec)
> Threads: acpi-cpufreq amd_pst+passive amd_pst+guided
> 192 1.00 (0.00 pct) 0.93 (-7.00 pct) 1.00 (0.00 pct)
> git-source power comparison:
> Threads: acpi-cpufreq amd_pst+passive amd_pst+guided
> 192 1.00 (0.00 pct) 1.00 (0.00 pct) 0.96 (-4.00 pct)
>
> ================ kernbench comparisons ================
> kernbench result comparison:
> Here results are throughput (compilations per 1000 sec)
> Load: acpi-cpufreq amd_pst+passive amd_pst+guided
> 32 1.00 (0.00 pct) 1.01 (1.00 pct) 1.02 (2.00 pct)
> 48 1.26 (0.00 pct) 1.28 (1.58 pct) 1.25 (-0.79 pct)
> 64 1.39 (0.00 pct) 1.47 (5.75 pct) 1.43 (2.87 pct)
> 96 1.48 (0.00 pct) 1.50 (1.35 pct) 1.49 (0.67 pct)
> 128 1.29 (0.00 pct) 1.32 (2.32 pct) 1.33 (3.10 pct)
> 192 1.17 (0.00 pct) 1.20 (2.56 pct) 1.21 (3.41 pct)
> 256 1.17 (0.00 pct) 1.18 (0.85 pct) 1.20 (2.56 pct)
> 384 1.16 (0.00 pct) 1.17 (0.86 pct) 1.21 (4.31 pct)
> kernbench power comparison:
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 32 1.00 (0.00 pct) 0.97 (-3.00 pct) 1.00 (0.00 pct)
> 48 0.87 (0.00 pct) 0.81 (-6.89 pct) 0.88 (1.14 pct)
> 64 0.81 (0.00 pct) 0.73 (-9.87 pct) 0.77 (-4.93 pct)
> 96 0.75 (0.00 pct) 0.74 (-1.33 pct) 0.75 (0.00 pct)
> 128 0.83 (0.00 pct) 0.79 (-4.81 pct) 0.83 (0.00 pct)
> 192 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
> 256 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
> 384 0.92 (0.00 pct) 0.88 (-4.34 pct) 0.92 (0.00 pct)
>
> ================ tbench comparisons ================
> tbench result comparison:
> Here results are throughput (MB/s)
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 0.70 (-30.00 pct) 1.37 (37.00 pct)
> 2 2.64 (0.00 pct) 1.39 (-47.34 pct) 2.70 (2.27 pct)
> 4 4.89 (0.00 pct) 2.75 (-43.76 pct) 5.28 (7.97 pct)
> 8 9.46 (0.00 pct) 5.42 (-42.70 pct) 10.22 (8.03 pct)
> 16 19.05 (0.00 pct) 10.42 (-45.30 pct) 19.94 (4.67 pct)
> 32 37.50 (0.00 pct) 20.23 (-46.05 pct) 36.87 (-1.68 pct)
> 64 61.24 (0.00 pct) 43.08 (-29.65 pct) 62.96 (2.80 pct)
> 128 67.16 (0.00 pct) 69.08 (2.85 pct) 67.34 (0.26 pct)
> 256 154.59 (0.00 pct) 162.33 (5.00 pct) 156.78 (1.41 pct)
> 512 154.02 (0.00 pct) 156.74 (1.76 pct) 153.48 (-0.35 pct)
> tbench power comparison:
> Clients: acpi-cpufreq amd_pst+passive amd_pst+guided
> 1 1.00 (0.00 pct) 0.97 (-3.00 pct) 1.08 (8.00 pct)
> 2 1.04 (0.00 pct) 0.97 (-6.73 pct) 1.11 (6.73 pct)
> 4 1.12 (0.00 pct) 0.99 (-11.60 pct) 1.18 (5.35 pct)
> 8 1.25 (0.00 pct) 1.04 (-16.80 pct) 1.31 (4.80 pct)
> 16 1.53 (0.00 pct) 1.13 (-26.14 pct) 1.58 (3.26 pct)
> 32 2.01 (0.00 pct) 1.36 (-32.33 pct) 2.03 (0.99 pct)
> 64 2.58 (0.00 pct) 2.14 (-17.05 pct) 2.61 (1.16 pct)
> 128 2.80 (0.00 pct) 2.81 (0.35 pct) 2.81 (0.35 pct)
> 256 3.39 (0.00 pct) 3.43 (1.17 pct) 3.42 (0.88 pct)
> 512 3.44 (0.00 pct) 3.44 (0.00 pct) 3.44 (0.00 pct)
>
> Note: this series is based on top of EPP v12 [3] series
>
> Change log:
>
> v6 -> v7:
> - Addressed comments by Ray
> - Reorder and rebase patches
> - Pick up Ack by Ray
>
> v5 -> v6:
> - Don't return -EBUSY when changing to same mode
>
> v4 -> v5:
> - Rebased on top of EPP v12 series
> - Addressed comments form Mario regarding documentation
> - Picked up RB flags from Mario and Bagas Sanjaya
>
> v3 -> v4:
> - Fixed active mode low frequency issue reported by Peter Jung and Tor Vic
> - Documentation modification suggested by Bagas Sanjaya
>
> v2 -> v3:
> - Addressed review comments form Mario.
> - Picked up RB tag from Mario.
> - Rebase on top of EPP v11 [3].
>
> v1 -> v2:
> - Fix issue with shared mem systems.
> - Rebase on top of EPP series.
>
> [1]: https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
> [2]: https://lore.kernel.org/lkml/[email protected]/
> [3]: https://lore.kernel.org/linux-pm/[email protected]/
>
> Wyes Karny (6):
> acpi: cppc: Add min and max perf reg writing support
> acpi: cppc: Add auto select register read/write support
> Documentation: cpufreq: amd-pstate: Move amd_pstate param to
> alphabetical order
> cpufreq: amd-pstate: Add guided autonomous mode
> cpufreq: amd-pstate: Add guided mode control support via sysfs
> Documentation: cpufreq: amd-pstate: Update amd_pstate status sysfs for
> guided
>
> .../admin-guide/kernel-parameters.txt | 40 ++--
> Documentation/admin-guide/pm/amd-pstate.rst | 31 ++-
> drivers/acpi/cppc_acpi.c | 121 +++++++++++-
> drivers/cpufreq/amd-pstate.c | 177 +++++++++++++-----
> include/acpi/cppc_acpi.h | 11 ++
> include/linux/amd-pstate.h | 2 +
> 6 files changed, 302 insertions(+), 80 deletions(-)

```
$ lscpu | grep 'Model name'
Model name: AMD Ryzen 9 5950X 16-Core Processor

$ cat /proc/cmdline
root=/dev/mapper/system threadirqs io_delay=none zswap.enabled=0 amd_pstate=guided quiet

$ sudo cpupower frequency-info
analyzing CPU 0:
driver: amd-pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 20.0 us
hardware limits: 550 MHz - 5.08 GHz
available cpufreq governors: conservative ondemand userspace powersave performance schedutil
current policy: frequency should be within 550 MHz and 5.08 GHz.
The governor "schedutil" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 3.95 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
AMD PSTATE Highest Performance: 166. Maximum Frequency: 5.08 GHz.
AMD PSTATE Nominal Performance: 111. Nominal Frequency: 3.40 GHz.
AMD PSTATE Lowest Non-linear Performance: 57. Lowest Non-linear Frequency: 1.74 GHz.
AMD PSTATE Lowest Performance: 19. Lowest Frequency: 550 MHz.
```

Tested-by: Oleksandr Natalenko <[email protected]>

Thank you.

--
Oleksandr Natalenko (post-factum)