2022-10-09 08:30:32

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v2 0/9] Implement AMD Pstate EPP Driver

Hi all,

This patchset implements one new AMD CPU frequency driver
"amd-pstate-epp” instance for better performance and power control.
CPPC has a parameter called energy preference performance (EPP).
The EPP is used in the CCLK DPM controller to drive the frequency that a core
is going to operate during short periods of activity.
EPP values will be utilized for different OS profiles (balanced, performance, power savings).

AMD Energy Performance Preference (EPP) provides a hint to the hardware
if software wants to bias toward performance (0x0) or energy efficiency (0xff)
The lowlevel power firmware will calculate the runtime frequency according to the EPP preference
value. So the EPP hint will impact the CPU cores frequency responsiveness.

We use the RAPL interface with "perf" tool to get the energy data of the package power.
Performance Per Watt (PPW) Calculation:

The PPW calculation is referred by below paper:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fdam%2Fdevelop%2Fexternal%2Fus%2Fen%2Fdocuments%2Fperformance-per-what-paper.pdf&data=04%7C01%7CPerry.Yuan%40amd.com%7Cac66e8ce98044e9b062708d9ab47c8d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637729147708574423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=TPOvCE%2Frbb0ptBreWNxHqOi9YnVhcHGKG88vviDLb00%3D&reserved=0

Below formula is referred from below spec to measure the PPW:

(F / t) / P = F * t / (t * E) = F / E,

"F" is the number of frames per second.
"P" is power measured in watts.
"E" is energy measured in joules.

Gitsouce Benchmark Data on ROME Server CPU
+------------------------------+------------------------------+------------+------------------+
| Kernel Module | PPW (1 / s * J) | Energy(J) | Improvement (%) |
+==============================+==============================+============+==================+
| acpi-cpufreq:schedutil | 5.85658E-05 | 17074.8 | base |
+------------------------------+------------------------------+------------+------------------+
| acpi-cpufreq:ondemand | 5.03079E-05 | 19877.6 | -14.10% |
+------------------------------+------------------------------+------------+------------------+
| acpi-cpufreq:performance | 5.88132E-05 | 17003 | 0.42% |
+------------------------------+------------------------------+------------+------------------+
| amd-pstate:ondemand | 4.60295E-05 | 21725.2 | -21.41% |
+------------------------------+------------------------------+------------+------------------+
| amd-pstate:schedutil | 4.70026E-05 | 21275.4 | -19.7% |
+------------------------------+------------------------------+------------+------------------+
| amd-pstate:performance | 5.80094E-05 | 17238.6 | -0.95% |
+------------------------------+------------------------------+------------+------------------+
| EPP:performance | 5.8292E-05 | 17155 | -0.47% |
+------------------------------+------------------------------+------------+------------------+
| EPP: balance performance: | 6.71709E-05 | 14887.4 | 14.69% |
+------------------------------+------------------------------+------------+------------------+
| EPP:power | 6.66951E-05 | 4993.6 | 13.88% |
+------------------------------+------------------------------+------------+------------------+

Tbench Benchmark Data one ROME Server CPU
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| Kernel Module | PPW MB / (s * J) |Throughput(MB/s)| Energy (J) | Improvement (%)|
+=============================================+===================+==============+=============+==================+
| acpi_cpufreq: schedutil | 46.39 | 17191 | 37057.3 | base |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| acpi_cpufreq: ondemand | 51.51 | 19269.5 | 37406.5 | 11.04 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| acpi_cpufreq: performance | 45.96 | 17063.7 | 37123.7 | -0.74 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: performance(0) | 54.46 | 20263.1 | 37205 | 17.87 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: balance performance | 55.03 | 20481.9 | 37221.5 | 19.14 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: balance_power | 54.43 | 20245.9 | 37194.2 | 17.77 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: power(255) | 54.26 | 20181.7 | 37197.4 | 17.40 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| amd-pstate: schedutil | 48.22 | 17844.9 | 37006.6 | 3.80 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| amd-pstate: ondemand | 61.30 | 22988 | 37503.4 | 33.72 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| amd-pstate: performance | 54.52 | 20252.6 | 37147.8 | 17.81 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+

changes from v1:
* rebased to v6.0
* drive feedbacks from Mario for the suspend/resume patch
* drive feedbacks from Nathan for the EPP support on msr type
* fix some typos and code style indent problems
* update commit comments for patch 4/7
* change the `epp_enabled` module param name to `epp`
* set the default epp mode to be false
* try to move energy_perf_strings[] and epp_values[] into `msr-index.h`,
but this change will cause compiler errors "no such instruction...",
* add testing for the x86_energy_perf_policy utility patchset(will
send that utility patchset with another thread)

Perry Yuan (9):
ACPI: CPPC: Add AMD pstate energy performance preference cppc control
cpufreq: amd_pstate: add module parameter to load amd pstate EPP
driver
cpufreq: cpufreq: export cpufreq cpu release and acquire
x86/msr: Add the MSR definition for AMD CPPC boost state
Documentation: amd-pstate: add EPP profiles introduction
cpufreq: amd_pstate: add AMD pstate EPP support for shared memory type
processor
cpufreq: amd_pstate: add AMD Pstate EPP support for the MSR based
processors
cpufreq: amd_pstate: implement amd pstate cpu online and offline
callback
cpufreq: amd-pstate: implement suspend and resume callbacks

Documentation/admin-guide/pm/amd-pstate.rst | 19 +
arch/x86/include/asm/msr-index.h | 7 +
drivers/acpi/cppc_acpi.c | 128 ++-
drivers/cpufreq/amd-pstate.c | 950 +++++++++++++++++++-
drivers/cpufreq/cpufreq.c | 2 +
include/acpi/cppc_acpi.h | 17 +
6 files changed, 1116 insertions(+), 7 deletions(-)

--
2.34.1


2022-10-09 08:39:51

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v2 1/9] ACPI: CPPC: Add AMD pstate energy performance preference cppc control

Add the EPP(Energy Performance Preference) support for the
AMD SoCs without the dedicated CPPC MSR, those SoCs need to add this
cppc acpi functions to update EPP values and desired perf value.

In order to get EPP worked, cppc_get_epp_caps() will query EPP preference
value and cppc_set_epp_perf() will set EPP new value.
Before the EPP works, pstate driver will use cppc_set_auto_epp() to
enable EPP function from firmware firstly.

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/acpi/cppc_acpi.c | 128 ++++++++++++++++++++++++++++++++++++++-
include/acpi/cppc_acpi.h | 17 ++++++
2 files changed, 144 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 093675b1a1ff..b0e7817cb97f 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1365,6 +1365,132 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
}
EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);

+/**
+ * cppc_get_epp_caps - Get the energy preference register value.
+ * @cpunum: CPU from which to get epp preference level.
+ * @perf_caps: Return address.
+ *
+ * Return: 0 for success, -EIO otherwise.
+ */
+int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
+{
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
+ struct cpc_register_resource *energy_perf_reg;
+ u64 energy_perf;
+
+ if (!cpc_desc) {
+ pr_warn("No CPC descriptor for CPU:%d\n", cpunum);
+ return -ENODEV;
+ }
+
+ energy_perf_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
+
+ if (!CPC_SUPPORTED(energy_perf_reg))
+ pr_warn("energy perf reg update is unsupported!\n");
+
+ if (CPC_IN_PCC(energy_perf_reg)) {
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = 0;
+
+ if (pcc_ss_id < 0)
+ return -EIO;
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+
+ if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
+ cpc_read(cpunum, energy_perf_reg, &energy_perf);
+ perf_caps->energy_perf = energy_perf;
+ } else {
+ ret = -EIO;
+ }
+
+ up_write(&pcc_ss_data->pcc_lock);
+
+ return ret;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cppc_get_epp_caps);
+
+int cppc_set_auto_epp(int cpu, bool enable)
+{
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
+ struct cpc_register_resource *auto_sel_reg;
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = -EINVAL;
+
+ if (!cpc_desc) {
+ pr_warn("No CPC descriptor for CPU:%d\n", cpu);
+ return -EINVAL;
+ }
+
+ auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
+
+ if (CPC_IN_PCC(auto_sel_reg)) {
+ if (pcc_ss_id < 0)
+ return -EIO;
+
+ ret = cpc_write(cpu, auto_sel_reg, enable);
+ if (ret)
+ return ret;
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+ /* after writing CPC, transfer the ownership of PCC to platform */
+ ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
+ up_write(&pcc_ss_data->pcc_lock);
+ return ret;
+ }
+
+ return cpc_write(cpu, auto_sel_reg, enable);
+}
+EXPORT_SYMBOL_GPL(cppc_set_auto_epp);
+
+/*
+ * Set Energy Performance Preference Register value through
+ * Performance Controls Interface
+ */
+int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
+{
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
+ struct cpc_register_resource *epp_set_reg;
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = -EINVAL;
+
+ if (!cpc_desc) {
+ pr_warn("No CPC descriptor for CPU:%d\n", cpu);
+ return -EINVAL;
+ }
+
+ epp_set_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
+
+ if (CPC_IN_PCC(epp_set_reg)) {
+ if (pcc_ss_id < 0)
+ return -EIO;
+
+ ret = cpc_write(cpu, epp_set_reg, perf_ctrls->energy_perf);
+ if (ret)
+ return ret;
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+ /* after writing CPC, transfer the ownership of PCC to platform */
+ ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
+ up_write(&pcc_ss_data->pcc_lock);
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cppc_set_epp_perf);
+
/**
* cppc_set_enable - Set to enable CPPC on the processor by writing the
* Continuous Performance Control package EnableRegister field.
@@ -1400,7 +1526,7 @@ int cppc_set_enable(int cpu, bool enable)
pcc_ss_data = pcc_data[pcc_ss_id];

down_write(&pcc_ss_data->pcc_lock);
- /* after writing CPC, transfer the ownership of PCC to platfrom */
+ /* after writing CPC, transfer the ownership of PCC to platform */
ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
up_write(&pcc_ss_data->pcc_lock);
return ret;
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index c5614444031f..10d91aeedaca 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -108,12 +108,14 @@ struct cppc_perf_caps {
u32 lowest_nonlinear_perf;
u32 lowest_freq;
u32 nominal_freq;
+ u32 energy_perf;
};

struct cppc_perf_ctrls {
u32 max_perf;
u32 min_perf;
u32 desired_perf;
+ u32 energy_perf;
};

struct cppc_perf_fb_ctrs {
@@ -149,6 +151,9 @@ extern bool cpc_ffh_supported(void);
extern bool cpc_supported_by_cpu(void);
extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
+extern int cppc_set_auto_epp(int cpu, bool enable);
+extern int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps);
+extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
#else /* !CONFIG_ACPI_CPPC_LIB */
static inline int cppc_get_desired_perf(int cpunum, u64 *desired_perf)
{
@@ -202,6 +207,18 @@ static inline int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
{
return -ENOTSUPP;
}
+static inline int cppc_set_auto_epp(int cpu, bool enable)
+{
+ return -ENOTSUPP;
+}
+static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
+{
+ return -ENOTSUPP;
+}
+static inline int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
+{
+ return -ENOTSUPP;
+}
#endif /* !CONFIG_ACPI_CPPC_LIB */

#endif /* _CPPC_ACPI_H*/
--
2.34.1

2022-10-09 08:40:34

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v2 4/9] x86/msr: Add the MSR definition for AMD CPPC boost state

This MSR can be used to check whether the CPU frequency boost state
is enabled in the hardware control. User can change the boost state in
the BIOS setting,amd_pstate driver will update the boost state according
to this msr value.

AMD Processor Programming Reference (PPR)
Link: https://www.amd.com/system/files/TechDocs/40332.pdf [p1095]
Link: https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip [p162]

Signed-off-by: Perry Yuan <[email protected]>
---
arch/x86/include/asm/msr-index.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6674bdb096f3..e5ea1c9f747b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -569,6 +569,7 @@
#define MSR_AMD_CPPC_CAP2 0xc00102b2
#define MSR_AMD_CPPC_REQ 0xc00102b3
#define MSR_AMD_CPPC_STATUS 0xc00102b4
+#define MSR_AMD_CPPC_HW_CTL 0xc0010015

#define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff)
#define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff)
@@ -579,6 +580,8 @@
#define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8)
#define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16)
#define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24)
+#define AMD_CPPC_PRECISION_BOOST_BIT 25
+#define AMD_CPPC_PRECISION_BOOST_ENABLED BIT_ULL(AMD_CPPC_PRECISION_BOOST_BIT)

/* AMD Performance Counter Global Status and Control MSRs */
#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
--
2.34.1

2022-10-09 09:09:45

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v2 5/9] Documentation: amd-pstate: add EPP profiles introduction

The patch add AMD pstate EPP feature introduction and what EPP
preference supported for AMD processors.

User can get supported list from
energy_performance_available_preferences attribute file, or update
current profile to energy_performance_preference file

1) See all EPP profiles
$ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_available_preferences
default performance balance_performance balance_power power

2) Check current EPP profile
$ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference
performance

3) Set new EPP profile
$ sudo bash -c "echo power > /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference"

Signed-off-by: Perry Yuan <[email protected]>
---
Documentation/admin-guide/pm/amd-pstate.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
index 83b58eb4ab4d..d0f0e115013b 100644
--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -261,6 +261,25 @@ lowest non-linear performance in `AMD CPPC Performance Capability
<perf_cap_>`_.)
This attribute is read-only.

+``energy_performance_available_preferences``
+
+All the supported EPP preference could be selected, List of the strings that
+can be set to the ``energy_performance_preference`` attribute
+those different profiles represent different energy vs efficiency hints provided
+to low-level firmware
+however, the ``default`` represents the epp value is set by platform firmware
+This attribute is read-only.
+
+``energy_performance_preference``
+
+The current energy performance preference can be read from this attribute.
+and user can change current preference according to energy or performance needs
+Please get all support profiles list from
+``energy_performance_available_preferences`` attribute, all the profiles are
+integer values defined between 0 to 255 when EPP feature is enabled by platform
+firmware, if EPP feature is disabled, driver will ignore the written value
+This attribute is read-write.
+
Other performance and frequency values can be read back from
``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.

--
2.34.1