2022-12-08 11:35:14

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 00/13] Implement AMD Pstate EPP Driver

Hi all,

This patchset implements one new AMD CPU frequency driver
`amd-pstate-epp` instance for better performance and power control.
CPPC has a parameter called energy preference performance (EPP).
The EPP is used in the CCLK DPM controller to drive the frequency that a core
is going to operate during short periods of activity.
EPP values will be utilized for different OS profiles (balanced, performance, power savings).

AMD Energy Performance Preference (EPP) provides a hint to the hardware
if software wants to bias toward performance (0x0) or energy efficiency (0xff)
The lowlevel power firmware will calculate the runtime frequency according to the EPP preference
value. So the EPP hint will impact the CPU cores frequency responsiveness.

We use the RAPL interface with "perf" tool to get the energy data of the package power.
Performance Per Watt (PPW) Calculation:

The PPW calculation is referred by below paper:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fdam%2Fdevelop%2Fexternal%2Fus%2Fen%2Fdocuments%2Fperformance-per-what-paper.pdf&data=04%7C01%7CPerry.Yuan%40amd.com%7Cac66e8ce98044e9b062708d9ab47c8d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637729147708574423%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=TPOvCE%2Frbb0ptBreWNxHqOi9YnVhcHGKG88vviDLb00%3D&reserved=0

Below formula is referred from below spec to measure the PPW:

(F / t) / P = F * t / (t * E) = F / E,

"F" is the number of frames per second.
"P" is power measured in watts.
"E" is energy measured in joules.

Gitsouce Benchmark Data on ROME Server CPU
+------------------------------+------------------------------+------------+------------------+
| Kernel Module | PPW (1 / s * J) |Energy(J) | PPW Improvement (%)|
+==============================+==============================+============+==================+
| acpi-cpufreq:schedutil | 5.85658E-05 | 17074.8 | base |
+------------------------------+------------------------------+------------+------------------+
| acpi-cpufreq:ondemand | 5.03079E-05 | 19877.6 | -14.10% |
+------------------------------+------------------------------+------------+------------------+
| acpi-cpufreq:performance | 5.88132E-05 | 17003 | 0.42% |
+------------------------------+------------------------------+------------+------------------+
| amd-pstate:ondemand | 4.60295E-05 | 21725.2 | -21.41% |
+------------------------------+------------------------------+------------+------------------+
| amd-pstate:schedutil | 4.70026E-05 | 21275.4 | -19.7% |
+------------------------------+------------------------------+------------+------------------+
| amd-pstate:performance | 5.80094E-05 | 17238.6 | -0.95% |
+------------------------------+------------------------------+------------+------------------+
| EPP:performance | 5.8292E-05 | 17155 | -0.47% |
+------------------------------+------------------------------+------------+------------------+
| EPP: balance performance: | 6.71709E-05 | 14887.4 | 14.69% |
+------------------------------+------------------------------+------------+------------------+
| EPP:power | 6.66951E-05 | 4993.6 | 13.88% |
+------------------------------+------------------------------+------------+------------------+

Tbench Benchmark Data on ROME Server CPU
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| Kernel Module | PPW MB / (s * J) |Throughput(MB/s)| Energy (J)|PPW Improvement(%)|
+=============================================+===================+==============+=============+==================+
| acpi_cpufreq: schedutil | 46.39 | 17191 | 37057.3 | base |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| acpi_cpufreq: ondemand | 51.51 | 19269.5 | 37406.5 | 11.04 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| acpi_cpufreq: performance | 45.96 | 17063.7 | 37123.7 | -0.74 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: performance(0) | 54.46 | 20263.1 | 37205 | 17.87 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: balance performance | 55.03 | 20481.9 | 37221.5 | 19.14 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: balance_power | 54.43 | 20245.9 | 37194.2 | 17.77 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| EPP:powersave: power(255) | 54.26 | 20181.7 | 37197.4 | 17.40 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| amd-pstate: schedutil | 48.22 | 17844.9 | 37006.6 | 3.80 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| amd-pstate: ondemand | 61.30 | 22988 | 37503.4 | 33.72 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+
| amd-pstate: performance | 54.52 | 20252.6 | 37147.8 | 17.81 % |
+---------------------------------------------+-------------------+--------------+-------------+------------------+

changes from v6:
* fix one legacy kernel hang issue when amd-pstate driver unregistering
* add new documentation to introduce new global sysfs attributes
* use sysfs_emit_at() to print epp profiles array
* update commit info for patch v6 patch 1/11 as Mario sugguested.
* trying to add the EPP profiles into cpufreq.h, but it will cause lots
of build failues,continue to keep cpufreq_common.h used in v7
* update commit info using amd-pstate as prefix same as before.
* remove CONFIG_ACPI for the header as Ray suggested.
* move amd_pstate_kobj to where it is used in patch "add frequency dynamic boost sysfs control"
* drive feedback removing X86_FEATURE_CPPC check for the epp init from Huang Ray
* drive feedback from Mario

change from v5:
* add one common header `cpufreq_commoncpufreq_common` to extract EPP profiles
definition for amd and intel pstate driver.
* remove the epp_off value to avoid confusion.
* convert some other sysfs sprintf() function with sysfs_emit() and add onew new patch
* add acpi pm server priofile detection to enable dynamic boost control
* fix some code format with checkpatch script
* move the EPP profile declaration into common header file `cpufreq_common.h`
* fix commit typos

changes from v4:
* rebase driver based on the v6.1-rc7
* remove the builtin changes patch because pstate driver has been
changed to builtin type by another thread patch
* update Documentation: amd-pstate: add amd pstate driver mode introduction
* replace sprintf with sysfs_emit() instead.
* fix typo for cppc_set_epp_perf() in cppc_acpi.h header

changes from v3:
* add one more document update patch for the active and passive mode
introducion.
* drive most of the feedbacks from Mario
* drive feedback from Rafael for the cppc_acpi driver.
* remove the epp raw data set/get function
* set the amd-pstate drive by passing kernel parameter
* set amd-pstate driver disabled by default if no kernel parameter
input from booting
* get cppc_set_auto_epp and cppc_set_epp_perf combined
* pick up reviewed by flag from Mario

changes from v2:
* change pstate driver as builtin type from module
* drop patch "export cpufreq cpu release and acquire"
* squash patch of shared mem into single patch of epp implementation
* add one new patch to support frequency boost control
* add patch to expose driver working status checking
* rebase driver into v6.1-rc4 kernel release
* move some declaration to amd-pstate.h
* drive feedback from Mario for the online/offline patch
* drive feedback from Mario for the suspend/resume patch
* drive feedback from Ray for the cppc_acpi and some other patches
* drive feedback from Nathan for the epp patch

changes from v1:
* rebased to v6.0
* drive feedbacks from Mario for the suspend/resume patch
* drive feedbacks from Nathan for the EPP support on msr type
* fix some typos and code style indent problems
* update commit comments for patch 4/7
* change the `epp_enabled` module param name to `epp`
* set the default epp mode to be false
* add testing for the x86_energy_perf_policy utility patchset(will
send that utility patchset with another thread)

v6: https://lore.kernel.org/lkml/[email protected]/
v5: https://lore.kernel.org/lkml/[email protected]/
v4: https://lore.kernel.org/lkml/[email protected]/
v3: https://lore.kernel.org/all/[email protected]/
v2: https://lore.kernel.org/all/[email protected]/
v1: https://lore.kernel.org/all/[email protected]/

Perry Yuan (13):
ACPI: CPPC: Add AMD pstate energy performance preference cppc control
Documentation: amd-pstate: add EPP profiles introduction
cpufreq: intel_pstate: use common macro definition for Energy
Preference Performance(EPP)
cpufreq: amd-pstate: fix kernel hang issue while amd-pstate
unregistering
cpufreq: amd-pstate: implement Pstate EPP support for the AMD
processors
cpufreq: amd-pstate: implement amd pstate cpu online and offline
callback
cpufreq: amd-pstate: implement suspend and resume callbacks
cpufreq: amd-pstate: add frequency dynamic boost sysfs control
cpufreq: amd-pstate: add driver working mode status sysfs entry
Documentation: amd-pstate: add amd pstate driver mode introduction
Documentation: introduce amd pstate active mode kernel command line
options
cpufreq: amd-pstate: convert sprintf with sysfs_emit()
Documentation: amd-pstate: introduce new global sysfs attributes

.../admin-guide/kernel-parameters.txt | 7 +
Documentation/admin-guide/pm/amd-pstate.rst | 83 +-
arch/x86/include/asm/msr-index.h | 4 -
drivers/acpi/cppc_acpi.c | 114 ++-
drivers/cpufreq/amd-pstate.c | 933 +++++++++++++++++-
drivers/cpufreq/intel_pstate.c | 37 +-
include/acpi/cppc_acpi.h | 12 +
include/linux/amd-pstate.h | 36 +
include/linux/cpufreq_common.h | 53 +
9 files changed, 1222 insertions(+), 57 deletions(-)
create mode 100644 include/linux/cpufreq_common.h

--
2.34.1


2022-12-08 11:35:20

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 04/13] cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering

In the amd_pstate_adjust_perf(), there is one cpufreq_cpu_get() call to
increase increments the kobject reference count of policy and make it as
busy. Therefore, a corresponding call to cpufreq_cpu_put() is needed to
decrement the kobject reference count back, it will resolve the kernel
hang issue when unregistering the amd-pstate driver and register the
`amd_pstate_epp` driver instance.

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 204e39006dda..c17bd845f5fc 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -307,6 +307,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
max_perf = min_perf;

amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true);
+ cpufreq_cpu_put(policy);
}

static int amd_get_min_freq(struct amd_cpudata *cpudata)
--
2.34.1

2022-12-08 11:36:24

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 10/13] Documentation: amd-pstate: add amd pstate driver mode introduction

From: Perry Yuan <[email protected]>

Introduce ``amd-pstate`` CPPC has two operation modes:
* CPPC Autonomous (active) mode
* CPPC non-autonomous (passive) mode.
active mode and passive mode can be chosen by different kernel parameters.

Signed-off-by: Perry Yuan <[email protected]>
---
Documentation/admin-guide/pm/amd-pstate.rst | 26 +++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
index 33ab8ec8fc2f..62744dae3c5f 100644
--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -299,8 +299,30 @@ module which supports the new AMD P-States mechanism on most of the future AMD
platforms. The AMD P-States mechanism is the more performance and energy
efficiency frequency management method on AMD processors.

-Kernel Module Options for ``amd-pstate``
-=========================================
+
+AMD Pstate Driver Operation Modes
+=================================
+
+``amd_pstate`` CPPC has two operation modes: CPPC Autonomous(active) mode and
+CPPC non-autonomous(passive) mode.
+active mode and passive mode can be chosen by different kernel parameters.
+When in Autonomous mode, CPPC ignores requests done in the Desired Performance
+Target register and takes into account only the values set to the Minimum requested
+performance, Maximum requested performance, and Energy Performance Preference
+registers. When Autonomous is disabled, it only considers the Desired Performance Target.
+
+Active Mode
+------------
+
+``amd_pstate=active``
+
+This is the low-level firmware control mode which is implemented by ``amd_pstate_epp``
+driver with ``amd_pstate=active`` passed to the kernel in the command line.
+In this mode, ``amd_pstate_epp`` driver provides a hint to the hardware if software
+wants to bias toward performance (0x0) or energy efficiency (0xff) to the CPPC firmware.
+then CPPC power algorithm will calculate the runtime workload and adjust the realtime
+cores frequency according to the power supply and thermal, core voltage and some other
+hardware conditions.

Passive Mode
------------
--
2.34.1

2022-12-08 11:36:38

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 02/13] Documentation: amd-pstate: add EPP profiles introduction

From: Perry Yuan <[email protected]>

The patch add AMD pstate EPP feature introduction and what EPP
preference supported for AMD processors.

User can get supported list from
energy_performance_available_preferences attribute file, or update
current profile to energy_performance_preference file

1) See all EPP profiles
$ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_available_preferences
default performance balance_performance balance_power power

2) Check current EPP profile
$ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference
performance

3) Set new EPP profile
$ sudo bash -c "echo power > /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference"

Signed-off-by: Perry Yuan <[email protected]>
---
Documentation/admin-guide/pm/amd-pstate.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
index 06e23538f79c..33ab8ec8fc2f 100644
--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -262,6 +262,25 @@ lowest non-linear performance in `AMD CPPC Performance Capability
<perf_cap_>`_.)
This attribute is read-only.

+``energy_performance_available_preferences``
+
+A list of all the supported EPP preferences that could be used for
+``energy_performance_preference`` on this system.
+These profiles represent different hints that are provided
+to the low-level firmware about the user's desired energy vs efficiency
+tradeoff. ``default`` represents the epp value is set by platform
+firmware. This attribute is read-only.
+
+``energy_performance_preference``
+
+The current energy performance preference can be read from this attribute.
+and user can change current preference according to energy or performance needs
+Please get all support profiles list from
+``energy_performance_available_preferences`` attribute, all the profiles are
+integer values defined between 0 to 255 when EPP feature is enabled by platform
+firmware, if EPP feature is disabled, driver will ignore the written value
+This attribute is read-write.
+
Other performance and frequency values can be read back from
``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.

--
2.34.1

2022-12-08 11:41:39

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 01/13] ACPI: CPPC: Add AMD pstate energy performance preference cppc control

From: Perry Yuan <[email protected]>

Add support for setting and querying EPP preferences to the generic
CPPC driver. This enables downstream drivers such as amd-pstate to discover
and use these values

Downstream drivers that want to use the new symbols cppc_get_epp_caps
and cppc_set_epp_perf for querying and setting EPP preferences will need
to call cppc_set_auto_epp to enable the EPP function first.

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/acpi/cppc_acpi.c | 114 +++++++++++++++++++++++++++++++++++++--
include/acpi/cppc_acpi.h | 12 +++++
2 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 093675b1a1ff..37fa75f25f62 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1093,6 +1093,9 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
{
struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
struct cpc_register_resource *reg;
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = -EINVAL;

if (!cpc_desc) {
pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
@@ -1102,10 +1105,6 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
reg = &cpc_desc->cpc_regs[reg_idx];

if (CPC_IN_PCC(reg)) {
- int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
- struct cppc_pcc_data *pcc_ss_data = NULL;
- int ret = 0;
-
if (pcc_ss_id < 0)
return -EIO;

@@ -1125,7 +1124,7 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)

cpc_read(cpunum, reg, perf);

- return 0;
+ return ret;
}

/**
@@ -1365,6 +1364,111 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
}
EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);

+/**
+ * cppc_get_epp_caps - Get the energy preference register value.
+ * @cpunum: CPU from which to get epp preference level.
+ * @perf_caps: Return address.
+ *
+ * Return: 0 for success, -EIO otherwise.
+ */
+int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
+{
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
+ struct cpc_register_resource *energy_perf_reg;
+ u64 energy_perf;
+
+ if (!cpc_desc) {
+ pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
+ return -ENODEV;
+ }
+
+ energy_perf_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
+
+ if (!CPC_SUPPORTED(energy_perf_reg))
+ pr_warn_once("energy perf reg update is unsupported!\n");
+
+ if (CPC_IN_PCC(energy_perf_reg)) {
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = 0;
+
+ if (pcc_ss_id < 0)
+ return -ENODEV;
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+
+ if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
+ cpc_read(cpunum, energy_perf_reg, &energy_perf);
+ perf_caps->energy_perf = energy_perf;
+ } else {
+ ret = -EIO;
+ }
+
+ up_write(&pcc_ss_data->pcc_lock);
+
+ return ret;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cppc_get_epp_caps);
+
+/*
+ * Set Energy Performance Preference Register value through
+ * Performance Controls Interface
+ */
+int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
+{
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
+ struct cpc_register_resource *epp_set_reg;
+ struct cpc_register_resource *auto_sel_reg;
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = -EINVAL;
+
+ if (!cpc_desc) {
+ pr_debug("No CPC descriptor for CPU:%d\n", cpu);
+ return -ENODEV;
+ }
+
+ auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
+ epp_set_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
+
+ if (CPC_IN_PCC(epp_set_reg) || CPC_IN_PCC(auto_sel_reg)) {
+ if (pcc_ss_id < 0) {
+ pr_debug("Invalid pcc_ss_id\n");
+ return -ENODEV;
+ }
+
+ if (CPC_SUPPORTED(auto_sel_reg)) {
+ ret = cpc_write(cpu, auto_sel_reg, enable);
+ if (ret)
+ return ret;
+ }
+
+ if (CPC_SUPPORTED(epp_set_reg)) {
+ ret = cpc_write(cpu, epp_set_reg, perf_ctrls->energy_perf);
+ if (ret)
+ return ret;
+ }
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+ /* after writing CPC, transfer the ownership of PCC to platform */
+ ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
+ up_write(&pcc_ss_data->pcc_lock);
+ } else {
+ ret = -ENOTSUPP;
+ pr_debug("_CPC in PCC is not supported\n");
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cppc_set_epp_perf);
+
/**
* cppc_set_enable - Set to enable CPPC on the processor by writing the
* Continuous Performance Control package EnableRegister field.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index c5614444031f..a45bb876a19c 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -108,12 +108,14 @@ struct cppc_perf_caps {
u32 lowest_nonlinear_perf;
u32 lowest_freq;
u32 nominal_freq;
+ u32 energy_perf;
};

struct cppc_perf_ctrls {
u32 max_perf;
u32 min_perf;
u32 desired_perf;
+ u32 energy_perf;
};

struct cppc_perf_fb_ctrs {
@@ -149,6 +151,8 @@ extern bool cpc_ffh_supported(void);
extern bool cpc_supported_by_cpu(void);
extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
+extern int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps);
+extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
#else /* !CONFIG_ACPI_CPPC_LIB */
static inline int cppc_get_desired_perf(int cpunum, u64 *desired_perf)
{
@@ -202,6 +206,14 @@ static inline int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
{
return -ENOTSUPP;
}
+static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
+{
+ return -ENOTSUPP;
+}
+static inline int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
+{
+ return -ENOTSUPP;
+}
#endif /* !CONFIG_ACPI_CPPC_LIB */

#endif /* _CPPC_ACPI_H*/
--
2.34.1

2022-12-08 11:44:28

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 09/13] cpufreq: amd-pstate: add driver working mode status sysfs entry

From: Perry Yuan <[email protected]>

While amd-pstate driver was loaded with specific driver mode, it will
need to check which mode is enabled for the pstate driver,add this sysfs
entry to show the current status

$ cat /sys/devices/system/cpu/amd-pstate/status
active

Meanwhile, user can switch the pstate driver mode with writing mode
string to sysfs entry as below.

Enable passive mode:
$ sudo bash -c "echo passive > /sys/devices/system/cpu/amd-pstate/status"

Enable active mode (EPP driver mode):
$ sudo bash -c "echo active > /sys/devices/system/cpu/amd-pstate/status"

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 101 +++++++++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 4cd53c010215..c90aee3ee42d 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -64,6 +64,8 @@ static bool cppc_active;
static int cppc_load __initdata;

static struct cpufreq_driver *default_pstate_driver;
+static struct cpufreq_driver amd_pstate_epp_driver;
+static struct cpufreq_driver amd_pstate_driver;
static struct amd_cpudata **all_cpu_data;
static struct amd_pstate_params global_params;

@@ -72,6 +74,7 @@ static DEFINE_MUTEX(amd_pstate_driver_lock);
struct kobject *amd_pstate_kobj;

static bool cppc_boost __read_mostly;
+static DEFINE_SPINLOCK(cppc_notify_lock);

static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64 cppc_req_cached)
{
@@ -629,6 +632,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
policy->driver_data = cpudata;

amd_pstate_boost_init(cpudata);
+ if (!default_pstate_driver->adjust_perf)
+ default_pstate_driver->adjust_perf = amd_pstate_adjust_perf;

return 0;

@@ -802,6 +807,100 @@ static ssize_t store_cppc_dynamic_boost(struct kobject *a,
return count;
}

+static ssize_t amd_pstate_show_status(char *buf)
+{
+ if (!default_pstate_driver)
+ return sysfs_emit(buf, "off\n");
+
+ return sysfs_emit(buf, "%s\n", default_pstate_driver == &amd_pstate_epp_driver ?
+ "active" : "passive");
+}
+
+static void amd_pstate_clear_update_util_hook(unsigned int cpu);
+static void amd_pstate_driver_cleanup(void)
+{
+ unsigned int cpu;
+
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ if (all_cpu_data[cpu]) {
+ if (default_pstate_driver == &amd_pstate_epp_driver)
+ amd_pstate_clear_update_util_hook(cpu);
+
+ spin_lock(&cppc_notify_lock);
+ kfree(all_cpu_data[cpu]);
+ WRITE_ONCE(all_cpu_data[cpu], NULL);
+ spin_unlock(&cppc_notify_lock);
+ }
+ }
+ cpus_read_unlock();
+
+ default_pstate_driver = NULL;
+}
+
+static int amd_pstate_update_status(const char *buf, size_t size)
+{
+ if (size == 3 && !strncmp(buf, "off", size)) {
+ if (!default_pstate_driver)
+ return -EINVAL;
+
+ if (cppc_active)
+ return -EBUSY;
+
+ cpufreq_unregister_driver(default_pstate_driver);
+ amd_pstate_driver_cleanup();
+ return 0;
+ }
+
+ if (size == 6 && !strncmp(buf, "active", size)) {
+ if (default_pstate_driver) {
+ if (default_pstate_driver == &amd_pstate_epp_driver)
+ return 0;
+ cpufreq_unregister_driver(default_pstate_driver);
+ default_pstate_driver = &amd_pstate_epp_driver;
+ }
+
+ return cpufreq_register_driver(default_pstate_driver);
+ }
+
+ if (size == 7 && !strncmp(buf, "passive", size)) {
+ if (default_pstate_driver) {
+ if (default_pstate_driver == &amd_pstate_driver)
+ return 0;
+ cpufreq_unregister_driver(default_pstate_driver);
+ }
+ default_pstate_driver = &amd_pstate_driver;
+ return cpufreq_register_driver(default_pstate_driver);
+ }
+
+ return -EINVAL;
+}
+
+static ssize_t show_status(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ ssize_t ret;
+
+ mutex_lock(&amd_pstate_driver_lock);
+ ret = amd_pstate_show_status(buf);
+ mutex_unlock(&amd_pstate_driver_lock);
+
+ return ret;
+}
+
+static ssize_t store_status(struct kobject *a, struct kobj_attribute *b,
+ const char *buf, size_t count)
+{
+ char *p = memchr(buf, '\n', count);
+ int ret;
+
+ mutex_lock(&amd_pstate_driver_lock);
+ ret = amd_pstate_update_status(buf, p ? p - buf : count);
+ mutex_unlock(&amd_pstate_driver_lock);
+
+ return ret < 0 ? ret : count;
+}
+
cpufreq_freq_attr_ro(amd_pstate_max_freq);
cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);

@@ -809,6 +908,7 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf);
cpufreq_freq_attr_rw(energy_performance_preference);
cpufreq_freq_attr_ro(energy_performance_available_preferences);
define_one_global_rw(cppc_dynamic_boost);
+define_one_global_rw(status);

static struct freq_attr *amd_pstate_attr[] = {
&amd_pstate_max_freq,
@@ -828,6 +928,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {

static struct attribute *pstate_global_attributes[] = {
&cppc_dynamic_boost.attr,
+ &status.attr,
NULL
};

--
2.34.1

2022-12-08 11:54:56

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro definition for Energy Preference Performance(EPP)

make the energy preference performance strings and profiles using one
common header for intel_pstate driver, then the amd_pstate epp driver can
use the common header as well. This will simpify the intel_pstate and
amd_pstate driver.

Signed-off-by: Perry Yuan <[email protected]>
---
arch/x86/include/asm/msr-index.h | 4 ---
drivers/cpufreq/intel_pstate.c | 37 +---------------------
include/linux/cpufreq_common.h | 53 ++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+), 40 deletions(-)
create mode 100644 include/linux/cpufreq_common.h

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4a2af82553e4..3983378cff5b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -472,10 +472,6 @@
#define HWP_MAX_PERF(x) ((x & 0xff) << 8)
#define HWP_DESIRED_PERF(x) ((x & 0xff) << 16)
#define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long) x & 0xff) << 24)
-#define HWP_EPP_PERFORMANCE 0x00
-#define HWP_EPP_BALANCE_PERFORMANCE 0x80
-#define HWP_EPP_BALANCE_POWERSAVE 0xC0
-#define HWP_EPP_POWERSAVE 0xFF
#define HWP_ACTIVITY_WINDOW(x) ((unsigned long long)(x & 0xff3) << 32)
#define HWP_PACKAGE_CONTROL(x) ((unsigned long long)(x & 0x1) << 42)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ad9be31753b6..1b842ed874ab 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -26,6 +26,7 @@
#include <linux/vmalloc.h>
#include <linux/pm_qos.h>
#include <trace/events/power.h>
+#include <linux/cpufreq_common.h>

#include <asm/cpu.h>
#include <asm/div64.h>
@@ -628,42 +629,6 @@ static int intel_pstate_set_epb(int cpu, s16 pref)
return 0;
}

-/*
- * EPP/EPB display strings corresponding to EPP index in the
- * energy_perf_strings[]
- * index String
- *-------------------------------------
- * 0 default
- * 1 performance
- * 2 balance_performance
- * 3 balance_power
- * 4 power
- */
-
-enum energy_perf_value_index {
- EPP_INDEX_DEFAULT = 0,
- EPP_INDEX_PERFORMANCE,
- EPP_INDEX_BALANCE_PERFORMANCE,
- EPP_INDEX_BALANCE_POWERSAVE,
- EPP_INDEX_POWERSAVE,
-};
-
-static const char * const energy_perf_strings[] = {
- [EPP_INDEX_DEFAULT] = "default",
- [EPP_INDEX_PERFORMANCE] = "performance",
- [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
- [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
- [EPP_INDEX_POWERSAVE] = "power",
- NULL
-};
-static unsigned int epp_values[] = {
- [EPP_INDEX_DEFAULT] = 0, /* Unused index */
- [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE,
- [EPP_INDEX_BALANCE_PERFORMANCE] = HWP_EPP_BALANCE_PERFORMANCE,
- [EPP_INDEX_BALANCE_POWERSAVE] = HWP_EPP_BALANCE_POWERSAVE,
- [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE,
-};
-
static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data, int *raw_epp)
{
s16 epp;
diff --git a/include/linux/cpufreq_common.h b/include/linux/cpufreq_common.h
new file mode 100644
index 000000000000..c1224e3bc68b
--- /dev/null
+++ b/include/linux/cpufreq_common.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * linux/include/linux/cpufreq_common.h
+ *
+ * Copyright (C) 2022 Advanced Micro Devices, Inc.
+ *
+ * Author: Perry Yuan <[email protected]>
+ */
+
+#ifndef _LINUX_CPUFREQ_COMMON_H
+#define _LINUX_CPUFREQ_COMMON_H
+/*
+ * EPP/EPB display strings corresponding to EPP index in the
+ * energy_perf_strings[]
+ * index String
+ *-------------------------------------
+ * 0 default
+ * 1 performance
+ * 2 balance_performance
+ * 3 balance_power
+ * 4 power
+ */
+
+#define HWP_EPP_PERFORMANCE 0x00
+#define HWP_EPP_BALANCE_PERFORMANCE 0x80
+#define HWP_EPP_BALANCE_POWERSAVE 0xC0
+#define HWP_EPP_POWERSAVE 0xFF
+
+enum energy_perf_value_index {
+ EPP_INDEX_DEFAULT = 0,
+ EPP_INDEX_PERFORMANCE,
+ EPP_INDEX_BALANCE_PERFORMANCE,
+ EPP_INDEX_BALANCE_POWERSAVE,
+ EPP_INDEX_POWERSAVE,
+};
+
+static const char * const energy_perf_strings[] = {
+ [EPP_INDEX_DEFAULT] = "default",
+ [EPP_INDEX_PERFORMANCE] = "performance",
+ [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
+ [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
+ [EPP_INDEX_POWERSAVE] = "power",
+ NULL
+};
+
+static unsigned int epp_values[] = {
+ [EPP_INDEX_DEFAULT] = 0, /* Unused index */
+ [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE,
+ [EPP_INDEX_BALANCE_PERFORMANCE] = HWP_EPP_BALANCE_PERFORMANCE,
+ [EPP_INDEX_BALANCE_POWERSAVE] = HWP_EPP_BALANCE_POWERSAVE,
+ [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE,
+};
+#endif /* _LINUX_CPUFREQ_COMMON_H */
--
2.34.1

2022-12-08 11:55:01

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend and resume callbacks

From: Perry Yuan <[email protected]>

add suspend and resume support for the AMD processors by amd_pstate_epp
driver instance.

When the CPPC is suspended, EPP driver will set EPP profile to 'power'
profile and set max/min perf to lowest perf value.
When resume happens, it will restore the MSR registers with
previous cached value.

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 40 ++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 412accab7bda..ea9255bdc9ac 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -1273,6 +1273,44 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
return amd_pstate_cpu_offline(policy);
}

+static int amd_pstate_epp_suspend(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
+ int ret;
+
+ /* avoid suspending when EPP is not enabled */
+ if (!cppc_active)
+ return 0;
+
+ /* set this flag to avoid setting core offline*/
+ cpudata->suspended = true;
+
+ /* disable CPPC in lowlevel firmware */
+ ret = amd_pstate_enable(false);
+ if (ret)
+ pr_err("failed to suspend, return %d\n", ret);
+
+ return 0;
+}
+
+static int amd_pstate_epp_resume(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
+
+ if (cpudata->suspended) {
+ mutex_lock(&amd_pstate_limits_lock);
+
+ /* enable amd pstate from suspend state*/
+ amd_pstate_epp_reenable(cpudata);
+
+ mutex_unlock(&amd_pstate_limits_lock);
+
+ cpudata->suspended = false;
+ }
+
+ return 0;
+}
+
static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
struct cpufreq_policy_data *policy)
{
@@ -1309,6 +1347,8 @@ static struct cpufreq_driver amd_pstate_epp_driver = {
.update_limits = amd_pstate_epp_update_limits,
.offline = amd_pstate_epp_cpu_offline,
.online = amd_pstate_epp_cpu_online,
+ .suspend = amd_pstate_epp_suspend,
+ .resume = amd_pstate_epp_resume,
.name = "amd_pstate_epp",
.attr = amd_pstate_epp_attr,
};
--
2.34.1

2022-12-08 12:31:37

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 06/13] cpufreq: amd-pstate: implement amd pstate cpu online and offline callback

From: Perry Yuan <[email protected]>

Adds online and offline driver callback support to allow cpu cores go
offline and help to restore the previous working states when core goes
back online later for EPP driver mode.

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 89 ++++++++++++++++++++++++++++++++++++
include/linux/amd-pstate.h | 1 +
2 files changed, 90 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 0a521be1be8a..412accab7bda 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -1186,6 +1186,93 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
return 0;
}

+static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata)
+{
+ struct cppc_perf_ctrls perf_ctrls;
+ u64 value, max_perf;
+ int ret;
+
+ ret = amd_pstate_enable(true);
+ if (ret)
+ pr_err("failed to enable amd pstate during resume, return %d\n", ret);
+
+ value = READ_ONCE(cpudata->cppc_req_cached);
+ max_perf = READ_ONCE(cpudata->highest_perf);
+
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+ } else {
+ perf_ctrls.max_perf = max_perf;
+ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
+ cppc_set_perf(cpudata->cpu, &perf_ctrls);
+ }
+}
+
+static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
+
+ pr_debug("AMD CPU Core %d going online\n", cpudata->cpu);
+
+ if (cppc_active) {
+ amd_pstate_epp_reenable(cpudata);
+ cpudata->suspended = false;
+ }
+
+ return 0;
+}
+
+static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
+ struct cppc_perf_ctrls perf_ctrls;
+ int min_perf;
+ u64 value;
+
+ min_perf = READ_ONCE(cpudata->lowest_perf);
+ value = READ_ONCE(cpudata->cppc_req_cached);
+
+ mutex_lock(&amd_pstate_limits_lock);
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN;
+
+ /* Set max perf same as min perf */
+ value &= ~AMD_CPPC_MAX_PERF(~0L);
+ value |= AMD_CPPC_MAX_PERF(min_perf);
+ value &= ~AMD_CPPC_MIN_PERF(~0L);
+ value |= AMD_CPPC_MIN_PERF(min_perf);
+ wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+ } else {
+ perf_ctrls.desired_perf = 0;
+ perf_ctrls.max_perf = min_perf;
+ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_POWERSAVE);
+ cppc_set_perf(cpudata->cpu, &perf_ctrls);
+ }
+ mutex_unlock(&amd_pstate_limits_lock);
+}
+
+static int amd_pstate_cpu_offline(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
+
+ pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu);
+
+ if (cpudata->suspended)
+ return 0;
+
+ if (cppc_active)
+ amd_pstate_epp_offline(policy);
+
+ return 0;
+}
+
+static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+{
+ amd_pstate_clear_update_util_hook(policy->cpu);
+
+ return amd_pstate_cpu_offline(policy);
+}
+
static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
struct cpufreq_policy_data *policy)
{
@@ -1220,6 +1307,8 @@ static struct cpufreq_driver amd_pstate_epp_driver = {
.init = amd_pstate_epp_cpu_init,
.exit = amd_pstate_epp_cpu_exit,
.update_limits = amd_pstate_epp_update_limits,
+ .offline = amd_pstate_epp_cpu_offline,
+ .online = amd_pstate_epp_cpu_online,
.name = "amd_pstate_epp",
.attr = amd_pstate_epp_attr,
};
diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
index 888af62040f1..3dd26a3d104c 100644
--- a/include/linux/amd-pstate.h
+++ b/include/linux/amd-pstate.h
@@ -99,6 +99,7 @@ struct amd_cpudata {
u64 cppc_cap1_cached;
struct update_util_data update_util;
struct amd_aperf_mperf sample;
+ bool suspended;
};

/**
--
2.34.1

2022-12-08 12:33:51

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 12/13] cpufreq: amd-pstate: convert sprintf with sysfs_emit()

replace the sprintf with a more generic sysfs_emit function

No potential function impact

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index c90aee3ee42d..f40a312ad56c 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -694,7 +694,7 @@ static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy,
if (max_freq < 0)
return max_freq;

- return sprintf(&buf[0], "%u\n", max_freq);
+ return sysfs_emit(buf, "%u\n", max_freq);
}

static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy,
@@ -707,7 +707,7 @@ static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *poli
if (freq < 0)
return freq;

- return sprintf(&buf[0], "%u\n", freq);
+ return sysfs_emit(buf, "%u\n", freq);
}

/*
@@ -722,7 +722,7 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,

perf = READ_ONCE(cpudata->highest_perf);

- return sprintf(&buf[0], "%u\n", perf);
+ return sysfs_emit(buf, "%u\n", perf);
}

static ssize_t show_energy_performance_available_preferences(
--
2.34.1

2022-12-08 12:35:08

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 08/13] cpufreq: amd-pstate: add frequency dynamic boost sysfs control

From: Perry Yuan <[email protected]>

Add one sysfs entry to control the CPU cores frequency boost state
The attribute file can allow user to set max performance boosted or
keeping at normal perf level.

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 67 ++++++++++++++++++++++++++++++++++--
1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ea9255bdc9ac..4cd53c010215 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -69,6 +69,7 @@ static struct amd_pstate_params global_params;

static DEFINE_MUTEX(amd_pstate_limits_lock);
static DEFINE_MUTEX(amd_pstate_driver_lock);
+struct kobject *amd_pstate_kobj;

static bool cppc_boost __read_mostly;

@@ -768,12 +769,46 @@ static ssize_t show_energy_performance_preference(
return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
}

+static void amd_pstate_update_policies(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ cpufreq_update_policy(cpu);
+}
+
+static ssize_t show_cppc_dynamic_boost(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%u\n", cppc_boost);
+}
+
+static ssize_t store_cppc_dynamic_boost(struct kobject *a,
+ struct kobj_attribute *b,
+ const char *buf, size_t count)
+{
+ bool new_state;
+ int ret;
+
+ ret = kstrtobool(buf, &new_state);
+ if (ret)
+ return -EINVAL;
+
+ mutex_lock(&amd_pstate_driver_lock);
+ cppc_boost = !!new_state;
+ amd_pstate_update_policies();
+ mutex_unlock(&amd_pstate_driver_lock);
+
+ return count;
+}
+
cpufreq_freq_attr_ro(amd_pstate_max_freq);
cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);

cpufreq_freq_attr_ro(amd_pstate_highest_perf);
cpufreq_freq_attr_rw(energy_performance_preference);
cpufreq_freq_attr_ro(energy_performance_available_preferences);
+define_one_global_rw(cppc_dynamic_boost);

static struct freq_attr *amd_pstate_attr[] = {
&amd_pstate_max_freq,
@@ -791,6 +826,15 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
NULL,
};

+static struct attribute *pstate_global_attributes[] = {
+ &cppc_dynamic_boost.attr,
+ NULL
+};
+
+static const struct attribute_group amd_pstate_global_attr_group = {
+ .attrs = pstate_global_attributes,
+};
+
static inline void update_boost_state(void)
{
u64 misc_en;
@@ -1404,9 +1448,28 @@ static int __init amd_pstate_init(void)

ret = cpufreq_register_driver(default_pstate_driver);
if (ret)
- pr_err("failed to register amd pstate driver with return %d\n",
- ret);
+ pr_err("failed to register driver with return %d\n", ret);
+
+ amd_pstate_kobj = kobject_create_and_add("amd-pstate", &cpu_subsys.dev_root->kobj);
+ if (!amd_pstate_kobj) {
+ ret = -EINVAL;
+ pr_err("global sysfs registration failed.\n");
+ goto kobject_free;
+ }
+
+ ret = sysfs_create_group(amd_pstate_kobj, &amd_pstate_global_attr_group);
+ if (ret) {
+ pr_err("sysfs attribute export failed with error %d.\n", ret);
+ goto global_attr_free;
+ }
+
+ return ret;

+global_attr_free:
+ kobject_put(amd_pstate_kobj);
+kobject_free:
+ cpufreq_unregister_driver(default_pstate_driver);
+ kfree(cpudata);
return ret;
}
device_initcall(amd_pstate_init);
--
2.34.1

2022-12-08 12:36:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v7 00/13] Implement AMD Pstate EPP Driver

Hi,

On Thu, Dec 8, 2022 at 12:19 PM Perry Yuan <[email protected]> wrote:
>
> Hi all,
>
> This patchset implements one new AMD CPU frequency driver
> `amd-pstate-epp` instance for better performance and power control.
> CPPC has a parameter called energy preference performance (EPP).
> The EPP is used in the CCLK DPM controller to drive the frequency that a core
> is going to operate during short periods of activity.
> EPP values will be utilized for different OS profiles (balanced, performance, power savings).

I honestly don't think that this work is ready for 6.2.

The number of patches in the series seems to change frequently and
there are active discussions around specific patches.

Accordingly, I will not consider applying it until 6.2-rc1 is out.

Thanks!

2022-12-08 12:40:51

by Yuan, Perry

[permalink] [raw]
Subject: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors

From: Perry Yuan <[email protected]>

Add EPP driver support for AMD SoCs which support a dedicated MSR for
CPPC. EPP is used by the DPM controller to configure the frequency that
a core operates at during short periods of activity.

The SoC EPP targets are configured on a scale from 0 to 255 where 0
represents maximum performance and 255 represents maximum efficiency.

The amd-pstate driver exports profile string names to userspace that are
tied to specific EPP values.

The balance_performance string (0x80) provides the best balance for
efficiency versus power on most systems, but users can choose other
strings to meet their needs as well.

$ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences
default performance balance_performance balance_power power

$ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference
balance_performance

To enable the driver,it needs to add `amd_pstate=active` to kernel
command line and kernel will load the active mode epp driver

Signed-off-by: Perry Yuan <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 631 ++++++++++++++++++++++++++++++++++-
include/linux/amd-pstate.h | 35 ++
2 files changed, 660 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index c17bd845f5fc..0a521be1be8a 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -37,6 +37,7 @@
#include <linux/uaccess.h>
#include <linux/static_call.h>
#include <linux/amd-pstate.h>
+#include <linux/cpufreq_common.h>

#include <acpi/processor.h>
#include <acpi/cppc_acpi.h>
@@ -59,9 +60,125 @@
* we disable it by default to go acpi-cpufreq on these processors and add a
* module parameter to be able to enable it manually for debugging.
*/
-static struct cpufreq_driver amd_pstate_driver;
+static bool cppc_active;
static int cppc_load __initdata;

+static struct cpufreq_driver *default_pstate_driver;
+static struct amd_cpudata **all_cpu_data;
+static struct amd_pstate_params global_params;
+
+static DEFINE_MUTEX(amd_pstate_limits_lock);
+static DEFINE_MUTEX(amd_pstate_driver_lock);
+
+static bool cppc_boost __read_mostly;
+
+static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64 cppc_req_cached)
+{
+ s16 epp;
+ struct cppc_perf_caps perf_caps;
+ int ret;
+
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ if (!cppc_req_cached) {
+ epp = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
+ &cppc_req_cached);
+ if (epp)
+ return epp;
+ }
+ epp = (cppc_req_cached >> 24) & 0xFF;
+ } else {
+ ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
+ if (ret < 0) {
+ pr_debug("Could not retrieve energy perf value (%d)\n", ret);
+ return -EIO;
+ }
+ epp = (s16) perf_caps.energy_perf;
+ }
+
+ return epp;
+}
+
+static int amd_pstate_get_energy_pref_index(struct amd_cpudata *cpudata)
+{
+ s16 epp;
+ int index = -EINVAL;
+
+ epp = amd_pstate_get_epp(cpudata, 0);
+ if (epp < 0)
+ return epp;
+
+ switch (epp) {
+ case HWP_EPP_PERFORMANCE:
+ index = EPP_INDEX_PERFORMANCE;
+ break;
+ case HWP_EPP_BALANCE_PERFORMANCE:
+ index = EPP_INDEX_BALANCE_PERFORMANCE;
+ break;
+ case HWP_EPP_BALANCE_POWERSAVE:
+ index = EPP_INDEX_BALANCE_POWERSAVE;
+ break;
+ case HWP_EPP_POWERSAVE:
+ index = EPP_INDEX_POWERSAVE;
+ break;
+ default:
+ break;
+ }
+
+ return index;
+}
+
+static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp)
+{
+ int ret;
+ struct cppc_perf_ctrls perf_ctrls;
+
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ u64 value = READ_ONCE(cpudata->cppc_req_cached);
+
+ value &= ~GENMASK_ULL(31, 24);
+ value |= (u64)epp << 24;
+ WRITE_ONCE(cpudata->cppc_req_cached, value);
+
+ ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+ if (!ret)
+ cpudata->epp_cached = epp;
+ } else {
+ perf_ctrls.energy_perf = epp;
+ ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+ if (ret) {
+ pr_debug("failed to set energy perf value (%d)\n", ret);
+ return ret;
+ }
+ cpudata->epp_cached = epp;
+ }
+
+ return ret;
+}
+
+static int amd_pstate_set_energy_pref_index(struct amd_cpudata *cpudata,
+ int pref_index)
+{
+ int epp = -EINVAL;
+ int ret;
+
+ if (!pref_index) {
+ pr_debug("EPP pref_index is invalid\n");
+ return -EINVAL;
+ }
+
+ if (epp == -EINVAL)
+ epp = epp_values[pref_index];
+
+ if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
+ pr_debug("EPP cannot be set under performance policy\n");
+ return -EBUSY;
+ }
+
+ ret = amd_pstate_set_epp(cpudata, epp);
+
+ return ret;
+}
+
static inline int pstate_enable(bool enable)
{
return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable);
@@ -70,11 +187,21 @@ static inline int pstate_enable(bool enable)
static int cppc_enable(bool enable)
{
int cpu, ret = 0;
+ struct cppc_perf_ctrls perf_ctrls;

for_each_present_cpu(cpu) {
ret = cppc_set_enable(cpu, enable);
if (ret)
return ret;
+
+ /* Enable autonomous mode for EPP */
+ if (!cppc_active) {
+ /* Set desired perf as zero to allow EPP firmware control */
+ perf_ctrls.desired_perf = 0;
+ ret = cppc_set_perf(cpu, &perf_ctrls);
+ if (ret)
+ return ret;
+ }
}

return ret;
@@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
return;

cpudata->boost_supported = true;
- amd_pstate_driver.boost_enabled = true;
+ default_pstate_driver->boost_enabled = true;
}

static void amd_perf_ctl_reset(unsigned int cpu)
@@ -592,10 +719,61 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
return sprintf(&buf[0], "%u\n", perf);
}

+static ssize_t show_energy_performance_available_preferences(
+ struct cpufreq_policy *policy, char *buf)
+{
+ int i = 0;
+ int offset = 0;
+
+ while (energy_perf_strings[i] != NULL)
+ offset += sysfs_emit_at(buf, offset, "%s ", energy_perf_strings[i++]);
+
+ sysfs_emit_at(buf, offset, "\n");
+
+ return offset;
+}
+
+static ssize_t store_energy_performance_preference(
+ struct cpufreq_policy *policy, const char *buf, size_t count)
+{
+ struct amd_cpudata *cpudata = policy->driver_data;
+ char str_preference[21];
+ ssize_t ret;
+
+ ret = sscanf(buf, "%20s", str_preference);
+ if (ret != 1)
+ return -EINVAL;
+
+ ret = match_string(energy_perf_strings, -1, str_preference);
+ if (ret < 0)
+ return -EINVAL;
+
+ mutex_lock(&amd_pstate_limits_lock);
+ ret = amd_pstate_set_energy_pref_index(cpudata, ret);
+ mutex_unlock(&amd_pstate_limits_lock);
+
+ return ret ?: count;
+}
+
+static ssize_t show_energy_performance_preference(
+ struct cpufreq_policy *policy, char *buf)
+{
+ struct amd_cpudata *cpudata = policy->driver_data;
+ int preference;
+
+ preference = amd_pstate_get_energy_pref_index(cpudata);
+ if (preference < 0)
+ return preference;
+
+ return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
+}
+
cpufreq_freq_attr_ro(amd_pstate_max_freq);
cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);

cpufreq_freq_attr_ro(amd_pstate_highest_perf);
+cpufreq_freq_attr_rw(energy_performance_preference);
+cpufreq_freq_attr_ro(energy_performance_available_preferences);

static struct freq_attr *amd_pstate_attr[] = {
&amd_pstate_max_freq,
@@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
NULL,
};

+static struct freq_attr *amd_pstate_epp_attr[] = {
+ &amd_pstate_max_freq,
+ &amd_pstate_lowest_nonlinear_freq,
+ &amd_pstate_highest_perf,
+ &energy_performance_preference,
+ &energy_performance_available_preferences,
+ NULL,
+};
+
+static inline void update_boost_state(void)
+{
+ u64 misc_en;
+ struct amd_cpudata *cpudata;
+
+ cpudata = all_cpu_data[0];
+ rdmsrl(MSR_K7_HWCR, misc_en);
+ global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);
+}
+
+static bool amd_pstate_acpi_pm_profile_server(void)
+{
+ if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
+ acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
+ return true;
+
+ return false;
+}
+
+static int amd_pstate_init_cpu(unsigned int cpunum)
+{
+ struct amd_cpudata *cpudata;
+
+ cpudata = all_cpu_data[cpunum];
+ if (!cpudata) {
+ cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
+ if (!cpudata)
+ return -ENOMEM;
+ WRITE_ONCE(all_cpu_data[cpunum], cpudata);
+
+ cpudata->cpu = cpunum;
+
+ if (cppc_active) {
+ if (amd_pstate_acpi_pm_profile_server())
+ cppc_boost = true;
+ }
+
+ }
+ cpudata->epp_powersave = -EINVAL;
+ cpudata->epp_policy = 0;
+ pr_debug("controlling: cpu %d\n", cpunum);
+ return 0;
+}
+
+static int __amd_pstate_cpu_init(struct cpufreq_policy *policy)
+{
+ int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
+ struct amd_cpudata *cpudata;
+ struct device *dev;
+ int rc;
+ u64 value;
+
+ rc = amd_pstate_init_cpu(policy->cpu);
+ if (rc)
+ return rc;
+
+ cpudata = all_cpu_data[policy->cpu];
+
+ dev = get_cpu_device(policy->cpu);
+ if (!dev)
+ goto free_cpudata1;
+
+ rc = amd_pstate_init_perf(cpudata);
+ if (rc)
+ goto free_cpudata1;
+
+ min_freq = amd_get_min_freq(cpudata);
+ max_freq = amd_get_max_freq(cpudata);
+ nominal_freq = amd_get_nominal_freq(cpudata);
+ lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
+ if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
+ dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
+ min_freq, max_freq);
+ ret = -EINVAL;
+ goto free_cpudata1;
+ }
+
+ policy->min = min_freq;
+ policy->max = max_freq;
+
+ policy->cpuinfo.min_freq = min_freq;
+ policy->cpuinfo.max_freq = max_freq;
+ /* It will be updated by governor */
+ policy->cur = policy->cpuinfo.min_freq;
+
+ /* Initial processor data capability frequencies */
+ cpudata->max_freq = max_freq;
+ cpudata->min_freq = min_freq;
+ cpudata->nominal_freq = nominal_freq;
+ cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
+
+ policy->driver_data = cpudata;
+
+ update_boost_state();
+ cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
+
+ policy->min = policy->cpuinfo.min_freq;
+ policy->max = policy->cpuinfo.max_freq;
+
+ if (boot_cpu_has(X86_FEATURE_CPPC))
+ policy->fast_switch_possible = true;
+
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, &value);
+ if (ret)
+ return ret;
+ WRITE_ONCE(cpudata->cppc_req_cached, value);
+
+ ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &value);
+ if (ret)
+ return ret;
+ WRITE_ONCE(cpudata->cppc_cap1_cached, value);
+ }
+ amd_pstate_boost_init(cpudata);
+
+ return 0;
+
+free_cpudata1:
+ kfree(cpudata);
+ return ret;
+}
+
+static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy)
+{
+ int ret;
+
+ ret = __amd_pstate_cpu_init(policy);
+ if (ret)
+ return ret;
+ /*
+ * Set the policy to powersave to provide a valid fallback value in case
+ * the default cpufreq governor is neither powersave nor performance.
+ */
+ policy->policy = CPUFREQ_POLICY_POWERSAVE;
+
+ return 0;
+}
+
+static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy)
+{
+ pr_debug("CPU %d exiting\n", policy->cpu);
+ policy->fast_switch_possible = false;
+ return 0;
+}
+
+static void amd_pstate_update_max_freq(unsigned int cpu)
+{
+ struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);
+
+ if (!policy)
+ return;
+
+ refresh_frequency_limits(policy);
+ cpufreq_cpu_put(policy);
+}
+
+static void amd_pstate_epp_update_limits(unsigned int cpu)
+{
+ mutex_lock(&amd_pstate_driver_lock);
+ update_boost_state();
+ if (global_params.cppc_boost_disabled) {
+ for_each_possible_cpu(cpu)
+ amd_pstate_update_max_freq(cpu);
+ } else {
+ cpufreq_update_policy(cpu);
+ }
+ mutex_unlock(&amd_pstate_driver_lock);
+}
+
+static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
+
+static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata)
+{
+ u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
+ u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
+ u32 max_limit = (hwp_req & 0xff);
+ u32 min_limit = (hwp_req & 0xff00) >> 8;
+ u32 boost_level1;
+
+ /* If max and min are equal or already at max, nothing to boost */
+ if (max_limit == min_limit)
+ return;
+
+ /* Set boost max and min to initial value */
+ if (!cpudata->cppc_boost_min)
+ cpudata->cppc_boost_min = min_limit;
+
+ boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) + min_limit) >> 1);
+
+ if (cpudata->cppc_boost_min < boost_level1)
+ cpudata->cppc_boost_min = boost_level1;
+ else if (cpudata->cppc_boost_min < AMD_CPPC_NOMINAL_PERF(hwp_cap))
+ cpudata->cppc_boost_min = AMD_CPPC_NOMINAL_PERF(hwp_cap);
+ else if (cpudata->cppc_boost_min == AMD_CPPC_NOMINAL_PERF(hwp_cap))
+ cpudata->cppc_boost_min = max_limit;
+ else
+ return;
+
+ hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
+ hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
+ wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);
+ cpudata->last_update = cpudata->sample.time;
+}
+
+static inline void amd_pstate_boost_down(struct amd_cpudata *cpudata)
+{
+ bool expired;
+
+ if (cpudata->cppc_boost_min) {
+ expired = time_after64(cpudata->sample.time, cpudata->last_update +
+ cppc_boost_hold_time_ns);
+
+ if (expired) {
+ wrmsrl(MSR_AMD_CPPC_REQ, cpudata->cppc_req_cached);
+ cpudata->cppc_boost_min = 0;
+ }
+ }
+
+ cpudata->last_update = cpudata->sample.time;
+}
+
+static inline void amd_pstate_boost_update_util(struct amd_cpudata *cpudata,
+ u64 time)
+{
+ cpudata->sample.time = time;
+ if (smp_processor_id() != cpudata->cpu)
+ return;
+
+ if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
+ bool do_io = false;
+
+ cpudata->sched_flags = 0;
+ /*
+ * Set iowait_boost flag and update time. Since IO WAIT flag
+ * is set all the time, we can't just conclude that there is
+ * some IO bound activity is scheduled on this CPU with just
+ * one occurrence. If we receive at least two in two
+ * consecutive ticks, then we treat as boost candidate.
+ * This is leveraged from Intel Pstate driver.
+ */
+ if (time_before64(time, cpudata->last_io_update + 2 * TICK_NSEC))
+ do_io = true;
+
+ cpudata->last_io_update = time;
+
+ if (do_io)
+ amd_pstate_boost_up(cpudata);
+
+ } else {
+ amd_pstate_boost_down(cpudata);
+ }
+}
+
+static inline void amd_pstate_cppc_update_hook(struct update_util_data *data,
+ u64 time, unsigned int flags)
+{
+ struct amd_cpudata *cpudata = container_of(data,
+ struct amd_cpudata, update_util);
+
+ cpudata->sched_flags |= flags;
+
+ if (smp_processor_id() == cpudata->cpu)
+ amd_pstate_boost_update_util(cpudata, time);
+}
+
+static void amd_pstate_clear_update_util_hook(unsigned int cpu)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[cpu];
+
+ if (!cpudata->update_util_set)
+ return;
+
+ cpufreq_remove_update_util_hook(cpu);
+ cpudata->update_util_set = false;
+ synchronize_rcu();
+}
+
+static void amd_pstate_set_update_util_hook(unsigned int cpu_num)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
+
+ if (!cppc_boost) {
+ if (cpudata->update_util_set)
+ amd_pstate_clear_update_util_hook(cpudata->cpu);
+ return;
+ }
+
+ if (cpudata->update_util_set)
+ return;
+
+ cpudata->sample.time = 0;
+ cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
+ amd_pstate_cppc_update_hook);
+ cpudata->update_util_set = true;
+}
+
+static void amd_pstate_epp_init(unsigned int cpu)
+{
+ struct amd_cpudata *cpudata = all_cpu_data[cpu];
+ u32 max_perf, min_perf;
+ u64 value;
+ s16 epp;
+
+ max_perf = READ_ONCE(cpudata->highest_perf);
+ min_perf = READ_ONCE(cpudata->lowest_perf);
+
+ value = READ_ONCE(cpudata->cppc_req_cached);
+
+ if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
+ min_perf = max_perf;
+
+ /* Initial min/max values for CPPC Performance Controls Register */
+ value &= ~AMD_CPPC_MIN_PERF(~0L);
+ value |= AMD_CPPC_MIN_PERF(min_perf);
+
+ value &= ~AMD_CPPC_MAX_PERF(~0L);
+ value |= AMD_CPPC_MAX_PERF(max_perf);
+
+ /* CPPC EPP feature require to set zero to the desire perf bit */
+ value &= ~AMD_CPPC_DES_PERF(~0L);
+ value |= AMD_CPPC_DES_PERF(0);
+
+ if (cpudata->epp_policy == cpudata->policy)
+ goto skip_epp;
+
+ cpudata->epp_policy = cpudata->policy;
+
+ if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
+ epp = amd_pstate_get_epp(cpudata, value);
+ cpudata->epp_powersave = epp;
+ if (epp < 0)
+ goto skip_epp;
+ /* force the epp value to be zero for performance policy */
+ epp = 0;
+ } else {
+ if (cpudata->epp_powersave < 0)
+ goto skip_epp;
+ /* Get BIOS pre-defined epp value */
+ epp = amd_pstate_get_epp(cpudata, value);
+ if (epp)
+ goto skip_epp;
+ epp = cpudata->epp_powersave;
+ }
+ /* Set initial EPP value */
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ value &= ~GENMASK_ULL(31, 24);
+ value |= (u64)epp << 24;
+ }
+
+skip_epp:
+ WRITE_ONCE(cpudata->cppc_req_cached, value);
+ amd_pstate_set_epp(cpudata, epp);
+}
+
+static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata)
+{
+ u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
+ u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
+ u32 max_limit = (hwp_cap >> 24) & 0xff;
+
+ hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
+ hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
+ wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req);
+}
+
+static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata;
+
+ if (!policy->cpuinfo.max_freq)
+ return -ENODEV;
+
+ pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
+ policy->cpuinfo.max_freq, policy->max);
+
+ cpudata = all_cpu_data[policy->cpu];
+ cpudata->policy = policy->policy;
+
+ if (boot_cpu_has(X86_FEATURE_CPPC)) {
+ mutex_lock(&amd_pstate_limits_lock);
+
+ if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
+ amd_pstate_clear_update_util_hook(policy->cpu);
+ amd_pstate_set_max_limits(cpudata);
+ } else {
+ amd_pstate_set_update_util_hook(policy->cpu);
+ }
+
+ mutex_unlock(&amd_pstate_limits_lock);
+ }
+ amd_pstate_epp_init(policy->cpu);
+
+ return 0;
+}
+
+static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
+ struct cpufreq_policy_data *policy)
+{
+ update_boost_state();
+ cpufreq_verify_within_cpu_limits(policy);
+}
+
+static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data *policy)
+{
+ amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
+ pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy->min);
+ return 0;
+}
+
static struct cpufreq_driver amd_pstate_driver = {
.flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
.verify = amd_pstate_verify,
@@ -617,8 +1213,20 @@ static struct cpufreq_driver amd_pstate_driver = {
.attr = amd_pstate_attr,
};

+static struct cpufreq_driver amd_pstate_epp_driver = {
+ .flags = CPUFREQ_CONST_LOOPS,
+ .verify = amd_pstate_epp_verify_policy,
+ .setpolicy = amd_pstate_epp_set_policy,
+ .init = amd_pstate_epp_cpu_init,
+ .exit = amd_pstate_epp_cpu_exit,
+ .update_limits = amd_pstate_epp_update_limits,
+ .name = "amd_pstate_epp",
+ .attr = amd_pstate_epp_attr,
+};
+
static int __init amd_pstate_init(void)
{
+ static struct amd_cpudata **cpudata;
int ret;

if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
@@ -645,7 +1253,8 @@ static int __init amd_pstate_init(void)
/* capability check */
if (boot_cpu_has(X86_FEATURE_CPPC)) {
pr_debug("AMD CPPC MSR based functionality is supported\n");
- amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
+ if (!cppc_active)
+ default_pstate_driver->adjust_perf = amd_pstate_adjust_perf;
} else {
pr_debug("AMD CPPC shared memory based functionality is supported\n");
static_call_update(amd_pstate_enable, cppc_enable);
@@ -653,6 +1262,10 @@ static int __init amd_pstate_init(void)
static_call_update(amd_pstate_update_perf, cppc_update_perf);
}

+ cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
+ if (!cpudata)
+ return -ENOMEM;
+ WRITE_ONCE(all_cpu_data, cpudata);
/* enable amd pstate feature */
ret = amd_pstate_enable(true);
if (ret) {
@@ -660,9 +1273,9 @@ static int __init amd_pstate_init(void)
return ret;
}

- ret = cpufreq_register_driver(&amd_pstate_driver);
+ ret = cpufreq_register_driver(default_pstate_driver);
if (ret)
- pr_err("failed to register amd_pstate_driver with return %d\n",
+ pr_err("failed to register amd pstate driver with return %d\n",
ret);

return ret;
@@ -677,8 +1290,14 @@ static int __init amd_pstate_param(char *str)
if (!strcmp(str, "disable")) {
cppc_load = 0;
pr_info("driver is explicitly disabled\n");
- } else if (!strcmp(str, "passive"))
+ } else if (!strcmp(str, "passive")) {
cppc_load = 1;
+ default_pstate_driver = &amd_pstate_driver;
+ } else if (!strcmp(str, "active")) {
+ cppc_active = 1;
+ cppc_load = 1;
+ default_pstate_driver = &amd_pstate_epp_driver;
+ }

return 0;
}
diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
index 1c4b8659f171..888af62040f1 100644
--- a/include/linux/amd-pstate.h
+++ b/include/linux/amd-pstate.h
@@ -25,6 +25,7 @@ struct amd_aperf_mperf {
u64 aperf;
u64 mperf;
u64 tsc;
+ u64 time;
};

/**
@@ -47,6 +48,18 @@ struct amd_aperf_mperf {
* @prev: Last Aperf/Mperf/tsc count value read from register
* @freq: current cpu frequency value
* @boost_supported: check whether the Processor or SBIOS supports boost mode
+ * @epp_powersave: Last saved CPPC energy performance preference
+ when policy switched to performance
+ * @epp_policy: Last saved policy used to set energy-performance preference
+ * @epp_cached: Cached CPPC energy-performance preference value
+ * @policy: Cpufreq policy value
+ * @sched_flags: Store scheduler flags for possible cross CPU update
+ * @update_util_set: CPUFreq utility callback is set
+ * @last_update: Time stamp of the last performance state update
+ * @cppc_boost_min: Last CPPC boosted min performance state
+ * @cppc_cap1_cached: Cached value of the last CPPC Capabilities MSR
+ * @update_util: Cpufreq utility callback information
+ * @sample: the stored performance sample
*
* The amd_cpudata is key private data for each CPU thread in AMD P-State, and
* represents all the attributes and goals that AMD P-State requests at runtime.
@@ -72,6 +85,28 @@ struct amd_cpudata {

u64 freq;
bool boost_supported;
+
+ /* EPP feature related attributes*/
+ s16 epp_powersave;
+ s16 epp_policy;
+ s16 epp_cached;
+ u32 policy;
+ u32 sched_flags;
+ bool update_util_set;
+ u64 last_update;
+ u64 last_io_update;
+ u32 cppc_boost_min;
+ u64 cppc_cap1_cached;
+ struct update_util_data update_util;
+ struct amd_aperf_mperf sample;
+};
+
+/**
+ * struct amd_pstate_params - global parameters for the performance control
+ * @ cppc_boost_disabled wheher the core performance boost disabled
+ */
+struct amd_pstate_params {
+ bool cppc_boost_disabled;
};

#endif /* _LINUX_AMD_PSTATE_H */
--
2.34.1

2022-12-09 08:13:18

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro definition for Energy Preference Performance(EPP)

On Thu, Dec 08, 2022 at 07:18:42PM +0800, Yuan, Perry wrote:
> make the energy preference performance strings and profiles using one
> common header for intel_pstate driver, then the amd_pstate epp driver can
> use the common header as well. This will simpify the intel_pstate and
> amd_pstate driver.
>
> Signed-off-by: Perry Yuan <[email protected]>

Please address the comment in V6:

https://lore.kernel.org/linux-pm/[email protected]/T/#md503ee2fa32858e6cc9ab4da9ec1b89a6bae6058

Thanks,
Ray

> ---
> arch/x86/include/asm/msr-index.h | 4 ---
> drivers/cpufreq/intel_pstate.c | 37 +---------------------
> include/linux/cpufreq_common.h | 53 ++++++++++++++++++++++++++++++++
> 3 files changed, 54 insertions(+), 40 deletions(-)
> create mode 100644 include/linux/cpufreq_common.h
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 4a2af82553e4..3983378cff5b 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -472,10 +472,6 @@
> #define HWP_MAX_PERF(x) ((x & 0xff) << 8)
> #define HWP_DESIRED_PERF(x) ((x & 0xff) << 16)
> #define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long) x & 0xff) << 24)
> -#define HWP_EPP_PERFORMANCE 0x00
> -#define HWP_EPP_BALANCE_PERFORMANCE 0x80
> -#define HWP_EPP_BALANCE_POWERSAVE 0xC0
> -#define HWP_EPP_POWERSAVE 0xFF
> #define HWP_ACTIVITY_WINDOW(x) ((unsigned long long)(x & 0xff3) << 32)
> #define HWP_PACKAGE_CONTROL(x) ((unsigned long long)(x & 0x1) << 42)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index ad9be31753b6..1b842ed874ab 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -26,6 +26,7 @@
> #include <linux/vmalloc.h>
> #include <linux/pm_qos.h>
> #include <trace/events/power.h>
> +#include <linux/cpufreq_common.h>
>
> #include <asm/cpu.h>
> #include <asm/div64.h>
> @@ -628,42 +629,6 @@ static int intel_pstate_set_epb(int cpu, s16 pref)
> return 0;
> }
>
> -/*
> - * EPP/EPB display strings corresponding to EPP index in the
> - * energy_perf_strings[]
> - * index String
> - *-------------------------------------
> - * 0 default
> - * 1 performance
> - * 2 balance_performance
> - * 3 balance_power
> - * 4 power
> - */
> -
> -enum energy_perf_value_index {
> - EPP_INDEX_DEFAULT = 0,
> - EPP_INDEX_PERFORMANCE,
> - EPP_INDEX_BALANCE_PERFORMANCE,
> - EPP_INDEX_BALANCE_POWERSAVE,
> - EPP_INDEX_POWERSAVE,
> -};
> -
> -static const char * const energy_perf_strings[] = {
> - [EPP_INDEX_DEFAULT] = "default",
> - [EPP_INDEX_PERFORMANCE] = "performance",
> - [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
> - [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
> - [EPP_INDEX_POWERSAVE] = "power",
> - NULL
> -};
> -static unsigned int epp_values[] = {
> - [EPP_INDEX_DEFAULT] = 0, /* Unused index */
> - [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE,
> - [EPP_INDEX_BALANCE_PERFORMANCE] = HWP_EPP_BALANCE_PERFORMANCE,
> - [EPP_INDEX_BALANCE_POWERSAVE] = HWP_EPP_BALANCE_POWERSAVE,
> - [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE,
> -};
> -
> static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data, int *raw_epp)
> {
> s16 epp;
> diff --git a/include/linux/cpufreq_common.h b/include/linux/cpufreq_common.h
> new file mode 100644
> index 000000000000..c1224e3bc68b
> --- /dev/null
> +++ b/include/linux/cpufreq_common.h
> @@ -0,0 +1,53 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * linux/include/linux/cpufreq_common.h
> + *
> + * Copyright (C) 2022 Advanced Micro Devices, Inc.
> + *
> + * Author: Perry Yuan <[email protected]>
> + */
> +
> +#ifndef _LINUX_CPUFREQ_COMMON_H
> +#define _LINUX_CPUFREQ_COMMON_H
> +/*
> + * EPP/EPB display strings corresponding to EPP index in the
> + * energy_perf_strings[]
> + * index String
> + *-------------------------------------
> + * 0 default
> + * 1 performance
> + * 2 balance_performance
> + * 3 balance_power
> + * 4 power
> + */
> +
> +#define HWP_EPP_PERFORMANCE 0x00
> +#define HWP_EPP_BALANCE_PERFORMANCE 0x80
> +#define HWP_EPP_BALANCE_POWERSAVE 0xC0
> +#define HWP_EPP_POWERSAVE 0xFF
> +
> +enum energy_perf_value_index {
> + EPP_INDEX_DEFAULT = 0,
> + EPP_INDEX_PERFORMANCE,
> + EPP_INDEX_BALANCE_PERFORMANCE,
> + EPP_INDEX_BALANCE_POWERSAVE,
> + EPP_INDEX_POWERSAVE,
> +};
> +
> +static const char * const energy_perf_strings[] = {
> + [EPP_INDEX_DEFAULT] = "default",
> + [EPP_INDEX_PERFORMANCE] = "performance",
> + [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
> + [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
> + [EPP_INDEX_POWERSAVE] = "power",
> + NULL
> +};
> +
> +static unsigned int epp_values[] = {
> + [EPP_INDEX_DEFAULT] = 0, /* Unused index */
> + [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE,
> + [EPP_INDEX_BALANCE_PERFORMANCE] = HWP_EPP_BALANCE_PERFORMANCE,
> + [EPP_INDEX_BALANCE_POWERSAVE] = HWP_EPP_BALANCE_POWERSAVE,
> + [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE,
> +};
> +#endif /* _LINUX_CPUFREQ_COMMON_H */
> --
> 2.34.1
>

2022-12-09 08:43:57

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 01/13] ACPI: CPPC: Add AMD pstate energy performance preference cppc control

On Thu, Dec 08, 2022 at 07:18:40PM +0800, Yuan, Perry wrote:
> From: Perry Yuan <[email protected]>
>
> Add support for setting and querying EPP preferences to the generic
> CPPC driver. This enables downstream drivers such as amd-pstate to discover
> and use these values
>
> Downstream drivers that want to use the new symbols cppc_get_epp_caps
> and cppc_set_epp_perf for querying and setting EPP preferences will need
> to call cppc_set_auto_epp to enable the EPP function first.
>
> Signed-off-by: Perry Yuan <[email protected]>

Acked-by: Huang Rui <[email protected]>

> ---
> drivers/acpi/cppc_acpi.c | 114 +++++++++++++++++++++++++++++++++++++--
> include/acpi/cppc_acpi.h | 12 +++++
> 2 files changed, 121 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 093675b1a1ff..37fa75f25f62 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -1093,6 +1093,9 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
> {
> struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> struct cpc_register_resource *reg;
> + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> + struct cppc_pcc_data *pcc_ss_data = NULL;
> + int ret = -EINVAL;
>
> if (!cpc_desc) {
> pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
> @@ -1102,10 +1105,6 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
> reg = &cpc_desc->cpc_regs[reg_idx];
>
> if (CPC_IN_PCC(reg)) {
> - int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> - struct cppc_pcc_data *pcc_ss_data = NULL;
> - int ret = 0;
> -
> if (pcc_ss_id < 0)
> return -EIO;
>
> @@ -1125,7 +1124,7 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
>
> cpc_read(cpunum, reg, perf);
>
> - return 0;
> + return ret;
> }
>
> /**
> @@ -1365,6 +1364,111 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> }
> EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
>
> +/**
> + * cppc_get_epp_caps - Get the energy preference register value.
> + * @cpunum: CPU from which to get epp preference level.
> + * @perf_caps: Return address.
> + *
> + * Return: 0 for success, -EIO otherwise.
> + */
> +int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
> +{
> + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> + struct cpc_register_resource *energy_perf_reg;
> + u64 energy_perf;
> +
> + if (!cpc_desc) {
> + pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
> + return -ENODEV;
> + }
> +
> + energy_perf_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
> +
> + if (!CPC_SUPPORTED(energy_perf_reg))
> + pr_warn_once("energy perf reg update is unsupported!\n");
> +
> + if (CPC_IN_PCC(energy_perf_reg)) {
> + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> + struct cppc_pcc_data *pcc_ss_data = NULL;
> + int ret = 0;
> +
> + if (pcc_ss_id < 0)
> + return -ENODEV;
> +
> + pcc_ss_data = pcc_data[pcc_ss_id];
> +
> + down_write(&pcc_ss_data->pcc_lock);
> +
> + if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
> + cpc_read(cpunum, energy_perf_reg, &energy_perf);
> + perf_caps->energy_perf = energy_perf;
> + } else {
> + ret = -EIO;
> + }
> +
> + up_write(&pcc_ss_data->pcc_lock);
> +
> + return ret;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cppc_get_epp_caps);
> +
> +/*
> + * Set Energy Performance Preference Register value through
> + * Performance Controls Interface
> + */
> +int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
> +{
> + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> + struct cpc_register_resource *epp_set_reg;
> + struct cpc_register_resource *auto_sel_reg;
> + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> + struct cppc_pcc_data *pcc_ss_data = NULL;
> + int ret = -EINVAL;
> +
> + if (!cpc_desc) {
> + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> + return -ENODEV;
> + }
> +
> + auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
> + epp_set_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
> +
> + if (CPC_IN_PCC(epp_set_reg) || CPC_IN_PCC(auto_sel_reg)) {
> + if (pcc_ss_id < 0) {
> + pr_debug("Invalid pcc_ss_id\n");
> + return -ENODEV;
> + }
> +
> + if (CPC_SUPPORTED(auto_sel_reg)) {
> + ret = cpc_write(cpu, auto_sel_reg, enable);
> + if (ret)
> + return ret;
> + }
> +
> + if (CPC_SUPPORTED(epp_set_reg)) {
> + ret = cpc_write(cpu, epp_set_reg, perf_ctrls->energy_perf);
> + if (ret)
> + return ret;
> + }
> +
> + pcc_ss_data = pcc_data[pcc_ss_id];
> +
> + down_write(&pcc_ss_data->pcc_lock);
> + /* after writing CPC, transfer the ownership of PCC to platform */
> + ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
> + up_write(&pcc_ss_data->pcc_lock);
> + } else {
> + ret = -ENOTSUPP;
> + pr_debug("_CPC in PCC is not supported\n");
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(cppc_set_epp_perf);
> +
> /**
> * cppc_set_enable - Set to enable CPPC on the processor by writing the
> * Continuous Performance Control package EnableRegister field.
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index c5614444031f..a45bb876a19c 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -108,12 +108,14 @@ struct cppc_perf_caps {
> u32 lowest_nonlinear_perf;
> u32 lowest_freq;
> u32 nominal_freq;
> + u32 energy_perf;
> };
>
> struct cppc_perf_ctrls {
> u32 max_perf;
> u32 min_perf;
> u32 desired_perf;
> + u32 energy_perf;
> };
>
> struct cppc_perf_fb_ctrs {
> @@ -149,6 +151,8 @@ extern bool cpc_ffh_supported(void);
> extern bool cpc_supported_by_cpu(void);
> extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
> extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
> +extern int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps);
> +extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
> #else /* !CONFIG_ACPI_CPPC_LIB */
> static inline int cppc_get_desired_perf(int cpunum, u64 *desired_perf)
> {
> @@ -202,6 +206,14 @@ static inline int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
> {
> return -ENOTSUPP;
> }
> +static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
> +{
> + return -ENOTSUPP;
> +}
> +static inline int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
> +{
> + return -ENOTSUPP;
> +}
> #endif /* !CONFIG_ACPI_CPPC_LIB */
>
> #endif /* _CPPC_ACPI_H*/
> --
> 2.34.1
>

2022-12-09 09:12:32

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 04/13] cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering

On Thu, Dec 08, 2022 at 07:18:43PM +0800, Yuan, Perry wrote:
> In the amd_pstate_adjust_perf(), there is one cpufreq_cpu_get() call to
> increase increments the kobject reference count of policy and make it as
> busy. Therefore, a corresponding call to cpufreq_cpu_put() is needed to
> decrement the kobject reference count back, it will resolve the kernel
> hang issue when unregistering the amd-pstate driver and register the
> `amd_pstate_epp` driver instance.
>
> Signed-off-by: Perry Yuan <[email protected]>

Acked-by: Huang Rui <[email protected]>

> ---
> drivers/cpufreq/amd-pstate.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 204e39006dda..c17bd845f5fc 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -307,6 +307,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
> max_perf = min_perf;
>
> amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true);
> + cpufreq_cpu_put(policy);
> }
>
> static int amd_get_min_freq(struct amd_cpudata *cpudata)
> --
> 2.34.1
>

2022-12-09 09:36:26

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro definition for Energy Preference Performance(EPP)

[AMD Official Use Only - General]



> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Friday, December 9, 2022 4:01 PM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro
> definition for Energy Preference Performance(EPP)
>
> On Thu, Dec 08, 2022 at 07:18:42PM +0800, Yuan, Perry wrote:
> > make the energy preference performance strings and profiles using one
> > common header for intel_pstate driver, then the amd_pstate epp driver
> > can use the common header as well. This will simpify the intel_pstate
> > and amd_pstate driver.
> >
> > Signed-off-by: Perry Yuan <[email protected]>
>
> Please address the comment in V6:
>
> https://lore.kernel.org/linux-
> pm/[email protected]/T/#md503ee2fa32858e6cc9ab4da9ec1b8
> 9a6bae6058
>
> Thanks,
> Ray

Talked with Mario as well, will fix the build failure and get this changed in V8.
Thanks for reviewing.


>
> > ---
> > arch/x86/include/asm/msr-index.h | 4 ---
> > drivers/cpufreq/intel_pstate.c | 37 +---------------------
> > include/linux/cpufreq_common.h | 53
> ++++++++++++++++++++++++++++++++
> > 3 files changed, 54 insertions(+), 40 deletions(-) create mode
> > 100644 include/linux/cpufreq_common.h
> >
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index 4a2af82553e4..3983378cff5b 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -472,10 +472,6 @@
> > #define HWP_MAX_PERF(x) ((x & 0xff) << 8)
> > #define HWP_DESIRED_PERF(x) ((x & 0xff) << 16)
> > #define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long)
> x & 0xff) << 24)
> > -#define HWP_EPP_PERFORMANCE 0x00
> > -#define HWP_EPP_BALANCE_PERFORMANCE 0x80
> > -#define HWP_EPP_BALANCE_POWERSAVE 0xC0
> > -#define HWP_EPP_POWERSAVE 0xFF
> > #define HWP_ACTIVITY_WINDOW(x) ((unsigned long
> long)(x & 0xff3) << 32)
> > #define HWP_PACKAGE_CONTROL(x) ((unsigned long
> long)(x & 0x1) << 42)
> >
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c index ad9be31753b6..1b842ed874ab
> > 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -26,6 +26,7 @@
> > #include <linux/vmalloc.h>
> > #include <linux/pm_qos.h>
> > #include <trace/events/power.h>
> > +#include <linux/cpufreq_common.h>
> >
> > #include <asm/cpu.h>
> > #include <asm/div64.h>
> > @@ -628,42 +629,6 @@ static int intel_pstate_set_epb(int cpu, s16 pref)
> > return 0;
> > }
> >
> > -/*
> > - * EPP/EPB display strings corresponding to EPP index in the
> > - * energy_perf_strings[]
> > - * index String
> > - *-------------------------------------
> > - * 0 default
> > - * 1 performance
> > - * 2 balance_performance
> > - * 3 balance_power
> > - * 4 power
> > - */
> > -
> > -enum energy_perf_value_index {
> > - EPP_INDEX_DEFAULT = 0,
> > - EPP_INDEX_PERFORMANCE,
> > - EPP_INDEX_BALANCE_PERFORMANCE,
> > - EPP_INDEX_BALANCE_POWERSAVE,
> > - EPP_INDEX_POWERSAVE,
> > -};
> > -
> > -static const char * const energy_perf_strings[] = {
> > - [EPP_INDEX_DEFAULT] = "default",
> > - [EPP_INDEX_PERFORMANCE] = "performance",
> > - [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
> > - [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
> > - [EPP_INDEX_POWERSAVE] = "power",
> > - NULL
> > -};
> > -static unsigned int epp_values[] = {
> > - [EPP_INDEX_DEFAULT] = 0, /* Unused index */
> > - [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE,
> > - [EPP_INDEX_BALANCE_PERFORMANCE] =
> HWP_EPP_BALANCE_PERFORMANCE,
> > - [EPP_INDEX_BALANCE_POWERSAVE] =
> HWP_EPP_BALANCE_POWERSAVE,
> > - [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE,
> > -};
> > -
> > static int intel_pstate_get_energy_pref_index(struct cpudata
> > *cpu_data, int *raw_epp) {
> > s16 epp;
> > diff --git a/include/linux/cpufreq_common.h
> > b/include/linux/cpufreq_common.h new file mode 100644 index
> > 000000000000..c1224e3bc68b
> > --- /dev/null
> > +++ b/include/linux/cpufreq_common.h
> > @@ -0,0 +1,53 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * linux/include/linux/cpufreq_common.h
> > + *
> > + * Copyright (C) 2022 Advanced Micro Devices, Inc.
> > + *
> > + * Author: Perry Yuan <[email protected]> */
> > +
> > +#ifndef _LINUX_CPUFREQ_COMMON_H
> > +#define _LINUX_CPUFREQ_COMMON_H
> > +/*
> > + * EPP/EPB display strings corresponding to EPP index in the
> > + * energy_perf_strings[]
> > + * index String
> > + *-------------------------------------
> > + * 0 default
> > + * 1 performance
> > + * 2 balance_performance
> > + * 3 balance_power
> > + * 4 power
> > + */
> > +
> > +#define HWP_EPP_PERFORMANCE 0x00
> > +#define HWP_EPP_BALANCE_PERFORMANCE 0x80
> > +#define HWP_EPP_BALANCE_POWERSAVE 0xC0
> > +#define HWP_EPP_POWERSAVE 0xFF
> > +
> > +enum energy_perf_value_index {
> > + EPP_INDEX_DEFAULT = 0,
> > + EPP_INDEX_PERFORMANCE,
> > + EPP_INDEX_BALANCE_PERFORMANCE,
> > + EPP_INDEX_BALANCE_POWERSAVE,
> > + EPP_INDEX_POWERSAVE,
> > +};
> > +
> > +static const char * const energy_perf_strings[] = {
> > + [EPP_INDEX_DEFAULT] = "default",
> > + [EPP_INDEX_PERFORMANCE] = "performance",
> > + [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance",
> > + [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power",
> > + [EPP_INDEX_POWERSAVE] = "power",
> > + NULL
> > +};
> > +
> > +static unsigned int epp_values[] = {
> > + [EPP_INDEX_DEFAULT] = 0, /* Unused index */
> > + [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE,
> > + [EPP_INDEX_BALANCE_PERFORMANCE] =
> HWP_EPP_BALANCE_PERFORMANCE,
> > + [EPP_INDEX_BALANCE_POWERSAVE] =
> HWP_EPP_BALANCE_POWERSAVE,
> > + [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE, }; #endif /*
> > +_LINUX_CPUFREQ_COMMON_H */
> > --
> > 2.34.1
> >

2022-12-12 01:59:59

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro definition for Energy Preference Performance(EPP)

On Fri, Dec 09, 2022 at 04:54:54PM +0800, Yuan, Perry wrote:
> [AMD Official Use Only - General]
>
>
>
> > -----Original Message-----
> > From: Huang, Ray <[email protected]>
> > Sent: Friday, December 9, 2022 4:01 PM
> > To: Yuan, Perry <[email protected]>
> > Cc: [email protected]; Limonciello, Mario
> > <[email protected]>; [email protected]; Sharma, Deepak
> > <[email protected]>; Fontenot, Nathan
> > <[email protected]>; Deucher, Alexander
> > <[email protected]>; Huang, Shimmer
> > <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> > Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> > [email protected]; [email protected]
> > Subject: Re: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro
> > definition for Energy Preference Performance(EPP)
> >
> > On Thu, Dec 08, 2022 at 07:18:42PM +0800, Yuan, Perry wrote:
> > > make the energy preference performance strings and profiles using one
> > > common header for intel_pstate driver, then the amd_pstate epp driver
> > > can use the common header as well. This will simpify the intel_pstate
> > > and amd_pstate driver.
> > >
> > > Signed-off-by: Perry Yuan <[email protected]>
> >
> > Please address the comment in V6:
> >
> > https://lore.kernel.org/linux-
> > pm/[email protected]/T/#md503ee2fa32858e6cc9ab4da9ec1b8
> > 9a6bae6058
> >
> > Thanks,
> > Ray
>
> Talked with Mario as well, will fix the build failure and get this changed in V8.
> Thanks for reviewing.
>

Please make sure you addressed all comment, then send the new version of
series.

2022-12-12 03:26:56

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 00/13] Implement AMD Pstate EPP Driver

[AMD Official Use Only - General]

Hi Rafael.

> -----Original Message-----
> From: Rafael J. Wysocki <[email protected]>
> Sent: Thursday, December 8, 2022 7:36 PM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; Huang, Ray <[email protected]>;
> [email protected]; Sharma, Deepak <[email protected]>;
> Fontenot, Nathan <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 00/13] Implement AMD Pstate EPP Driver
>
> Hi,
>
> On Thu, Dec 8, 2022 at 12:19 PM Perry Yuan <[email protected]> wrote:
> >
> > Hi all,
> >
> > This patchset implements one new AMD CPU frequency driver
> > `amd-pstate-epp` instance for better performance and power control.
> > CPPC has a parameter called energy preference performance (EPP).
> > The EPP is used in the CCLK DPM controller to drive the frequency that
> > a core is going to operate during short periods of activity.
> > EPP values will be utilized for different OS profiles (balanced, performance,
> power savings).
>
> I honestly don't think that this work is ready for 6.2.
>
> The number of patches in the series seems to change frequently and there
> are active discussions around specific patches.
>
> Accordingly, I will not consider applying it until 6.2-rc1 is out.
>
> Thanks!

Thanks for your feedback on this. I add some issue fix and some documents patches to the series which changes the patches numbers.
I will drive the feedback and will get review and ack flags before you help to merge it.

Perry.

2022-12-12 03:58:04

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 01/13] ACPI: CPPC: Add AMD pstate energy performance preference cppc control

On Fri, Dec 09, 2022 at 03:55:28PM +0800, Huang Rui wrote:
> On Thu, Dec 08, 2022 at 07:18:40PM +0800, Yuan, Perry wrote:
> > From: Perry Yuan <[email protected]>
> >
> > Add support for setting and querying EPP preferences to the generic
> > CPPC driver. This enables downstream drivers such as amd-pstate to discover
> > and use these values
> >
> > Downstream drivers that want to use the new symbols cppc_get_epp_caps
> > and cppc_set_epp_perf for querying and setting EPP preferences will need
> > to call cppc_set_auto_epp to enable the EPP function first.
> >
> > Signed-off-by: Perry Yuan <[email protected]>
>
> Acked-by: Huang Rui <[email protected]>
>
> > ---
> > drivers/acpi/cppc_acpi.c | 114 +++++++++++++++++++++++++++++++++++++--
> > include/acpi/cppc_acpi.h | 12 +++++
> > 2 files changed, 121 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index 093675b1a1ff..37fa75f25f62 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1093,6 +1093,9 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
> > {
> > struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> > struct cpc_register_resource *reg;
> > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > + int ret = -EINVAL;
> >
> > if (!cpc_desc) {
> > pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
> > @@ -1102,10 +1105,6 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
> > reg = &cpc_desc->cpc_regs[reg_idx];
> >
> > if (CPC_IN_PCC(reg)) {
> > - int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > - struct cppc_pcc_data *pcc_ss_data = NULL;
> > - int ret = 0;
> > -
> > if (pcc_ss_id < 0)
> > return -EIO;
> >
> > @@ -1125,7 +1124,7 @@ static int cppc_get_perf(int cpunum, enum cppc_regs reg_idx, u64 *perf)
> >
> > cpc_read(cpunum, reg, perf);
> >
> > - return 0;
> > + return ret;
> > }
> >
> > /**
> > @@ -1365,6 +1364,111 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> > }
> > EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
> >
> > +/**
> > + * cppc_get_epp_caps - Get the energy preference register value.
> > + * @cpunum: CPU from which to get epp preference level.
> > + * @perf_caps: Return address.
> > + *
> > + * Return: 0 for success, -EIO otherwise.
> > + */
> > +int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)

Take a look at the patch again, due to the energy_perf is actually one of
the members in struct cppc_perf_caps. It's better to modify the existing
cppc_get_perf_caps() to get the epp value as well.

Thanks,
Ray

> > +{
> > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> > + struct cpc_register_resource *energy_perf_reg;
> > + u64 energy_perf;
> > +
> > + if (!cpc_desc) {
> > + pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
> > + return -ENODEV;
> > + }
> > +
> > + energy_perf_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
> > +
> > + if (!CPC_SUPPORTED(energy_perf_reg))
> > + pr_warn_once("energy perf reg update is unsupported!\n");
> > +
> > + if (CPC_IN_PCC(energy_perf_reg)) {
> > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > + int ret = 0;
> > +
> > + if (pcc_ss_id < 0)
> > + return -ENODEV;
> > +
> > + pcc_ss_data = pcc_data[pcc_ss_id];
> > +
> > + down_write(&pcc_ss_data->pcc_lock);
> > +
> > + if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
> > + cpc_read(cpunum, energy_perf_reg, &energy_perf);
> > + perf_caps->energy_perf = energy_perf;
> > + } else {
> > + ret = -EIO;
> > + }
> > +
> > + up_write(&pcc_ss_data->pcc_lock);
> > +
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(cppc_get_epp_caps);
> > +
> > +/*
> > + * Set Energy Performance Preference Register value through
> > + * Performance Controls Interface
> > + */
> > +int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
> > +{
> > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> > + struct cpc_register_resource *epp_set_reg;
> > + struct cpc_register_resource *auto_sel_reg;
> > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > + int ret = -EINVAL;
> > +
> > + if (!cpc_desc) {
> > + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> > + return -ENODEV;
> > + }
> > +
> > + auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
> > + epp_set_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
> > +
> > + if (CPC_IN_PCC(epp_set_reg) || CPC_IN_PCC(auto_sel_reg)) {
> > + if (pcc_ss_id < 0) {
> > + pr_debug("Invalid pcc_ss_id\n");
> > + return -ENODEV;
> > + }
> > +
> > + if (CPC_SUPPORTED(auto_sel_reg)) {
> > + ret = cpc_write(cpu, auto_sel_reg, enable);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + if (CPC_SUPPORTED(epp_set_reg)) {
> > + ret = cpc_write(cpu, epp_set_reg, perf_ctrls->energy_perf);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + pcc_ss_data = pcc_data[pcc_ss_id];
> > +
> > + down_write(&pcc_ss_data->pcc_lock);
> > + /* after writing CPC, transfer the ownership of PCC to platform */
> > + ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
> > + up_write(&pcc_ss_data->pcc_lock);
> > + } else {
> > + ret = -ENOTSUPP;
> > + pr_debug("_CPC in PCC is not supported\n");
> > + }
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(cppc_set_epp_perf);
> > +
> > /**
> > * cppc_set_enable - Set to enable CPPC on the processor by writing the
> > * Continuous Performance Control package EnableRegister field.
> > diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> > index c5614444031f..a45bb876a19c 100644
> > --- a/include/acpi/cppc_acpi.h
> > +++ b/include/acpi/cppc_acpi.h
> > @@ -108,12 +108,14 @@ struct cppc_perf_caps {
> > u32 lowest_nonlinear_perf;
> > u32 lowest_freq;
> > u32 nominal_freq;
> > + u32 energy_perf;
> > };
> >
> > struct cppc_perf_ctrls {
> > u32 max_perf;
> > u32 min_perf;
> > u32 desired_perf;
> > + u32 energy_perf;
> > };
> >
> > struct cppc_perf_fb_ctrs {
> > @@ -149,6 +151,8 @@ extern bool cpc_ffh_supported(void);
> > extern bool cpc_supported_by_cpu(void);
> > extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
> > extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
> > +extern int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps);
> > +extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
> > #else /* !CONFIG_ACPI_CPPC_LIB */
> > static inline int cppc_get_desired_perf(int cpunum, u64 *desired_perf)
> > {
> > @@ -202,6 +206,14 @@ static inline int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
> > {
> > return -ENOTSUPP;
> > }
> > +static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable)
> > +{
> > + return -ENOTSUPP;
> > +}
> > +static inline int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
> > +{
> > + return -ENOTSUPP;
> > +}
> > #endif /* !CONFIG_ACPI_CPPC_LIB */
> >
> > #endif /* _CPPC_ACPI_H*/
> > --
> > 2.34.1
> >

2022-12-12 09:06:57

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors

On Thu, Dec 08, 2022 at 07:18:44PM +0800, Yuan, Perry wrote:
> From: Perry Yuan <[email protected]>
>
> Add EPP driver support for AMD SoCs which support a dedicated MSR for
> CPPC. EPP is used by the DPM controller to configure the frequency that
> a core operates at during short periods of activity.
>
> The SoC EPP targets are configured on a scale from 0 to 255 where 0
> represents maximum performance and 255 represents maximum efficiency.
>
> The amd-pstate driver exports profile string names to userspace that are
> tied to specific EPP values.
>
> The balance_performance string (0x80) provides the best balance for
> efficiency versus power on most systems, but users can choose other
> strings to meet their needs as well.
>
> $ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences
> default performance balance_performance balance_power power
>
> $ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference
> balance_performance
>
> To enable the driver,it needs to add `amd_pstate=active` to kernel
> command line and kernel will load the active mode epp driver
>
> Signed-off-by: Perry Yuan <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 631 ++++++++++++++++++++++++++++++++++-
> include/linux/amd-pstate.h | 35 ++
> 2 files changed, 660 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index c17bd845f5fc..0a521be1be8a 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -37,6 +37,7 @@
> #include <linux/uaccess.h>
> #include <linux/static_call.h>
> #include <linux/amd-pstate.h>
> +#include <linux/cpufreq_common.h>
>
> #include <acpi/processor.h>
> #include <acpi/cppc_acpi.h>
> @@ -59,9 +60,125 @@
> * we disable it by default to go acpi-cpufreq on these processors and add a
> * module parameter to be able to enable it manually for debugging.
> */
> -static struct cpufreq_driver amd_pstate_driver;
> +static bool cppc_active;
> static int cppc_load __initdata;
>
> +static struct cpufreq_driver *default_pstate_driver;
> +static struct amd_cpudata **all_cpu_data;
> +static struct amd_pstate_params global_params;
> +
> +static DEFINE_MUTEX(amd_pstate_limits_lock);
> +static DEFINE_MUTEX(amd_pstate_driver_lock);
> +
> +static bool cppc_boost __read_mostly;
> +
> +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64 cppc_req_cached)
> +{
> + s16 epp;
> + struct cppc_perf_caps perf_caps;
> + int ret;
> +
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + if (!cppc_req_cached) {
> + epp = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> + &cppc_req_cached);
> + if (epp)
> + return epp;
> + }
> + epp = (cppc_req_cached >> 24) & 0xFF;
> + } else {
> + ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
> + if (ret < 0) {
> + pr_debug("Could not retrieve energy perf value (%d)\n", ret);
> + return -EIO;
> + }
> + epp = (s16) perf_caps.energy_perf;

It should align the static_call structure to implement the function. Please
refer amd_pstate_init_perf. I think it can even re-use the init_perf to get
the epp cap value.

> + }
> +
> + return epp;
> +}
> +
> +static int amd_pstate_get_energy_pref_index(struct amd_cpudata *cpudata)
> +{
> + s16 epp;
> + int index = -EINVAL;
> +
> + epp = amd_pstate_get_epp(cpudata, 0);
> + if (epp < 0)
> + return epp;
> +
> + switch (epp) {
> + case HWP_EPP_PERFORMANCE:
> + index = EPP_INDEX_PERFORMANCE;
> + break;
> + case HWP_EPP_BALANCE_PERFORMANCE:
> + index = EPP_INDEX_BALANCE_PERFORMANCE;
> + break;
> + case HWP_EPP_BALANCE_POWERSAVE:
> + index = EPP_INDEX_BALANCE_POWERSAVE;
> + break;
> + case HWP_EPP_POWERSAVE:
> + index = EPP_INDEX_POWERSAVE;
> + break;
> + default:
> + break;
> + }
> +
> + return index;
> +}
> +
> +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp)
> +{
> + int ret;
> + struct cppc_perf_ctrls perf_ctrls;
> +
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + u64 value = READ_ONCE(cpudata->cppc_req_cached);
> +
> + value &= ~GENMASK_ULL(31, 24);
> + value |= (u64)epp << 24;
> + WRITE_ONCE(cpudata->cppc_req_cached, value);
> +
> + ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
> + if (!ret)
> + cpudata->epp_cached = epp;
> + } else {
> + perf_ctrls.energy_perf = epp;
> + ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);

Since the energy_perf is one of members of struct cppc_perf_ctrls, could we
use cppc_set_perf as well?

> + if (ret) {
> + pr_debug("failed to set energy perf value (%d)\n", ret);
> + return ret;
> + }
> + cpudata->epp_cached = epp;
> + }
> +
> + return ret;
> +}

The same with above, the helpers for different cppc types of processors
such as MSR or share memory should be implemented by static_call.

> +
> +static int amd_pstate_set_energy_pref_index(struct amd_cpudata *cpudata,
> + int pref_index)
> +{
> + int epp = -EINVAL;
> + int ret;
> +
> + if (!pref_index) {
> + pr_debug("EPP pref_index is invalid\n");
> + return -EINVAL;
> + }
> +
> + if (epp == -EINVAL)
> + epp = epp_values[pref_index];
> +
> + if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> + pr_debug("EPP cannot be set under performance policy\n");
> + return -EBUSY;
> + }
> +
> + ret = amd_pstate_set_epp(cpudata, epp);
> +
> + return ret;
> +}
> +
> static inline int pstate_enable(bool enable)
> {
> return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable);
> @@ -70,11 +187,21 @@ static inline int pstate_enable(bool enable)
> static int cppc_enable(bool enable)
> {
> int cpu, ret = 0;
> + struct cppc_perf_ctrls perf_ctrls;
>
> for_each_present_cpu(cpu) {
> ret = cppc_set_enable(cpu, enable);
> if (ret)
> return ret;
> +
> + /* Enable autonomous mode for EPP */
> + if (!cppc_active) {
> + /* Set desired perf as zero to allow EPP firmware control */
> + perf_ctrls.desired_perf = 0;
> + ret = cppc_set_perf(cpu, &perf_ctrls);
> + if (ret)
> + return ret;
> + }
> }
>
> return ret;
> @@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
> return;
>
> cpudata->boost_supported = true;
> - amd_pstate_driver.boost_enabled = true;
> + default_pstate_driver->boost_enabled = true;
> }
>
> static void amd_perf_ctl_reset(unsigned int cpu)
> @@ -592,10 +719,61 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
> return sprintf(&buf[0], "%u\n", perf);
> }
>
> +static ssize_t show_energy_performance_available_preferences(
> + struct cpufreq_policy *policy, char *buf)
> +{
> + int i = 0;
> + int offset = 0;
> +
> + while (energy_perf_strings[i] != NULL)
> + offset += sysfs_emit_at(buf, offset, "%s ", energy_perf_strings[i++]);
> +
> + sysfs_emit_at(buf, offset, "\n");
> +
> + return offset;
> +}
> +
> +static ssize_t store_energy_performance_preference(
> + struct cpufreq_policy *policy, const char *buf, size_t count)
> +{
> + struct amd_cpudata *cpudata = policy->driver_data;
> + char str_preference[21];
> + ssize_t ret;
> +
> + ret = sscanf(buf, "%20s", str_preference);
> + if (ret != 1)
> + return -EINVAL;
> +
> + ret = match_string(energy_perf_strings, -1, str_preference);
> + if (ret < 0)
> + return -EINVAL;
> +
> + mutex_lock(&amd_pstate_limits_lock);
> + ret = amd_pstate_set_energy_pref_index(cpudata, ret);
> + mutex_unlock(&amd_pstate_limits_lock);
> +
> + return ret ?: count;
> +}
> +
> +static ssize_t show_energy_performance_preference(
> + struct cpufreq_policy *policy, char *buf)
> +{
> + struct amd_cpudata *cpudata = policy->driver_data;
> + int preference;
> +
> + preference = amd_pstate_get_energy_pref_index(cpudata);
> + if (preference < 0)
> + return preference;
> +
> + return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
> +}
> +
> cpufreq_freq_attr_ro(amd_pstate_max_freq);
> cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>
> cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> +cpufreq_freq_attr_rw(energy_performance_preference);
> +cpufreq_freq_attr_ro(energy_performance_available_preferences);
>
> static struct freq_attr *amd_pstate_attr[] = {
> &amd_pstate_max_freq,
> @@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
> NULL,
> };
>
> +static struct freq_attr *amd_pstate_epp_attr[] = {
> + &amd_pstate_max_freq,
> + &amd_pstate_lowest_nonlinear_freq,
> + &amd_pstate_highest_perf,
> + &energy_performance_preference,
> + &energy_performance_available_preferences,
> + NULL,
> +};
> +
> +static inline void update_boost_state(void)
> +{
> + u64 misc_en;
> + struct amd_cpudata *cpudata;
> +
> + cpudata = all_cpu_data[0];
> + rdmsrl(MSR_K7_HWCR, misc_en);
> + global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);

I won't need introduce the additional cppc_boost_disabled here. The
cpufreq_driver->boost_enabled and cpudata->boost_supported can manage this
function.

I believe it should be the firmware issue that the legacy ACPI Boost state
will impact the frequency of CPPC. Could you move this handling into the
cpufreq_driver->set_boost callback function to enable boost state by
default.

> +}
> +
> +static bool amd_pstate_acpi_pm_profile_server(void)
> +{
> + if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
> + acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
> + return true;
> +
> + return false;
> +}
> +
> +static int amd_pstate_init_cpu(unsigned int cpunum)
> +{
> + struct amd_cpudata *cpudata;
> +
> + cpudata = all_cpu_data[cpunum];
> + if (!cpudata) {
> + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> + if (!cpudata)
> + return -ENOMEM;
> + WRITE_ONCE(all_cpu_data[cpunum], cpudata);
> +
> + cpudata->cpu = cpunum;
> +
> + if (cppc_active) {

The cppc_active is a bit confused here, if we run into amd-pstate driver,
the cppc should be active. I know you want to indicate the different driver
mode you are running. Please use an enumeration type to mark it different
mode such as PASSIVE_MODE, ACTIVE_MODE, and GUIDED_MODE (Wyes proposed).

> + if (amd_pstate_acpi_pm_profile_server())
> + cppc_boost = true;
> + }
> +
> + }
> + cpudata->epp_powersave = -EINVAL;
> + cpudata->epp_policy = 0;
> + pr_debug("controlling: cpu %d\n", cpunum);
> + return 0;
> +}
> +
> +static int __amd_pstate_cpu_init(struct cpufreq_policy *policy)
> +{
> + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> + struct amd_cpudata *cpudata;
> + struct device *dev;
> + int rc;
> + u64 value;
> +
> + rc = amd_pstate_init_cpu(policy->cpu);
> + if (rc)
> + return rc;
> +
> + cpudata = all_cpu_data[policy->cpu];
> +
> + dev = get_cpu_device(policy->cpu);
> + if (!dev)
> + goto free_cpudata1;
> +
> + rc = amd_pstate_init_perf(cpudata);
> + if (rc)
> + goto free_cpudata1;
> +
> + min_freq = amd_get_min_freq(cpudata);
> + max_freq = amd_get_max_freq(cpudata);
> + nominal_freq = amd_get_nominal_freq(cpudata);
> + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> + dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
> + min_freq, max_freq);
> + ret = -EINVAL;
> + goto free_cpudata1;
> + }
> +
> + policy->min = min_freq;
> + policy->max = max_freq;
> +
> + policy->cpuinfo.min_freq = min_freq;
> + policy->cpuinfo.max_freq = max_freq;
> + /* It will be updated by governor */
> + policy->cur = policy->cpuinfo.min_freq;
> +
> + /* Initial processor data capability frequencies */
> + cpudata->max_freq = max_freq;
> + cpudata->min_freq = min_freq;
> + cpudata->nominal_freq = nominal_freq;
> + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> +
> + policy->driver_data = cpudata;
> +
> + update_boost_state();
> + cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
> +
> + policy->min = policy->cpuinfo.min_freq;
> + policy->max = policy->cpuinfo.max_freq;
> +
> + if (boot_cpu_has(X86_FEATURE_CPPC))
> + policy->fast_switch_possible = true;

Please move this line into below if-case.

> +
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, &value);
> + if (ret)
> + return ret;
> + WRITE_ONCE(cpudata->cppc_req_cached, value);
> +
> + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &value);
> + if (ret)
> + return ret;
> + WRITE_ONCE(cpudata->cppc_cap1_cached, value);
> + }
> + amd_pstate_boost_init(cpudata);
> +
> + return 0;
> +
> +free_cpudata1:
> + kfree(cpudata);
> + return ret;
> +}
> +
> +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy)
> +{
> + int ret;
> +
> + ret = __amd_pstate_cpu_init(policy);

I don't see any reason that we need to define a __amd_pstate_cpu_init()
here. Intel P-State driver's __intel_pstate_cpu_init() is used both on
intel_pstate_cpu_init and intel_cpufreq_cpu_init.

> + if (ret)
> + return ret;
> + /*
> + * Set the policy to powersave to provide a valid fallback value in case
> + * the default cpufreq governor is neither powersave nor performance.
> + */
> + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> +
> + return 0;
> +}
> +
> +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy)
> +{
> + pr_debug("CPU %d exiting\n", policy->cpu);
> + policy->fast_switch_possible = false;
> + return 0;
> +}
> +
> +static void amd_pstate_update_max_freq(unsigned int cpu)

Why do you name this function "update max frequency"?

We won't have the differnt cpudata->pstate.max_freq and
cpudata->pstate.turbo_freq on Intel P-State driver.

I think in fact we don't update anything here.

> +{
> + struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);

struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);

> +
> + if (!policy)
> + return;
> +
> + refresh_frequency_limits(policy);
> + cpufreq_cpu_put(policy);
> +}
> +
> +static void amd_pstate_epp_update_limits(unsigned int cpu)
> +{
> + mutex_lock(&amd_pstate_driver_lock);
> + update_boost_state();
> + if (global_params.cppc_boost_disabled) {
> + for_each_possible_cpu(cpu)
> + amd_pstate_update_max_freq(cpu);

This should do nothing in the amd-pstate.

> + } else {
> + cpufreq_update_policy(cpu);
> + }
> + mutex_unlock(&amd_pstate_driver_lock);
> +}
> +
> +static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
> +
> +static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata)
> +{
> + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> + u32 max_limit = (hwp_req & 0xff);
> + u32 min_limit = (hwp_req & 0xff00) >> 8;

We can use cpudata->max_perf and cpudata->min_perf directly.

> + u32 boost_level1;
> +
> + /* If max and min are equal or already at max, nothing to boost */

I believe this is the only case that max_perf == min_perf, not at max.

> + if (max_limit == min_limit)
> + return;
> +
> + /* Set boost max and min to initial value */
> + if (!cpudata->cppc_boost_min)
> + cpudata->cppc_boost_min = min_limit;
> +
> + boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) + min_limit) >> 1);
> +
> + if (cpudata->cppc_boost_min < boost_level1)
> + cpudata->cppc_boost_min = boost_level1;
> + else if (cpudata->cppc_boost_min < AMD_CPPC_NOMINAL_PERF(hwp_cap))
> + cpudata->cppc_boost_min = AMD_CPPC_NOMINAL_PERF(hwp_cap);
> + else if (cpudata->cppc_boost_min == AMD_CPPC_NOMINAL_PERF(hwp_cap))
> + cpudata->cppc_boost_min = max_limit;
> + else
> + return;

Could you please elaborate the reason that you separate the min_perf
(cppc_boost_min) you would like to update into cppc_req register as these
different cases? Why we pick up these cases such as (min + nominal)/2, and
around nominal? What's the help to optimize the final result? - I am
thinking the autonomous mode is handled by SMU firmware, we need to provide
some data that let us know it influences the final result.

> +
> + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> + hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
> + wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);

Do we need an update for share memory processors? In other words, epp is
also supported on share memory processors. - again, we should use
static call to handle the msr and cppc_acpi stuff.

> + cpudata->last_update = cpudata->sample.time;
> +}
> +
> +static inline void amd_pstate_boost_down(struct amd_cpudata *cpudata)
> +{
> + bool expired;
> +
> + if (cpudata->cppc_boost_min) {
> + expired = time_after64(cpudata->sample.time, cpudata->last_update +
> + cppc_boost_hold_time_ns);
> +
> + if (expired) {
> + wrmsrl(MSR_AMD_CPPC_REQ, cpudata->cppc_req_cached);
> + cpudata->cppc_boost_min = 0;
> + }
> + }
> +
> + cpudata->last_update = cpudata->sample.time;
> +}
> +
> +static inline void amd_pstate_boost_update_util(struct amd_cpudata *cpudata,
> + u64 time)
> +{
> + cpudata->sample.time = time;
> + if (smp_processor_id() != cpudata->cpu)
> + return;
> +
> + if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
> + bool do_io = false;
> +
> + cpudata->sched_flags = 0;
> + /*
> + * Set iowait_boost flag and update time. Since IO WAIT flag
> + * is set all the time, we can't just conclude that there is
> + * some IO bound activity is scheduled on this CPU with just
> + * one occurrence. If we receive at least two in two
> + * consecutive ticks, then we treat as boost candidate.
> + * This is leveraged from Intel Pstate driver.

I would like to know whether we can hit this case as well? If we can find
or create a use case to hit it in our platforms, I am fine to add it our
driver as well. If not, I don't suggest it we add them at this moment. I
hope we have verified each code path once we add them into the driver.

> + */
> + if (time_before64(time, cpudata->last_io_update + 2 * TICK_NSEC))
> + do_io = true;
> +
> + cpudata->last_io_update = time;
> +
> + if (do_io)
> + amd_pstate_boost_up(cpudata);
> +
> + } else {
> + amd_pstate_boost_down(cpudata);
> + }
> +}
> +
> +static inline void amd_pstate_cppc_update_hook(struct update_util_data *data,
> + u64 time, unsigned int flags)
> +{
> + struct amd_cpudata *cpudata = container_of(data,
> + struct amd_cpudata, update_util);
> +
> + cpudata->sched_flags |= flags;
> +
> + if (smp_processor_id() == cpudata->cpu)
> + amd_pstate_boost_update_util(cpudata, time);
> +}
> +
> +static void amd_pstate_clear_update_util_hook(unsigned int cpu)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> +
> + if (!cpudata->update_util_set)
> + return;
> +
> + cpufreq_remove_update_util_hook(cpu);
> + cpudata->update_util_set = false;
> + synchronize_rcu();
> +}
> +
> +static void amd_pstate_set_update_util_hook(unsigned int cpu_num)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
> +
> + if (!cppc_boost) {
> + if (cpudata->update_util_set)
> + amd_pstate_clear_update_util_hook(cpudata->cpu);
> + return;
> + }
> +
> + if (cpudata->update_util_set)
> + return;
> +
> + cpudata->sample.time = 0;
> + cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
> + amd_pstate_cppc_update_hook);
> + cpudata->update_util_set = true;
> +}
> +
> +static void amd_pstate_epp_init(unsigned int cpu)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> + u32 max_perf, min_perf;
> + u64 value;
> + s16 epp;
> +
> + max_perf = READ_ONCE(cpudata->highest_perf);
> + min_perf = READ_ONCE(cpudata->lowest_perf);
> +
> + value = READ_ONCE(cpudata->cppc_req_cached);
> +
> + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> + min_perf = max_perf;
> +
> + /* Initial min/max values for CPPC Performance Controls Register */
> + value &= ~AMD_CPPC_MIN_PERF(~0L);
> + value |= AMD_CPPC_MIN_PERF(min_perf);
> +
> + value &= ~AMD_CPPC_MAX_PERF(~0L);
> + value |= AMD_CPPC_MAX_PERF(max_perf);
> +
> + /* CPPC EPP feature require to set zero to the desire perf bit */
> + value &= ~AMD_CPPC_DES_PERF(~0L);
> + value |= AMD_CPPC_DES_PERF(0);
> +
> + if (cpudata->epp_policy == cpudata->policy)
> + goto skip_epp;
> +
> + cpudata->epp_policy = cpudata->policy;
> +
> + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> + epp = amd_pstate_get_epp(cpudata, value);
> + cpudata->epp_powersave = epp;

I didn't see where we should have epp_powersave here. Only initial this,
but it won't be used anywhere.

> + if (epp < 0)
> + goto skip_epp;
> + /* force the epp value to be zero for performance policy */
> + epp = 0;
> + } else {
> + if (cpudata->epp_powersave < 0)
> + goto skip_epp;
> + /* Get BIOS pre-defined epp value */
> + epp = amd_pstate_get_epp(cpudata, value);
> + if (epp)
> + goto skip_epp;
> + epp = cpudata->epp_powersave;
> + }
> + /* Set initial EPP value */
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + value &= ~GENMASK_ULL(31, 24);
> + value |= (u64)epp << 24;
> + }
> +
> +skip_epp:
> + WRITE_ONCE(cpudata->cppc_req_cached, value);
> + amd_pstate_set_epp(cpudata, epp);
> +}
> +
> +static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata)
> +{
> + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> + u32 max_limit = (hwp_cap >> 24) & 0xff;
> +
> + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> + hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
> + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req);
> +}
> +
> +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
> +{
> + struct amd_cpudata *cpudata;
> +
> + if (!policy->cpuinfo.max_freq)
> + return -ENODEV;
> +
> + pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
> + policy->cpuinfo.max_freq, policy->max);
> +
> + cpudata = all_cpu_data[policy->cpu];
> + cpudata->policy = policy->policy;
> +
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + mutex_lock(&amd_pstate_limits_lock);
> +
> + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> + amd_pstate_clear_update_util_hook(policy->cpu);
> + amd_pstate_set_max_limits(cpudata);
> + } else {
> + amd_pstate_set_update_util_hook(policy->cpu);
> + }
> +
> + mutex_unlock(&amd_pstate_limits_lock);
> + }
> + amd_pstate_epp_init(policy->cpu);
> +
> + return 0;
> +}
> +
> +static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> + struct cpufreq_policy_data *policy)
> +{
> + update_boost_state();
> + cpufreq_verify_within_cpu_limits(policy);
> +}
> +
> +static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data *policy)
> +{
> + amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
> + pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy->min);
> + return 0;
> +}

amd_pstate_verify_cpu_policy and amd_pstate_epp_verify_policy can be
squeezed in one function.

> +
> static struct cpufreq_driver amd_pstate_driver = {
> .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
> .verify = amd_pstate_verify,
> @@ -617,8 +1213,20 @@ static struct cpufreq_driver amd_pstate_driver = {
> .attr = amd_pstate_attr,
> };
>
> +static struct cpufreq_driver amd_pstate_epp_driver = {
> + .flags = CPUFREQ_CONST_LOOPS,
> + .verify = amd_pstate_epp_verify_policy,
> + .setpolicy = amd_pstate_epp_set_policy,
> + .init = amd_pstate_epp_cpu_init,
> + .exit = amd_pstate_epp_cpu_exit,
> + .update_limits = amd_pstate_epp_update_limits,
> + .name = "amd_pstate_epp",
> + .attr = amd_pstate_epp_attr,
> +};
> +
> static int __init amd_pstate_init(void)
> {
> + static struct amd_cpudata **cpudata;
> int ret;
>
> if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> @@ -645,7 +1253,8 @@ static int __init amd_pstate_init(void)
> /* capability check */
> if (boot_cpu_has(X86_FEATURE_CPPC)) {
> pr_debug("AMD CPPC MSR based functionality is supported\n");
> - amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> + if (!cppc_active)
> + default_pstate_driver->adjust_perf = amd_pstate_adjust_perf;
> } else {
> pr_debug("AMD CPPC shared memory based functionality is supported\n");
> static_call_update(amd_pstate_enable, cppc_enable);
> @@ -653,6 +1262,10 @@ static int __init amd_pstate_init(void)
> static_call_update(amd_pstate_update_perf, cppc_update_perf);
> }
>
> + cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
> + if (!cpudata)
> + return -ENOMEM;
> + WRITE_ONCE(all_cpu_data, cpudata);

Why we cannot use cpufreq_policy->driver_data to store the cpudata? I
believe the cpudata is per-CPU can be easily get from private data.

> /* enable amd pstate feature */
> ret = amd_pstate_enable(true);
> if (ret) {
> @@ -660,9 +1273,9 @@ static int __init amd_pstate_init(void)
> return ret;
> }
>
> - ret = cpufreq_register_driver(&amd_pstate_driver);
> + ret = cpufreq_register_driver(default_pstate_driver);
> if (ret)
> - pr_err("failed to register amd_pstate_driver with return %d\n",
> + pr_err("failed to register amd pstate driver with return %d\n",
> ret);
>
> return ret;
> @@ -677,8 +1290,14 @@ static int __init amd_pstate_param(char *str)
> if (!strcmp(str, "disable")) {
> cppc_load = 0;
> pr_info("driver is explicitly disabled\n");
> - } else if (!strcmp(str, "passive"))
> + } else if (!strcmp(str, "passive")) {
> cppc_load = 1;
> + default_pstate_driver = &amd_pstate_driver;
> + } else if (!strcmp(str, "active")) {
> + cppc_active = 1;
> + cppc_load = 1;
> + default_pstate_driver = &amd_pstate_epp_driver;
> + }
>
> return 0;
> }
> diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
> index 1c4b8659f171..888af62040f1 100644
> --- a/include/linux/amd-pstate.h
> +++ b/include/linux/amd-pstate.h
> @@ -25,6 +25,7 @@ struct amd_aperf_mperf {
> u64 aperf;
> u64 mperf;
> u64 tsc;
> + u64 time;
> };
>
> /**
> @@ -47,6 +48,18 @@ struct amd_aperf_mperf {
> * @prev: Last Aperf/Mperf/tsc count value read from register
> * @freq: current cpu frequency value
> * @boost_supported: check whether the Processor or SBIOS supports boost mode
> + * @epp_powersave: Last saved CPPC energy performance preference
> + when policy switched to performance
> + * @epp_policy: Last saved policy used to set energy-performance preference
> + * @epp_cached: Cached CPPC energy-performance preference value
> + * @policy: Cpufreq policy value
> + * @sched_flags: Store scheduler flags for possible cross CPU update
> + * @update_util_set: CPUFreq utility callback is set
> + * @last_update: Time stamp of the last performance state update
> + * @cppc_boost_min: Last CPPC boosted min performance state
> + * @cppc_cap1_cached: Cached value of the last CPPC Capabilities MSR
> + * @update_util: Cpufreq utility callback information
> + * @sample: the stored performance sample
> *
> * The amd_cpudata is key private data for each CPU thread in AMD P-State, and
> * represents all the attributes and goals that AMD P-State requests at runtime.
> @@ -72,6 +85,28 @@ struct amd_cpudata {
>
> u64 freq;
> bool boost_supported;
> +
> + /* EPP feature related attributes*/
> + s16 epp_powersave;
> + s16 epp_policy;
> + s16 epp_cached;
> + u32 policy;
> + u32 sched_flags;
> + bool update_util_set;
> + u64 last_update;
> + u64 last_io_update;
> + u32 cppc_boost_min;
> + u64 cppc_cap1_cached;
> + struct update_util_data update_util;
> + struct amd_aperf_mperf sample;
> +};
> +
> +/**
> + * struct amd_pstate_params - global parameters for the performance control
> + * @ cppc_boost_disabled wheher the core performance boost disabled
> + */
> +struct amd_pstate_params {
> + bool cppc_boost_disabled;
> };

This should not be defined in include/linux/amd-pstate.h, because it's only
used in amd-pstate.c.

Thanks,
Ray

2022-12-12 09:11:44

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 06/13] cpufreq: amd-pstate: implement amd pstate cpu online and offline callback

On Thu, Dec 08, 2022 at 07:18:45PM +0800, Yuan, Perry wrote:
> From: Perry Yuan <[email protected]>
>
> Adds online and offline driver callback support to allow cpu cores go
> offline and help to restore the previous working states when core goes
> back online later for EPP driver mode.
>
> Signed-off-by: Perry Yuan <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 89 ++++++++++++++++++++++++++++++++++++
> include/linux/amd-pstate.h | 1 +
> 2 files changed, 90 insertions(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 0a521be1be8a..412accab7bda 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -1186,6 +1186,93 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
> return 0;
> }
>
> +static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata)
> +{
> + struct cppc_perf_ctrls perf_ctrls;
> + u64 value, max_perf;
> + int ret;
> +
> + ret = amd_pstate_enable(true);
> + if (ret)
> + pr_err("failed to enable amd pstate during resume, return %d\n", ret);
> +
> + value = READ_ONCE(cpudata->cppc_req_cached);
> + max_perf = READ_ONCE(cpudata->highest_perf);
> +
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
> + } else {
> + perf_ctrls.max_perf = max_perf;
> + perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
> + cppc_set_perf(cpudata->cpu, &perf_ctrls);
> + }
> +}
> +
> +static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> +
> + pr_debug("AMD CPU Core %d going online\n", cpudata->cpu);
> +
> + if (cppc_active) {
> + amd_pstate_epp_reenable(cpudata);
> + cpudata->suspended = false;
> + }
> +
> + return 0;
> +}
> +
> +static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> + struct cppc_perf_ctrls perf_ctrls;
> + int min_perf;
> + u64 value;
> +
> + min_perf = READ_ONCE(cpudata->lowest_perf);
> + value = READ_ONCE(cpudata->cppc_req_cached);
> +
> + mutex_lock(&amd_pstate_limits_lock);
> + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> + cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN;
> +
> + /* Set max perf same as min perf */
> + value &= ~AMD_CPPC_MAX_PERF(~0L);
> + value |= AMD_CPPC_MAX_PERF(min_perf);
> + value &= ~AMD_CPPC_MIN_PERF(~0L);
> + value |= AMD_CPPC_MIN_PERF(min_perf);
> + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
> + } else {
> + perf_ctrls.desired_perf = 0;
> + perf_ctrls.max_perf = min_perf;
> + perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_POWERSAVE);
> + cppc_set_perf(cpudata->cpu, &perf_ctrls);
> + }

Could you double confirm whether these registers will be cleared or
modified while the CPU cores enter/exit online/offline? I remember Joe gave
a test before, the register value will be saved even it gets back to
idle/offline.

Thanks,
Ray

> + mutex_unlock(&amd_pstate_limits_lock);
> +}
> +
> +static int amd_pstate_cpu_offline(struct cpufreq_policy *policy)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> +
> + pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu);
> +
> + if (cpudata->suspended)
> + return 0;
> +
> + if (cppc_active)
> + amd_pstate_epp_offline(policy);
> +
> + return 0;
> +}
> +
> +static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
> +{
> + amd_pstate_clear_update_util_hook(policy->cpu);
> +
> + return amd_pstate_cpu_offline(policy);
> +}
> +
> static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> struct cpufreq_policy_data *policy)
> {
> @@ -1220,6 +1307,8 @@ static struct cpufreq_driver amd_pstate_epp_driver = {
> .init = amd_pstate_epp_cpu_init,
> .exit = amd_pstate_epp_cpu_exit,
> .update_limits = amd_pstate_epp_update_limits,
> + .offline = amd_pstate_epp_cpu_offline,
> + .online = amd_pstate_epp_cpu_online,
> .name = "amd_pstate_epp",
> .attr = amd_pstate_epp_attr,
> };
> diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
> index 888af62040f1..3dd26a3d104c 100644
> --- a/include/linux/amd-pstate.h
> +++ b/include/linux/amd-pstate.h
> @@ -99,6 +99,7 @@ struct amd_cpudata {
> u64 cppc_cap1_cached;
> struct update_util_data update_util;
> struct amd_aperf_mperf sample;
> + bool suspended;
> };
>
> /**
> --
> 2.34.1
>

2022-12-12 09:41:16

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend and resume callbacks

On Thu, Dec 08, 2022 at 07:18:46PM +0800, Yuan, Perry wrote:
> From: Perry Yuan <[email protected]>
>
> add suspend and resume support for the AMD processors by amd_pstate_epp
> driver instance.
>
> When the CPPC is suspended, EPP driver will set EPP profile to 'power'
> profile and set max/min perf to lowest perf value.
> When resume happens, it will restore the MSR registers with
> previous cached value.
>
> Signed-off-by: Perry Yuan <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 40 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 412accab7bda..ea9255bdc9ac 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -1273,6 +1273,44 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
> return amd_pstate_cpu_offline(policy);
> }
>
> +static int amd_pstate_epp_suspend(struct cpufreq_policy *policy)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> + int ret;
> +
> + /* avoid suspending when EPP is not enabled */
> + if (!cppc_active)
> + return 0;
> +
> + /* set this flag to avoid setting core offline*/
> + cpudata->suspended = true;
> +
> + /* disable CPPC in lowlevel firmware */
> + ret = amd_pstate_enable(false);
> + if (ret)
> + pr_err("failed to suspend, return %d\n", ret);
> +
> + return 0;
> +}
> +
> +static int amd_pstate_epp_resume(struct cpufreq_policy *policy)
> +{
> + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> +
> + if (cpudata->suspended) {
> + mutex_lock(&amd_pstate_limits_lock);
> +
> + /* enable amd pstate from suspend state*/
> + amd_pstate_epp_reenable(cpudata);

The same comment, could you please double confirm whether the perfo_ctrls
registers will be cleared while we execute a round of S3 suspend/resume?

> +
> + mutex_unlock(&amd_pstate_limits_lock);
> +
> + cpudata->suspended = false;
> + }
> +
> + return 0;
> +}
> +
> static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> struct cpufreq_policy_data *policy)
> {
> @@ -1309,6 +1347,8 @@ static struct cpufreq_driver amd_pstate_epp_driver = {
> .update_limits = amd_pstate_epp_update_limits,
> .offline = amd_pstate_epp_cpu_offline,
> .online = amd_pstate_epp_cpu_online,
> + .suspend = amd_pstate_epp_suspend,
> + .resume = amd_pstate_epp_resume,
> .name = "amd_pstate_epp",
> .attr = amd_pstate_epp_attr,
> };
> --
> 2.34.1
>

2022-12-12 10:35:33

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 01/13] ACPI: CPPC: Add AMD pstate energy performance preference cppc control

[AMD Official Use Only - General]

Hi Ray.

> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Monday, December 12, 2022 11:29 AM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 01/13] ACPI: CPPC: Add AMD pstate energy
> performance preference cppc control
>
> On Fri, Dec 09, 2022 at 03:55:28PM +0800, Huang Rui wrote:
> > On Thu, Dec 08, 2022 at 07:18:40PM +0800, Yuan, Perry wrote:
> > > From: Perry Yuan <[email protected]>
> > >
> > > Add support for setting and querying EPP preferences to the generic
> > > CPPC driver. This enables downstream drivers such as amd-pstate to
> > > discover and use these values
> > >
> > > Downstream drivers that want to use the new symbols
> > > cppc_get_epp_caps and cppc_set_epp_perf for querying and setting EPP
> > > preferences will need to call cppc_set_auto_epp to enable the EPP
> function first.
> > >
> > > Signed-off-by: Perry Yuan <[email protected]>
> >
> > Acked-by: Huang Rui <[email protected]>
> >
> > > ---
> > > drivers/acpi/cppc_acpi.c | 114
> > > +++++++++++++++++++++++++++++++++++++--
> > > include/acpi/cppc_acpi.h | 12 +++++
> > > 2 files changed, 121 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > > index 093675b1a1ff..37fa75f25f62 100644
> > > --- a/drivers/acpi/cppc_acpi.c
> > > +++ b/drivers/acpi/cppc_acpi.c
> > > @@ -1093,6 +1093,9 @@ static int cppc_get_perf(int cpunum, enum
> > > cppc_regs reg_idx, u64 *perf) {
> > > struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> > > struct cpc_register_resource *reg;
> > > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > > + int ret = -EINVAL;
> > >
> > > if (!cpc_desc) {
> > > pr_debug("No CPC descriptor for CPU:%d\n", cpunum); @@
> -1102,10
> > > +1105,6 @@ static int cppc_get_perf(int cpunum, enum cppc_regs
> reg_idx, u64 *perf)
> > > reg = &cpc_desc->cpc_regs[reg_idx];
> > >
> > > if (CPC_IN_PCC(reg)) {
> > > - int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > > - struct cppc_pcc_data *pcc_ss_data = NULL;
> > > - int ret = 0;
> > > -
> > > if (pcc_ss_id < 0)
> > > return -EIO;
> > >
> > > @@ -1125,7 +1124,7 @@ static int cppc_get_perf(int cpunum, enum
> > > cppc_regs reg_idx, u64 *perf)
> > >
> > > cpc_read(cpunum, reg, perf);
> > >
> > > - return 0;
> > > + return ret;
> > > }
> > >
> > > /**
> > > @@ -1365,6 +1364,111 @@ int cppc_get_perf_ctrs(int cpunum, struct
> > > cppc_perf_fb_ctrs *perf_fb_ctrs) }
> > > EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
> > >
> > > +/**
> > > + * cppc_get_epp_caps - Get the energy preference register value.
> > > + * @cpunum: CPU from which to get epp preference level.
> > > + * @perf_caps: Return address.
> > > + *
> > > + * Return: 0 for success, -EIO otherwise.
> > > + */
> > > +int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps *perf_caps)
>
> Take a look at the patch again, due to the energy_perf is actually one of the
> members in struct cppc_perf_caps. It's better to modify the existing
> cppc_get_perf_caps() to get the epp value as well.
>
> Thanks,
> Ray

Makes sense, I will change it in the V8.

Perry.

>
> > > +{
> > > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum);
> > > + struct cpc_register_resource *energy_perf_reg;
> > > + u64 energy_perf;
> > > +
> > > + if (!cpc_desc) {
> > > + pr_debug("No CPC descriptor for CPU:%d\n", cpunum);
> > > + return -ENODEV;
> > > + }
> > > +
> > > + energy_perf_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
> > > +
> > > + if (!CPC_SUPPORTED(energy_perf_reg))
> > > + pr_warn_once("energy perf reg update is unsupported!\n");
> > > +
> > > + if (CPC_IN_PCC(energy_perf_reg)) {
> > > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
> > > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > > + int ret = 0;
> > > +
> > > + if (pcc_ss_id < 0)
> > > + return -ENODEV;
> > > +
> > > + pcc_ss_data = pcc_data[pcc_ss_id];
> > > +
> > > + down_write(&pcc_ss_data->pcc_lock);
> > > +
> > > + if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) {
> > > + cpc_read(cpunum, energy_perf_reg, &energy_perf);
> > > + perf_caps->energy_perf = energy_perf;
> > > + } else {
> > > + ret = -EIO;
> > > + }
> > > +
> > > + up_write(&pcc_ss_data->pcc_lock);
> > > +
> > > + return ret;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(cppc_get_epp_caps);
> > > +
> > > +/*
> > > + * Set Energy Performance Preference Register value through
> > > + * Performance Controls Interface
> > > + */
> > > +int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls,
> > > +bool enable) {
> > > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> > > + struct cpc_register_resource *epp_set_reg;
> > > + struct cpc_register_resource *auto_sel_reg;
> > > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > > + int ret = -EINVAL;
> > > +
> > > + if (!cpc_desc) {
> > > + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> > > + return -ENODEV;
> > > + }
> > > +
> > > + auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE];
> > > + epp_set_reg = &cpc_desc->cpc_regs[ENERGY_PERF];
> > > +
> > > + if (CPC_IN_PCC(epp_set_reg) || CPC_IN_PCC(auto_sel_reg)) {
> > > + if (pcc_ss_id < 0) {
> > > + pr_debug("Invalid pcc_ss_id\n");
> > > + return -ENODEV;
> > > + }
> > > +
> > > + if (CPC_SUPPORTED(auto_sel_reg)) {
> > > + ret = cpc_write(cpu, auto_sel_reg, enable);
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + if (CPC_SUPPORTED(epp_set_reg)) {
> > > + ret = cpc_write(cpu, epp_set_reg, perf_ctrls-
> >energy_perf);
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + pcc_ss_data = pcc_data[pcc_ss_id];
> > > +
> > > + down_write(&pcc_ss_data->pcc_lock);
> > > + /* after writing CPC, transfer the ownership of PCC to
> platform */
> > > + ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
> > > + up_write(&pcc_ss_data->pcc_lock);
> > > + } else {
> > > + ret = -ENOTSUPP;
> > > + pr_debug("_CPC in PCC is not supported\n");
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(cppc_set_epp_perf);
> > > +
> > > /**
> > > * cppc_set_enable - Set to enable CPPC on the processor by writing the
> > > * Continuous Performance Control package EnableRegister field.
> > > diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> > > index c5614444031f..a45bb876a19c 100644
> > > --- a/include/acpi/cppc_acpi.h
> > > +++ b/include/acpi/cppc_acpi.h
> > > @@ -108,12 +108,14 @@ struct cppc_perf_caps {
> > > u32 lowest_nonlinear_perf;
> > > u32 lowest_freq;
> > > u32 nominal_freq;
> > > + u32 energy_perf;
> > > };
> > >
> > > struct cppc_perf_ctrls {
> > > u32 max_perf;
> > > u32 min_perf;
> > > u32 desired_perf;
> > > + u32 energy_perf;
> > > };
> > >
> > > struct cppc_perf_fb_ctrs {
> > > @@ -149,6 +151,8 @@ extern bool cpc_ffh_supported(void); extern
> > > bool cpc_supported_by_cpu(void); extern int cpc_read_ffh(int
> > > cpunum, struct cpc_reg *reg, u64 *val); extern int
> > > cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
> > > +extern int cppc_get_epp_caps(int cpunum, struct cppc_perf_caps
> > > +*perf_caps); extern int cppc_set_epp_perf(int cpu, struct
> > > +cppc_perf_ctrls *perf_ctrls, bool enable);
> > > #else /* !CONFIG_ACPI_CPPC_LIB */
> > > static inline int cppc_get_desired_perf(int cpunum, u64
> > > *desired_perf) { @@ -202,6 +206,14 @@ static inline int
> > > cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val) {
> > > return -ENOTSUPP;
> > > }
> > > +static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls
> > > +*perf_ctrls, bool enable) {
> > > + return -ENOTSUPP;
> > > +}
> > > +static inline int cppc_get_epp_caps(int cpunum, struct
> > > +cppc_perf_caps *perf_caps) {
> > > + return -ENOTSUPP;
> > > +}
> > > #endif /* !CONFIG_ACPI_CPPC_LIB */
> > >
> > > #endif /* _CPPC_ACPI_H*/
> > > --
> > > 2.34.1
> > >

2022-12-12 10:43:57

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 08/13] cpufreq: amd-pstate: add frequency dynamic boost sysfs control

On Thu, Dec 08, 2022 at 07:18:47PM +0800, Yuan, Perry wrote:
> From: Perry Yuan <[email protected]>
>
> Add one sysfs entry to control the CPU cores frequency boost state
> The attribute file can allow user to set max performance boosted or
> keeping at normal perf level.
>
> Signed-off-by: Perry Yuan <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 67 ++++++++++++++++++++++++++++++++++--
> 1 file changed, 65 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index ea9255bdc9ac..4cd53c010215 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -69,6 +69,7 @@ static struct amd_pstate_params global_params;
>
> static DEFINE_MUTEX(amd_pstate_limits_lock);
> static DEFINE_MUTEX(amd_pstate_driver_lock);
> +struct kobject *amd_pstate_kobj;
>
> static bool cppc_boost __read_mostly;
>
> @@ -768,12 +769,46 @@ static ssize_t show_energy_performance_preference(
> return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
> }
>
> +static void amd_pstate_update_policies(void)
> +{
> + int cpu;
> +
> + for_each_possible_cpu(cpu)
> + cpufreq_update_policy(cpu);
> +}
> +
> +static ssize_t show_cppc_dynamic_boost(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + return sysfs_emit(buf, "%u\n", cppc_boost);
> +}
> +
> +static ssize_t store_cppc_dynamic_boost(struct kobject *a,
> + struct kobj_attribute *b,
> + const char *buf, size_t count)
> +{
> + bool new_state;
> + int ret;
> +
> + ret = kstrtobool(buf, &new_state);
> + if (ret)
> + return -EINVAL;
> +
> + mutex_lock(&amd_pstate_driver_lock);
> + cppc_boost = !!new_state;
> + amd_pstate_update_policies();
> + mutex_unlock(&amd_pstate_driver_lock);

This patch should wait that after we confirm the iowait boost necessity for
amd-pstate.

Thanks,
Ray

> +
> + return count;
> +}
> +
> cpufreq_freq_attr_ro(amd_pstate_max_freq);
> cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>
> cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> cpufreq_freq_attr_rw(energy_performance_preference);
> cpufreq_freq_attr_ro(energy_performance_available_preferences);
> +define_one_global_rw(cppc_dynamic_boost);
>
> static struct freq_attr *amd_pstate_attr[] = {
> &amd_pstate_max_freq,
> @@ -791,6 +826,15 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
> NULL,
> };
>
> +static struct attribute *pstate_global_attributes[] = {
> + &cppc_dynamic_boost.attr,
> + NULL
> +};
> +
> +static const struct attribute_group amd_pstate_global_attr_group = {
> + .attrs = pstate_global_attributes,
> +};
> +
> static inline void update_boost_state(void)
> {
> u64 misc_en;
> @@ -1404,9 +1448,28 @@ static int __init amd_pstate_init(void)
>
> ret = cpufreq_register_driver(default_pstate_driver);
> if (ret)
> - pr_err("failed to register amd pstate driver with return %d\n",
> - ret);
> + pr_err("failed to register driver with return %d\n", ret);
> +
> + amd_pstate_kobj = kobject_create_and_add("amd-pstate", &cpu_subsys.dev_root->kobj);
> + if (!amd_pstate_kobj) {
> + ret = -EINVAL;
> + pr_err("global sysfs registration failed.\n");
> + goto kobject_free;
> + }
> +
> + ret = sysfs_create_group(amd_pstate_kobj, &amd_pstate_global_attr_group);
> + if (ret) {
> + pr_err("sysfs attribute export failed with error %d.\n", ret);
> + goto global_attr_free;
> + }
> +
> + return ret;
>
> +global_attr_free:
> + kobject_put(amd_pstate_kobj);
> +kobject_free:
> + cpufreq_unregister_driver(default_pstate_driver);
> + kfree(cpudata);
> return ret;
> }
> device_initcall(amd_pstate_init);
> --
> 2.34.1
>

2022-12-12 11:08:36

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 09/13] cpufreq: amd-pstate: add driver working mode status sysfs entry

On Thu, Dec 08, 2022 at 07:18:48PM +0800, Yuan, Perry wrote:
> From: Perry Yuan <[email protected]>
>
> While amd-pstate driver was loaded with specific driver mode, it will
> need to check which mode is enabled for the pstate driver,add this sysfs
> entry to show the current status
>
> $ cat /sys/devices/system/cpu/amd-pstate/status
> active
>
> Meanwhile, user can switch the pstate driver mode with writing mode
> string to sysfs entry as below.
>
> Enable passive mode:
> $ sudo bash -c "echo passive > /sys/devices/system/cpu/amd-pstate/status"
>
> Enable active mode (EPP driver mode):
> $ sudo bash -c "echo active > /sys/devices/system/cpu/amd-pstate/status"
>
> Signed-off-by: Perry Yuan <[email protected]>

I believe you should align with Wyes to send out a unify state or status
API to indicate the state machine of different work mode for AMD P-State
driver including active and guided mode.

Thanks,
Ray

> ---
> drivers/cpufreq/amd-pstate.c | 101 +++++++++++++++++++++++++++++++++++
> 1 file changed, 101 insertions(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 4cd53c010215..c90aee3ee42d 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -64,6 +64,8 @@ static bool cppc_active;
> static int cppc_load __initdata;
>
> static struct cpufreq_driver *default_pstate_driver;
> +static struct cpufreq_driver amd_pstate_epp_driver;
> +static struct cpufreq_driver amd_pstate_driver;
> static struct amd_cpudata **all_cpu_data;
> static struct amd_pstate_params global_params;
>
> @@ -72,6 +74,7 @@ static DEFINE_MUTEX(amd_pstate_driver_lock);
> struct kobject *amd_pstate_kobj;
>
> static bool cppc_boost __read_mostly;
> +static DEFINE_SPINLOCK(cppc_notify_lock);
>
> static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64 cppc_req_cached)
> {
> @@ -629,6 +632,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> policy->driver_data = cpudata;
>
> amd_pstate_boost_init(cpudata);
> + if (!default_pstate_driver->adjust_perf)
> + default_pstate_driver->adjust_perf = amd_pstate_adjust_perf;
>
> return 0;
>
> @@ -802,6 +807,100 @@ static ssize_t store_cppc_dynamic_boost(struct kobject *a,
> return count;
> }
>
> +static ssize_t amd_pstate_show_status(char *buf)
> +{
> + if (!default_pstate_driver)
> + return sysfs_emit(buf, "off\n");
> +
> + return sysfs_emit(buf, "%s\n", default_pstate_driver == &amd_pstate_epp_driver ?
> + "active" : "passive");
> +}
> +
> +static void amd_pstate_clear_update_util_hook(unsigned int cpu);
> +static void amd_pstate_driver_cleanup(void)
> +{
> + unsigned int cpu;
> +
> + cpus_read_lock();
> + for_each_online_cpu(cpu) {
> + if (all_cpu_data[cpu]) {
> + if (default_pstate_driver == &amd_pstate_epp_driver)
> + amd_pstate_clear_update_util_hook(cpu);
> +
> + spin_lock(&cppc_notify_lock);
> + kfree(all_cpu_data[cpu]);
> + WRITE_ONCE(all_cpu_data[cpu], NULL);
> + spin_unlock(&cppc_notify_lock);
> + }
> + }
> + cpus_read_unlock();
> +
> + default_pstate_driver = NULL;
> +}
> +
> +static int amd_pstate_update_status(const char *buf, size_t size)
> +{
> + if (size == 3 && !strncmp(buf, "off", size)) {
> + if (!default_pstate_driver)
> + return -EINVAL;
> +
> + if (cppc_active)
> + return -EBUSY;
> +
> + cpufreq_unregister_driver(default_pstate_driver);
> + amd_pstate_driver_cleanup();
> + return 0;
> + }
> +
> + if (size == 6 && !strncmp(buf, "active", size)) {
> + if (default_pstate_driver) {
> + if (default_pstate_driver == &amd_pstate_epp_driver)
> + return 0;
> + cpufreq_unregister_driver(default_pstate_driver);
> + default_pstate_driver = &amd_pstate_epp_driver;
> + }
> +
> + return cpufreq_register_driver(default_pstate_driver);
> + }
> +
> + if (size == 7 && !strncmp(buf, "passive", size)) {
> + if (default_pstate_driver) {
> + if (default_pstate_driver == &amd_pstate_driver)
> + return 0;
> + cpufreq_unregister_driver(default_pstate_driver);
> + }
> + default_pstate_driver = &amd_pstate_driver;
> + return cpufreq_register_driver(default_pstate_driver);
> + }
> +
> + return -EINVAL;
> +}
> +
> +static ssize_t show_status(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + ssize_t ret;
> +
> + mutex_lock(&amd_pstate_driver_lock);
> + ret = amd_pstate_show_status(buf);
> + mutex_unlock(&amd_pstate_driver_lock);
> +
> + return ret;
> +}
> +
> +static ssize_t store_status(struct kobject *a, struct kobj_attribute *b,
> + const char *buf, size_t count)
> +{
> + char *p = memchr(buf, '\n', count);
> + int ret;
> +
> + mutex_lock(&amd_pstate_driver_lock);
> + ret = amd_pstate_update_status(buf, p ? p - buf : count);
> + mutex_unlock(&amd_pstate_driver_lock);
> +
> + return ret < 0 ? ret : count;
> +}
> +
> cpufreq_freq_attr_ro(amd_pstate_max_freq);
> cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>
> @@ -809,6 +908,7 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> cpufreq_freq_attr_rw(energy_performance_preference);
> cpufreq_freq_attr_ro(energy_performance_available_preferences);
> define_one_global_rw(cppc_dynamic_boost);
> +define_one_global_rw(status);
>
> static struct freq_attr *amd_pstate_attr[] = {
> &amd_pstate_max_freq,
> @@ -828,6 +928,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
>
> static struct attribute *pstate_global_attributes[] = {
> &cppc_dynamic_boost.attr,
> + &status.attr,
> NULL
> };
>
> --
> 2.34.1
>

2022-12-12 15:35:09

by Mario Limonciello

[permalink] [raw]
Subject: RE: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend and resume callbacks

[Public]



> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Monday, December 12, 2022 03:05
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend and
> resume callbacks
>
> On Thu, Dec 08, 2022 at 07:18:46PM +0800, Yuan, Perry wrote:
> > From: Perry Yuan <[email protected]>
> >
> > add suspend and resume support for the AMD processors by
> amd_pstate_epp
> > driver instance.
> >
> > When the CPPC is suspended, EPP driver will set EPP profile to 'power'
> > profile and set max/min perf to lowest perf value.
> > When resume happens, it will restore the MSR registers with
> > previous cached value.
> >
> > Signed-off-by: Perry Yuan <[email protected]>
> > ---
> > drivers/cpufreq/amd-pstate.c | 40
> ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 40 insertions(+)
> >
> > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > index 412accab7bda..ea9255bdc9ac 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -1273,6 +1273,44 @@ static int amd_pstate_epp_cpu_offline(struct
> cpufreq_policy *policy)
> > return amd_pstate_cpu_offline(policy);
> > }
> >
> > +static int amd_pstate_epp_suspend(struct cpufreq_policy *policy)
> > +{
> > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > + int ret;
> > +
> > + /* avoid suspending when EPP is not enabled */
> > + if (!cppc_active)
> > + return 0;
> > +
> > + /* set this flag to avoid setting core offline*/
> > + cpudata->suspended = true;
> > +
> > + /* disable CPPC in lowlevel firmware */
> > + ret = amd_pstate_enable(false);
> > + if (ret)
> > + pr_err("failed to suspend, return %d\n", ret);
> > +
> > + return 0;
> > +}
> > +
> > +static int amd_pstate_epp_resume(struct cpufreq_policy *policy)
> > +{
> > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > +
> > + if (cpudata->suspended) {
> > + mutex_lock(&amd_pstate_limits_lock);
> > +
> > + /* enable amd pstate from suspend state*/
> > + amd_pstate_epp_reenable(cpudata);
>
> The same comment, could you please double confirm whether the
> perfo_ctrls
> registers will be cleared while we execute a round of S3 suspend/resume?

And if they are; identify what is clearing them. It might not be the same for s0i3
and S3.

>
> > +
> > + mutex_unlock(&amd_pstate_limits_lock);
> > +
> > + cpudata->suspended = false;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> > struct cpufreq_policy_data *policy)
> > {
> > @@ -1309,6 +1347,8 @@ static struct cpufreq_driver
> amd_pstate_epp_driver = {
> > .update_limits = amd_pstate_epp_update_limits,
> > .offline = amd_pstate_epp_cpu_offline,
> > .online = amd_pstate_epp_cpu_online,
> > + .suspend = amd_pstate_epp_suspend,
> > + .resume = amd_pstate_epp_resume,
> > .name = "amd_pstate_epp",
> > .attr = amd_pstate_epp_attr,
> > };
> > --
> > 2.34.1
> >

2022-12-18 13:10:31

by Thomas Koch

[permalink] [raw]
Subject: Re: [PATCH v7 08/13] cpufreq: amd-pstate: add frequency dynamic boost sysfs control

Hi Perry,

in amd_pstate active mode, where is the equivalent to
/sys/devices/system/cpu/cpufreq/boost?

Is it /sys/devices/system/cpu/amd-pstate/cppc_dynamic_boost or something
else?

--
Freundliche Grüße / Kind regards,
Thomas Koch

Mail : [email protected]
Web : https://linrunner.de/tlp

2022-12-19 07:51:43

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 08/13] cpufreq: amd-pstate: add frequency dynamic boost sysfs control

[AMD Official Use Only - General]

Hi Thomas

> -----Original Message-----
> From: Thomas Koch <[email protected]>
> Sent: Sunday, December 18, 2022 8:17 PM
> To: Yuan, Perry <[email protected]>
> Cc: Deucher, Alexander <[email protected]>; Sharma, Deepak
> <[email protected]>; Meng, Li (Jassmine) <[email protected]>;
> Limonciello, Mario <[email protected]>; Fontenot, Nathan
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; linux-
> [email protected]; [email protected];
> [email protected]; Huang, Ray <[email protected]>;
> [email protected]; Karny, Wyes <[email protected]>
> Subject: Re: [PATCH v7 08/13] cpufreq: amd-pstate: add frequency dynamic
> boost sysfs control
>
> Hi Perry,
>
> in amd_pstate active mode, where is the equivalent to
> /sys/devices/system/cpu/cpufreq/boost?
>
> Is it /sys/devices/system/cpu/amd-pstate/cppc_dynamic_boost or
> something else?
>
> --
> Freundliche Gr??e / Kind regards,
> Thomas Koch

In the EPP driver mode, there is no frequency boost options to control for user end currently, the boost is created for the iowait boost feature.
The CPU cores scaling range is controlled by the lowlevel firmware.
Next, we will consider adding interface to influence the Min/Max Frequency if customers need that.

Perry.

>
> Mail : [email protected]
> Web :
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinru
> nner.de%2Ftlp&data=05%7C01%7CPerry.Yuan%40amd.com%7C64eaf9320b
> 05445198af08dae0f1d911%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C638069627013800892%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
> wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7
> C%7C%7C&sdata=PQNBnIuU%2B95%2BBqwwqoKosGKhhYYvJSxMfI1vRCaS6
> p8%3D&reserved=0

2022-12-19 09:28:47

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro definition for Energy Preference Performance(EPP)

[AMD Official Use Only - General]

Hi Ray.

> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Monday, December 12, 2022 9:29 AM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 03/13] cpufreq: intel_pstate: use common macro
> definition for Energy Preference Performance(EPP)
>
> On Fri, Dec 09, 2022 at 04:54:54PM +0800, Yuan, Perry wrote:
> > [AMD Official Use Only - General]
> >
> >
> >
> > > -----Original Message-----
> > > From: Huang, Ray <[email protected]>
> > > Sent: Friday, December 9, 2022 4:01 PM
> > > To: Yuan, Perry <[email protected]>
> > > Cc: [email protected]; Limonciello, Mario
> > > <[email protected]>; [email protected]; Sharma,
> Deepak
> > > <[email protected]>; Fontenot, Nathan
> <[email protected]>;
> > > Deucher, Alexander <[email protected]>; Huang, Shimmer
> > > <[email protected]>; Du, Xiaojian <[email protected]>;
> Meng,
> > > Li (Jassmine) <[email protected]>; Karny, Wyes
> <[email protected]>;
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH v7 03/13] cpufreq: intel_pstate: use common
> > > macro definition for Energy Preference Performance(EPP)
> > >
> > > On Thu, Dec 08, 2022 at 07:18:42PM +0800, Yuan, Perry wrote:
> > > > make the energy preference performance strings and profiles using
> > > > one common header for intel_pstate driver, then the amd_pstate epp
> > > > driver can use the common header as well. This will simpify the
> > > > intel_pstate and amd_pstate driver.
> > > >
> > > > Signed-off-by: Perry Yuan <[email protected]>
> > >
> > > Please address the comment in V6:
> > >
> > > https://lore.kernel.org/linux-
> > >
> pm/[email protected]/T/#md503ee2fa32858e6cc9ab4da9ec1b
> 8
> > > 9a6bae6058
> > >
> > > Thanks,
> > > Ray
> >
> > Talked with Mario as well, will fix the build failure and get this changed in
> V8.
> > Thanks for reviewing.
> >
>
> Please make sure you addressed all comment, then send the new version of
> series.

The common code change was made in v8 series, Please take a look if.
Thanks.

2022-12-19 10:56:29

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors

[AMD Official Use Only - General]

HI ray.

> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Monday, December 12, 2022 4:47 PM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP
> support for the AMD processors
>
> On Thu, Dec 08, 2022 at 07:18:44PM +0800, Yuan, Perry wrote:
> > From: Perry Yuan <[email protected]>
> >
> > Add EPP driver support for AMD SoCs which support a dedicated MSR for
> > CPPC. EPP is used by the DPM controller to configure the frequency
> > that a core operates at during short periods of activity.
> >
> > The SoC EPP targets are configured on a scale from 0 to 255 where 0
> > represents maximum performance and 255 represents maximum efficiency.
> >
> > The amd-pstate driver exports profile string names to userspace that
> > are tied to specific EPP values.
> >
> > The balance_performance string (0x80) provides the best balance for
> > efficiency versus power on most systems, but users can choose other
> > strings to meet their needs as well.
> >
> > $ cat
> >
> /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_p
> > references default performance balance_performance balance_power
> power
> >
> > $ cat
> >
> /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference
> > balance_performance
> >
> > To enable the driver,it needs to add `amd_pstate=active` to kernel
> > command line and kernel will load the active mode epp driver
> >
> > Signed-off-by: Perry Yuan <[email protected]>
> > ---
> > drivers/cpufreq/amd-pstate.c | 631
> ++++++++++++++++++++++++++++++++++-
> > include/linux/amd-pstate.h | 35 ++
> > 2 files changed, 660 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/cpufreq/amd-pstate.c
> > b/drivers/cpufreq/amd-pstate.c index c17bd845f5fc..0a521be1be8a
> 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -37,6 +37,7 @@
> > #include <linux/uaccess.h>
> > #include <linux/static_call.h>
> > #include <linux/amd-pstate.h>
> > +#include <linux/cpufreq_common.h>
> >
> > #include <acpi/processor.h>
> > #include <acpi/cppc_acpi.h>
> > @@ -59,9 +60,125 @@
> > * we disable it by default to go acpi-cpufreq on these processors and add
> a
> > * module parameter to be able to enable it manually for debugging.
> > */
> > -static struct cpufreq_driver amd_pstate_driver;
> > +static bool cppc_active;
> > static int cppc_load __initdata;
> >
> > +static struct cpufreq_driver *default_pstate_driver; static struct
> > +amd_cpudata **all_cpu_data; static struct amd_pstate_params
> > +global_params;
> > +
> > +static DEFINE_MUTEX(amd_pstate_limits_lock);
> > +static DEFINE_MUTEX(amd_pstate_driver_lock);
> > +
> > +static bool cppc_boost __read_mostly;
> > +
> > +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64
> > +cppc_req_cached) {
> > + s16 epp;
> > + struct cppc_perf_caps perf_caps;
> > + int ret;
> > +
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + if (!cppc_req_cached) {
> > + epp = rdmsrl_on_cpu(cpudata->cpu,
> MSR_AMD_CPPC_REQ,
> > + &cppc_req_cached);
> > + if (epp)
> > + return epp;
> > + }
> > + epp = (cppc_req_cached >> 24) & 0xFF;
> > + } else {
> > + ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
> > + if (ret < 0) {
> > + pr_debug("Could not retrieve energy perf value
> (%d)\n", ret);
> > + return -EIO;
> > + }
> > + epp = (s16) perf_caps.energy_perf;
>
> It should align the static_call structure to implement the function. Please
> refer amd_pstate_init_perf. I think it can even re-use the init_perf to get the
> epp cap value.

The amd_pstate_init_perf() is only called when driver registering,
However the amd_pstate_get_epp() will be called frequently to update the EPP MSR value and EPP Min/Max limitation.
So I suggest to keep using the amd_pstate_get_epp() to update EPP related values as it is.

Static_call method can do that for MSR and Shared memory API call, but amd_pstate_get_epp() is simple enough for now. No need to make this
Epp update function also using static call method.
Considering the tight schedule and merge window, I would like to keep the current way to update EPP, Otherwise the Customer release schedule will be delayed.

Perry.


>
> > + }
> > +
> > + return epp;
> > +}
> > +
> > +static int amd_pstate_get_energy_pref_index(struct amd_cpudata
> > +*cpudata) {
> > + s16 epp;
> > + int index = -EINVAL;
> > +
> > + epp = amd_pstate_get_epp(cpudata, 0);
> > + if (epp < 0)
> > + return epp;
> > +
> > + switch (epp) {
> > + case HWP_EPP_PERFORMANCE:
> > + index = EPP_INDEX_PERFORMANCE;
> > + break;
> > + case HWP_EPP_BALANCE_PERFORMANCE:
> > + index = EPP_INDEX_BALANCE_PERFORMANCE;
> > + break;
> > + case HWP_EPP_BALANCE_POWERSAVE:
> > + index = EPP_INDEX_BALANCE_POWERSAVE;
> > + break;
> > + case HWP_EPP_POWERSAVE:
> > + index = EPP_INDEX_POWERSAVE;
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > + return index;
> > +}
> > +
> > +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp) {
> > + int ret;
> > + struct cppc_perf_ctrls perf_ctrls;
> > +
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + u64 value = READ_ONCE(cpudata->cppc_req_cached);
> > +
> > + value &= ~GENMASK_ULL(31, 24);
> > + value |= (u64)epp << 24;
> > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > +
> > + ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> value);
> > + if (!ret)
> > + cpudata->epp_cached = epp;
> > + } else {
> > + perf_ctrls.energy_perf = epp;
> > + ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
>
> Since the energy_perf is one of members of struct cppc_perf_ctrls, could we
> use cppc_set_perf as well?

cppc_set_epp_perf() can handle the EPP value update correctly,
cppc_set_perf() is used for desired perf updating with a very high updating rate for governor such as schedutil governor.
And it has two Phase, Phase I and Phase II, implement the EPP value update in this function, I have concern that we will meet some potential
Firmware or performance drop risk.
The release schedule and merge window is closing for v6.2 and this change request happened after six patch review series.
I afraid of that we have no enough time to mitigate the risk for this new code change.
We can consider to continue optimize this in the following patch.

Perry.

>
> > + if (ret) {
> > + pr_debug("failed to set energy perf value (%d)\n",
> ret);
> > + return ret;
> > + }
> > + cpudata->epp_cached = epp;
> > + }
> > +
> > + return ret;
> > +}
>
> The same with above, the helpers for different cppc types of processors such
> as MSR or share memory should be implemented by static_call.
>
> > +
> > +static int amd_pstate_set_energy_pref_index(struct amd_cpudata
> *cpudata,
> > + int pref_index)
> > +{
> > + int epp = -EINVAL;
> > + int ret;
> > +
> > + if (!pref_index) {
> > + pr_debug("EPP pref_index is invalid\n");
> > + return -EINVAL;
> > + }
> > +
> > + if (epp == -EINVAL)
> > + epp = epp_values[pref_index];
> > +
> > + if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> {
> > + pr_debug("EPP cannot be set under performance policy\n");
> > + return -EBUSY;
> > + }
> > +
> > + ret = amd_pstate_set_epp(cpudata, epp);
> > +
> > + return ret;
> > +}
> > +
> > static inline int pstate_enable(bool enable) {
> > return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); @@ -70,11
> +187,21
> > @@ static inline int pstate_enable(bool enable) static int
> > cppc_enable(bool enable) {
> > int cpu, ret = 0;
> > + struct cppc_perf_ctrls perf_ctrls;
> >
> > for_each_present_cpu(cpu) {
> > ret = cppc_set_enable(cpu, enable);
> > if (ret)
> > return ret;
> > +
> > + /* Enable autonomous mode for EPP */
> > + if (!cppc_active) {
> > + /* Set desired perf as zero to allow EPP firmware
> control */
> > + perf_ctrls.desired_perf = 0;
> > + ret = cppc_set_perf(cpu, &perf_ctrls);
> > + if (ret)
> > + return ret;
> > + }
> > }
> >
> > return ret;
> > @@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct
> amd_cpudata *cpudata)
> > return;
> >
> > cpudata->boost_supported = true;
> > - amd_pstate_driver.boost_enabled = true;
> > + default_pstate_driver->boost_enabled = true;
> > }
> >
> > static void amd_perf_ctl_reset(unsigned int cpu) @@ -592,10 +719,61
> > @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy
> *policy,
> > return sprintf(&buf[0], "%u\n", perf); }
> >
> > +static ssize_t show_energy_performance_available_preferences(
> > + struct cpufreq_policy *policy, char *buf) {
> > + int i = 0;
> > + int offset = 0;
> > +
> > + while (energy_perf_strings[i] != NULL)
> > + offset += sysfs_emit_at(buf, offset, "%s ",
> > +energy_perf_strings[i++]);
> > +
> > + sysfs_emit_at(buf, offset, "\n");
> > +
> > + return offset;
> > +}
> > +
> > +static ssize_t store_energy_performance_preference(
> > + struct cpufreq_policy *policy, const char *buf, size_t count) {
> > + struct amd_cpudata *cpudata = policy->driver_data;
> > + char str_preference[21];
> > + ssize_t ret;
> > +
> > + ret = sscanf(buf, "%20s", str_preference);
> > + if (ret != 1)
> > + return -EINVAL;
> > +
> > + ret = match_string(energy_perf_strings, -1, str_preference);
> > + if (ret < 0)
> > + return -EINVAL;
> > +
> > + mutex_lock(&amd_pstate_limits_lock);
> > + ret = amd_pstate_set_energy_pref_index(cpudata, ret);
> > + mutex_unlock(&amd_pstate_limits_lock);
> > +
> > + return ret ?: count;
> > +}
> > +
> > +static ssize_t show_energy_performance_preference(
> > + struct cpufreq_policy *policy, char *buf) {
> > + struct amd_cpudata *cpudata = policy->driver_data;
> > + int preference;
> > +
> > + preference = amd_pstate_get_energy_pref_index(cpudata);
> > + if (preference < 0)
> > + return preference;
> > +
> > + return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]); }
> > +
> > cpufreq_freq_attr_ro(amd_pstate_max_freq);
> > cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> >
> > cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> > +cpufreq_freq_attr_rw(energy_performance_preference);
> > +cpufreq_freq_attr_ro(energy_performance_available_preferences);
> >
> > static struct freq_attr *amd_pstate_attr[] = {
> > &amd_pstate_max_freq,
> > @@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
> > NULL,
> > };
> >
> > +static struct freq_attr *amd_pstate_epp_attr[] = {
> > + &amd_pstate_max_freq,
> > + &amd_pstate_lowest_nonlinear_freq,
> > + &amd_pstate_highest_perf,
> > + &energy_performance_preference,
> > + &energy_performance_available_preferences,
> > + NULL,
> > +};
> > +
> > +static inline void update_boost_state(void) {
> > + u64 misc_en;
> > + struct amd_cpudata *cpudata;
> > +
> > + cpudata = all_cpu_data[0];
> > + rdmsrl(MSR_K7_HWCR, misc_en);
> > + global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);
>
> I won't need introduce the additional cppc_boost_disabled here. The
> cpufreq_driver->boost_enabled and cpudata->boost_supported can manage
> this function.

The cppc_boost_disabled is used to mark if the PMFW Core boost disabled,
If some other driver for example thermal, performance limiting driver disable the core boost.
We need to update the flag to let driver know the boost is disabled.

* boost_supported is used to change CORE freq boost state.
* EPP driver did not use the cpufreq core boost sysfs node. So the boost_enabled is not used here.

>
> I believe it should be the firmware issue that the legacy ACPI Boost state will
> impact the frequency of CPPC. Could you move this handling into the
> cpufreq_driver->set_boost callback function to enable boost state by default.
>
> > +}
> > +
> > +static bool amd_pstate_acpi_pm_profile_server(void)
> > +{
> > + if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
> > + acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static int amd_pstate_init_cpu(unsigned int cpunum) {
> > + struct amd_cpudata *cpudata;
> > +
> > + cpudata = all_cpu_data[cpunum];
> > + if (!cpudata) {
> > + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> > + if (!cpudata)
> > + return -ENOMEM;
> > + WRITE_ONCE(all_cpu_data[cpunum], cpudata);
> > +
> > + cpudata->cpu = cpunum;
> > +
> > + if (cppc_active) {
>
> The cppc_active is a bit confused here, if we run into amd-pstate driver, the
> cppc should be active. I know you want to indicate the different driver mode
> you are running. Please use an enumeration type to mark it different mode
> such as PASSIVE_MODE, ACTIVE_MODE, and GUIDED_MODE (Wyes
> proposed).

Aligned with Wyse, I add one new patch to support enumerated working mode in V8


>
> > + if (amd_pstate_acpi_pm_profile_server())
> > + cppc_boost = true;
> > + }
> > +
> > + }
> > + cpudata->epp_powersave = -EINVAL;
> > + cpudata->epp_policy = 0;
> > + pr_debug("controlling: cpu %d\n", cpunum);
> > + return 0;
> > +}
> > +
> > +static int __amd_pstate_cpu_init(struct cpufreq_policy *policy) {
> > + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> > + struct amd_cpudata *cpudata;
> > + struct device *dev;
> > + int rc;
> > + u64 value;
> > +
> > + rc = amd_pstate_init_cpu(policy->cpu);
> > + if (rc)
> > + return rc;
> > +
> > + cpudata = all_cpu_data[policy->cpu];
> > +
> > + dev = get_cpu_device(policy->cpu);
> > + if (!dev)
> > + goto free_cpudata1;
> > +
> > + rc = amd_pstate_init_perf(cpudata);
> > + if (rc)
> > + goto free_cpudata1;
> > +
> > + min_freq = amd_get_min_freq(cpudata);
> > + max_freq = amd_get_max_freq(cpudata);
> > + nominal_freq = amd_get_nominal_freq(cpudata);
> > + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> > + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> > + dev_err(dev, "min_freq(%d) or max_freq(%d) value is
> incorrect\n",
> > + min_freq, max_freq);
> > + ret = -EINVAL;
> > + goto free_cpudata1;
> > + }
> > +
> > + policy->min = min_freq;
> > + policy->max = max_freq;
> > +
> > + policy->cpuinfo.min_freq = min_freq;
> > + policy->cpuinfo.max_freq = max_freq;
> > + /* It will be updated by governor */
> > + policy->cur = policy->cpuinfo.min_freq;
> > +
> > + /* Initial processor data capability frequencies */
> > + cpudata->max_freq = max_freq;
> > + cpudata->min_freq = min_freq;
> > + cpudata->nominal_freq = nominal_freq;
> > + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> > +
> > + policy->driver_data = cpudata;
> > +
> > + update_boost_state();
> > + cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
> > +
> > + policy->min = policy->cpuinfo.min_freq;
> > + policy->max = policy->cpuinfo.max_freq;
> > +
> > + if (boot_cpu_has(X86_FEATURE_CPPC))
> > + policy->fast_switch_possible = true;
>
> Please move this line into below if-case.


Fixed In V8

>
> > +
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> &value);
> > + if (ret)
> > + return ret;
> > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > +
> > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> &value);
> > + if (ret)
> > + return ret;
> > + WRITE_ONCE(cpudata->cppc_cap1_cached, value);
> > + }
> > + amd_pstate_boost_init(cpudata);
> > +
> > + return 0;
> > +
> > +free_cpudata1:
> > + kfree(cpudata);
> > + return ret;
> > +}
> > +
> > +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) {
> > + int ret;
> > +
> > + ret = __amd_pstate_cpu_init(policy);
>
> I don't see any reason that we need to define a __amd_pstate_cpu_init()
> here. Intel P-State driver's __intel_pstate_cpu_init() is used both on
> intel_pstate_cpu_init and intel_cpufreq_cpu_init.

Fixed in V8.

>
> > + if (ret)
> > + return ret;
> > + /*
> > + * Set the policy to powersave to provide a valid fallback value in
> case
> > + * the default cpufreq governor is neither powersave nor
> performance.
> > + */
> > + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> > +
> > + return 0;
> > +}
> > +
> > +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) {
> > + pr_debug("CPU %d exiting\n", policy->cpu);
> > + policy->fast_switch_possible = false;
> > + return 0;
> > +}
> > +
> > +static void amd_pstate_update_max_freq(unsigned int cpu)
>
> Why do you name this function "update max frequency"?
>
> We won't have the differnt cpudata->pstate.max_freq and
> cpudata->pstate.turbo_freq on Intel P-State driver.
>
> I think in fact we don't update anything here.

When core frequency was disabled, the function will update the frequency limits.
Currently the boost sysfs is not added, so the max freq is not changed.
Could we keep the code for the coming patch to add the sysfs node for boost control ?
It has no harm to the EPP driver.

>
> > +{
> > + struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);
>
> struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
>
> > +
> > + if (!policy)
> > + return;
> > +
> > + refresh_frequency_limits(policy);
> > + cpufreq_cpu_put(policy);
> > +}
> > +
> > +static void amd_pstate_epp_update_limits(unsigned int cpu) {
> > + mutex_lock(&amd_pstate_driver_lock);
> > + update_boost_state();
> > + if (global_params.cppc_boost_disabled) {
> > + for_each_possible_cpu(cpu)
> > + amd_pstate_update_max_freq(cpu);
>
> This should do nothing in the amd-pstate.

Currently the boost sysfs is not added, so the max freq is not changed.
Could we keep the code for the coming patch to add the sysfs node for boost control ?
It has no harm to the EPP driver.

>
> > + } else {
> > + cpufreq_update_policy(cpu);
> > + }
> > + mutex_unlock(&amd_pstate_driver_lock);
> > +}
> > +
> > +static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
> > +
> > +static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata) {
> > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > + u32 max_limit = (hwp_req & 0xff);
> > + u32 min_limit = (hwp_req & 0xff00) >> 8;
>
> We can use cpudata->max_perf and cpudata->min_perf directly.

Iowait boost code removed from V8.

>
> > + u32 boost_level1;
> > +
> > + /* If max and min are equal or already at max, nothing to boost */
>
> I believe this is the only case that max_perf == min_perf, not at max.

Iowait boost code removed from V8.

>
> > + if (max_limit == min_limit)
> > + return;
> > +
> > + /* Set boost max and min to initial value */
> > + if (!cpudata->cppc_boost_min)
> > + cpudata->cppc_boost_min = min_limit;
> > +
> > + boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) +
> min_limit) >> 1);
> > +
> > + if (cpudata->cppc_boost_min < boost_level1)
> > + cpudata->cppc_boost_min = boost_level1;
> > + else if (cpudata->cppc_boost_min <
> AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > + cpudata->cppc_boost_min =
> AMD_CPPC_NOMINAL_PERF(hwp_cap);
> > + else if (cpudata->cppc_boost_min ==
> AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > + cpudata->cppc_boost_min = max_limit;
> > + else
> > + return;
>
> Could you please elaborate the reason that you separate the min_perf
> (cppc_boost_min) you would like to update into cppc_req register as these
> different cases? Why we pick up these cases such as (min + nominal)/2, and
> around nominal? What's the help to optimize the final result? - I am thinking
> the autonomous mode is handled by SMU firmware, we need to provide
> some data that let us know it influences the final result.
>

Iowait boost code removed from V8.

> > +
> > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > + hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
> > + wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);
>
> Do we need an update for share memory processors? In other words, epp is
> also supported on share memory processors. - again, we should use static
> call to handle the msr and cppc_acpi stuff.
>
> > + cpudata->last_update = cpudata->sample.time; }
> > +
> > +static inline void amd_pstate_boost_down(struct amd_cpudata *cpudata)
> > +{
> > + bool expired;
> > +
> > + if (cpudata->cppc_boost_min) {
> > + expired = time_after64(cpudata->sample.time, cpudata-
> >last_update +
> > + cppc_boost_hold_time_ns);
> > +
> > + if (expired) {
> > + wrmsrl(MSR_AMD_CPPC_REQ, cpudata-
> >cppc_req_cached);
> > + cpudata->cppc_boost_min = 0;
> > + }
> > + }
> > +
> > + cpudata->last_update = cpudata->sample.time; }
> > +
> > +static inline void amd_pstate_boost_update_util(struct amd_cpudata
> *cpudata,
> > + u64 time)
> > +{
> > + cpudata->sample.time = time;
> > + if (smp_processor_id() != cpudata->cpu)
> > + return;
> > +
> > + if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
> > + bool do_io = false;
> > +
> > + cpudata->sched_flags = 0;
> > + /*
> > + * Set iowait_boost flag and update time. Since IO WAIT flag
> > + * is set all the time, we can't just conclude that there is
> > + * some IO bound activity is scheduled on this CPU with just
> > + * one occurrence. If we receive at least two in two
> > + * consecutive ticks, then we treat as boost candidate.
> > + * This is leveraged from Intel Pstate driver.
>
> I would like to know whether we can hit this case as well? If we can find or
> create a use case to hit it in our platforms, I am fine to add it our driver as
> well. If not, I don't suggest it we add them at this moment. I hope we have
> verified each code path once we add them into the driver.

Sure, no problem.
Iowait boost code removed from V8.


>
> > + */
> > + if (time_before64(time, cpudata->last_io_update + 2 *
> TICK_NSEC))
> > + do_io = true;
> > +
> > + cpudata->last_io_update = time;
> > +
> > + if (do_io)
> > + amd_pstate_boost_up(cpudata);
> > +
> > + } else {
> > + amd_pstate_boost_down(cpudata);
> > + }
> > +}
> > +
> > +static inline void amd_pstate_cppc_update_hook(struct update_util_data
> *data,
> > + u64 time, unsigned int flags)
> > +{
> > + struct amd_cpudata *cpudata = container_of(data,
> > + struct amd_cpudata, update_util);
> > +
> > + cpudata->sched_flags |= flags;
> > +
> > + if (smp_processor_id() == cpudata->cpu)
> > + amd_pstate_boost_update_util(cpudata, time); }
> > +
> > +static void amd_pstate_clear_update_util_hook(unsigned int cpu) {
> > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > +
> > + if (!cpudata->update_util_set)
> > + return;
> > +
> > + cpufreq_remove_update_util_hook(cpu);
> > + cpudata->update_util_set = false;
> > + synchronize_rcu();
> > +}
> > +
> > +static void amd_pstate_set_update_util_hook(unsigned int cpu_num) {
> > + struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
> > +
> > + if (!cppc_boost) {
> > + if (cpudata->update_util_set)
> > + amd_pstate_clear_update_util_hook(cpudata->cpu);
> > + return;
> > + }
> > +
> > + if (cpudata->update_util_set)
> > + return;
> > +
> > + cpudata->sample.time = 0;
> > + cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
> > +
> amd_pstate_cppc_update_hook);
> > + cpudata->update_util_set = true;
> > +}
> > +
> > +static void amd_pstate_epp_init(unsigned int cpu) {
> > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > + u32 max_perf, min_perf;
> > + u64 value;
> > + s16 epp;
> > +
> > + max_perf = READ_ONCE(cpudata->highest_perf);
> > + min_perf = READ_ONCE(cpudata->lowest_perf);
> > +
> > + value = READ_ONCE(cpudata->cppc_req_cached);
> > +
> > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > + min_perf = max_perf;
> > +
> > + /* Initial min/max values for CPPC Performance Controls Register */
> > + value &= ~AMD_CPPC_MIN_PERF(~0L);
> > + value |= AMD_CPPC_MIN_PERF(min_perf);
> > +
> > + value &= ~AMD_CPPC_MAX_PERF(~0L);
> > + value |= AMD_CPPC_MAX_PERF(max_perf);
> > +
> > + /* CPPC EPP feature require to set zero to the desire perf bit */
> > + value &= ~AMD_CPPC_DES_PERF(~0L);
> > + value |= AMD_CPPC_DES_PERF(0);
> > +
> > + if (cpudata->epp_policy == cpudata->policy)
> > + goto skip_epp;
> > +
> > + cpudata->epp_policy = cpudata->policy;
> > +
> > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > + epp = amd_pstate_get_epp(cpudata, value);
> > + cpudata->epp_powersave = epp;
>
> I didn't see where we should have epp_powersave here. Only initial this, but
> it won't be used anywhere.

epp_powersave var was removed from V8.


>
> > + if (epp < 0)
> > + goto skip_epp;
> > + /* force the epp value to be zero for performance policy */
> > + epp = 0;
> > + } else {
> > + if (cpudata->epp_powersave < 0)
> > + goto skip_epp;
> > + /* Get BIOS pre-defined epp value */
> > + epp = amd_pstate_get_epp(cpudata, value);
> > + if (epp)
> > + goto skip_epp;
> > + epp = cpudata->epp_powersave;
> > + }
> > + /* Set initial EPP value */
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + value &= ~GENMASK_ULL(31, 24);
> > + value |= (u64)epp << 24;
> > + }
> > +
> > +skip_epp:
> > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > + amd_pstate_set_epp(cpudata, epp);
> > +}
> > +
> > +static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata) {
> > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > + u32 max_limit = (hwp_cap >> 24) & 0xff;
> > +
> > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > + hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
> > + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req); }
> > +
> > +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) {
> > + struct amd_cpudata *cpudata;
> > +
> > + if (!policy->cpuinfo.max_freq)
> > + return -ENODEV;
> > +
> > + pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
> > + policy->cpuinfo.max_freq, policy->max);
> > +
> > + cpudata = all_cpu_data[policy->cpu];
> > + cpudata->policy = policy->policy;
> > +
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + mutex_lock(&amd_pstate_limits_lock);
> > +
> > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > + amd_pstate_clear_update_util_hook(policy->cpu);
> > + amd_pstate_set_max_limits(cpudata);
> > + } else {
> > + amd_pstate_set_update_util_hook(policy->cpu);
> > + }
> > +
> > + mutex_unlock(&amd_pstate_limits_lock);
> > + }
> > + amd_pstate_epp_init(policy->cpu);
> > +
> > + return 0;
> > +}
> > +
> > +static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> > + struct cpufreq_policy_data *policy)
> {
> > + update_boost_state();
> > + cpufreq_verify_within_cpu_limits(policy);
> > +}
> > +
> > +static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data
> > +*policy) {
> > + amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
> > + pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy-
> >min);
> > + return 0;
> > +}
>
> amd_pstate_verify_cpu_policy and amd_pstate_epp_verify_policy can be
> squeezed in one function.

Fixed in V8.

>
> > +
> > static struct cpufreq_driver amd_pstate_driver = {
> > .flags = CPUFREQ_CONST_LOOPS |
> CPUFREQ_NEED_UPDATE_LIMITS,
> > .verify = amd_pstate_verify,
> > @@ -617,8 +1213,20 @@ static struct cpufreq_driver amd_pstate_driver =
> {
> > .attr = amd_pstate_attr,
> > };
> >
> > +static struct cpufreq_driver amd_pstate_epp_driver = {
> > + .flags = CPUFREQ_CONST_LOOPS,
> > + .verify = amd_pstate_epp_verify_policy,
> > + .setpolicy = amd_pstate_epp_set_policy,
> > + .init = amd_pstate_epp_cpu_init,
> > + .exit = amd_pstate_epp_cpu_exit,
> > + .update_limits = amd_pstate_epp_update_limits,
> > + .name = "amd_pstate_epp",
> > + .attr = amd_pstate_epp_attr,
> > +};
> > +
> > static int __init amd_pstate_init(void) {
> > + static struct amd_cpudata **cpudata;
> > int ret;
> >
> > if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) @@ -645,7
> +1253,8 @@
> > static int __init amd_pstate_init(void)
> > /* capability check */
> > if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > pr_debug("AMD CPPC MSR based functionality is
> supported\n");
> > - amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> > + if (!cppc_active)
> > + default_pstate_driver->adjust_perf =
> amd_pstate_adjust_perf;
> > } else {
> > pr_debug("AMD CPPC shared memory based functionality is
> supported\n");
> > static_call_update(amd_pstate_enable, cppc_enable); @@ -
> 653,6
> > +1262,10 @@ static int __init amd_pstate_init(void)
> > static_call_update(amd_pstate_update_perf,
> cppc_update_perf);
> > }
> >
> > + cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
> > + if (!cpudata)
> > + return -ENOMEM;
> > + WRITE_ONCE(all_cpu_data, cpudata);
>
> Why we cannot use cpufreq_policy->driver_data to store the cpudata? I
> believe the cpudata is per-CPU can be easily get from private data.

cpufreq_policy->driver_data can do that, but global data can have better cached hit rate especially
for the server cluster.
So I would prefer to use the global cpudata to store each CPU data struct.
Could we keep it for EPP driver ?


>
> > /* enable amd pstate feature */
> > ret = amd_pstate_enable(true);
> > if (ret) {
> > @@ -660,9 +1273,9 @@ static int __init amd_pstate_init(void)
> > return ret;
> > }
> >
> > - ret = cpufreq_register_driver(&amd_pstate_driver);
> > + ret = cpufreq_register_driver(default_pstate_driver);
> > if (ret)
> > - pr_err("failed to register amd_pstate_driver with
> return %d\n",
> > + pr_err("failed to register amd pstate driver with
> return %d\n",
> > ret);
> >
> > return ret;
> > @@ -677,8 +1290,14 @@ static int __init amd_pstate_param(char *str)
> > if (!strcmp(str, "disable")) {
> > cppc_load = 0;
> > pr_info("driver is explicitly disabled\n");
> > - } else if (!strcmp(str, "passive"))
> > + } else if (!strcmp(str, "passive")) {
> > cppc_load = 1;
> > + default_pstate_driver = &amd_pstate_driver;
> > + } else if (!strcmp(str, "active")) {
> > + cppc_active = 1;
> > + cppc_load = 1;
> > + default_pstate_driver = &amd_pstate_epp_driver;
> > + }
> >
> > return 0;
> > }
> > diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
> > index 1c4b8659f171..888af62040f1 100644
> > --- a/include/linux/amd-pstate.h
> > +++ b/include/linux/amd-pstate.h
> > @@ -25,6 +25,7 @@ struct amd_aperf_mperf {
> > u64 aperf;
> > u64 mperf;
> > u64 tsc;
> > + u64 time;
> > };
> >
> > /**
> > @@ -47,6 +48,18 @@ struct amd_aperf_mperf {
> > * @prev: Last Aperf/Mperf/tsc count value read from register
> > * @freq: current cpu frequency value
> > * @boost_supported: check whether the Processor or SBIOS supports
> > boost mode
> > + * @epp_powersave: Last saved CPPC energy performance preference
> > + when policy switched to performance
> > + * @epp_policy: Last saved policy used to set energy-performance
> > +preference
> > + * @epp_cached: Cached CPPC energy-performance preference value
> > + * @policy: Cpufreq policy value
> > + * @sched_flags: Store scheduler flags for possible cross CPU update
> > + * @update_util_set: CPUFreq utility callback is set
> > + * @last_update: Time stamp of the last performance state update
> > + * @cppc_boost_min: Last CPPC boosted min performance state
> > + * @cppc_cap1_cached: Cached value of the last CPPC Capabilities MSR
> > + * @update_util: Cpufreq utility callback information
> > + * @sample: the stored performance sample
> > *
> > * The amd_cpudata is key private data for each CPU thread in AMD P-
> State, and
> > * represents all the attributes and goals that AMD P-State requests at
> runtime.
> > @@ -72,6 +85,28 @@ struct amd_cpudata {
> >
> > u64 freq;
> > bool boost_supported;
> > +
> > + /* EPP feature related attributes*/
> > + s16 epp_powersave;
> > + s16 epp_policy;
> > + s16 epp_cached;
> > + u32 policy;
> > + u32 sched_flags;
> > + bool update_util_set;
> > + u64 last_update;
> > + u64 last_io_update;
> > + u32 cppc_boost_min;
> > + u64 cppc_cap1_cached;
> > + struct update_util_data update_util;
> > + struct amd_aperf_mperf sample;
> > +};
> > +
> > +/**
> > + * struct amd_pstate_params - global parameters for the performance
> > +control
> > + * @ cppc_boost_disabled wheher the core performance boost disabled
> > +*/ struct amd_pstate_params {
> > + bool cppc_boost_disabled;
> > };
>
> This should not be defined in include/linux/amd-pstate.h, because it's only
> used in amd-pstate.c.

Moved into amd-pstate.c, Fixed in V8.

>
> Thanks,
> Ray

2022-12-19 11:41:56

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 06/13] cpufreq: amd-pstate: implement amd pstate cpu online and offline callback

[AMD Official Use Only - General]



> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Monday, December 12, 2022 5:02 PM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 06/13] cpufreq: amd-pstate: implement amd pstate
> cpu online and offline callback
>
> On Thu, Dec 08, 2022 at 07:18:45PM +0800, Yuan, Perry wrote:
> > From: Perry Yuan <[email protected]>
> >
> > Adds online and offline driver callback support to allow cpu cores go
> > offline and help to restore the previous working states when core goes
> > back online later for EPP driver mode.
> >
> > Signed-off-by: Perry Yuan <[email protected]>
> > ---
> > drivers/cpufreq/amd-pstate.c | 89
> ++++++++++++++++++++++++++++++++++++
> > include/linux/amd-pstate.h | 1 +
> > 2 files changed, 90 insertions(+)
> >
> > diff --git a/drivers/cpufreq/amd-pstate.c
> > b/drivers/cpufreq/amd-pstate.c index 0a521be1be8a..412accab7bda
> 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -1186,6 +1186,93 @@ static int amd_pstate_epp_set_policy(struct
> cpufreq_policy *policy)
> > return 0;
> > }
> >
> > +static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata) {
> > + struct cppc_perf_ctrls perf_ctrls;
> > + u64 value, max_perf;
> > + int ret;
> > +
> > + ret = amd_pstate_enable(true);
> > + if (ret)
> > + pr_err("failed to enable amd pstate during resume,
> return %d\n",
> > +ret);
> > +
> > + value = READ_ONCE(cpudata->cppc_req_cached);
> > + max_perf = READ_ONCE(cpudata->highest_perf);
> > +
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
> > + } else {
> > + perf_ctrls.max_perf = max_perf;
> > + perf_ctrls.energy_perf =
> AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
> > + cppc_set_perf(cpudata->cpu, &perf_ctrls);
> > + }
> > +}
> > +
> > +static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) {
> > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > +
> > + pr_debug("AMD CPU Core %d going online\n", cpudata->cpu);
> > +
> > + if (cppc_active) {
> > + amd_pstate_epp_reenable(cpudata);
> > + cpudata->suspended = false;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void amd_pstate_epp_offline(struct cpufreq_policy *policy) {
> > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > + struct cppc_perf_ctrls perf_ctrls;
> > + int min_perf;
> > + u64 value;
> > +
> > + min_perf = READ_ONCE(cpudata->lowest_perf);
> > + value = READ_ONCE(cpudata->cppc_req_cached);
> > +
> > + mutex_lock(&amd_pstate_limits_lock);
> > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > + cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN;
> > +
> > + /* Set max perf same as min perf */
> > + value &= ~AMD_CPPC_MAX_PERF(~0L);
> > + value |= AMD_CPPC_MAX_PERF(min_perf);
> > + value &= ~AMD_CPPC_MIN_PERF(~0L);
> > + value |= AMD_CPPC_MIN_PERF(min_perf);
> > + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
> > + } else {
> > + perf_ctrls.desired_perf = 0;
> > + perf_ctrls.max_perf = min_perf;
> > + perf_ctrls.energy_perf =
> AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_POWERSAVE);
> > + cppc_set_perf(cpudata->cpu, &perf_ctrls);
> > + }
>
> Could you double confirm whether these registers will be cleared or
> modified while the CPU cores enter/exit online/offline? I remember Joe gave
> a test before, the register value will be saved even it gets back to idle/offline.
>
> Thanks,
> Ray

We cannot guarantee the MSR values are not changed after suspended for each BIOS/CPU combination.
But we can save and restore the MSR content for suspend/resume.
So to be safe, save the register to be restored is more safe here.

The key point is that, we need to set the core to be lowest perf when it is offline for power saving.
And restore the MSR value after bring back online.
That is the reason why driver save/restore the MSR here.


>
> > + mutex_unlock(&amd_pstate_limits_lock);
> > +}
> > +
> > +static int amd_pstate_cpu_offline(struct cpufreq_policy *policy) {
> > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > +
> > + pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu);
> > +
> > + if (cpudata->suspended)
> > + return 0;
> > +
> > + if (cppc_active)
> > + amd_pstate_epp_offline(policy);
> > +
> > + return 0;
> > +}
> > +
> > +static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
> > +{
> > + amd_pstate_clear_update_util_hook(policy->cpu);
> > +
> > + return amd_pstate_cpu_offline(policy); }
> > +
> > static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> > struct cpufreq_policy_data *policy)
> { @@ -1220,6 +1307,8 @@
> > static struct cpufreq_driver amd_pstate_epp_driver = {
> > .init = amd_pstate_epp_cpu_init,
> > .exit = amd_pstate_epp_cpu_exit,
> > .update_limits = amd_pstate_epp_update_limits,
> > + .offline = amd_pstate_epp_cpu_offline,
> > + .online = amd_pstate_epp_cpu_online,
> > .name = "amd_pstate_epp",
> > .attr = amd_pstate_epp_attr,
> > };
> > diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
> > index 888af62040f1..3dd26a3d104c 100644
> > --- a/include/linux/amd-pstate.h
> > +++ b/include/linux/amd-pstate.h
> > @@ -99,6 +99,7 @@ struct amd_cpudata {
> > u64 cppc_cap1_cached;
> > struct update_util_data update_util;
> > struct amd_aperf_mperf sample;
> > + bool suspended;
> > };
> >
> > /**
> > --
> > 2.34.1
> >

2022-12-19 11:43:38

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 09/13] cpufreq: amd-pstate: add driver working mode status sysfs entry

[AMD Official Use Only - General]



> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Monday, December 12, 2022 6:24 PM
> To: Yuan, Perry <[email protected]>
> Cc: [email protected]; Limonciello, Mario
> <[email protected]>; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 09/13] cpufreq: amd-pstate: add driver working
> mode status sysfs entry
>
> On Thu, Dec 08, 2022 at 07:18:48PM +0800, Yuan, Perry wrote:
> > From: Perry Yuan <[email protected]>
> >
> > While amd-pstate driver was loaded with specific driver mode, it will
> > need to check which mode is enabled for the pstate driver,add this
> > sysfs entry to show the current status
> >
> > $ cat /sys/devices/system/cpu/amd-pstate/status
> > active
> >
> > Meanwhile, user can switch the pstate driver mode with writing mode
> > string to sysfs entry as below.
> >
> > Enable passive mode:
> > $ sudo bash -c "echo passive > /sys/devices/system/cpu/amd-
> pstate/status"
> >
> > Enable active mode (EPP driver mode):
> > $ sudo bash -c "echo active > /sys/devices/system/cpu/amd-pstate/status"
> >
> > Signed-off-by: Perry Yuan <[email protected]>
>
> I believe you should align with Wyes to send out a unify state or status API to
> indicate the state machine of different work mode for AMD P-State driver
> including active and guided mode.
>
> Thanks,
> Ray
>

Aligned with Wyse and I add one new working mode switch support patch in V8 adding epp,guide,passive mode in it.

> > ---
> > drivers/cpufreq/amd-pstate.c | 101
> > +++++++++++++++++++++++++++++++++++
> > 1 file changed, 101 insertions(+)
> >
> > diff --git a/drivers/cpufreq/amd-pstate.c
> > b/drivers/cpufreq/amd-pstate.c index 4cd53c010215..c90aee3ee42d
> 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -64,6 +64,8 @@ static bool cppc_active; static int cppc_load
> > __initdata;
> >
> > static struct cpufreq_driver *default_pstate_driver;
> > +static struct cpufreq_driver amd_pstate_epp_driver; static struct
> > +cpufreq_driver amd_pstate_driver;
> > static struct amd_cpudata **all_cpu_data; static struct
> > amd_pstate_params global_params;
> >
> > @@ -72,6 +74,7 @@ static DEFINE_MUTEX(amd_pstate_driver_lock);
> > struct kobject *amd_pstate_kobj;
> >
> > static bool cppc_boost __read_mostly;
> > +static DEFINE_SPINLOCK(cppc_notify_lock);
> >
> > static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64
> > cppc_req_cached) { @@ -629,6 +632,8 @@ static int
> > amd_pstate_cpu_init(struct cpufreq_policy *policy)
> > policy->driver_data = cpudata;
> >
> > amd_pstate_boost_init(cpudata);
> > + if (!default_pstate_driver->adjust_perf)
> > + default_pstate_driver->adjust_perf =
> amd_pstate_adjust_perf;
> >
> > return 0;
> >
> > @@ -802,6 +807,100 @@ static ssize_t store_cppc_dynamic_boost(struct
> kobject *a,
> > return count;
> > }
> >
> > +static ssize_t amd_pstate_show_status(char *buf) {
> > + if (!default_pstate_driver)
> > + return sysfs_emit(buf, "off\n");
> > +
> > + return sysfs_emit(buf, "%s\n", default_pstate_driver ==
> &amd_pstate_epp_driver ?
> > + "active" : "passive");
> > +}
> > +
> > +static void amd_pstate_clear_update_util_hook(unsigned int cpu);
> > +static void amd_pstate_driver_cleanup(void) {
> > + unsigned int cpu;
> > +
> > + cpus_read_lock();
> > + for_each_online_cpu(cpu) {
> > + if (all_cpu_data[cpu]) {
> > + if (default_pstate_driver ==
> &amd_pstate_epp_driver)
> > + amd_pstate_clear_update_util_hook(cpu);
> > +
> > + spin_lock(&cppc_notify_lock);
> > + kfree(all_cpu_data[cpu]);
> > + WRITE_ONCE(all_cpu_data[cpu], NULL);
> > + spin_unlock(&cppc_notify_lock);
> > + }
> > + }
> > + cpus_read_unlock();
> > +
> > + default_pstate_driver = NULL;
> > +}
> > +
> > +static int amd_pstate_update_status(const char *buf, size_t size) {
> > + if (size == 3 && !strncmp(buf, "off", size)) {
> > + if (!default_pstate_driver)
> > + return -EINVAL;
> > +
> > + if (cppc_active)
> > + return -EBUSY;
> > +
> > + cpufreq_unregister_driver(default_pstate_driver);
> > + amd_pstate_driver_cleanup();
> > + return 0;
> > + }
> > +
> > + if (size == 6 && !strncmp(buf, "active", size)) {
> > + if (default_pstate_driver) {
> > + if (default_pstate_driver ==
> &amd_pstate_epp_driver)
> > + return 0;
> > + cpufreq_unregister_driver(default_pstate_driver);
> > + default_pstate_driver = &amd_pstate_epp_driver;
> > + }
> > +
> > + return cpufreq_register_driver(default_pstate_driver);
> > + }
> > +
> > + if (size == 7 && !strncmp(buf, "passive", size)) {
> > + if (default_pstate_driver) {
> > + if (default_pstate_driver == &amd_pstate_driver)
> > + return 0;
> > + cpufreq_unregister_driver(default_pstate_driver);
> > + }
> > + default_pstate_driver = &amd_pstate_driver;
> > + return cpufreq_register_driver(default_pstate_driver);
> > + }
> > +
> > + return -EINVAL;
> > +}
> > +
> > +static ssize_t show_status(struct kobject *kobj,
> > + struct kobj_attribute *attr, char *buf) {
> > + ssize_t ret;
> > +
> > + mutex_lock(&amd_pstate_driver_lock);
> > + ret = amd_pstate_show_status(buf);
> > + mutex_unlock(&amd_pstate_driver_lock);
> > +
> > + return ret;
> > +}
> > +
> > +static ssize_t store_status(struct kobject *a, struct kobj_attribute *b,
> > + const char *buf, size_t count) {
> > + char *p = memchr(buf, '\n', count);
> > + int ret;
> > +
> > + mutex_lock(&amd_pstate_driver_lock);
> > + ret = amd_pstate_update_status(buf, p ? p - buf : count);
> > + mutex_unlock(&amd_pstate_driver_lock);
> > +
> > + return ret < 0 ? ret : count;
> > +}
> > +
> > cpufreq_freq_attr_ro(amd_pstate_max_freq);
> > cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> >
> > @@ -809,6 +908,7 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> > cpufreq_freq_attr_rw(energy_performance_preference);
> > cpufreq_freq_attr_ro(energy_performance_available_preferences);
> > define_one_global_rw(cppc_dynamic_boost);
> > +define_one_global_rw(status);
> >
> > static struct freq_attr *amd_pstate_attr[] = {
> > &amd_pstate_max_freq,
> > @@ -828,6 +928,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
> >
> > static struct attribute *pstate_global_attributes[] = {
> > &cppc_dynamic_boost.attr,
> > + &status.attr,
> > NULL
> > };
> >
> > --
> > 2.34.1
> >

2022-12-19 12:01:22

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend and resume callbacks

[Public]



> -----Original Message-----
> From: Limonciello, Mario <[email protected]>
> Sent: Monday, December 12, 2022 11:15 PM
> To: Huang, Ray <[email protected]>; Yuan, Perry <[email protected]>
> Cc: [email protected]; [email protected]; Sharma, Deepak
> <[email protected]>; Fontenot, Nathan
> <[email protected]>; Deucher, Alexander
> <[email protected]>; Huang, Shimmer
> <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> [email protected]; [email protected]
> Subject: RE: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend and
> resume callbacks
>
> [Public]
>
>
>
> > -----Original Message-----
> > From: Huang, Ray <[email protected]>
> > Sent: Monday, December 12, 2022 03:05
> > To: Yuan, Perry <[email protected]>
> > Cc: [email protected]; Limonciello, Mario
> > <[email protected]>; [email protected]; Sharma,
> Deepak
> > <[email protected]>; Fontenot, Nathan
> <[email protected]>;
> > Deucher, Alexander <[email protected]>; Huang, Shimmer
> > <[email protected]>; Du, Xiaojian <[email protected]>;
> Meng, Li
> > (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> > [email protected]; [email protected]
> > Subject: Re: [PATCH v7 07/13] cpufreq: amd-pstate: implement suspend
> > and resume callbacks
> >
> > On Thu, Dec 08, 2022 at 07:18:46PM +0800, Yuan, Perry wrote:
> > > From: Perry Yuan <[email protected]>
> > >
> > > add suspend and resume support for the AMD processors by
> > amd_pstate_epp
> > > driver instance.
> > >
> > > When the CPPC is suspended, EPP driver will set EPP profile to 'power'
> > > profile and set max/min perf to lowest perf value.
> > > When resume happens, it will restore the MSR registers with previous
> > > cached value.
> > >
> > > Signed-off-by: Perry Yuan <[email protected]>
> > > ---
> > > drivers/cpufreq/amd-pstate.c | 40
> > ++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 40 insertions(+)
> > >
> > > diff --git a/drivers/cpufreq/amd-pstate.c
> > > b/drivers/cpufreq/amd-pstate.c index 412accab7bda..ea9255bdc9ac
> > > 100644
> > > --- a/drivers/cpufreq/amd-pstate.c
> > > +++ b/drivers/cpufreq/amd-pstate.c
> > > @@ -1273,6 +1273,44 @@ static int amd_pstate_epp_cpu_offline(struct
> > cpufreq_policy *policy)
> > > return amd_pstate_cpu_offline(policy); }
> > >
> > > +static int amd_pstate_epp_suspend(struct cpufreq_policy *policy) {
> > > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > > + int ret;
> > > +
> > > + /* avoid suspending when EPP is not enabled */
> > > + if (!cppc_active)
> > > + return 0;
> > > +
> > > + /* set this flag to avoid setting core offline*/
> > > + cpudata->suspended = true;
> > > +
> > > + /* disable CPPC in lowlevel firmware */
> > > + ret = amd_pstate_enable(false);
> > > + if (ret)
> > > + pr_err("failed to suspend, return %d\n", ret);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int amd_pstate_epp_resume(struct cpufreq_policy *policy) {
> > > + struct amd_cpudata *cpudata = all_cpu_data[policy->cpu];
> > > +
> > > + if (cpudata->suspended) {
> > > + mutex_lock(&amd_pstate_limits_lock);
> > > +
> > > + /* enable amd pstate from suspend state*/
> > > + amd_pstate_epp_reenable(cpudata);
> >
> > The same comment, could you please double confirm whether the
> > perfo_ctrls registers will be cleared while we execute a round of S3
> > suspend/resume?

PERF_CTL register will be always 0 if we do not load acpi_cpufreq driver after kernel booting.
So suspend/resume will not change the PERF_CTL MSR as well.


>
> And if they are; identify what is clearing them. It might not be the same for
> s0i3 and S3.

I checked the PERF_CTRL with suspend/resume, offline/online test.
The MSRs of all cores are not changed while amd-pstate driver loaded instead of acpi-cpufreq.


>
> >
> > > +
> > > + mutex_unlock(&amd_pstate_limits_lock);
> > > +
> > > + cpudata->suspended = false;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> > > struct cpufreq_policy_data *policy)
> { @@ -1309,6 +1347,8
> > > @@ static struct cpufreq_driver
> > amd_pstate_epp_driver = {
> > > .update_limits = amd_pstate_epp_update_limits,
> > > .offline = amd_pstate_epp_cpu_offline,
> > > .online = amd_pstate_epp_cpu_online,
> > > + .suspend = amd_pstate_epp_suspend,
> > > + .resume = amd_pstate_epp_resume,
> > > .name = "amd_pstate_epp",
> > > .attr = amd_pstate_epp_attr,
> > > };
> > > --
> > > 2.34.1
> > >

2022-12-23 07:47:55

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors

On Mon, Dec 19, 2022 at 06:21:14PM +0800, Yuan, Perry wrote:
> [AMD Official Use Only - General]
>
> HI ray.
>
> > -----Original Message-----
> > From: Huang, Ray <[email protected]>
> > Sent: Monday, December 12, 2022 4:47 PM
> > To: Yuan, Perry <[email protected]>
> > Cc: [email protected]; Limonciello, Mario
> > <[email protected]>; [email protected]; Sharma, Deepak
> > <[email protected]>; Fontenot, Nathan
> > <[email protected]>; Deucher, Alexander
> > <[email protected]>; Huang, Shimmer
> > <[email protected]>; Du, Xiaojian <[email protected]>; Meng,
> > Li (Jassmine) <[email protected]>; Karny, Wyes <[email protected]>;
> > [email protected]; [email protected]
> > Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP
> > support for the AMD processors
> >
> > On Thu, Dec 08, 2022 at 07:18:44PM +0800, Yuan, Perry wrote:
> > > From: Perry Yuan <[email protected]>
> > >
> > > Add EPP driver support for AMD SoCs which support a dedicated MSR for
> > > CPPC. EPP is used by the DPM controller to configure the frequency
> > > that a core operates at during short periods of activity.
> > >
> > > The SoC EPP targets are configured on a scale from 0 to 255 where 0
> > > represents maximum performance and 255 represents maximum efficiency.
> > >
> > > The amd-pstate driver exports profile string names to userspace that
> > > are tied to specific EPP values.
> > >
> > > The balance_performance string (0x80) provides the best balance for
> > > efficiency versus power on most systems, but users can choose other
> > > strings to meet their needs as well.
> > >
> > > $ cat
> > >
> > /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_p
> > > references default performance balance_performance balance_power
> > power
> > >
> > > $ cat
> > >
> > /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference
> > > balance_performance
> > >
> > > To enable the driver,it needs to add `amd_pstate=active` to kernel
> > > command line and kernel will load the active mode epp driver
> > >
> > > Signed-off-by: Perry Yuan <[email protected]>
> > > ---
> > > drivers/cpufreq/amd-pstate.c | 631
> > ++++++++++++++++++++++++++++++++++-
> > > include/linux/amd-pstate.h | 35 ++
> > > 2 files changed, 660 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/cpufreq/amd-pstate.c
> > > b/drivers/cpufreq/amd-pstate.c index c17bd845f5fc..0a521be1be8a
> > 100644
> > > --- a/drivers/cpufreq/amd-pstate.c
> > > +++ b/drivers/cpufreq/amd-pstate.c
> > > @@ -37,6 +37,7 @@
> > > #include <linux/uaccess.h>
> > > #include <linux/static_call.h>
> > > #include <linux/amd-pstate.h>
> > > +#include <linux/cpufreq_common.h>
> > >
> > > #include <acpi/processor.h>
> > > #include <acpi/cppc_acpi.h>
> > > @@ -59,9 +60,125 @@
> > > * we disable it by default to go acpi-cpufreq on these processors and add
> > a
> > > * module parameter to be able to enable it manually for debugging.
> > > */
> > > -static struct cpufreq_driver amd_pstate_driver;
> > > +static bool cppc_active;
> > > static int cppc_load __initdata;
> > >
> > > +static struct cpufreq_driver *default_pstate_driver; static struct
> > > +amd_cpudata **all_cpu_data; static struct amd_pstate_params
> > > +global_params;
> > > +
> > > +static DEFINE_MUTEX(amd_pstate_limits_lock);
> > > +static DEFINE_MUTEX(amd_pstate_driver_lock);
> > > +
> > > +static bool cppc_boost __read_mostly;
> > > +
> > > +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64
> > > +cppc_req_cached) {
> > > + s16 epp;
> > > + struct cppc_perf_caps perf_caps;
> > > + int ret;
> > > +
> > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > + if (!cppc_req_cached) {
> > > + epp = rdmsrl_on_cpu(cpudata->cpu,
> > MSR_AMD_CPPC_REQ,
> > > + &cppc_req_cached);
> > > + if (epp)
> > > + return epp;
> > > + }
> > > + epp = (cppc_req_cached >> 24) & 0xFF;
> > > + } else {
> > > + ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
> > > + if (ret < 0) {
> > > + pr_debug("Could not retrieve energy perf value
> > (%d)\n", ret);
> > > + return -EIO;
> > > + }
> > > + epp = (s16) perf_caps.energy_perf;
> >
> > It should align the static_call structure to implement the function. Please
> > refer amd_pstate_init_perf. I think it can even re-use the init_perf to get the
> > epp cap value.
>
> The amd_pstate_init_perf() is only called when driver registering,
> However the amd_pstate_get_epp() will be called frequently to update the EPP MSR value and EPP Min/Max limitation.
> So I suggest to keep using the amd_pstate_get_epp() to update EPP related values as it is.
>
> Static_call method can do that for MSR and Shared memory API call, but amd_pstate_get_epp() is simple enough for now. No need to make this
> Epp update function also using static call method.

Using static calls are to avoid retpoline, not to make them simple.

https://thenewstack.io/linux-kernel-5-10-introduces-static-calls-to-prevent-speculative-execution-attacks/

> Considering the tight schedule and merge window, I would like to keep the current way to update EPP, Otherwise the Customer release schedule will be delayed.
>

Mailing list is not the place to talk about internal schedule.

> Perry.
>
>
> >
> > > + }
> > > +
> > > + return epp;
> > > +}
> > > +
> > > +static int amd_pstate_get_energy_pref_index(struct amd_cpudata
> > > +*cpudata) {
> > > + s16 epp;
> > > + int index = -EINVAL;
> > > +
> > > + epp = amd_pstate_get_epp(cpudata, 0);
> > > + if (epp < 0)
> > > + return epp;
> > > +
> > > + switch (epp) {
> > > + case HWP_EPP_PERFORMANCE:
> > > + index = EPP_INDEX_PERFORMANCE;
> > > + break;
> > > + case HWP_EPP_BALANCE_PERFORMANCE:
> > > + index = EPP_INDEX_BALANCE_PERFORMANCE;
> > > + break;
> > > + case HWP_EPP_BALANCE_POWERSAVE:
> > > + index = EPP_INDEX_BALANCE_POWERSAVE;
> > > + break;
> > > + case HWP_EPP_POWERSAVE:
> > > + index = EPP_INDEX_POWERSAVE;
> > > + break;
> > > + default:
> > > + break;
> > > + }
> > > +
> > > + return index;
> > > +}
> > > +
> > > +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp) {
> > > + int ret;
> > > + struct cppc_perf_ctrls perf_ctrls;
> > > +
> > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > + u64 value = READ_ONCE(cpudata->cppc_req_cached);
> > > +
> > > + value &= ~GENMASK_ULL(31, 24);
> > > + value |= (u64)epp << 24;
> > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > +
> > > + ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > value);
> > > + if (!ret)
> > > + cpudata->epp_cached = epp;
> > > + } else {
> > > + perf_ctrls.energy_perf = epp;
> > > + ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
> >
> > Since the energy_perf is one of members of struct cppc_perf_ctrls, could we
> > use cppc_set_perf as well?
>
> cppc_set_epp_perf() can handle the EPP value update correctly,
> cppc_set_perf() is used for desired perf updating with a very high updating rate for governor such as schedutil governor.
> And it has two Phase, Phase I and Phase II, implement the EPP value update in this function, I have concern that we will meet some potential
> Firmware or performance drop risk.

I am fine to use the separated cppc_set_epp_perf.

> The release schedule and merge window is closing for v6.2 and this change request happened after six patch review series.
> I afraid of that we have no enough time to mitigate the risk for this new code change.
> We can consider to continue optimize this in the following patch.
>
> Perry.
>
> >
> > > + if (ret) {
> > > + pr_debug("failed to set energy perf value (%d)\n",
> > ret);
> > > + return ret;
> > > + }
> > > + cpudata->epp_cached = epp;
> > > + }
> > > +
> > > + return ret;
> > > +}
> >
> > The same with above, the helpers for different cppc types of processors such
> > as MSR or share memory should be implemented by static_call.
> >
> > > +
> > > +static int amd_pstate_set_energy_pref_index(struct amd_cpudata
> > *cpudata,
> > > + int pref_index)
> > > +{
> > > + int epp = -EINVAL;
> > > + int ret;
> > > +
> > > + if (!pref_index) {
> > > + pr_debug("EPP pref_index is invalid\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (epp == -EINVAL)
> > > + epp = epp_values[pref_index];
> > > +
> > > + if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > {
> > > + pr_debug("EPP cannot be set under performance policy\n");
> > > + return -EBUSY;
> > > + }
> > > +
> > > + ret = amd_pstate_set_epp(cpudata, epp);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > static inline int pstate_enable(bool enable) {
> > > return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); @@ -70,11
> > +187,21
> > > @@ static inline int pstate_enable(bool enable) static int
> > > cppc_enable(bool enable) {
> > > int cpu, ret = 0;
> > > + struct cppc_perf_ctrls perf_ctrls;
> > >
> > > for_each_present_cpu(cpu) {
> > > ret = cppc_set_enable(cpu, enable);
> > > if (ret)
> > > return ret;
> > > +
> > > + /* Enable autonomous mode for EPP */
> > > + if (!cppc_active) {
> > > + /* Set desired perf as zero to allow EPP firmware
> > control */
> > > + perf_ctrls.desired_perf = 0;
> > > + ret = cppc_set_perf(cpu, &perf_ctrls);
> > > + if (ret)
> > > + return ret;
> > > + }
> > > }
> > >
> > > return ret;
> > > @@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct
> > amd_cpudata *cpudata)
> > > return;
> > >
> > > cpudata->boost_supported = true;
> > > - amd_pstate_driver.boost_enabled = true;
> > > + default_pstate_driver->boost_enabled = true;
> > > }
> > >
> > > static void amd_perf_ctl_reset(unsigned int cpu) @@ -592,10 +719,61
> > > @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy
> > *policy,
> > > return sprintf(&buf[0], "%u\n", perf); }
> > >
> > > +static ssize_t show_energy_performance_available_preferences(
> > > + struct cpufreq_policy *policy, char *buf) {
> > > + int i = 0;
> > > + int offset = 0;
> > > +
> > > + while (energy_perf_strings[i] != NULL)
> > > + offset += sysfs_emit_at(buf, offset, "%s ",
> > > +energy_perf_strings[i++]);
> > > +
> > > + sysfs_emit_at(buf, offset, "\n");
> > > +
> > > + return offset;
> > > +}
> > > +
> > > +static ssize_t store_energy_performance_preference(
> > > + struct cpufreq_policy *policy, const char *buf, size_t count) {
> > > + struct amd_cpudata *cpudata = policy->driver_data;
> > > + char str_preference[21];
> > > + ssize_t ret;
> > > +
> > > + ret = sscanf(buf, "%20s", str_preference);
> > > + if (ret != 1)
> > > + return -EINVAL;
> > > +
> > > + ret = match_string(energy_perf_strings, -1, str_preference);
> > > + if (ret < 0)
> > > + return -EINVAL;
> > > +
> > > + mutex_lock(&amd_pstate_limits_lock);
> > > + ret = amd_pstate_set_energy_pref_index(cpudata, ret);
> > > + mutex_unlock(&amd_pstate_limits_lock);
> > > +
> > > + return ret ?: count;
> > > +}
> > > +
> > > +static ssize_t show_energy_performance_preference(
> > > + struct cpufreq_policy *policy, char *buf) {
> > > + struct amd_cpudata *cpudata = policy->driver_data;
> > > + int preference;
> > > +
> > > + preference = amd_pstate_get_energy_pref_index(cpudata);
> > > + if (preference < 0)
> > > + return preference;
> > > +
> > > + return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]); }
> > > +
> > > cpufreq_freq_attr_ro(amd_pstate_max_freq);
> > > cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> > >
> > > cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> > > +cpufreq_freq_attr_rw(energy_performance_preference);
> > > +cpufreq_freq_attr_ro(energy_performance_available_preferences);
> > >
> > > static struct freq_attr *amd_pstate_attr[] = {
> > > &amd_pstate_max_freq,
> > > @@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
> > > NULL,
> > > };
> > >
> > > +static struct freq_attr *amd_pstate_epp_attr[] = {
> > > + &amd_pstate_max_freq,
> > > + &amd_pstate_lowest_nonlinear_freq,
> > > + &amd_pstate_highest_perf,
> > > + &energy_performance_preference,
> > > + &energy_performance_available_preferences,
> > > + NULL,
> > > +};
> > > +
> > > +static inline void update_boost_state(void) {
> > > + u64 misc_en;
> > > + struct amd_cpudata *cpudata;
> > > +
> > > + cpudata = all_cpu_data[0];
> > > + rdmsrl(MSR_K7_HWCR, misc_en);
> > > + global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);
> >
> > I won't need introduce the additional cppc_boost_disabled here. The
> > cpufreq_driver->boost_enabled and cpudata->boost_supported can manage
> > this function.
>
> The cppc_boost_disabled is used to mark if the PMFW Core boost disabled,
> If some other driver for example thermal, performance limiting driver disable the core boost.

I didn't see any other driver to control MSR_K7_HWCR_CPB_DIS except
acpi-cpufreq.

> We need to update the flag to let driver know the boost is disabled.
>
> * boost_supported is used to change CORE freq boost state.
> * EPP driver did not use the cpufreq core boost sysfs node. So the boost_enabled is not used here.

I would like to clarify again the core boost state is for legacy ACPI
P-State, and it's configured by MSR_K7_HWCR. The CPPC is using the highest
perf to map the boost frequency. However, here, it's because of some
hardware/firmware issues or quirks that the legacy boost setting still
impacts the target frequency. So cppc_boost will confuse the user for the
functionalities between CPPC and ACPI P-State.

Can we enable core_boost configuration by default if using amd-pstate?

>
> >
> > I believe it should be the firmware issue that the legacy ACPI Boost state will
> > impact the frequency of CPPC. Could you move this handling into the
> > cpufreq_driver->set_boost callback function to enable boost state by default.
> >
> > > +}
> > > +
> > > +static bool amd_pstate_acpi_pm_profile_server(void)
> > > +{
> > > + if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
> > > + acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
> > > + return true;
> > > +
> > > + return false;
> > > +}
> > > +
> > > +static int amd_pstate_init_cpu(unsigned int cpunum) {
> > > + struct amd_cpudata *cpudata;
> > > +
> > > + cpudata = all_cpu_data[cpunum];
> > > + if (!cpudata) {
> > > + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> > > + if (!cpudata)
> > > + return -ENOMEM;
> > > + WRITE_ONCE(all_cpu_data[cpunum], cpudata);
> > > +
> > > + cpudata->cpu = cpunum;
> > > +
> > > + if (cppc_active) {
> >
> > The cppc_active is a bit confused here, if we run into amd-pstate driver, the
> > cppc should be active. I know you want to indicate the different driver mode
> > you are running. Please use an enumeration type to mark it different mode
> > such as PASSIVE_MODE, ACTIVE_MODE, and GUIDED_MODE (Wyes
> > proposed).
>
> Aligned with Wyse, I add one new patch to support enumerated working mode in V8
>
>
> >
> > > + if (amd_pstate_acpi_pm_profile_server())
> > > + cppc_boost = true;
> > > + }
> > > +
> > > + }
> > > + cpudata->epp_powersave = -EINVAL;
> > > + cpudata->epp_policy = 0;
> > > + pr_debug("controlling: cpu %d\n", cpunum);
> > > + return 0;
> > > +}
> > > +
> > > +static int __amd_pstate_cpu_init(struct cpufreq_policy *policy) {
> > > + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> > > + struct amd_cpudata *cpudata;
> > > + struct device *dev;
> > > + int rc;
> > > + u64 value;
> > > +
> > > + rc = amd_pstate_init_cpu(policy->cpu);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + cpudata = all_cpu_data[policy->cpu];
> > > +
> > > + dev = get_cpu_device(policy->cpu);
> > > + if (!dev)
> > > + goto free_cpudata1;
> > > +
> > > + rc = amd_pstate_init_perf(cpudata);
> > > + if (rc)
> > > + goto free_cpudata1;
> > > +
> > > + min_freq = amd_get_min_freq(cpudata);
> > > + max_freq = amd_get_max_freq(cpudata);
> > > + nominal_freq = amd_get_nominal_freq(cpudata);
> > > + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> > > + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> > > + dev_err(dev, "min_freq(%d) or max_freq(%d) value is
> > incorrect\n",
> > > + min_freq, max_freq);
> > > + ret = -EINVAL;
> > > + goto free_cpudata1;
> > > + }
> > > +
> > > + policy->min = min_freq;
> > > + policy->max = max_freq;
> > > +
> > > + policy->cpuinfo.min_freq = min_freq;
> > > + policy->cpuinfo.max_freq = max_freq;
> > > + /* It will be updated by governor */
> > > + policy->cur = policy->cpuinfo.min_freq;
> > > +
> > > + /* Initial processor data capability frequencies */
> > > + cpudata->max_freq = max_freq;
> > > + cpudata->min_freq = min_freq;
> > > + cpudata->nominal_freq = nominal_freq;
> > > + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> > > +
> > > + policy->driver_data = cpudata;
> > > +
> > > + update_boost_state();
> > > + cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
> > > +
> > > + policy->min = policy->cpuinfo.min_freq;
> > > + policy->max = policy->cpuinfo.max_freq;
> > > +
> > > + if (boot_cpu_has(X86_FEATURE_CPPC))
> > > + policy->fast_switch_possible = true;
> >
> > Please move this line into below if-case.
>
>
> Fixed In V8
>
> >
> > > +
> > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > &value);
> > > + if (ret)
> > > + return ret;
> > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > +
> > > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> > &value);
> > > + if (ret)
> > > + return ret;
> > > + WRITE_ONCE(cpudata->cppc_cap1_cached, value);
> > > + }
> > > + amd_pstate_boost_init(cpudata);
> > > +
> > > + return 0;
> > > +
> > > +free_cpudata1:
> > > + kfree(cpudata);
> > > + return ret;
> > > +}
> > > +
> > > +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) {
> > > + int ret;
> > > +
> > > + ret = __amd_pstate_cpu_init(policy);
> >
> > I don't see any reason that we need to define a __amd_pstate_cpu_init()
> > here. Intel P-State driver's __intel_pstate_cpu_init() is used both on
> > intel_pstate_cpu_init and intel_cpufreq_cpu_init.
>
> Fixed in V8.
>
> >
> > > + if (ret)
> > > + return ret;
> > > + /*
> > > + * Set the policy to powersave to provide a valid fallback value in
> > case
> > > + * the default cpufreq governor is neither powersave nor
> > performance.
> > > + */
> > > + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) {
> > > + pr_debug("CPU %d exiting\n", policy->cpu);
> > > + policy->fast_switch_possible = false;
> > > + return 0;
> > > +}
> > > +
> > > +static void amd_pstate_update_max_freq(unsigned int cpu)
> >
> > Why do you name this function "update max frequency"?
> >
> > We won't have the differnt cpudata->pstate.max_freq and
> > cpudata->pstate.turbo_freq on Intel P-State driver.
> >
> > I think in fact we don't update anything here.
>
> When core frequency was disabled, the function will update the frequency limits.
> Currently the boost sysfs is not added, so the max freq is not changed.
> Could we keep the code for the coming patch to add the sysfs node for boost control ?
> It has no harm to the EPP driver.

Again, the boost frequency and state is for ACPI P-State, we cannot make it
confused in the CPPC.

I didn't see where amd-pstate updates frequency max/min (limit), why we
keep the redundant codes here?

>
> >
> > > +{
> > > + struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);
> >
> > struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> >
> > > +
> > > + if (!policy)
> > > + return;
> > > +
> > > + refresh_frequency_limits(policy);
> > > + cpufreq_cpu_put(policy);
> > > +}
> > > +
> > > +static void amd_pstate_epp_update_limits(unsigned int cpu) {
> > > + mutex_lock(&amd_pstate_driver_lock);
> > > + update_boost_state();
> > > + if (global_params.cppc_boost_disabled) {
> > > + for_each_possible_cpu(cpu)
> > > + amd_pstate_update_max_freq(cpu);
> >
> > This should do nothing in the amd-pstate.
>
> Currently the boost sysfs is not added, so the max freq is not changed.
> Could we keep the code for the coming patch to add the sysfs node for boost control ?
> It has no harm to the EPP driver.
>
> >
> > > + } else {
> > > + cpufreq_update_policy(cpu);
> > > + }
> > > + mutex_unlock(&amd_pstate_driver_lock);
> > > +}
> > > +
> > > +static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
> > > +
> > > +static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata) {
> > > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > + u32 max_limit = (hwp_req & 0xff);
> > > + u32 min_limit = (hwp_req & 0xff00) >> 8;
> >
> > We can use cpudata->max_perf and cpudata->min_perf directly.
>
> Iowait boost code removed from V8.
>
> >
> > > + u32 boost_level1;
> > > +
> > > + /* If max and min are equal or already at max, nothing to boost */
> >
> > I believe this is the only case that max_perf == min_perf, not at max.
>
> Iowait boost code removed from V8.
>
> >
> > > + if (max_limit == min_limit)
> > > + return;
> > > +
> > > + /* Set boost max and min to initial value */
> > > + if (!cpudata->cppc_boost_min)
> > > + cpudata->cppc_boost_min = min_limit;
> > > +
> > > + boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) +
> > min_limit) >> 1);
> > > +
> > > + if (cpudata->cppc_boost_min < boost_level1)
> > > + cpudata->cppc_boost_min = boost_level1;
> > > + else if (cpudata->cppc_boost_min <
> > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > + cpudata->cppc_boost_min =
> > AMD_CPPC_NOMINAL_PERF(hwp_cap);
> > > + else if (cpudata->cppc_boost_min ==
> > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > + cpudata->cppc_boost_min = max_limit;
> > > + else
> > > + return;
> >
> > Could you please elaborate the reason that you separate the min_perf
> > (cppc_boost_min) you would like to update into cppc_req register as these
> > different cases? Why we pick up these cases such as (min + nominal)/2, and
> > around nominal? What's the help to optimize the final result? - I am thinking
> > the autonomous mode is handled by SMU firmware, we need to provide
> > some data that let us know it influences the final result.
> >
>
> Iowait boost code removed from V8.
>
> > > +
> > > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > + hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
> > > + wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);
> >
> > Do we need an update for share memory processors? In other words, epp is
> > also supported on share memory processors. - again, we should use static
> > call to handle the msr and cppc_acpi stuff.
> >
> > > + cpudata->last_update = cpudata->sample.time; }
> > > +
> > > +static inline void amd_pstate_boost_down(struct amd_cpudata *cpudata)
> > > +{
> > > + bool expired;
> > > +
> > > + if (cpudata->cppc_boost_min) {
> > > + expired = time_after64(cpudata->sample.time, cpudata-
> > >last_update +
> > > + cppc_boost_hold_time_ns);
> > > +
> > > + if (expired) {
> > > + wrmsrl(MSR_AMD_CPPC_REQ, cpudata-
> > >cppc_req_cached);
> > > + cpudata->cppc_boost_min = 0;
> > > + }
> > > + }
> > > +
> > > + cpudata->last_update = cpudata->sample.time; }
> > > +
> > > +static inline void amd_pstate_boost_update_util(struct amd_cpudata
> > *cpudata,
> > > + u64 time)
> > > +{
> > > + cpudata->sample.time = time;
> > > + if (smp_processor_id() != cpudata->cpu)
> > > + return;
> > > +
> > > + if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
> > > + bool do_io = false;
> > > +
> > > + cpudata->sched_flags = 0;
> > > + /*
> > > + * Set iowait_boost flag and update time. Since IO WAIT flag
> > > + * is set all the time, we can't just conclude that there is
> > > + * some IO bound activity is scheduled on this CPU with just
> > > + * one occurrence. If we receive at least two in two
> > > + * consecutive ticks, then we treat as boost candidate.
> > > + * This is leveraged from Intel Pstate driver.
> >
> > I would like to know whether we can hit this case as well? If we can find or
> > create a use case to hit it in our platforms, I am fine to add it our driver as
> > well. If not, I don't suggest it we add them at this moment. I hope we have
> > verified each code path once we add them into the driver.
>
> Sure, no problem.
> Iowait boost code removed from V8.
>
>
> >
> > > + */
> > > + if (time_before64(time, cpudata->last_io_update + 2 *
> > TICK_NSEC))
> > > + do_io = true;
> > > +
> > > + cpudata->last_io_update = time;
> > > +
> > > + if (do_io)
> > > + amd_pstate_boost_up(cpudata);
> > > +
> > > + } else {
> > > + amd_pstate_boost_down(cpudata);
> > > + }
> > > +}
> > > +
> > > +static inline void amd_pstate_cppc_update_hook(struct update_util_data
> > *data,
> > > + u64 time, unsigned int flags)
> > > +{
> > > + struct amd_cpudata *cpudata = container_of(data,
> > > + struct amd_cpudata, update_util);
> > > +
> > > + cpudata->sched_flags |= flags;
> > > +
> > > + if (smp_processor_id() == cpudata->cpu)
> > > + amd_pstate_boost_update_util(cpudata, time); }
> > > +
> > > +static void amd_pstate_clear_update_util_hook(unsigned int cpu) {
> > > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > +
> > > + if (!cpudata->update_util_set)
> > > + return;
> > > +
> > > + cpufreq_remove_update_util_hook(cpu);
> > > + cpudata->update_util_set = false;
> > > + synchronize_rcu();
> > > +}
> > > +
> > > +static void amd_pstate_set_update_util_hook(unsigned int cpu_num) {
> > > + struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
> > > +
> > > + if (!cppc_boost) {
> > > + if (cpudata->update_util_set)
> > > + amd_pstate_clear_update_util_hook(cpudata->cpu);
> > > + return;
> > > + }
> > > +
> > > + if (cpudata->update_util_set)
> > > + return;
> > > +
> > > + cpudata->sample.time = 0;
> > > + cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
> > > +
> > amd_pstate_cppc_update_hook);
> > > + cpudata->update_util_set = true;
> > > +}
> > > +
> > > +static void amd_pstate_epp_init(unsigned int cpu) {
> > > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > + u32 max_perf, min_perf;
> > > + u64 value;
> > > + s16 epp;
> > > +
> > > + max_perf = READ_ONCE(cpudata->highest_perf);
> > > + min_perf = READ_ONCE(cpudata->lowest_perf);
> > > +
> > > + value = READ_ONCE(cpudata->cppc_req_cached);
> > > +
> > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > > + min_perf = max_perf;
> > > +
> > > + /* Initial min/max values for CPPC Performance Controls Register */
> > > + value &= ~AMD_CPPC_MIN_PERF(~0L);
> > > + value |= AMD_CPPC_MIN_PERF(min_perf);
> > > +
> > > + value &= ~AMD_CPPC_MAX_PERF(~0L);
> > > + value |= AMD_CPPC_MAX_PERF(max_perf);
> > > +
> > > + /* CPPC EPP feature require to set zero to the desire perf bit */
> > > + value &= ~AMD_CPPC_DES_PERF(~0L);
> > > + value |= AMD_CPPC_DES_PERF(0);
> > > +
> > > + if (cpudata->epp_policy == cpudata->policy)
> > > + goto skip_epp;
> > > +
> > > + cpudata->epp_policy = cpudata->policy;
> > > +
> > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > + epp = amd_pstate_get_epp(cpudata, value);
> > > + cpudata->epp_powersave = epp;
> >
> > I didn't see where we should have epp_powersave here. Only initial this, but
> > it won't be used anywhere.
>
> epp_powersave var was removed from V8.
>
>
> >
> > > + if (epp < 0)
> > > + goto skip_epp;
> > > + /* force the epp value to be zero for performance policy */
> > > + epp = 0;
> > > + } else {
> > > + if (cpudata->epp_powersave < 0)
> > > + goto skip_epp;
> > > + /* Get BIOS pre-defined epp value */
> > > + epp = amd_pstate_get_epp(cpudata, value);
> > > + if (epp)
> > > + goto skip_epp;
> > > + epp = cpudata->epp_powersave;
> > > + }
> > > + /* Set initial EPP value */
> > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > + value &= ~GENMASK_ULL(31, 24);
> > > + value |= (u64)epp << 24;
> > > + }
> > > +
> > > +skip_epp:
> > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > + amd_pstate_set_epp(cpudata, epp);
> > > +}
> > > +
> > > +static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata) {
> > > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > + u32 max_limit = (hwp_cap >> 24) & 0xff;
> > > +
> > > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > + hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
> > > + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req); }
> > > +
> > > +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) {
> > > + struct amd_cpudata *cpudata;
> > > +
> > > + if (!policy->cpuinfo.max_freq)
> > > + return -ENODEV;
> > > +
> > > + pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
> > > + policy->cpuinfo.max_freq, policy->max);
> > > +
> > > + cpudata = all_cpu_data[policy->cpu];
> > > + cpudata->policy = policy->policy;
> > > +
> > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > + mutex_lock(&amd_pstate_limits_lock);
> > > +
> > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > + amd_pstate_clear_update_util_hook(policy->cpu);
> > > + amd_pstate_set_max_limits(cpudata);
> > > + } else {
> > > + amd_pstate_set_update_util_hook(policy->cpu);
> > > + }
> > > +
> > > + mutex_unlock(&amd_pstate_limits_lock);
> > > + }
> > > + amd_pstate_epp_init(policy->cpu);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static void amd_pstate_verify_cpu_policy(struct amd_cpudata *cpudata,
> > > + struct cpufreq_policy_data *policy)
> > {
> > > + update_boost_state();
> > > + cpufreq_verify_within_cpu_limits(policy);
> > > +}
> > > +
> > > +static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data
> > > +*policy) {
> > > + amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
> > > + pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy-
> > >min);
> > > + return 0;
> > > +}
> >
> > amd_pstate_verify_cpu_policy and amd_pstate_epp_verify_policy can be
> > squeezed in one function.
>
> Fixed in V8.
>
> >
> > > +
> > > static struct cpufreq_driver amd_pstate_driver = {
> > > .flags = CPUFREQ_CONST_LOOPS |
> > CPUFREQ_NEED_UPDATE_LIMITS,
> > > .verify = amd_pstate_verify,
> > > @@ -617,8 +1213,20 @@ static struct cpufreq_driver amd_pstate_driver =
> > {
> > > .attr = amd_pstate_attr,
> > > };
> > >
> > > +static struct cpufreq_driver amd_pstate_epp_driver = {
> > > + .flags = CPUFREQ_CONST_LOOPS,
> > > + .verify = amd_pstate_epp_verify_policy,
> > > + .setpolicy = amd_pstate_epp_set_policy,
> > > + .init = amd_pstate_epp_cpu_init,
> > > + .exit = amd_pstate_epp_cpu_exit,
> > > + .update_limits = amd_pstate_epp_update_limits,
> > > + .name = "amd_pstate_epp",
> > > + .attr = amd_pstate_epp_attr,
> > > +};
> > > +
> > > static int __init amd_pstate_init(void) {
> > > + static struct amd_cpudata **cpudata;
> > > int ret;
> > >
> > > if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) @@ -645,7
> > +1253,8 @@
> > > static int __init amd_pstate_init(void)
> > > /* capability check */
> > > if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > pr_debug("AMD CPPC MSR based functionality is
> > supported\n");
> > > - amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> > > + if (!cppc_active)
> > > + default_pstate_driver->adjust_perf =
> > amd_pstate_adjust_perf;
> > > } else {
> > > pr_debug("AMD CPPC shared memory based functionality is
> > supported\n");
> > > static_call_update(amd_pstate_enable, cppc_enable); @@ -
> > 653,6
> > > +1262,10 @@ static int __init amd_pstate_init(void)
> > > static_call_update(amd_pstate_update_perf,
> > cppc_update_perf);
> > > }
> > >
> > > + cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
> > > + if (!cpudata)
> > > + return -ENOMEM;
> > > + WRITE_ONCE(all_cpu_data, cpudata);
> >
> > Why we cannot use cpufreq_policy->driver_data to store the cpudata? I
> > believe the cpudata is per-CPU can be easily get from private data.
>
> cpufreq_policy->driver_data can do that, but global data can have better cached hit rate especially
> for the server cluster.
> So I would prefer to use the global cpudata to store each CPU data struct.
> Could we keep it for EPP driver ?

I didn't get your meaning what's "better cached hit rate especially for the
server cluster"? In my view, there is just different design no performance
enhancement. Since we already implemented in cpufreq_policy->driver_data, I
didn't see any visiblely improvement to change into this way.

Thanks,
Ray

2022-12-25 16:55:24

by Yuan, Perry

[permalink] [raw]
Subject: RE: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors

[AMD Official Use Only - General]



> -----Original Message-----
> From: Huang, Ray <[email protected]>
> Sent: Friday, December 23, 2022 3:16 PM
> To: Yuan, Perry <[email protected]>
> Cc: Huang, Shimmer <[email protected]>; Limonciello, Mario
> <[email protected]>; Liang, Richard qi
> <[email protected]>; [email protected];
> [email protected]; Sharma, Deepak <[email protected]>;
> Fontenot, Nathan <[email protected]>; Deucher, Alexander
> <[email protected]>; Du, Xiaojian <[email protected]>;
> Meng, Li (Jassmine) <[email protected]>; Karny, Wyes
> <[email protected]>; [email protected]; linux-
> [email protected]
> Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP
> support for the AMD processors
>
> On Mon, Dec 19, 2022 at 06:21:14PM +0800, Yuan, Perry wrote:
> > [AMD Official Use Only - General]
> >
> > HI ray.
> >
> > > -----Original Message-----
> > > From: Huang, Ray <[email protected]>
> > > Sent: Monday, December 12, 2022 4:47 PM
> > > To: Yuan, Perry <[email protected]>
> > > Cc: [email protected]; Limonciello, Mario
> > > <[email protected]>; [email protected]; Sharma,
> Deepak
> > > <[email protected]>; Fontenot, Nathan
> <[email protected]>;
> > > Deucher, Alexander <[email protected]>; Huang, Shimmer
> > > <[email protected]>; Du, Xiaojian <[email protected]>;
> Meng,
> > > Li (Jassmine) <[email protected]>; Karny, Wyes
> <[email protected]>;
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate
> > > EPP support for the AMD processors
> > >
> > > On Thu, Dec 08, 2022 at 07:18:44PM +0800, Yuan, Perry wrote:
> > > > From: Perry Yuan <[email protected]>
> > > >
> > > > Add EPP driver support for AMD SoCs which support a dedicated MSR
> > > > for CPPC. EPP is used by the DPM controller to configure the
> > > > frequency that a core operates at during short periods of activity.
> > > >
> > > > The SoC EPP targets are configured on a scale from 0 to 255 where
> > > > 0 represents maximum performance and 255 represents maximum
> efficiency.
> > > >
> > > > The amd-pstate driver exports profile string names to userspace
> > > > that are tied to specific EPP values.
> > > >
> > > > The balance_performance string (0x80) provides the best balance
> > > > for efficiency versus power on most systems, but users can choose
> > > > other strings to meet their needs as well.
> > > >
> > > > $ cat
> > > >
> > >
> /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available
> > > _p
> > > > references default performance balance_performance balance_power
> > > power
> > > >
> > > > $ cat
> > > >
> > >
> /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preferenc
> > > e
> > > > balance_performance
> > > >
> > > > To enable the driver,it needs to add `amd_pstate=active` to kernel
> > > > command line and kernel will load the active mode epp driver
> > > >
> > > > Signed-off-by: Perry Yuan <[email protected]>
> > > > ---
> > > > drivers/cpufreq/amd-pstate.c | 631
> > > ++++++++++++++++++++++++++++++++++-
> > > > include/linux/amd-pstate.h | 35 ++
> > > > 2 files changed, 660 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/cpufreq/amd-pstate.c
> > > > b/drivers/cpufreq/amd-pstate.c index c17bd845f5fc..0a521be1be8a
> > > 100644
> > > > --- a/drivers/cpufreq/amd-pstate.c
> > > > +++ b/drivers/cpufreq/amd-pstate.c
> > > > @@ -37,6 +37,7 @@
> > > > #include <linux/uaccess.h>
> > > > #include <linux/static_call.h>
> > > > #include <linux/amd-pstate.h>
> > > > +#include <linux/cpufreq_common.h>
> > > >
> > > > #include <acpi/processor.h>
> > > > #include <acpi/cppc_acpi.h>
> > > > @@ -59,9 +60,125 @@
> > > > * we disable it by default to go acpi-cpufreq on these
> > > > processors and add
> > > a
> > > > * module parameter to be able to enable it manually for debugging.
> > > > */
> > > > -static struct cpufreq_driver amd_pstate_driver;
> > > > +static bool cppc_active;
> > > > static int cppc_load __initdata;
> > > >
> > > > +static struct cpufreq_driver *default_pstate_driver; static
> > > > +struct amd_cpudata **all_cpu_data; static struct
> > > > +amd_pstate_params global_params;
> > > > +
> > > > +static DEFINE_MUTEX(amd_pstate_limits_lock);
> > > > +static DEFINE_MUTEX(amd_pstate_driver_lock);
> > > > +
> > > > +static bool cppc_boost __read_mostly;
> > > > +
> > > > +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64
> > > > +cppc_req_cached) {
> > > > + s16 epp;
> > > > + struct cppc_perf_caps perf_caps;
> > > > + int ret;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + if (!cppc_req_cached) {
> > > > + epp = rdmsrl_on_cpu(cpudata->cpu,
> > > MSR_AMD_CPPC_REQ,
> > > > + &cppc_req_cached);
> > > > + if (epp)
> > > > + return epp;
> > > > + }
> > > > + epp = (cppc_req_cached >> 24) & 0xFF;
> > > > + } else {
> > > > + ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
> > > > + if (ret < 0) {
> > > > + pr_debug("Could not retrieve energy perf value
> > > (%d)\n", ret);
> > > > + return -EIO;
> > > > + }
> > > > + epp = (s16) perf_caps.energy_perf;
> > >
> > > It should align the static_call structure to implement the function.
> > > Please refer amd_pstate_init_perf. I think it can even re-use the
> > > init_perf to get the epp cap value.
> >
> > The amd_pstate_init_perf() is only called when driver registering,
> > However the amd_pstate_get_epp() will be called frequently to update
> the EPP MSR value and EPP Min/Max limitation.
> > So I suggest to keep using the amd_pstate_get_epp() to update EPP
> related values as it is.
> >
> > Static_call method can do that for MSR and Shared memory API call,
> > but amd_pstate_get_epp() is simple enough for now. No need to make
> this Epp update function also using static call method.
>
> Using static calls are to avoid retpoline, not to make them simple.
>
> https://thenewstack.io/linux-kernel-5-10-introduces-static-calls-to-prevent-
> speculative-execution-attacks/
>
> > Considering the tight schedule and merge window, I would like to keep the
> current way to update EPP, Otherwise the Customer release schedule will be
> delayed.
> >
>
> Mailing list is not the place to talk about internal schedule.
>
> > Perry.
> >
> >
> > >
> > > > + }
> > > > +
> > > > + return epp;
> > > > +}
> > > > +
> > > > +static int amd_pstate_get_energy_pref_index(struct amd_cpudata
> > > > +*cpudata) {
> > > > + s16 epp;
> > > > + int index = -EINVAL;
> > > > +
> > > > + epp = amd_pstate_get_epp(cpudata, 0);
> > > > + if (epp < 0)
> > > > + return epp;
> > > > +
> > > > + switch (epp) {
> > > > + case HWP_EPP_PERFORMANCE:
> > > > + index = EPP_INDEX_PERFORMANCE;
> > > > + break;
> > > > + case HWP_EPP_BALANCE_PERFORMANCE:
> > > > + index = EPP_INDEX_BALANCE_PERFORMANCE;
> > > > + break;
> > > > + case HWP_EPP_BALANCE_POWERSAVE:
> > > > + index = EPP_INDEX_BALANCE_POWERSAVE;
> > > > + break;
> > > > + case HWP_EPP_POWERSAVE:
> > > > + index = EPP_INDEX_POWERSAVE;
> > > > + break;
> > > > + default:
> > > > + break;
> > > > + }
> > > > +
> > > > + return index;
> > > > +}
> > > > +
> > > > +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32
> epp) {
> > > > + int ret;
> > > > + struct cppc_perf_ctrls perf_ctrls;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + u64 value = READ_ONCE(cpudata->cppc_req_cached);
> > > > +
> > > > + value &= ~GENMASK_ULL(31, 24);
> > > > + value |= (u64)epp << 24;
> > > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > > +
> > > > + ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > > value);
> > > > + if (!ret)
> > > > + cpudata->epp_cached = epp;
> > > > + } else {
> > > > + perf_ctrls.energy_perf = epp;
> > > > + ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
> > >
> > > Since the energy_perf is one of members of struct cppc_perf_ctrls,
> > > could we use cppc_set_perf as well?
> >
> > cppc_set_epp_perf() can handle the EPP value update correctly,
> > cppc_set_perf() is used for desired perf updating with a very high updating
> rate for governor such as schedutil governor.
> > And it has two Phase, Phase I and Phase II, implement the EPP value
> > update in this function, I have concern that we will meet some potential
> Firmware or performance drop risk.
>
> I am fine to use the separated cppc_set_epp_perf.
>
> > The release schedule and merge window is closing for v6.2 and this change
> request happened after six patch review series.
> > I afraid of that we have no enough time to mitigate the risk for this new
> code change.
> > We can consider to continue optimize this in the following patch.
> >
> > Perry.
> >
> > >
> > > > + if (ret) {
> > > > + pr_debug("failed to set energy perf value (%d)\n",
> > > ret);
> > > > + return ret;
> > > > + }
> > > > + cpudata->epp_cached = epp;
> > > > + }
> > > > +
> > > > + return ret;
> > > > +}
> > >
> > > The same with above, the helpers for different cppc types of
> > > processors such as MSR or share memory should be implemented by
> static_call.
> > >
> > > > +
> > > > +static int amd_pstate_set_energy_pref_index(struct amd_cpudata
> > > *cpudata,
> > > > + int pref_index)
> > > > +{
> > > > + int epp = -EINVAL;
> > > > + int ret;
> > > > +
> > > > + if (!pref_index) {
> > > > + pr_debug("EPP pref_index is invalid\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + if (epp == -EINVAL)
> > > > + epp = epp_values[pref_index];
> > > > +
> > > > + if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > > {
> > > > + pr_debug("EPP cannot be set under performance policy\n");
> > > > + return -EBUSY;
> > > > + }
> > > > +
> > > > + ret = amd_pstate_set_epp(cpudata, epp);
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > static inline int pstate_enable(bool enable) {
> > > > return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); @@ -70,11
> > > +187,21
> > > > @@ static inline int pstate_enable(bool enable) static int
> > > > cppc_enable(bool enable) {
> > > > int cpu, ret = 0;
> > > > + struct cppc_perf_ctrls perf_ctrls;
> > > >
> > > > for_each_present_cpu(cpu) {
> > > > ret = cppc_set_enable(cpu, enable);
> > > > if (ret)
> > > > return ret;
> > > > +
> > > > + /* Enable autonomous mode for EPP */
> > > > + if (!cppc_active) {
> > > > + /* Set desired perf as zero to allow EPP firmware
> > > control */
> > > > + perf_ctrls.desired_perf = 0;
> > > > + ret = cppc_set_perf(cpu, &perf_ctrls);
> > > > + if (ret)
> > > > + return ret;
> > > > + }
> > > > }
> > > >
> > > > return ret;
> > > > @@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct
> > > amd_cpudata *cpudata)
> > > > return;
> > > >
> > > > cpudata->boost_supported = true;
> > > > - amd_pstate_driver.boost_enabled = true;
> > > > + default_pstate_driver->boost_enabled = true;
> > > > }
> > > >
> > > > static void amd_perf_ctl_reset(unsigned int cpu) @@ -592,10
> > > > +719,61 @@ static ssize_t show_amd_pstate_highest_perf(struct
> > > > cpufreq_policy
> > > *policy,
> > > > return sprintf(&buf[0], "%u\n", perf); }
> > > >
> > > > +static ssize_t show_energy_performance_available_preferences(
> > > > + struct cpufreq_policy *policy, char *buf) {
> > > > + int i = 0;
> > > > + int offset = 0;
> > > > +
> > > > + while (energy_perf_strings[i] != NULL)
> > > > + offset += sysfs_emit_at(buf, offset, "%s ",
> > > > +energy_perf_strings[i++]);
> > > > +
> > > > + sysfs_emit_at(buf, offset, "\n");
> > > > +
> > > > + return offset;
> > > > +}
> > > > +
> > > > +static ssize_t store_energy_performance_preference(
> > > > + struct cpufreq_policy *policy, const char *buf, size_t count) {
> > > > + struct amd_cpudata *cpudata = policy->driver_data;
> > > > + char str_preference[21];
> > > > + ssize_t ret;
> > > > +
> > > > + ret = sscanf(buf, "%20s", str_preference);
> > > > + if (ret != 1)
> > > > + return -EINVAL;
> > > > +
> > > > + ret = match_string(energy_perf_strings, -1, str_preference);
> > > > + if (ret < 0)
> > > > + return -EINVAL;
> > > > +
> > > > + mutex_lock(&amd_pstate_limits_lock);
> > > > + ret = amd_pstate_set_energy_pref_index(cpudata, ret);
> > > > + mutex_unlock(&amd_pstate_limits_lock);
> > > > +
> > > > + return ret ?: count;
> > > > +}
> > > > +
> > > > +static ssize_t show_energy_performance_preference(
> > > > + struct cpufreq_policy *policy, char *buf) {
> > > > + struct amd_cpudata *cpudata = policy->driver_data;
> > > > + int preference;
> > > > +
> > > > + preference = amd_pstate_get_energy_pref_index(cpudata);
> > > > + if (preference < 0)
> > > > + return preference;
> > > > +
> > > > + return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
> > > > +}
> > > > +
> > > > cpufreq_freq_attr_ro(amd_pstate_max_freq);
> > > > cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> > > >
> > > > cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> > > > +cpufreq_freq_attr_rw(energy_performance_preference);
> > > > +cpufreq_freq_attr_ro(energy_performance_available_preferences);
> > > >
> > > > static struct freq_attr *amd_pstate_attr[] = {
> > > > &amd_pstate_max_freq,
> > > > @@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
> > > > NULL,
> > > > };
> > > >
> > > > +static struct freq_attr *amd_pstate_epp_attr[] = {
> > > > + &amd_pstate_max_freq,
> > > > + &amd_pstate_lowest_nonlinear_freq,
> > > > + &amd_pstate_highest_perf,
> > > > + &energy_performance_preference,
> > > > + &energy_performance_available_preferences,
> > > > + NULL,
> > > > +};
> > > > +
> > > > +static inline void update_boost_state(void) {
> > > > + u64 misc_en;
> > > > + struct amd_cpudata *cpudata;
> > > > +
> > > > + cpudata = all_cpu_data[0];
> > > > + rdmsrl(MSR_K7_HWCR, misc_en);
> > > > + global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);
> > >
> > > I won't need introduce the additional cppc_boost_disabled here. The
> > > cpufreq_driver->boost_enabled and cpudata->boost_supported can
> > > manage this function.
> >
> > The cppc_boost_disabled is used to mark if the PMFW Core boost
> > disabled, If some other driver for example thermal, performance limiting
> driver disable the core boost.
>
> I didn't see any other driver to control MSR_K7_HWCR_CPB_DIS except acpi-
> cpufreq.
>
> > We need to update the flag to let driver know the boost is disabled.
> >
> > * boost_supported is used to change CORE freq boost state.
> > * EPP driver did not use the cpufreq core boost sysfs node. So the
> boost_enabled is not used here.
>
> I would like to clarify again the core boost state is for legacy ACPI P-State, and
> it's configured by MSR_K7_HWCR. The CPPC is using the highest perf to map
> the boost frequency. However, here, it's because of some
> hardware/firmware issues or quirks that the legacy boost setting still impacts
> the target frequency. So cppc_boost will confuse the user for the
> functionalities between CPPC and ACPI P-State.

Make sense , As I know, EPP driver dose not decide the max frequency, the frequency changes are totally controlled by
the pmfw and we do not need to set the boost state here.
Power firmware will adjust frequency at runtime between the idle frequency and max frequency.
So I remove the boost state checking code in v9 to avoid confusion.

Perry.

>
> Can we enable core_boost configuration by default if using amd-pstate?
>
> >
> > >
> > > I believe it should be the firmware issue that the legacy ACPI Boost
> > > state will impact the frequency of CPPC. Could you move this
> > > handling into the cpufreq_driver->set_boost callback function to enable
> boost state by default.
> > >
> > > > +}
> > > > +
> > > > +static bool amd_pstate_acpi_pm_profile_server(void)
> > > > +{
> > > > + if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
> > > > + acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
> > > > + return true;
> > > > +
> > > > + return false;
> > > > +}
> > > > +
> > > > +static int amd_pstate_init_cpu(unsigned int cpunum) {
> > > > + struct amd_cpudata *cpudata;
> > > > +
> > > > + cpudata = all_cpu_data[cpunum];
> > > > + if (!cpudata) {
> > > > + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> > > > + if (!cpudata)
> > > > + return -ENOMEM;
> > > > + WRITE_ONCE(all_cpu_data[cpunum], cpudata);
> > > > +
> > > > + cpudata->cpu = cpunum;
> > > > +
> > > > + if (cppc_active) {
> > >
> > > The cppc_active is a bit confused here, if we run into amd-pstate
> > > driver, the cppc should be active. I know you want to indicate the
> > > different driver mode you are running. Please use an enumeration
> > > type to mark it different mode such as PASSIVE_MODE, ACTIVE_MODE,
> > > and GUIDED_MODE (Wyes proposed).
> >
> > Aligned with Wyse, I add one new patch to support enumerated working
> > mode in V8
> >
> >
> > >
> > > > + if (amd_pstate_acpi_pm_profile_server())
> > > > + cppc_boost = true;
> > > > + }
> > > > +
> > > > + }
> > > > + cpudata->epp_powersave = -EINVAL;
> > > > + cpudata->epp_policy = 0;
> > > > + pr_debug("controlling: cpu %d\n", cpunum);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int __amd_pstate_cpu_init(struct cpufreq_policy *policy) {
> > > > + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> > > > + struct amd_cpudata *cpudata;
> > > > + struct device *dev;
> > > > + int rc;
> > > > + u64 value;
> > > > +
> > > > + rc = amd_pstate_init_cpu(policy->cpu);
> > > > + if (rc)
> > > > + return rc;
> > > > +
> > > > + cpudata = all_cpu_data[policy->cpu];
> > > > +
> > > > + dev = get_cpu_device(policy->cpu);
> > > > + if (!dev)
> > > > + goto free_cpudata1;
> > > > +
> > > > + rc = amd_pstate_init_perf(cpudata);
> > > > + if (rc)
> > > > + goto free_cpudata1;
> > > > +
> > > > + min_freq = amd_get_min_freq(cpudata);
> > > > + max_freq = amd_get_max_freq(cpudata);
> > > > + nominal_freq = amd_get_nominal_freq(cpudata);
> > > > + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> > > > + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> > > > + dev_err(dev, "min_freq(%d) or max_freq(%d) value is
> > > incorrect\n",
> > > > + min_freq, max_freq);
> > > > + ret = -EINVAL;
> > > > + goto free_cpudata1;
> > > > + }
> > > > +
> > > > + policy->min = min_freq;
> > > > + policy->max = max_freq;
> > > > +
> > > > + policy->cpuinfo.min_freq = min_freq;
> > > > + policy->cpuinfo.max_freq = max_freq;
> > > > + /* It will be updated by governor */
> > > > + policy->cur = policy->cpuinfo.min_freq;
> > > > +
> > > > + /* Initial processor data capability frequencies */
> > > > + cpudata->max_freq = max_freq;
> > > > + cpudata->min_freq = min_freq;
> > > > + cpudata->nominal_freq = nominal_freq;
> > > > + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> > > > +
> > > > + policy->driver_data = cpudata;
> > > > +
> > > > + update_boost_state();
> > > > + cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
> > > > +
> > > > + policy->min = policy->cpuinfo.min_freq;
> > > > + policy->max = policy->cpuinfo.max_freq;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC))
> > > > + policy->fast_switch_possible = true;
> > >
> > > Please move this line into below if-case.
> >
> >
> > Fixed In V8
> >
> > >
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > > &value);
> > > > + if (ret)
> > > > + return ret;
> > > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > > +
> > > > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> > > &value);
> > > > + if (ret)
> > > > + return ret;
> > > > + WRITE_ONCE(cpudata->cppc_cap1_cached, value);
> > > > + }
> > > > + amd_pstate_boost_init(cpudata);
> > > > +
> > > > + return 0;
> > > > +
> > > > +free_cpudata1:
> > > > + kfree(cpudata);
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) {
> > > > + int ret;
> > > > +
> > > > + ret = __amd_pstate_cpu_init(policy);
> > >
> > > I don't see any reason that we need to define a
> > > __amd_pstate_cpu_init() here. Intel P-State driver's
> > > __intel_pstate_cpu_init() is used both on intel_pstate_cpu_init and
> intel_cpufreq_cpu_init.
> >
> > Fixed in V8.
> >
> > >
> > > > + if (ret)
> > > > + return ret;
> > > > + /*
> > > > + * Set the policy to powersave to provide a valid fallback value
> > > > +in
> > > case
> > > > + * the default cpufreq governor is neither powersave nor
> > > performance.
> > > > + */
> > > > + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) {
> > > > + pr_debug("CPU %d exiting\n", policy->cpu);
> > > > + policy->fast_switch_possible = false;
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void amd_pstate_update_max_freq(unsigned int cpu)
> > >
> > > Why do you name this function "update max frequency"?
> > >
> > > We won't have the differnt cpudata->pstate.max_freq and
> > > cpudata->pstate.turbo_freq on Intel P-State driver.
> > >
> > > I think in fact we don't update anything here.
> >
> > When core frequency was disabled, the function will update the frequency
> limits.
> > Currently the boost sysfs is not added, so the max freq is not changed.
> > Could we keep the code for the coming patch to add the sysfs node for
> boost control ?
> > It has no harm to the EPP driver.
>
> Again, the boost frequency and state is for ACPI P-State, we cannot make it
> confused in the CPPC.
>
> I didn't see where amd-pstate updates frequency max/min (limit), why we
> keep the redundant codes here?
>
> >
> > >
> > > > +{
> > > > + struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);
> > >
> > > struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > >
> > > > +
> > > > + if (!policy)
> > > > + return;
> > > > +
> > > > + refresh_frequency_limits(policy);
> > > > + cpufreq_cpu_put(policy);
> > > > +}
> > > > +
> > > > +static void amd_pstate_epp_update_limits(unsigned int cpu) {
> > > > + mutex_lock(&amd_pstate_driver_lock);
> > > > + update_boost_state();
> > > > + if (global_params.cppc_boost_disabled) {
> > > > + for_each_possible_cpu(cpu)
> > > > + amd_pstate_update_max_freq(cpu);
> > >
> > > This should do nothing in the amd-pstate.
> >
> > Currently the boost sysfs is not added, so the max freq is not changed.
> > Could we keep the code for the coming patch to add the sysfs node for
> boost control ?
> > It has no harm to the EPP driver.
> >
> > >
> > > > + } else {
> > > > + cpufreq_update_policy(cpu);
> > > > + }
> > > > + mutex_unlock(&amd_pstate_driver_lock);
> > > > +}
> > > > +
> > > > +static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
> > > > +
> > > > +static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata)
> {
> > > > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > > + u32 max_limit = (hwp_req & 0xff);
> > > > + u32 min_limit = (hwp_req & 0xff00) >> 8;
> > >
> > > We can use cpudata->max_perf and cpudata->min_perf directly.
> >
> > Iowait boost code removed from V8.
> >
> > >
> > > > + u32 boost_level1;
> > > > +
> > > > + /* If max and min are equal or already at max, nothing to boost
> > > > +*/
> > >
> > > I believe this is the only case that max_perf == min_perf, not at max.
> >
> > Iowait boost code removed from V8.
> >
> > >
> > > > + if (max_limit == min_limit)
> > > > + return;
> > > > +
> > > > + /* Set boost max and min to initial value */
> > > > + if (!cpudata->cppc_boost_min)
> > > > + cpudata->cppc_boost_min = min_limit;
> > > > +
> > > > + boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) +
> > > min_limit) >> 1);
> > > > +
> > > > + if (cpudata->cppc_boost_min < boost_level1)
> > > > + cpudata->cppc_boost_min = boost_level1;
> > > > + else if (cpudata->cppc_boost_min <
> > > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > > + cpudata->cppc_boost_min =
> > > AMD_CPPC_NOMINAL_PERF(hwp_cap);
> > > > + else if (cpudata->cppc_boost_min ==
> > > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > > + cpudata->cppc_boost_min = max_limit;
> > > > + else
> > > > + return;
> > >
> > > Could you please elaborate the reason that you separate the min_perf
> > > (cppc_boost_min) you would like to update into cppc_req register as
> > > these different cases? Why we pick up these cases such as (min +
> > > nominal)/2, and around nominal? What's the help to optimize the
> > > final result? - I am thinking the autonomous mode is handled by SMU
> > > firmware, we need to provide some data that let us know it influences
> the final result.
> > >
> >
> > Iowait boost code removed from V8.
> >
> > > > +
> > > > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > > + hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
> > > > + wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);
> > >
> > > Do we need an update for share memory processors? In other words,
> > > epp is also supported on share memory processors. - again, we should
> > > use static call to handle the msr and cppc_acpi stuff.
> > >
> > > > + cpudata->last_update = cpudata->sample.time; }
> > > > +
> > > > +static inline void amd_pstate_boost_down(struct amd_cpudata
> > > > +*cpudata) {
> > > > + bool expired;
> > > > +
> > > > + if (cpudata->cppc_boost_min) {
> > > > + expired = time_after64(cpudata->sample.time, cpudata-
> > > >last_update +
> > > > + cppc_boost_hold_time_ns);
> > > > +
> > > > + if (expired) {
> > > > + wrmsrl(MSR_AMD_CPPC_REQ, cpudata-
> > > >cppc_req_cached);
> > > > + cpudata->cppc_boost_min = 0;
> > > > + }
> > > > + }
> > > > +
> > > > + cpudata->last_update = cpudata->sample.time; }
> > > > +
> > > > +static inline void amd_pstate_boost_update_util(struct
> > > > +amd_cpudata
> > > *cpudata,
> > > > + u64 time)
> > > > +{
> > > > + cpudata->sample.time = time;
> > > > + if (smp_processor_id() != cpudata->cpu)
> > > > + return;
> > > > +
> > > > + if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
> > > > + bool do_io = false;
> > > > +
> > > > + cpudata->sched_flags = 0;
> > > > + /*
> > > > + * Set iowait_boost flag and update time. Since IO WAIT flag
> > > > + * is set all the time, we can't just conclude that there is
> > > > + * some IO bound activity is scheduled on this CPU with just
> > > > + * one occurrence. If we receive at least two in two
> > > > + * consecutive ticks, then we treat as boost candidate.
> > > > + * This is leveraged from Intel Pstate driver.
> > >
> > > I would like to know whether we can hit this case as well? If we can
> > > find or create a use case to hit it in our platforms, I am fine to
> > > add it our driver as well. If not, I don't suggest it we add them at
> > > this moment. I hope we have verified each code path once we add them
> into the driver.
> >
> > Sure, no problem.
> > Iowait boost code removed from V8.
> >
> >
> > >
> > > > + */
> > > > + if (time_before64(time, cpudata->last_io_update + 2 *
> > > TICK_NSEC))
> > > > + do_io = true;
> > > > +
> > > > + cpudata->last_io_update = time;
> > > > +
> > > > + if (do_io)
> > > > + amd_pstate_boost_up(cpudata);
> > > > +
> > > > + } else {
> > > > + amd_pstate_boost_down(cpudata);
> > > > + }
> > > > +}
> > > > +
> > > > +static inline void amd_pstate_cppc_update_hook(struct
> > > > +update_util_data
> > > *data,
> > > > + u64 time, unsigned int flags) {
> > > > + struct amd_cpudata *cpudata = container_of(data,
> > > > + struct amd_cpudata, update_util);
> > > > +
> > > > + cpudata->sched_flags |= flags;
> > > > +
> > > > + if (smp_processor_id() == cpudata->cpu)
> > > > + amd_pstate_boost_update_util(cpudata, time); }
> > > > +
> > > > +static void amd_pstate_clear_update_util_hook(unsigned int cpu) {
> > > > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > > +
> > > > + if (!cpudata->update_util_set)
> > > > + return;
> > > > +
> > > > + cpufreq_remove_update_util_hook(cpu);
> > > > + cpudata->update_util_set = false;
> > > > + synchronize_rcu();
> > > > +}
> > > > +
> > > > +static void amd_pstate_set_update_util_hook(unsigned int cpu_num)
> {
> > > > + struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
> > > > +
> > > > + if (!cppc_boost) {
> > > > + if (cpudata->update_util_set)
> > > > + amd_pstate_clear_update_util_hook(cpudata->cpu);
> > > > + return;
> > > > + }
> > > > +
> > > > + if (cpudata->update_util_set)
> > > > + return;
> > > > +
> > > > + cpudata->sample.time = 0;
> > > > + cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
> > > > +
> > > amd_pstate_cppc_update_hook);
> > > > + cpudata->update_util_set = true; }
> > > > +
> > > > +static void amd_pstate_epp_init(unsigned int cpu) {
> > > > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > > + u32 max_perf, min_perf;
> > > > + u64 value;
> > > > + s16 epp;
> > > > +
> > > > + max_perf = READ_ONCE(cpudata->highest_perf);
> > > > + min_perf = READ_ONCE(cpudata->lowest_perf);
> > > > +
> > > > + value = READ_ONCE(cpudata->cppc_req_cached);
> > > > +
> > > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > > > + min_perf = max_perf;
> > > > +
> > > > + /* Initial min/max values for CPPC Performance Controls Register */
> > > > + value &= ~AMD_CPPC_MIN_PERF(~0L);
> > > > + value |= AMD_CPPC_MIN_PERF(min_perf);
> > > > +
> > > > + value &= ~AMD_CPPC_MAX_PERF(~0L);
> > > > + value |= AMD_CPPC_MAX_PERF(max_perf);
> > > > +
> > > > + /* CPPC EPP feature require to set zero to the desire perf bit */
> > > > + value &= ~AMD_CPPC_DES_PERF(~0L);
> > > > + value |= AMD_CPPC_DES_PERF(0);
> > > > +
> > > > + if (cpudata->epp_policy == cpudata->policy)
> > > > + goto skip_epp;
> > > > +
> > > > + cpudata->epp_policy = cpudata->policy;
> > > > +
> > > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > > + epp = amd_pstate_get_epp(cpudata, value);
> > > > + cpudata->epp_powersave = epp;
> > >
> > > I didn't see where we should have epp_powersave here. Only initial
> > > this, but it won't be used anywhere.
> >
> > epp_powersave var was removed from V8.
> >
> >
> > >
> > > > + if (epp < 0)
> > > > + goto skip_epp;
> > > > + /* force the epp value to be zero for performance policy */
> > > > + epp = 0;
> > > > + } else {
> > > > + if (cpudata->epp_powersave < 0)
> > > > + goto skip_epp;
> > > > + /* Get BIOS pre-defined epp value */
> > > > + epp = amd_pstate_get_epp(cpudata, value);
> > > > + if (epp)
> > > > + goto skip_epp;
> > > > + epp = cpudata->epp_powersave;
> > > > + }
> > > > + /* Set initial EPP value */
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + value &= ~GENMASK_ULL(31, 24);
> > > > + value |= (u64)epp << 24;
> > > > + }
> > > > +
> > > > +skip_epp:
> > > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > > + amd_pstate_set_epp(cpudata, epp); }
> > > > +
> > > > +static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata)
> {
> > > > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > > + u32 max_limit = (hwp_cap >> 24) & 0xff;
> > > > +
> > > > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > > + hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
> > > > + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req); }
> > > > +
> > > > +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) {
> > > > + struct amd_cpudata *cpudata;
> > > > +
> > > > + if (!policy->cpuinfo.max_freq)
> > > > + return -ENODEV;
> > > > +
> > > > + pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
> > > > + policy->cpuinfo.max_freq, policy->max);
> > > > +
> > > > + cpudata = all_cpu_data[policy->cpu];
> > > > + cpudata->policy = policy->policy;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + mutex_lock(&amd_pstate_limits_lock);
> > > > +
> > > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > > + amd_pstate_clear_update_util_hook(policy->cpu);
> > > > + amd_pstate_set_max_limits(cpudata);
> > > > + } else {
> > > > + amd_pstate_set_update_util_hook(policy->cpu);
> > > > + }
> > > > +
> > > > + mutex_unlock(&amd_pstate_limits_lock);
> > > > + }
> > > > + amd_pstate_epp_init(policy->cpu);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void amd_pstate_verify_cpu_policy(struct amd_cpudata
> *cpudata,
> > > > + struct cpufreq_policy_data *policy)
> > > {
> > > > + update_boost_state();
> > > > + cpufreq_verify_within_cpu_limits(policy);
> > > > +}
> > > > +
> > > > +static int amd_pstate_epp_verify_policy(struct
> > > > +cpufreq_policy_data
> > > > +*policy) {
> > > > + amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
> > > > + pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy-
> > > >min);
> > > > + return 0;
> > > > +}
> > >
> > > amd_pstate_verify_cpu_policy and amd_pstate_epp_verify_policy can
> be
> > > squeezed in one function.
> >
> > Fixed in V8.
> >
> > >
> > > > +
> > > > static struct cpufreq_driver amd_pstate_driver = {
> > > > .flags = CPUFREQ_CONST_LOOPS |
> > > CPUFREQ_NEED_UPDATE_LIMITS,
> > > > .verify = amd_pstate_verify,
> > > > @@ -617,8 +1213,20 @@ static struct cpufreq_driver
> > > > amd_pstate_driver =
> > > {
> > > > .attr = amd_pstate_attr,
> > > > };
> > > >
> > > > +static struct cpufreq_driver amd_pstate_epp_driver = {
> > > > + .flags = CPUFREQ_CONST_LOOPS,
> > > > + .verify = amd_pstate_epp_verify_policy,
> > > > + .setpolicy = amd_pstate_epp_set_policy,
> > > > + .init = amd_pstate_epp_cpu_init,
> > > > + .exit = amd_pstate_epp_cpu_exit,
> > > > + .update_limits = amd_pstate_epp_update_limits,
> > > > + .name = "amd_pstate_epp",
> > > > + .attr = amd_pstate_epp_attr,
> > > > +};
> > > > +
> > > > static int __init amd_pstate_init(void) {
> > > > + static struct amd_cpudata **cpudata;
> > > > int ret;
> > > >
> > > > if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) @@ -645,7
> > > +1253,8 @@
> > > > static int __init amd_pstate_init(void)
> > > > /* capability check */
> > > > if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > pr_debug("AMD CPPC MSR based functionality is
> > > supported\n");
> > > > - amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> > > > + if (!cppc_active)
> > > > + default_pstate_driver->adjust_perf =
> > > amd_pstate_adjust_perf;
> > > > } else {
> > > > pr_debug("AMD CPPC shared memory based functionality is
> > > supported\n");
> > > > static_call_update(amd_pstate_enable, cppc_enable); @@ -
> > > 653,6
> > > > +1262,10 @@ static int __init amd_pstate_init(void)
> > > > static_call_update(amd_pstate_update_perf,
> > > cppc_update_perf);
> > > > }
> > > >
> > > > + cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
> > > > + if (!cpudata)
> > > > + return -ENOMEM;
> > > > + WRITE_ONCE(all_cpu_data, cpudata);
> > >
> > > Why we cannot use cpufreq_policy->driver_data to store the cpudata?
> > > I believe the cpudata is per-CPU can be easily get from private data.
> >
> > cpufreq_policy->driver_data can do that, but global data can have
> > better cached hit rate especially for the server cluster.
> > So I would prefer to use the global cpudata to store each CPU data struct.
> > Could we keep it for EPP driver ?
>
> I didn't get your meaning what's "better cached hit rate especially for the
> server cluster"? In my view, there is just different design no performance
> enhancement. Since we already implemented in cpufreq_policy-
> >driver_data, I didn't see any visiblely improvement to change into this way.
>
> Thanks,
> Ray