2021-09-08 15:03:03

by Huang Rui

[permalink] [raw]
Subject: [PATCH 00/19] cpufreq: introduce a new AMD CPU frequency control mechanism

Hi all,

We would like to introduce a new AMD CPU frequency control mechanism as the
"amd-pstate" driver for modern AMD Zen based CPU series in Linux Kernel.
The new mechanism is based on Collaborative processor performance control
(CPPC) which is finer grain frequency management than legacy ACPI hardware
P-States. Current AMD CPU platforms are using the ACPI P-states driver to
manage CPU frequency and clocks with switching only in 3 P-states. AMD
P-States is to replace the ACPI P-states controls, allows a flexible,
low-latency interface for the Linux kernel to directly communicate the
performance hints to hardware.

"amd-pstate" leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for amd-pstate is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.

There are two types of hardware implementations for amd-pstate: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.

Using the new AMD P-States method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to ajust to the most efficiency frequency according to
the kernel scheduler loading.

Performance Per Watt (PPW) Caculation:

The PPW caculation is referred by below paper:
https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf

Below formula is referred from below spec to measure the PPW:

(F / t) / P = F * t / (t * E) = F / E,

"F" is the number of frames per second.
"P" is power measurd in watts.
"E" is energy measured in joules.

We use the RAPL interface with "perf" tool to get the energy data of the
package power.

The data comparsions between amd-pstate and acpi-freq module are tested on
AMD Cezanne processor:

1) TBench CPU benchmark:

+---------------------------------------------------------------------+
| |
| TBench (Performance Per Watt) |
| Higher is better |
+-------------------+------------------------+------------------------+
| | Performance Per Watt | Performance Per Watt |
| Kernel Module | (Schedutil) | (Ondemand) |
| | Unit: MB / (s * J) | Unit: MB / (s * J) |
+-------------------+------------------------+------------------------+
| | | |
| acpi-cpufreq | 3.022 | 2.969 |
| | | |
+-------------------+------------------------+------------------------+
| | | |
| amd-pstate | 3.131 | 3.284 |
| | | |
+-------------------+------------------------+------------------------+

2) Gitsource CPU benchmark:

+---------------------------------------------------------------------+
| |
| Gitsource (Performance Per Watt) |
| Higher is better |
+-------------------+------------------------+------------------------+
| | Performance Per Watt | Performance Per Watt |
| Kernel Module | (Schedutil) | (Ondemand) |
| | Unit: 1 / (s * J) | Unit: 1 / (s * J) |
+-------------------+------------------------+------------------------+
| | | |
| acpi-cpufreq | 3.42172E-07 | 2.74508E-07 |
| | | |
+-------------------+------------------------+------------------------+
| | | |
| amd-pstate | 4.09141E-07 | 3.47610E-07 |
| | | |
+-------------------+------------------------+------------------------+

3) Speedometer 2.0 CPU benchmark:

+---------------------------------------------------------------------+
| |
| Speedometer 2.0 (Performance Per Watt) |
| Higher is better |
+-------------------+------------------------+------------------------+
| | Performance Per Watt | Performance Per Watt |
| Kernel Module | (Schedutil) | (Ondemand) |
| | Unit: 1 / (s * J) | Unit: 1 / (s * J) |
+-------------------+------------------------+------------------------+
| | | |
| acpi-cpufreq | 0.116111767 | 0.110321664 |
| | | |
+-------------------+------------------------+------------------------+
| | | |
| amd-pstate | 0.115825281 | 0.122024299 |
| | | |
+-------------------+------------------------+------------------------+


According to above average data, we can see this solution has shown better
performance per watt scaling on mobile CPU benchmarks in most of cases.

These patch series depends on a "hotplug capable" CPU fix below (Only few
of CPU parts with "un-hotplug" core will encounter the issue and Mario is
working on the fix):
https://lore.kernel.org/linux-pm/[email protected]/

And we can see patch series in below git repo:
https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=amd-pstate-dev-v1

For details introduction, please see the patch 19.

Thanks,
Ray

Huang Rui (18):
x86/cpufreatures: add AMD CPPC extension feature flag
x86/msr: add AMD CPPC MSR definitions
cpufreq: amd: introduce a new amd pstate driver to support future
processors
cpufreq: amd: add fast switch function for amd-pstate module
cpufreq: amd: add acpi cppc function as the backend for legacy
processors
cpufreq: amd: add trace for amd-pstate module
cpufreq: amd: add boost mode support for amd-pstate
cpufreq: amd: add amd-pstate checking support check attribute
cpufreq: amd: add amd-pstate frequencies attributes
cpufreq: amd: add amd-pstate performance attributes
cpupower: add AMD P-state capability flag
cpupower: add the function to check amd-pstate enabled
cpupower: initial AMD P-state capability
cpupower: add amd-pstate sysfs entries into libcpufreq
cpupower: enable boost state support for amd-pstate module
cpupower: add amd-pstate get data function to query the info
cpupower: print amd-pstate information on cpupower
Documentation: amd-pstate: add amd-pstate driver introduction

Jinzhou Su (1):
ACPI: CPPC: add cppc enable register function

Documentation/admin-guide/pm/amd_pstate.rst | 377 ++++++++
.../admin-guide/pm/working-state.rst | 1 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 17 +
drivers/acpi/cppc_acpi.c | 42 +
drivers/cpufreq/Kconfig.x86 | 13 +
drivers/cpufreq/Makefile | 5 +
drivers/cpufreq/amd-pstate-trace.c | 2 +
drivers/cpufreq/amd-pstate-trace.h | 96 +++
drivers/cpufreq/amd-pstate.c | 812 ++++++++++++++++++
include/acpi/cppc_acpi.h | 5 +
tools/power/cpupower/lib/cpufreq.c | 44 +-
tools/power/cpupower/lib/cpufreq.h | 16 +
tools/power/cpupower/utils/cpufreq-info.c | 27 +-
tools/power/cpupower/utils/helpers/cpuid.c | 13 +
tools/power/cpupower/utils/helpers/helpers.h | 6 +
tools/power/cpupower/utils/helpers/misc.c | 27 +
17 files changed, 1500 insertions(+), 4 deletions(-)
create mode 100644 Documentation/admin-guide/pm/amd_pstate.rst
create mode 100644 drivers/cpufreq/amd-pstate-trace.c
create mode 100644 drivers/cpufreq/amd-pstate-trace.h
create mode 100644 drivers/cpufreq/amd-pstate.c

--
2.25.1


2021-09-08 15:03:10

by Huang Rui

[permalink] [raw]
Subject: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

Add Collaborative Processor Performance Control Extension feature flag
for AMD processors.

Signed-off-by: Huang Rui <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..f7aea50e3371 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -313,6 +313,7 @@
#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
+#define X86_FEATURE_AMD_CPPC_EXT (13*32+27) /* Collaborative Processor Performance Control Extension */

/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
--
2.25.1

2021-09-08 15:04:10

by Huang Rui

[permalink] [raw]
Subject: [PATCH 05/19] cpufreq: amd: add fast switch function for amd-pstate module

Introduce the fast switch function for amd-pstate module on the AMD
processors which support the full MSR register control. It's able to
decrease the lattency on interrupt context.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 64 ++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 4c9c9bf1d72b..32b4f6d79783 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -212,6 +212,66 @@ static int amd_pstate_target(struct cpufreq_policy *policy,
return ret;
}

+static void amd_pstate_adjust_perf(unsigned int cpu,
+ unsigned long min_perf,
+ unsigned long target_perf,
+ unsigned long capacity)
+{
+ unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
+ amd_cap_perf, lowest_nonlinear_perf;
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ struct amd_cpudata *cpudata = policy->driver_data;
+
+ amd_cap_perf = READ_ONCE(cpudata->highest_perf);
+ lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+
+ if (target_perf < capacity)
+ amd_des_perf = DIV_ROUND_UP(amd_cap_perf * target_perf,
+ capacity);
+
+ amd_min_perf = READ_ONCE(cpudata->highest_perf);
+ if (min_perf < capacity)
+ amd_min_perf = DIV_ROUND_UP(amd_cap_perf * min_perf, capacity);
+
+ if (amd_min_perf < lowest_nonlinear_perf)
+ amd_min_perf = lowest_nonlinear_perf;
+
+ amd_max_perf = amd_cap_perf;
+ if (amd_max_perf < amd_min_perf)
+ amd_max_perf = amd_min_perf;
+
+ amd_des_perf = clamp_t(unsigned long, amd_des_perf,
+ amd_min_perf, amd_max_perf);
+
+ amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
+ amd_max_perf, true);
+}
+
+static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
+ unsigned int target_freq)
+{
+ u64 ratio;
+ struct amd_cpudata *cpudata = policy->driver_data;
+ unsigned long amd_max_perf, amd_min_perf, amd_des_perf, nominal_perf;
+
+ if (!cpudata->max_freq)
+ return -ENODEV;
+
+ amd_max_perf = READ_ONCE(cpudata->highest_perf);
+ amd_min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+
+ amd_des_perf = DIV_ROUND_UP(target_freq * amd_max_perf,
+ cpudata->max_freq);
+
+ amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
+ amd_max_perf, true);
+
+ nominal_perf = READ_ONCE(cpudata->nominal_perf);
+ ratio = div_u64(amd_des_perf << SCHED_CAPACITY_SHIFT, nominal_perf);
+
+ return cpudata->nominal_freq * ratio >> SCHED_CAPACITY_SHIFT;
+}
+
static int amd_get_min_freq(struct amd_cpudata *cpudata)
{
struct cppc_perf_caps cppc_perf;
@@ -356,6 +416,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
/* It will be updated by governor */
policy->cur = policy->cpuinfo.min_freq;

+ policy->fast_switch_possible = true;
+
ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
FREQ_QOS_MIN, policy->cpuinfo.min_freq);
if (ret < 0) {
@@ -408,6 +470,8 @@ static struct cpufreq_driver amd_pstate_driver = {
.flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
.verify = amd_pstate_verify,
.target = amd_pstate_target,
+ .fast_switch = amd_pstate_fast_switch,
+ .adjust_perf = amd_pstate_adjust_perf,
.init = amd_pstate_cpu_init,
.exit = amd_pstate_cpu_exit,
.name = "amd-pstate",
--
2.25.1

2021-09-08 15:05:01

by Huang Rui

[permalink] [raw]
Subject: [PATCH 10/19] cpufreq: amd: add amd-pstate frequencies attributes

Introduce sysfs attributes to get the different level processor
frequencies.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 80 +++++++++++++++++++++++++++++++++++-
1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 48dedd5af101..3c727a22cb69 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -577,16 +577,94 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
return 0;
}

-static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
+/* Sysfs attributes */
+
+static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy,
char *buf)
+{
+ int ret = 0, max_freq;
+ struct amd_cpudata *cpudata;
+
+ cpudata = policy->driver_data;
+
+ max_freq = amd_get_max_freq(cpudata);
+ if (max_freq < 0)
+ return max_freq;
+
+ ret += sprintf(&buf[ret], "%u\n", max_freq);
+
+ return ret;
+}
+
+static ssize_t show_amd_pstate_nominal_freq(struct cpufreq_policy *policy,
+ char *buf)
+{
+ int ret = 0, nominal_freq;
+ struct amd_cpudata *cpudata;
+
+ cpudata = policy->driver_data;
+
+ nominal_freq = amd_get_nominal_freq(cpudata);
+ if (nominal_freq < 0)
+ return nominal_freq;
+
+ ret += sprintf(&buf[ret], "%u\n", nominal_freq);
+
+ return ret;
+}
+
+static ssize_t
+show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy, char *buf)
+{
+ int ret = 0, freq;
+ struct amd_cpudata *cpudata;
+
+ cpudata = policy->driver_data;
+
+ freq = amd_get_lowest_nonlinear_freq(cpudata);
+ if (freq < 0)
+ return freq;
+
+ ret += sprintf(&buf[ret], "%u\n", freq);
+
+ return ret;
+}
+
+static ssize_t show_amd_pstate_min_freq(struct cpufreq_policy *policy, char *buf)
+{
+ int ret = 0;
+ int freq;
+ struct amd_cpudata *cpudata;
+
+ cpudata = policy->driver_data;
+
+ freq = amd_get_min_freq(cpudata);
+ if (freq < 0)
+ return freq;
+
+ ret += sprintf(&buf[ret], "%u\n", freq);
+
+ return ret;
+}
+
+static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
+ char *buf)
{
return sprintf(&buf[0], "%d\n", acpi_cpc_valid() ? 1 : 0);
}

cpufreq_freq_attr_ro(is_amd_pstate_enabled);
+cpufreq_freq_attr_ro(amd_pstate_max_freq);
+cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
+cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
+cpufreq_freq_attr_ro(amd_pstate_min_freq);

static struct freq_attr *amd_pstate_attr[] = {
&is_amd_pstate_enabled,
+ &amd_pstate_max_freq,
+ &amd_pstate_nominal_freq,
+ &amd_pstate_lowest_nonlinear_freq,
+ &amd_pstate_min_freq,
NULL,
};

--
2.25.1

2021-09-08 15:05:04

by Huang Rui

[permalink] [raw]
Subject: [PATCH 02/19] x86/msr: add AMD CPPC MSR definitions

AMD CPPC (Collaborative Processor Performance Control) function uses MSR
registers to manage the performance hints. So add the MSR register macro
here.

Signed-off-by: Huang Rui <[email protected]>
---
arch/x86/include/asm/msr-index.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c413432b33..ce42e15cf303 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -486,6 +486,23 @@

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

+/* AMD Collaborative Processor Performance Control MSRs */
+#define MSR_AMD_CPPC_CAP1 0xc00102b0
+#define MSR_AMD_CPPC_ENABLE 0xc00102b1
+#define MSR_AMD_CPPC_CAP2 0xc00102b2
+#define MSR_AMD_CPPC_REQ 0xc00102b3
+#define MSR_AMD_CPPC_STATUS 0xc00102b4
+
+#define CAP1_LOWEST_PERF(x) (((x) >> 0) & 0xff)
+#define CAP1_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff)
+#define CAP1_NOMINAL_PERF(x) (((x) >> 16) & 0xff)
+#define CAP1_HIGHEST_PERF(x) (((x) >> 24) & 0xff)
+
+#define REQ_MAX_PERF(x) (((x) & 0xff) << 0)
+#define REQ_MIN_PERF(x) (((x) & 0xff) << 8)
+#define REQ_DES_PERF(x) (((x) & 0xff) << 16)
+#define REQ_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24)
+
/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9

--
2.25.1

2021-09-08 15:05:04

by Huang Rui

[permalink] [raw]
Subject: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

amd-pstate is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on AMD Zen based CPU series in Linux
kernel. The new mechanism is based on Collaborative processor
performance control (CPPC) which is finer grain frequency management
than legacy ACPI hardware P-States. Current AMD CPU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. AMD P-States is to replace the ACPI
P-states controls, allows a flexible, low-latency interface for the
Linux kernel to directly communicate the performance hints to hardware.

"amd-pstate" leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for amd-pstate is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.

There are two types of hardware implementations for amd-pstate: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.

Using the new AMD P-States method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to ajust to the most efficiency frequency according to
the kernel scheduler loading.

Performance Per Watt (PPW) Caculation:

The PPW caculation is referred by below paper:
https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf

Below formula is referred from below spec to measure the PPW:

(F / t) / P = F * t / (t * E) = F / E,

"F" is the number of frames per second.
"P" is power measurd in watts.
"E" is energy measured in joules.

We use the RAPL interface with "perf" tool to get the energy data of the
package power.

The data comparsions between amd-pstate and acpi-freq module are tested on
AMD Cezanne processor:

1) TBench CPU benchmark:

+---------------------------------------------------------------------+
| |
| TBench (Performance Per Watt) |
| Higher is better |
+-------------------+------------------------+------------------------+
| | Performance Per Watt | Performance Per Watt |
| Kernel Module | (Schedutil) | (Ondemand) |
| | Unit: MB / (s * J) | Unit: MB / (s * J) |
+-------------------+------------------------+------------------------+
| | | |
| acpi-cpufreq | 3.022 | 2.969 |
| | | |
+-------------------+------------------------+------------------------+
| | | |
| amd-pstate | 3.131 | 3.284 |
| | | |
+-------------------+------------------------+------------------------+

2) Gitsource CPU benchmark:

+---------------------------------------------------------------------+
| |
| Gitsource (Performance Per Watt) |
| Higher is better |
+-------------------+------------------------+------------------------+
| | Performance Per Watt | Performance Per Watt |
| Kernel Module | (Schedutil) | (Ondemand) |
| | Unit: 1 / (s * J) | Unit: 1 / (s * J) |
+-------------------+------------------------+------------------------+
| | | |
| acpi-cpufreq | 3.42172E-07 | 2.74508E-07 |
| | | |
+-------------------+------------------------+------------------------+
| | | |
| amd-pstate | 4.09141E-07 | 3.47610E-07 |
| | | |
+-------------------+------------------------+------------------------+

3) Speedometer 2.0 CPU benchmark:

+---------------------------------------------------------------------+
| |
| Speedometer 2.0 (Performance Per Watt) |
| Higher is better |
+-------------------+------------------------+------------------------+
| | Performance Per Watt | Performance Per Watt |
| Kernel Module | (Schedutil) | (Ondemand) |
| | Unit: 1 / (s * J) | Unit: 1 / (s * J) |
+-------------------+------------------------+------------------------+
| | | |
| acpi-cpufreq | 0.116111767 | 0.110321664 |
| | | |
+-------------------+------------------------+------------------------+
| | | |
| amd-pstate | 0.115825281 | 0.122024299 |
| | | |
+-------------------+------------------------+------------------------+

According to above average data, we can see this solution has shown better
performance per watt scaling on mobile CPU benchmarks in most of cases.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/Kconfig.x86 | 13 +
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/amd-pstate.c | 478 +++++++++++++++++++++++++++++++++++
3 files changed, 492 insertions(+)
create mode 100644 drivers/cpufreq/amd-pstate.c

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index 92701a18bdd9..9cd7e338bdcd 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -34,6 +34,19 @@ config X86_PCC_CPUFREQ

If in doubt, say N.

+config X86_AMD_PSTATE
+ tristate "AMD Processor P-State driver"
+ depends on X86
+ select ACPI_PROCESSOR if ACPI
+ select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO
+ select CPU_FREQ_GOV_SCHEDUTIL if SMP
+ help
+ This driver adds a CPUFreq driver which utilizes a fine grain
+ processor performance freqency control range instead of legacy
+ performance levels. This driver also supports newer AMD CPUs.
+
+ If in doubt, say N.
+
config X86_ACPI_CPUFREQ
tristate "ACPI Processor P-States driver"
depends on ACPI_PROCESSOR
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 27d3bd7ea9d4..3d4bd7141cf8 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o
# powernow-k8 can load then. ACPI is preferred to all other hardware-specific drivers.
# speedstep-* is preferred over p4-clockmod.

+obj-$(CONFIG_X86_AMD_PSTATE) += amd-pstate.o
obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o
obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
new file mode 100644
index 000000000000..4c9c9bf1d72b
--- /dev/null
+++ b/drivers/cpufreq/amd-pstate.c
@@ -0,0 +1,478 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * amd-pstate.c - AMD Processor P-state Frequency Driver
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Author: Huang Rui <[email protected]>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/sched.h>
+#include <linux/cpufreq.h>
+#include <linux/compiler.h>
+#include <linux/dmi.h>
+#include <linux/slab.h>
+#include <linux/acpi.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/uaccess.h>
+
+#include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+#include <asm/cpu_device_id.h>
+
+#define AMD_PSTATE_TRANSITION_LATENCY 0x20000
+#define AMD_PSTATE_TRANSITION_DELAY 500
+
+static struct cpufreq_driver amd_pstate_driver;
+
+struct amd_cpudata {
+ int cpu;
+
+ struct freq_qos_request req[2];
+ struct cpufreq_policy *policy;
+
+ u64 cppc_req_cached;
+
+ u32 highest_perf;
+ u32 nominal_perf;
+ u32 lowest_nonlinear_perf;
+ u32 lowest_perf;
+
+ u32 max_freq;
+ u32 min_freq;
+ u32 nominal_freq;
+ u32 lowest_nonlinear_freq;
+};
+
+struct amd_pstate_perf_funcs {
+ int (*enable)(bool enable);
+ int (*init_perf)(struct amd_cpudata *cpudata);
+ void (*update_perf)(struct amd_cpudata *cpudata,
+ u32 min_perf, u32 des_perf,
+ u32 max_perf, bool fast_switch);
+};
+
+static inline int pstate_enable(bool enable)
+{
+ return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
+}
+
+static int
+amd_pstate_enable(struct amd_pstate_perf_funcs *funcs, bool enable)
+{
+ if (!funcs)
+ return -EINVAL;
+
+ return funcs->enable(enable);
+}
+
+static int pstate_init_perf(struct amd_cpudata *cpudata)
+{
+ u64 cap1;
+
+ int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
+ &cap1);
+ if (ret)
+ return ret;
+
+ /* Some AMD processors has specific power features that the cppc entry
+ * doesn't indicate the highest performance. It will introduce the
+ * feature in following days.
+ */
+ WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
+
+ WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
+ WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
+ WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
+
+ return 0;
+}
+
+static int amd_pstate_init_perf(struct amd_cpudata *cpudata)
+{
+ struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
+
+ if (!funcs)
+ return -EINVAL;
+
+ return funcs->init_perf(cpudata);
+}
+
+static void pstate_update_perf(struct amd_cpudata *cpudata,
+ u32 min_perf, u32 des_perf, u32 max_perf,
+ bool fast_switch)
+{
+ if (fast_switch)
+ wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
+ else
+ wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
+ READ_ONCE(cpudata->cppc_req_cached));
+}
+
+static int
+amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
+ u32 des_perf, u32 max_perf, bool fast_switch)
+{
+ struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
+
+ if (!funcs)
+ return -EINVAL;
+
+ funcs->update_perf(cpudata, min_perf, des_perf,
+ max_perf, fast_switch);
+
+ return 0;
+}
+
+static int
+amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
+ u32 des_perf, u32 max_perf, bool fast_switch)
+{
+ u64 prev = READ_ONCE(cpudata->cppc_req_cached);
+ u64 value = prev;
+
+ value &= ~REQ_MIN_PERF(~0L);
+ value |= REQ_MIN_PERF(min_perf);
+
+ value &= ~REQ_DES_PERF(~0L);
+ value |= REQ_DES_PERF(des_perf);
+
+ value &= ~REQ_MAX_PERF(~0L);
+ value |= REQ_MAX_PERF(max_perf);
+
+ if (value == prev)
+ return 0;
+
+ WRITE_ONCE(cpudata->cppc_req_cached, value);
+
+ return amd_pstate_update_perf(cpudata, min_perf, des_perf,
+ max_perf, fast_switch);
+}
+
+static int amd_pstate_verify(struct cpufreq_policy_data *policy)
+{
+ cpufreq_verify_within_cpu_limits(policy);
+
+ return 0;
+}
+
+static int amd_pstate_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ int ret;
+ struct cpufreq_freqs freqs;
+ struct amd_cpudata *cpudata = policy->driver_data;
+ unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
+ amd_cap_perf;
+
+ if (!cpudata->max_freq)
+ return -ENODEV;
+
+ amd_cap_perf = READ_ONCE(cpudata->highest_perf);
+ amd_min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+ amd_max_perf = amd_cap_perf;
+
+ freqs.old = policy->cur;
+ freqs.new = target_freq;
+
+ amd_des_perf = DIV_ROUND_CLOSEST(target_freq * amd_cap_perf,
+ cpudata->max_freq);
+
+ cpufreq_freq_transition_begin(policy, &freqs);
+ ret = amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
+ amd_max_perf, false);
+ cpufreq_freq_transition_end(policy, &freqs, false);
+
+ return ret;
+}
+
+static int amd_get_min_freq(struct amd_cpudata *cpudata)
+{
+ struct cppc_perf_caps cppc_perf;
+
+ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+ if (ret)
+ return ret;
+
+ /* Switch to khz */
+ return cppc_perf.lowest_freq * 1000;
+}
+
+static int amd_get_max_freq(struct amd_cpudata *cpudata)
+{
+ struct cppc_perf_caps cppc_perf;
+ u32 max_perf, max_freq, nominal_freq, nominal_perf;
+ u64 boost_ratio;
+
+ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+ if (ret)
+ return ret;
+
+ nominal_freq = cppc_perf.nominal_freq;
+ nominal_perf = READ_ONCE(cpudata->nominal_perf);
+ max_perf = READ_ONCE(cpudata->highest_perf);
+
+ boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT,
+ nominal_perf);
+
+ max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT;
+
+ /* Switch to khz */
+ return max_freq * 1000;
+}
+
+static int amd_get_nominal_freq(struct amd_cpudata *cpudata)
+{
+ struct cppc_perf_caps cppc_perf;
+ u32 nominal_freq;
+
+ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+ if (ret)
+ return ret;
+
+ nominal_freq = cppc_perf.nominal_freq;
+
+ /* Switch to khz */
+ return nominal_freq * 1000;
+}
+
+static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
+{
+ struct cppc_perf_caps cppc_perf;
+ u32 lowest_nonlinear_freq, lowest_nonlinear_perf,
+ nominal_freq, nominal_perf;
+ u64 lowest_nonlinear_ratio;
+
+ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+ if (ret)
+ return ret;
+
+ nominal_freq = cppc_perf.nominal_freq;
+ nominal_perf = READ_ONCE(cpudata->nominal_perf);
+
+ lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf;
+
+ lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf <<
+ SCHED_CAPACITY_SHIFT, nominal_perf);
+
+ lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT;
+
+ /* Switch to khz */
+ return lowest_nonlinear_freq * 1000;
+}
+
+static int amd_pstate_init_freqs_in_cpudata(struct amd_cpudata *cpudata,
+ u32 max_freq, u32 min_freq,
+ u32 nominal_freq,
+ u32 lowest_nonlinear_freq)
+{
+ if (!cpudata)
+ return -EINVAL;
+
+ /* Initial processor data capability frequencies */
+ cpudata->max_freq = max_freq;
+ cpudata->min_freq = min_freq;
+ cpudata->nominal_freq = nominal_freq;
+ cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
+
+ return 0;
+}
+
+static struct amd_pstate_perf_funcs pstate_funcs = {
+ .enable = pstate_enable,
+ .init_perf = pstate_init_perf,
+ .update_perf = pstate_update_perf,
+};
+
+static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
+{
+ int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
+ unsigned int cpu = policy->cpu;
+ struct device *dev;
+ struct amd_cpudata *cpudata;
+
+ dev = get_cpu_device(policy->cpu);
+ if (!dev)
+ return -ENODEV;
+
+ cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
+ if (!cpudata)
+ return -ENOMEM;
+
+ cpudata->cpu = cpu;
+ cpudata->policy = policy;
+
+ ret = amd_pstate_init_perf(cpudata);
+ if (ret)
+ goto free_cpudata1;
+
+ min_freq = amd_get_min_freq(cpudata);
+ max_freq = amd_get_max_freq(cpudata);
+ nominal_freq = amd_get_nominal_freq(cpudata);
+ lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
+
+ if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
+ dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
+ min_freq, max_freq);
+ ret = -EINVAL;
+ goto free_cpudata1;
+ }
+
+ policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
+ policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
+
+ policy->min = min_freq;
+ policy->max = max_freq;
+
+ policy->cpuinfo.min_freq = min_freq;
+ policy->cpuinfo.max_freq = max_freq;
+
+ /* It will be updated by governor */
+ policy->cur = policy->cpuinfo.min_freq;
+
+ ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
+ FREQ_QOS_MIN, policy->cpuinfo.min_freq);
+ if (ret < 0) {
+ dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
+ goto free_cpudata1;
+ }
+
+ ret = freq_qos_add_request(&policy->constraints, &cpudata->req[1],
+ FREQ_QOS_MAX, policy->cpuinfo.max_freq);
+ if (ret < 0) {
+ dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
+ goto free_cpudata2;
+ }
+
+ ret = amd_pstate_init_freqs_in_cpudata(cpudata, max_freq, min_freq,
+ nominal_freq,
+ lowest_nonlinear_freq);
+ if (ret) {
+ dev_err(dev, "Failed to init cpudata (%d)\n", ret);
+ goto free_cpudata3;
+ }
+
+ policy->driver_data = cpudata;
+
+ return 0;
+
+free_cpudata3:
+ freq_qos_remove_request(&cpudata->req[1]);
+free_cpudata2:
+ freq_qos_remove_request(&cpudata->req[0]);
+free_cpudata1:
+ kfree(cpudata);
+ return ret;
+}
+
+static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
+{
+ struct amd_cpudata *cpudata;
+
+ cpudata = policy->driver_data;
+
+ freq_qos_remove_request(&cpudata->req[1]);
+ freq_qos_remove_request(&cpudata->req[0]);
+ kfree(cpudata);
+
+ return 0;
+}
+
+static struct cpufreq_driver amd_pstate_driver = {
+ .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
+ .verify = amd_pstate_verify,
+ .target = amd_pstate_target,
+ .init = amd_pstate_cpu_init,
+ .exit = amd_pstate_cpu_exit,
+ .name = "amd-pstate",
+};
+
+static int __init amd_pstate_init(void)
+{
+ int ret;
+ struct amd_pstate_perf_funcs *funcs;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return -ENODEV;
+
+ if (!acpi_cpc_valid()) {
+ pr_debug("%s, the _CPC object is not present in SBIOS\n",
+ __func__);
+ return -ENODEV;
+ }
+
+ /* don't keep reloading if cpufreq_driver exists */
+ if (cpufreq_get_current_driver())
+ return -EEXIST;
+
+ /* capability check */
+ if (!boot_cpu_has(X86_FEATURE_AMD_CPPC_EXT)) {
+ pr_debug("%s, AMD CPPC extension functionality is supported\n",
+ __func__);
+ return -ENODEV;
+ }
+
+ funcs = &pstate_funcs;
+
+ /* enable amd pstate feature */
+ ret = amd_pstate_enable(funcs, true);
+ if (ret) {
+ pr_err("%s, failed to enable amd-pstate with return %d\n",
+ __func__, ret);
+ return ret;
+ }
+
+ amd_pstate_driver.driver_data = funcs;
+
+ ret = cpufreq_register_driver(&amd_pstate_driver);
+ if (ret) {
+ pr_err("%s, return %d\n", __func__, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __exit amd_pstate_exit(void)
+{
+ struct amd_pstate_perf_funcs *funcs;
+
+ funcs = cpufreq_get_driver_data();
+
+ cpufreq_unregister_driver(&amd_pstate_driver);
+
+ amd_pstate_enable(funcs, false);
+}
+
+module_init(amd_pstate_init);
+module_exit(amd_pstate_exit);
+
+MODULE_AUTHOR("Huang Rui <[email protected]>");
+MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
+MODULE_LICENSE("GPL");
--
2.25.1

2021-09-08 15:05:05

by Huang Rui

[permalink] [raw]
Subject: [PATCH 11/19] cpufreq: amd: add amd-pstate performance attributes

Introduce sysfs attributes to get the different level amd-pstate
performances.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 66 ++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 3c727a22cb69..9c60388d45ed 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -647,6 +647,62 @@ static ssize_t show_amd_pstate_min_freq(struct cpufreq_policy *policy, char *buf
return ret;
}

+static ssize_t
+show_amd_pstate_highest_perf(struct cpufreq_policy *policy, char *buf)
+{
+ int ret = 0;
+ u32 perf;
+ struct amd_cpudata *cpudata = policy->driver_data;
+
+ perf = READ_ONCE(cpudata->highest_perf);
+
+ ret += sprintf(&buf[ret], "%u\n", perf);
+
+ return ret;
+}
+
+static ssize_t
+show_amd_pstate_nominal_perf(struct cpufreq_policy *policy, char *buf)
+{
+ int ret = 0;
+ u32 perf;
+ struct amd_cpudata *cpudata = policy->driver_data;
+
+ perf = READ_ONCE(cpudata->nominal_perf);
+
+ ret += sprintf(&buf[ret], "%u\n", perf);
+
+ return ret;
+}
+
+static ssize_t
+show_amd_pstate_lowest_nonlinear_perf(struct cpufreq_policy *policy, char *buf)
+{
+ int ret = 0;
+ u32 perf;
+ struct amd_cpudata *cpudata = policy->driver_data;
+
+ perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+
+ ret += sprintf(&buf[ret], "%u\n", perf);
+
+ return ret;
+}
+
+static ssize_t
+show_amd_pstate_lowest_perf(struct cpufreq_policy *policy, char *buf)
+{
+ int ret = 0;
+ u32 perf;
+ struct amd_cpudata *cpudata = policy->driver_data;
+
+ perf = READ_ONCE(cpudata->lowest_perf);
+
+ ret += sprintf(&buf[ret], "%u\n", perf);
+
+ return ret;
+}
+
static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
char *buf)
{
@@ -654,17 +710,27 @@ static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
}

cpufreq_freq_attr_ro(is_amd_pstate_enabled);
+
cpufreq_freq_attr_ro(amd_pstate_max_freq);
cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
cpufreq_freq_attr_ro(amd_pstate_min_freq);

+cpufreq_freq_attr_ro(amd_pstate_highest_perf);
+cpufreq_freq_attr_ro(amd_pstate_nominal_perf);
+cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_perf);
+cpufreq_freq_attr_ro(amd_pstate_lowest_perf);
+
static struct freq_attr *amd_pstate_attr[] = {
&is_amd_pstate_enabled,
&amd_pstate_max_freq,
&amd_pstate_nominal_freq,
&amd_pstate_lowest_nonlinear_freq,
&amd_pstate_min_freq,
+ &amd_pstate_highest_perf,
+ &amd_pstate_nominal_perf,
+ &amd_pstate_lowest_nonlinear_perf,
+ &amd_pstate_lowest_perf,
NULL,
};

--
2.25.1

2021-09-08 15:05:32

by Huang Rui

[permalink] [raw]
Subject: [PATCH 07/19] cpufreq: amd: add trace for amd-pstate module

Add trace event to monitor the performance value changes which is
controlled by cpu governors.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/Makefile | 6 +-
drivers/cpufreq/amd-pstate-trace.c | 2 +
drivers/cpufreq/amd-pstate-trace.h | 96 ++++++++++++++++++++++++++++++
drivers/cpufreq/amd-pstate.c | 19 ++++--
4 files changed, 118 insertions(+), 5 deletions(-)
create mode 100644 drivers/cpufreq/amd-pstate-trace.c
create mode 100644 drivers/cpufreq/amd-pstate-trace.h

diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 3d4bd7141cf8..c1909475eaf9 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -17,6 +17,10 @@ obj-$(CONFIG_CPU_FREQ_GOV_ATTR_SET) += cpufreq_governor_attr_set.o
obj-$(CONFIG_CPUFREQ_DT) += cpufreq-dt.o
obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o

+# Traces
+CFLAGS_amd-pstate-trace.o := -I$(src)
+amd_pstate-y := amd-pstate.o amd-pstate-trace.o
+
##################################################################################
# x86 drivers.
# Link order matters. K8 is preferred to ACPI because of firmware bugs in early
@@ -24,7 +28,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o
# powernow-k8 can load then. ACPI is preferred to all other hardware-specific drivers.
# speedstep-* is preferred over p4-clockmod.

-obj-$(CONFIG_X86_AMD_PSTATE) += amd-pstate.o
+obj-$(CONFIG_X86_AMD_PSTATE) += amd_pstate.o
obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o
obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o
diff --git a/drivers/cpufreq/amd-pstate-trace.c b/drivers/cpufreq/amd-pstate-trace.c
new file mode 100644
index 000000000000..891b696dcd69
--- /dev/null
+++ b/drivers/cpufreq/amd-pstate-trace.c
@@ -0,0 +1,2 @@
+#define CREATE_TRACE_POINTS
+#include "amd-pstate-trace.h"
diff --git a/drivers/cpufreq/amd-pstate-trace.h b/drivers/cpufreq/amd-pstate-trace.h
new file mode 100644
index 000000000000..50c85e150f30
--- /dev/null
+++ b/drivers/cpufreq/amd-pstate-trace.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * amd-pstate-trace.h - AMD Processor P-state Frequency Driver Tracer
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Author: Huang Rui <[email protected]>
+ */
+
+#if !defined(_AMD_PSTATE_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _AMD_PSTATE_TRACE_H
+
+#include <linux/cpufreq.h>
+#include <linux/tracepoint.h>
+#include <linux/trace_events.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM amd_cpu
+
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE amd-pstate-trace
+
+#define TPS(x) tracepoint_string(x)
+
+TRACE_EVENT(amd_pstate_perf,
+
+ TP_PROTO(unsigned long min_perf,
+ unsigned long target_perf,
+ unsigned long capacity,
+ unsigned int cpu_id,
+ u64 prev,
+ u64 value,
+ int type
+ ),
+
+ TP_ARGS(min_perf,
+ target_perf,
+ capacity,
+ cpu_id,
+ prev,
+ value,
+ type
+ ),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, min_perf)
+ __field(unsigned long, target_perf)
+ __field(unsigned long, capacity)
+ __field(unsigned int, cpu_id)
+ __field(u64, prev)
+ __field(u64, value)
+ __field(int, type)
+ ),
+
+ TP_fast_assign(
+ __entry->min_perf = min_perf;
+ __entry->target_perf = target_perf;
+ __entry->capacity = capacity;
+ __entry->cpu_id = cpu_id;
+ __entry->prev = prev;
+ __entry->value = value;
+ __entry->type = type;
+ ),
+
+ TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u prev=0x%llx value=0x%llx type=0x%d",
+ (unsigned long)__entry->min_perf,
+ (unsigned long)__entry->target_perf,
+ (unsigned long)__entry->capacity,
+ (unsigned int)__entry->cpu_id,
+ (u64)__entry->prev,
+ (u64)__entry->value,
+ (int)__entry->type
+ )
+);
+
+#endif /* _AMD_PSTATE_TRACE_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+
+#include <trace/define_trace.h>
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index a46cd5dd9f7c..ea965a122431 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -44,10 +44,18 @@
#include <asm/processor.h>
#include <asm/cpufeature.h>
#include <asm/cpu_device_id.h>
+#include "amd-pstate-trace.h"

#define AMD_PSTATE_TRANSITION_LATENCY 0x20000
#define AMD_PSTATE_TRANSITION_DELAY 500

+enum switch_type
+{
+ AMD_TARGET = 0,
+ AMD_ADJUST_PERF,
+ AMD_FAST_SWITCH,
+};
+
static struct cpufreq_driver amd_pstate_driver;

struct amd_cpudata {
@@ -195,7 +203,8 @@ amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,

static int
amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
- u32 des_perf, u32 max_perf, bool fast_switch)
+ u32 des_perf, u32 max_perf, bool fast_switch,
+ enum switch_type type)
{
u64 prev = READ_ONCE(cpudata->cppc_req_cached);
u64 value = prev;
@@ -209,6 +218,8 @@ amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
value &= ~REQ_MAX_PERF(~0L);
value |= REQ_MAX_PERF(max_perf);

+ trace_amd_pstate_perf(min_perf, des_perf, max_perf,
+ cpudata->cpu, prev, value, type);
if (value == prev)
return 0;

@@ -250,7 +261,7 @@ static int amd_pstate_target(struct cpufreq_policy *policy,

cpufreq_freq_transition_begin(policy, &freqs);
ret = amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
- amd_max_perf, false);
+ amd_max_perf, false, AMD_TARGET);
cpufreq_freq_transition_end(policy, &freqs, false);

return ret;
@@ -288,7 +299,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
amd_min_perf, amd_max_perf);

amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
- amd_max_perf, true);
+ amd_max_perf, true, AMD_ADJUST_PERF);
}

static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
@@ -308,7 +319,7 @@ static unsigned int amd_pstate_fast_switch(struct cpufreq_policy *policy,
cpudata->max_freq);

amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
- amd_max_perf, true);
+ amd_max_perf, true, AMD_FAST_SWITCH);

nominal_perf = READ_ONCE(cpudata->nominal_perf);
ratio = div_u64(amd_des_perf << SCHED_CAPACITY_SHIFT, nominal_perf);
--
2.25.1

2021-09-08 15:05:43

by Huang Rui

[permalink] [raw]
Subject: [PATCH 15/19] cpupower: add amd-pstate sysfs entries into libcpufreq

These amd-pstate sysfs entries will be used on cpupower for amd-pstate
kernel module.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/lib/cpufreq.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
index c3b56db8b921..3f92ddadaad2 100644
--- a/tools/power/cpupower/lib/cpufreq.c
+++ b/tools/power/cpupower/lib/cpufreq.c
@@ -69,6 +69,14 @@ enum cpufreq_value {
SCALING_MIN_FREQ,
SCALING_MAX_FREQ,
STATS_NUM_TRANSITIONS,
+ AMD_PSTATE_HIGHEST_PERF,
+ AMD_PSTATE_NOMINAL_PERF,
+ AMD_PSTATE_LOWEST_NONLINEAR_PERF,
+ AMD_PSTATE_LOWEST_PERF,
+ AMD_PSTATE_MAX_FREQ,
+ AMD_PSTATE_NOMINAL_FREQ,
+ AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
+ AMD_PSTATE_MIN_FREQ,
MAX_CPUFREQ_VALUE_READ_FILES
};

@@ -80,7 +88,15 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = {
[SCALING_CUR_FREQ] = "scaling_cur_freq",
[SCALING_MIN_FREQ] = "scaling_min_freq",
[SCALING_MAX_FREQ] = "scaling_max_freq",
- [STATS_NUM_TRANSITIONS] = "stats/total_trans"
+ [STATS_NUM_TRANSITIONS] = "stats/total_trans",
+ [AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
+ [AMD_PSTATE_NOMINAL_PERF] = "amd_pstate_nominal_perf",
+ [AMD_PSTATE_LOWEST_NONLINEAR_PERF] = "amd_pstate_lowest_nonlinear_perf",
+ [AMD_PSTATE_LOWEST_PERF] = "amd_pstate_lowest_perf",
+ [AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
+ [AMD_PSTATE_NOMINAL_FREQ] = "amd_pstate_nominal_freq",
+ [AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
+ [AMD_PSTATE_MIN_FREQ] = "amd_pstate_min_freq"
};


--
2.25.1

2021-09-08 15:06:16

by Huang Rui

[permalink] [raw]
Subject: [PATCH 19/19] Documentation: amd-pstate: add amd-pstate driver introduction

Introduce the amd-pstate driver design and implementation.

Signed-off-by: Huang Rui <[email protected]>
---
Documentation/admin-guide/pm/amd_pstate.rst | 377 ++++++++++++++++++
.../admin-guide/pm/working-state.rst | 1 +
2 files changed, 378 insertions(+)
create mode 100644 Documentation/admin-guide/pm/amd_pstate.rst

diff --git a/Documentation/admin-guide/pm/amd_pstate.rst b/Documentation/admin-guide/pm/amd_pstate.rst
new file mode 100644
index 000000000000..c3659dde0cee
--- /dev/null
+++ b/Documentation/admin-guide/pm/amd_pstate.rst
@@ -0,0 +1,377 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===============================================
+``amd-pstate`` CPU Performance Scaling Driver
+===============================================
+
+:Copyright: |copy| 2021 Advanced Micro Devices, Inc.
+
+:Author: Huang Rui <[email protected]>
+
+
+Introduction
+===================
+
+``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
+new CPU frequency control mechanism on modern AMD APU and CPU series in
+Linux kernel. The new mechanism is based on Collaborative Processor
+Performance Control (CPPC) which provides finer grain frequency management
+than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
+the ACPI P-states driver to manage CPU frequency and clocks with switching
+only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a
+flexible, low-latency interface for the Linux kernel to directly
+communicate the performance hints to hardware.
+
+``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
+``ondemand``, etc. to manage the performance hints which are provided by
+CPPC hardware functionality that internally follows the hardware
+specification (for details refer to AMD64 Architecture Programmer's Manual
+Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic
+frequency control function according to kernel governors on some of the
+Zen2 and Zen3 processors, and we will implement more AMD specific functions
+in future after we verify them on the hardware and SBIOS.
+
+
+AMD CPPC Overview
+=======================
+
+Collaborative Processor Performance Control (CPPC) interface enumerates a
+continuous, abstract, and unit-less performance value in a scale that is
+not tied to a specific performance state / frequency. This is an ACPI
+standard [2]_ which software can specify application performance goals and
+hints as a relative target to the infrastructure limits. AMD processors
+provides the low latency register model (MSR) instead of AML code
+interpreter for performance adjustments. ``amd-pstate`` will initialize a
+``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks
+to manage each performance update behavior. ::
+
+ Highest Perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | | |
+ | | Max Perf ---->| |
+ | | | |
+ | | | |
+ Nominal Perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | Desired Perf ---->| |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ Lowest non- | | | |
+ linear perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | Lowest perf ---->| |
+ | | | |
+ Lowest perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | | |
+ | | | |
+ 0 ------>+-----------------------+ +-----------------------+
+
+ AMD P-States Performance Scale
+
+
+.. _perf_cap:
+
+AMD CPPC Performance Capability
+--------------------------------
+
+Highest Performance (RO)
+.........................
+
+It is the absolute maximum performance an individual processor may reach,
+assuming ideal conditions. This performance level may not be sustainable
+for long durations and may only be achievable if other platform components
+are in a specific state; for example, it may require other processors be in
+an idle state. This would be equivalent to the highest frequencies
+supported by the processor.
+
+Nominal (Guaranteed) Performance (RO)
+......................................
+
+It is the maximum sustained performance level of the processor, assuming
+ideal operating conditions. In absence of an external constraint (power,
+thermal, etc.) this is the performance level the processor is expected to
+be able to maintain continuously. All cores/processors are expected to be
+able to sustain their nominal performance state simultaneously.
+
+Lowest non-linear Performance (RO)
+...................................
+
+It is the lowest performance level at which nonlinear power savings are
+achieved, for example, due to the combined effects of voltage and frequency
+scaling. Above this threshold, lower performance levels should be generally
+more energy efficient than higher performance levels. This register
+effectively conveys the most efficient performance level to ``amd-pstate``.
+
+Lowest Performance (RO)
+........................
+
+It is the absolute lowest performance level of the processor. Selecting a
+performance level lower than the lowest nonlinear performance level may
+cause an efficiency penalty but should reduce the instantaneous power
+consumption of the processor.
+
+AMD CPPC Performance Control
+------------------------------
+
+``amd-pstate`` passes performance goals through these registers. The
+register drives the behavior of the desired performance target.
+
+Minimum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies the minimum allowed performance level.
+
+Maximum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies a limit the maximum performance that is expected
+to be supplied by the hardware.
+
+Desired performance target (RW)
+...................................
+
+``amd-pstate`` specifies a desired target in the CPPC performance scale as
+a relative number. This can be expressed as percentage of nominal
+performance (infrastructure max). Below the nominal sustained performance
+level, desired performance expresses the average performance level of the
+processor subject to hardware. Above the nominal performance level,
+processor must provide at least nominal performance requested and go higher
+if current operating conditions allow.
+
+Energy Performance Preference (EPP) (RW)
+.........................................
+
+Provides a hint to the hardware if software wants to bias toward performance
+(0x0) or energy efficiency (0xff).
+
+
+Key Governors Support
+=======================
+
+``amd-pstate`` can be used with all the (generic) scaling governors listed
+by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
+it is responsible for the configuration of policy objects corresponding to
+CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
+to the policy objects) with accurate information on the maximum and minimum
+operating frequencies supported by the hardware. Users can check the
+``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
+
+``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
+frequency control. It is to fine tune the processor configuration on
+``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
+registers adjust_perf callback to implement the CPPC similar performance
+update behavior. It is initialized by ``sugov_start`` and then populate the
+CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as
+the utilization update callback function in CPU scheduler. CPU scheduler
+will call ``cpufreq_update_util`` and assign the target performance
+according to the ``struct sugov_cpu`` that utilization update belongs to.
+Then ``amd-pstate`` updates the desired performance according to the CPU
+scheduler assigned.
+
+
+Processor Support
+=======================
+
+The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is
+not existed at the detected processor, and it uses ``acpi_cpc_valid`` to
+check the _CPC existence. All Zen based processors support legacy ACPI
+hardware P-States function, so while the ``amd-pstate`` fails to be
+initialized, the kernel will fall back to initialize ``acpi-cpufreq``
+driver.
+
+There are two types of hardware implementations for ``amd-pstate``: one is
+`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
+<perf_cap_>`_. It can use :c:macro:`X86_FEATURE_AMD_CPPC_EXT` feature flag
+(for details refer to Processor Programming Reference (PPR) for AMD Family
+19h Model 21h, Revision B0 Processors [3]_) to indicate the different
+types. ``amd-pstate`` is to register different ``amd_pstate_perf_funcs``
+instances for different hardware implementations.
+
+Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the
+future, it will be supported on more and more AMD processors.
+
+Full MSR Support
+-----------------
+
+Some new Zen3 processors such as Cezanne provide the MSR registers directly
+while the :c:macro:`X86_FEATURE_AMD_CPPC_EXT` CPU feature flag is set.
+``amd-pstate`` can handle the MSR register to implement the fast switch
+function in ``CPUFreq`` that can shrink latency of frequency control on the
+interrupt context.
+
+Shared Memory Support
+----------------------
+
+If :c:macro:`X86_FEATURE_AMD_CPPC_EXT` CPU feature flag is not set, that
+means the processor supports shared memory solution. In this case,
+``amd-pstate`` uses the ``cppc_acpi`` helper methods to implement the
+callback functions of ``amd_pstate_perf_funcs``.
+
+
+AMD P-States and ACPI hardware P-States always can be supported in one
+processor. But AMD P-States has the higher priority and if it is enabled
+with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
+to the request from AMD P-States.
+
+
+User Space Interface in ``sysfs``
+==================================
+
+``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
+control its functionality at the system level. They located in the
+``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
+
+ root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_min_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_nominal_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_nominal_perf
+ /sys/devices/system/cpu/cpufreq/policy0/is_amd_pstate_enabled
+
+
+``is_amd_pstate_enabled``
+
+Query whether current kernel loads ``amd-pstate`` to enable the AMD
+P-States functionality.
+This attribute is read-only.
+
+``amd_pstate_highest_perf / amd_pstate_max_freq``
+
+Maximum CPPC performance and CPU frequency that the driver is allowed to
+set in percent of the maximum supported CPPC performance level (the highest
+performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
+This attribute is read-only.
+
+``amd_pstate_nominal_perf / amd_pstate_nominal_freq``
+
+Nominal CPPC performance and CPU frequency that the driver is allowed to
+set in percent of the maximum supported CPPC performance level (Please see
+nominal performance in `AMD CPPC Performance Capability <perf_cap_>`_).
+This attribute is read-only.
+
+``amd_pstate_lowest_nonlinear_perf / amd_pstate_lowest_nonlinear_freq``
+
+The lowest non-linear CPPC performance and CPU frequency that the driver is
+allowed to set in percent of the maximum supported CPPC performance level
+(Please see the lowest non-linear performance in `AMD CPPC Performance
+Capability <perf_cap_>`_).
+This attribute is read-only.
+
+``amd_pstate_lowest_perf / amd_pstate_min_freq``
+
+The lowest physical CPPC performance and CPU frequency.
+This attribute is read-only.
+
+
+``amd-pstate`` vs ``acpi-cpufreq``
+======================================
+
+On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
+provided by the platform firmware used for CPU performance scaling, but
+only provides 3 P-states on AMD processors.
+However, on modern AMD APU and CPU series, it provides the collaborative
+processor performance control according to ACPI protocol and customize this
+for AMD platforms. That is fine-grain and continuous frequency range
+instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
+module which supports the new AMD P-States mechanism on most of future AMD
+platforms. The AMD P-States mechanism will be the more performance and energy
+efficiency frequency management method on AMD processors.
+
+``cpupower`` tool support for ``amd-pstate``
+===============================================
+
+``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency
+information. And it is in progress to support more and more operations for new
+``amd-pstate`` module with this tool. ::
+
+ root@hr-test1:/home/ray# cpupower frequency-info
+ analyzing CPU 0:
+ driver: amd-pstate
+ CPUs which run at the same hardware frequency: 0
+ CPUs which need to have their frequency coordinated by software: 0
+ maximum transition latency: 131 us
+ hardware limits: 400 MHz - 4.68 GHz
+ available cpufreq governors: ondemand conservative powersave userspace performance schedutil
+ current policy: frequency should be within 400 MHz and 4.68 GHz.
+ The governor "schedutil" may decide which speed to use
+ within this range.
+ current CPU frequency: Unable to call hardware
+ current CPU frequency: 4.02 GHz (asserted by call to kernel)
+ boost state support:
+ Supported: yes
+ Active: yes
+ AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
+ AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
+ AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
+ AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
+
+
+Diagnostics and Tuning
+=======================
+
+Trace Events
+--------------
+
+There are two static trace events that can be used for ``amd-pstate``
+diagnostics. One of them is the cpu_frequency trace event generally used
+by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
+specific to ``amd-pstate``. The following sequence of shell commands can
+be used to enable them and see their output (if the kernel is generally
+configured to support event tracing). ::
+
+ root@hr-test1:/home/ray# cd /sys/kernel/tracing/
+ root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
+ root@hr-test1:/sys/kernel/tracing# cat trace
+ # tracer: nop
+ #
+ # entries-in-buffer/entries-written: 47827/42233061 #P:2
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / delay
+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION
+ # | | | |||| | |
+ <idle>-0 [000] d.s. 244057.464842: amd_pstate_perf: amd_min_perf=39 amd_des_perf=39 amd_max_perf=166 cpu_id=0 prev=0x2727a6 value=0x2727a6
+ <idle>-0 [000] d.h. 244057.475436: amd_pstate_perf: amd_min_perf=39 amd_des_perf=39 amd_max_perf=166 cpu_id=0 prev=0x2727a6 value=0x2727a6
+ <idle>-0 [000] d.h. 244057.476629: amd_pstate_perf: amd_min_perf=39 amd_des_perf=39 amd_max_perf=166 cpu_id=0 prev=0x2727a6 value=0x2727a6
+ <idle>-0 [000] d.s. 244057.484847: amd_pstate_perf: amd_min_perf=39 amd_des_perf=39 amd_max_perf=166 cpu_id=0 prev=0x2727a6 value=0x2727a6
+ <idle>-0 [000] d.h. 244057.499821: amd_pstate_perf: amd_min_perf=39 amd_des_perf=39 amd_max_perf=166 cpu_id=0 prev=0x2727a6 value=0x2727a6
+ avahi-daemon-528 [000] d... 244057.513568: amd_pstate_perf: amd_min_perf=39 amd_des_perf=39 amd_max_perf=166 cpu_id=0 prev=0x2727a6 value=0x2727a6
+
+The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling
+governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
+policies with other scaling governors).
+
+
+Reference
+===========
+
+.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
+ https://www.amd.com/system/files/TechDocs/24593.pdf
+
+.. [2] Advanced Configuration and Power Interface Specification,
+ https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
+
+.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 21h, Revision B0 Processors
+ https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip
+
diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst
index f40994c422dc..28db6156b55d 100644
--- a/Documentation/admin-guide/pm/working-state.rst
+++ b/Documentation/admin-guide/pm/working-state.rst
@@ -11,6 +11,7 @@ Working-State Power Management
intel_idle
cpufreq
intel_pstate
+ amd_pstate
cpufreq_drivers
intel_epb
intel-speed-select
--
2.25.1

2021-09-08 15:06:53

by Huang Rui

[permalink] [raw]
Subject: [PATCH 08/19] cpufreq: amd: add boost mode support for amd-pstate

If the sbios supports the boost mode of amd-pstate, let's switch to
boost enabled by default.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 50 ++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ea965a122431..67a9a117f524 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -75,6 +75,8 @@ struct amd_cpudata {
u32 min_freq;
u32 nominal_freq;
u32 lowest_nonlinear_freq;
+
+ bool boost_supported;
};

struct amd_pstate_perf_funcs {
@@ -229,6 +231,19 @@ amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
max_perf, fast_switch);
}

+static bool amd_pstate_boost_supported(struct amd_cpudata *cpudata)
+{
+ u32 highest_perf, nominal_perf;
+
+ highest_perf = READ_ONCE(cpudata->highest_perf);
+ nominal_perf = READ_ONCE(cpudata->nominal_perf);
+
+ if (highest_perf > nominal_perf)
+ return true;
+
+ return false;
+}
+
static int amd_pstate_verify(struct cpufreq_policy_data *policy)
{
cpufreq_verify_within_cpu_limits(policy);
@@ -402,6 +417,37 @@ static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
return lowest_nonlinear_freq * 1000;
}

+static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state)
+{
+ struct amd_cpudata *cpudata = policy->driver_data;
+ int ret;
+
+ if (!cpudata->boost_supported) {
+ pr_err("Boost mode is not supported by this processor or SBIOS\n");
+ return -EINVAL;
+ }
+
+ if (state)
+ policy->cpuinfo.max_freq = cpudata->max_freq;
+ else
+ policy->cpuinfo.max_freq = cpudata->nominal_freq;
+
+ policy->max = policy->cpuinfo.max_freq;
+
+ ret = freq_qos_update_request(&cpudata->req[1],
+ policy->cpuinfo.max_freq);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
+{
+ cpudata->boost_supported = true;
+ amd_pstate_driver.boost_enabled = true;
+}
+
static int amd_pstate_init_freqs_in_cpudata(struct amd_cpudata *cpudata,
u32 max_freq, u32 min_freq,
u32 nominal_freq,
@@ -504,6 +550,9 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)

policy->driver_data = cpudata;

+ if (amd_pstate_boost_supported(cpudata))
+ amd_pstate_boost_init(cpudata);
+
return 0;

free_cpudata3:
@@ -535,6 +584,7 @@ static struct cpufreq_driver amd_pstate_driver = {
.fast_switch = amd_pstate_fast_switch,
.init = amd_pstate_cpu_init,
.exit = amd_pstate_cpu_exit,
+ .set_boost = amd_pstate_set_boost,
.name = "amd-pstate",
};

--
2.25.1

2021-09-08 15:06:54

by Huang Rui

[permalink] [raw]
Subject: [PATCH 16/19] cpupower: enable boost state support for amd-pstate module

The AMD P-state boost API is different from ACPI hardware P-states, so
implement the support for amd-pstate kernel module.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/lib/cpufreq.c | 20 ++++++++++++++++++++
tools/power/cpupower/lib/cpufreq.h | 3 +++
tools/power/cpupower/utils/helpers/misc.c | 7 +++++++
3 files changed, 30 insertions(+)

diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
index 3f92ddadaad2..37da87bdcfb1 100644
--- a/tools/power/cpupower/lib/cpufreq.c
+++ b/tools/power/cpupower/lib/cpufreq.c
@@ -790,3 +790,23 @@ unsigned long cpufreq_get_transitions(unsigned int cpu)
{
return sysfs_cpufreq_get_one_value(cpu, STATS_NUM_TRANSITIONS);
}
+
+int amd_pstate_boost_support(unsigned int cpu)
+{
+ unsigned int highest_perf, nominal_perf;
+
+ highest_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_HIGHEST_PERF);
+ nominal_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_NOMINAL_PERF);
+
+ return highest_perf > nominal_perf ? 1 : 0;
+}
+
+int amd_pstate_boost_enabled(unsigned int cpu)
+{
+ unsigned int cpuinfo_max, amd_pstate_max;
+
+ cpuinfo_max = sysfs_cpufreq_get_one_value(cpu, CPUINFO_MAX_FREQ);
+ amd_pstate_max = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_MAX_FREQ);
+
+ return cpuinfo_max == amd_pstate_max ? 1 : 0;
+}
diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
index 95f4fd9e2656..d54d02a7a4f4 100644
--- a/tools/power/cpupower/lib/cpufreq.h
+++ b/tools/power/cpupower/lib/cpufreq.h
@@ -203,6 +203,9 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
int cpufreq_set_frequency(unsigned int cpu,
unsigned long target_frequency);

+int amd_pstate_boost_support(unsigned int cpu);
+int amd_pstate_boost_enabled(unsigned int cpu);
+
#ifdef __cplusplus
}
#endif
diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
index 07d80775fb68..aba979320760 100644
--- a/tools/power/cpupower/utils/helpers/misc.c
+++ b/tools/power/cpupower/utils/helpers/misc.c
@@ -10,6 +10,7 @@
#if defined(__i386__) || defined(__x86_64__)

#include "cpupower_intern.h"
+#include "cpufreq.h"

#define MSR_AMD_HWCR 0xc0010015

@@ -39,6 +40,12 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
if (ret)
return ret;
}
+ } if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&
+ amd_pstate_boost_support(cpu)) {
+ *support = 1;
+
+ if (amd_pstate_boost_enabled(cpu))
+ *active = 1;
} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
*support = *active = 1;
return 0;
--
2.25.1

2021-09-08 15:07:31

by Huang Rui

[permalink] [raw]
Subject: [PATCH 09/19] cpufreq: amd: add amd-pstate checking support check attribute

The amd-pstate hardware support check will be needed by cpupower to know
whether amd-pstate is enabled and supported.

Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 67a9a117f524..48dedd5af101 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -577,6 +577,19 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
return 0;
}

+static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
+ char *buf)
+{
+ return sprintf(&buf[0], "%d\n", acpi_cpc_valid() ? 1 : 0);
+}
+
+cpufreq_freq_attr_ro(is_amd_pstate_enabled);
+
+static struct freq_attr *amd_pstate_attr[] = {
+ &is_amd_pstate_enabled,
+ NULL,
+};
+
static struct cpufreq_driver amd_pstate_driver = {
.flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
.verify = amd_pstate_verify,
@@ -586,6 +599,7 @@ static struct cpufreq_driver amd_pstate_driver = {
.exit = amd_pstate_cpu_exit,
.set_boost = amd_pstate_set_boost,
.name = "amd-pstate",
+ .attr = amd_pstate_attr,
};

static int __init amd_pstate_init(void)
--
2.25.1

2021-09-08 17:34:16

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 16/19] cpupower: enable boost state support for amd-pstate module

On 9/8/2021 9:59 AM, Huang Rui wrote:
> The AMD P-state boost API is different from ACPI hardware P-states, so
> implement the support for amd-pstate kernel module.
>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/lib/cpufreq.c | 20 ++++++++++++++++++++
> tools/power/cpupower/lib/cpufreq.h | 3 +++
> tools/power/cpupower/utils/helpers/misc.c | 7 +++++++
> 3 files changed, 30 insertions(+)
>
> diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> index 3f92ddadaad2..37da87bdcfb1 100644
> --- a/tools/power/cpupower/lib/cpufreq.c
> +++ b/tools/power/cpupower/lib/cpufreq.c
> @@ -790,3 +790,23 @@ unsigned long cpufreq_get_transitions(unsigned int cpu)
> {
> return sysfs_cpufreq_get_one_value(cpu, STATS_NUM_TRANSITIONS);
> }
> +
> +int amd_pstate_boost_support(unsigned int cpu)
> +{
> + unsigned int highest_perf, nominal_perf;
> +
> + highest_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_HIGHEST_PERF);
> + nominal_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_NOMINAL_PERF);
> +
> + return highest_perf > nominal_perf ? 1 : 0;
> +}
> +
> +int amd_pstate_boost_enabled(unsigned int cpu)
> +{
> + unsigned int cpuinfo_max, amd_pstate_max;
> +
> + cpuinfo_max = sysfs_cpufreq_get_one_value(cpu, CPUINFO_MAX_FREQ);
> + amd_pstate_max = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_MAX_FREQ);
> +
> + return cpuinfo_max == amd_pstate_max ? 1 : 0;
> +}
> diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
> index 95f4fd9e2656..d54d02a7a4f4 100644
> --- a/tools/power/cpupower/lib/cpufreq.h
> +++ b/tools/power/cpupower/lib/cpufreq.h
> @@ -203,6 +203,9 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
> int cpufreq_set_frequency(unsigned int cpu,
> unsigned long target_frequency);
>
> +int amd_pstate_boost_support(unsigned int cpu);
> +int amd_pstate_boost_enabled(unsigned int cpu);
> +
> #ifdef __cplusplus
> }
> #endif
> diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> index 07d80775fb68..aba979320760 100644
> --- a/tools/power/cpupower/utils/helpers/misc.c
> +++ b/tools/power/cpupower/utils/helpers/misc.c
> @@ -10,6 +10,7 @@
> #if defined(__i386__) || defined(__x86_64__)
>
> #include "cpupower_intern.h"
> +#include "cpufreq.h"
>
> #define MSR_AMD_HWCR 0xc0010015
>
> @@ -39,6 +40,12 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
> if (ret)
> return ret;
> }
> + } if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&

Shouldn't this be

} else if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&

-Nathan

> + amd_pstate_boost_support(cpu)) {
> + *support = 1;
> +
> + if (amd_pstate_boost_enabled(cpu))
> + *active = 1;
> } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
> *support = *active = 1;
> return 0;
> --
> 2.25.1
>

2021-09-08 17:45:14

by Huang Rui

[permalink] [raw]
Subject: [PATCH 03/19] ACPI: CPPC: add cppc enable register function

From: Jinzhou Su <[email protected]>

Export the cppc enable register function for future use.

Signed-off-by: Jinzhou Su <[email protected]>
Signed-off-by: Huang Rui <[email protected]>
---
drivers/acpi/cppc_acpi.c | 42 ++++++++++++++++++++++++++++++++++++++++
include/acpi/cppc_acpi.h | 5 +++++
2 files changed, 47 insertions(+)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index a4d4eebba1da..de4b30545215 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1220,6 +1220,48 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
}
EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);

+/**
+ * cppc_set_enable - Set to enable CPPC register.
+ * @cpu: CPU for which to enable CPPC register.
+ * @enable: enable field to write into share memory.
+ *
+ * Return: 0 for success, -ERRNO otherwise.
+ */
+int cppc_set_enable(int cpu, u32 enable)
+{
+ int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
+ struct cpc_register_resource *enable_reg;
+ struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+ struct cppc_pcc_data *pcc_ss_data = NULL;
+ int ret = -1;
+
+ if (!cpc_desc) {
+ pr_debug("No CPC descriptor for CPU:%d\n", cpu);
+ return -ENODEV;
+ }
+
+ enable_reg = &cpc_desc->cpc_regs[ENABLE];
+
+ if (CPC_IN_PCC(enable_reg)) {
+
+ if (pcc_ss_id < 0)
+ return -EIO;
+
+ ret = cpc_write(cpu, enable_reg, enable);
+ if (ret)
+ return ret;
+
+ pcc_ss_data = pcc_data[pcc_ss_id];
+
+ down_write(&pcc_ss_data->pcc_lock);
+ send_pcc_cmd(pcc_ss_id, CMD_WRITE);
+ up_write(&pcc_ss_data->pcc_lock);
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cppc_set_enable);
+
/**
* cppc_set_perf - Set a CPU's performance controls.
* @cpu: CPU for which to set performance controls.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 9f4985b4d64d..3fdae40a75fc 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -137,6 +137,7 @@ struct cppc_cpudata {
extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
+extern int cppc_set_enable(int cpu, u32 enable);
extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
extern bool acpi_cpc_valid(void);
extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
@@ -157,6 +158,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
{
return -ENOTSUPP;
}
+static inline int cppc_set_enable(int cpu, u32 enable)
+{
+ return -ENOTSUPP;
+}
static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
{
return -ENOTSUPP;
--
2.25.1

2021-09-08 17:45:15

by Huang Rui

[permalink] [raw]
Subject: [PATCH 06/19] cpufreq: amd: add acpi cppc function as the backend for legacy processors

In some old Zen based processors, they are using the shared memory that
exposed from ACPI SBIOS.

Signed-off-by: Jinzhou Su <[email protected]>
Signed-off-by: Huang Rui <[email protected]>
---
drivers/cpufreq/amd-pstate.c | 63 ++++++++++++++++++++++++++++++++----
1 file changed, 57 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 32b4f6d79783..a46cd5dd9f7c 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -82,6 +82,19 @@ static inline int pstate_enable(bool enable)
return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
}

+static int cppc_enable(bool enable)
+{
+ int cpu, ret = 0;
+
+ for_each_online_cpu(cpu) {
+ ret = cppc_set_enable(cpu, enable ? 1 : 0);
+ if (ret)
+ return ret;
+ }
+
+ return ret;
+}
+
static int
amd_pstate_enable(struct amd_pstate_perf_funcs *funcs, bool enable)
{
@@ -113,6 +126,24 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
return 0;
}

+static int cppc_init_perf(struct amd_cpudata *cpudata)
+{
+ struct cppc_perf_caps cppc_perf;
+
+ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+ if (ret)
+ return ret;
+
+ WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
+
+ WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf);
+ WRITE_ONCE(cpudata->lowest_nonlinear_perf,
+ cppc_perf.lowest_nonlinear_perf);
+ WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
+
+ return 0;
+}
+
static int amd_pstate_init_perf(struct amd_cpudata *cpudata)
{
struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
@@ -134,6 +165,19 @@ static void pstate_update_perf(struct amd_cpudata *cpudata,
READ_ONCE(cpudata->cppc_req_cached));
}

+static void cppc_update_perf(struct amd_cpudata *cpudata,
+ u32 min_perf, u32 des_perf,
+ u32 max_perf, bool fast_switch)
+{
+ struct cppc_perf_ctrls perf_ctrls;
+
+ perf_ctrls.max_perf = max_perf;
+ perf_ctrls.min_perf = min_perf;
+ perf_ctrls.desired_perf = des_perf;
+
+ cppc_set_perf(cpudata->cpu, &perf_ctrls);
+}
+
static int
amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
u32 des_perf, u32 max_perf, bool fast_switch)
@@ -370,6 +414,12 @@ static struct amd_pstate_perf_funcs pstate_funcs = {
.update_perf = pstate_update_perf,
};

+static struct amd_pstate_perf_funcs cppc_funcs = {
+ .enable = cppc_enable,
+ .init_perf = cppc_init_perf,
+ .update_perf = cppc_update_perf,
+};
+
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
{
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -416,7 +466,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
/* It will be updated by governor */
policy->cur = policy->cpuinfo.min_freq;

- policy->fast_switch_possible = true;
+ if (boot_cpu_has(X86_FEATURE_AMD_CPPC_EXT))
+ policy->fast_switch_possible = true;

ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
FREQ_QOS_MIN, policy->cpuinfo.min_freq);
@@ -471,7 +522,6 @@ static struct cpufreq_driver amd_pstate_driver = {
.verify = amd_pstate_verify,
.target = amd_pstate_target,
.fast_switch = amd_pstate_fast_switch,
- .adjust_perf = amd_pstate_adjust_perf,
.init = amd_pstate_cpu_init,
.exit = amd_pstate_cpu_exit,
.name = "amd-pstate",
@@ -496,14 +546,15 @@ static int __init amd_pstate_init(void)
return -EEXIST;

/* capability check */
- if (!boot_cpu_has(X86_FEATURE_AMD_CPPC_EXT)) {
+ if (boot_cpu_has(X86_FEATURE_AMD_CPPC_EXT)) {
pr_debug("%s, AMD CPPC extension functionality is supported\n",
__func__);
- return -ENODEV;
+ funcs = &pstate_funcs;
+ amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
+ } else {
+ funcs = &cppc_funcs;
}

- funcs = &pstate_funcs;
-
/* enable amd pstate feature */
ret = amd_pstate_enable(funcs, true);
if (ret) {
--
2.25.1

2021-09-08 17:46:12

by Huang Rui

[permalink] [raw]
Subject: [PATCH 13/19] cpupower: add the function to check amd-pstate enabled

Introduce the cpupower_amd_pstate_enabled() to check whether the kernel
mode enables amd-pstate.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/utils/helpers/helpers.h | 5 +++++
tools/power/cpupower/utils/helpers/misc.c | 20 ++++++++++++++++++++
2 files changed, 25 insertions(+)

diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index b4813efdfb00..eb43c14d1728 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -136,6 +136,11 @@ extern int decode_pstates(unsigned int cpu, int boost_states,

extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
int *active, int * states);
+
+/* AMD PSTATE enabling **************************/
+
+extern unsigned long cpupower_amd_pstate_enabled(unsigned int cpu);
+
/*
* CPUID functions returning a single datum
*/
diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
index fc6e34511721..07d80775fb68 100644
--- a/tools/power/cpupower/utils/helpers/misc.c
+++ b/tools/power/cpupower/utils/helpers/misc.c
@@ -83,6 +83,26 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val)
return 0;
}

+unsigned long cpupower_amd_pstate_enabled(unsigned int cpu)
+{
+ char linebuf[MAX_LINE_LEN];
+ char path[SYSFS_PATH_MAX];
+ unsigned long val;
+ char *endp;
+
+ snprintf(path, sizeof(path),
+ PATH_TO_CPU "cpu%u/cpufreq/is_amd_pstate_enabled", cpu);
+
+ if (cpupower_read_sysfs(path, linebuf, MAX_LINE_LEN) == 0)
+ return 0;
+
+ val = strtoul(linebuf, &endp, 0);
+ if (endp == linebuf || errno == ERANGE)
+ return 0;
+
+ return val;
+}
+
#endif /* #if defined(__i386__) || defined(__x86_64__) */

/* get_cpustate
--
2.25.1

2021-09-08 17:46:18

by Huang Rui

[permalink] [raw]
Subject: [PATCH 17/19] cpupower: add amd-pstate get data function to query the info

Frequency-info needs an interface to query the current amd-pstate data.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/lib/cpufreq.c | 6 ++++++
tools/power/cpupower/lib/cpufreq.h | 13 +++++++++++++
2 files changed, 19 insertions(+)

diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
index 37da87bdcfb1..1443080868da 100644
--- a/tools/power/cpupower/lib/cpufreq.c
+++ b/tools/power/cpupower/lib/cpufreq.c
@@ -810,3 +810,9 @@ int amd_pstate_boost_enabled(unsigned int cpu)

return cpuinfo_max == amd_pstate_max ? 1 : 0;
}
+
+unsigned amd_pstate_get_data(unsigned int cpu, enum amd_pstate_param param)
+{
+ return sysfs_cpufreq_get_one_value(cpu,
+ param + AMD_PSTATE_HIGHEST_PERF);
+}
diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
index d54d02a7a4f4..954e72704fc0 100644
--- a/tools/power/cpupower/lib/cpufreq.h
+++ b/tools/power/cpupower/lib/cpufreq.h
@@ -206,6 +206,19 @@ int cpufreq_set_frequency(unsigned int cpu,
int amd_pstate_boost_support(unsigned int cpu);
int amd_pstate_boost_enabled(unsigned int cpu);

+enum amd_pstate_param {
+ HIGHEST_PERF,
+ NOMINAL_PERF,
+ LOWEST_NONLINEAR_PERF,
+ LOWEST_PERF,
+ MAX_FREQ,
+ NOMINAL_FREQ,
+ LOWEST_NONLINEAR_FREQ,
+ MIN_FREQ,
+};
+
+unsigned amd_pstate_get_data(unsigned int cpu, enum amd_pstate_param param);
+
#ifdef __cplusplus
}
#endif
--
2.25.1

2021-09-08 17:46:22

by Huang Rui

[permalink] [raw]
Subject: [PATCH 18/19] cpupower: print amd-pstate information on cpupower

amd-pstate kernel module is using the fine grain frequency instead of
acpi hardware pstate. So the performance and frequency values should be
printed in frequency-info.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/utils/cpufreq-info.c | 27 ++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c
index f9895e31ff5a..9eabed209adc 100644
--- a/tools/power/cpupower/utils/cpufreq-info.c
+++ b/tools/power/cpupower/utils/cpufreq-info.c
@@ -183,9 +183,30 @@ static int get_boost_mode_x86(unsigned int cpu)
printf(_(" Supported: %s\n"), support ? _("yes") : _("no"));
printf(_(" Active: %s\n"), active ? _("yes") : _("no"));

- if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
- cpupower_cpu_info.family >= 0x10) ||
- cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
+ if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
+ cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
+ printf(_(" AMD PSTATE Highest Performance: %u. Maximum Frequency: "),
+ amd_pstate_get_data(cpu, HIGHEST_PERF));
+ print_speed(amd_pstate_get_data(cpu, MAX_FREQ));
+ printf(".\n");
+
+ printf(_(" AMD PSTATE Nominal Performance: %u. Nominal Frequency: "),
+ amd_pstate_get_data(cpu, NOMINAL_PERF));
+ print_speed(amd_pstate_get_data(cpu, NOMINAL_FREQ));
+ printf(".\n");
+
+ printf(_(" AMD PSTATE Lowest Non-linear Performance: %u. Lowest Non-linear Frequency: "),
+ amd_pstate_get_data(cpu, LOWEST_NONLINEAR_PERF));
+ print_speed(amd_pstate_get_data(cpu, LOWEST_NONLINEAR_FREQ));
+ printf(".\n");
+
+ printf(_(" AMD PSTATE Lowest Performance: %u. Lowest Frequency: "),
+ amd_pstate_get_data(cpu, LOWEST_PERF));
+ print_speed(amd_pstate_get_data(cpu, MIN_FREQ));
+ printf(".\n");
+ } else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
+ cpupower_cpu_info.family >= 0x10) ||
+ cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
ret = decode_pstates(cpu, b_states, pstates, &pstate_no);
if (ret)
return ret;
--
2.25.1

2021-09-08 17:47:20

by Huang Rui

[permalink] [raw]
Subject: [PATCH 14/19] cpupower: initial AMD P-state capability

If kernel enables AMD P-state, cpupower won't need to respond ACPI
hardware P-states function anymore.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/utils/helpers/cpuid.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c
index 72eb43593180..78218c54acca 100644
--- a/tools/power/cpupower/utils/helpers/cpuid.c
+++ b/tools/power/cpupower/utils/helpers/cpuid.c
@@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info)
if (ext_cpuid_level >= 0x80000008 &&
cpuid_ebx(0x80000008) & (1 << 4))
cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU;
+
+ if (cpupower_amd_pstate_enabled(0)) {
+ cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE;
+
+ /*
+ * If AMD P-state is enabled, the firmware will treat
+ * AMD P-state function as high priority.
+ */
+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB;
+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR;
+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE;
+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF;
+ }
}

if (cpu_info->vendor == X86_VENDOR_INTEL) {
--
2.25.1

2021-09-08 18:12:20

by Huang Rui

[permalink] [raw]
Subject: [PATCH 12/19] cpupower: add AMD P-state capability flag

Add AMD P-state capability flag in cpupower to indicate AMD new P-state
kernel module support on Ryzen processors.

Signed-off-by: Huang Rui <[email protected]>
---
tools/power/cpupower/utils/helpers/helpers.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index 33ffacee7fcb..b4813efdfb00 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -73,6 +73,7 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL,
#define CPUPOWER_CAP_AMD_HW_PSTATE 0x00000100
#define CPUPOWER_CAP_AMD_PSTATEDEF 0x00000200
#define CPUPOWER_CAP_AMD_CPB_MSR 0x00000400
+#define CPUPOWER_CAP_AMD_PSTATE 0x00000800

#define CPUPOWER_AMD_CPBDIS 0x02000000

--
2.25.1

2021-09-08 18:17:58

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 10/19] cpufreq: amd: add amd-pstate frequencies attributes

On 9/8/2021 9:59 AM, Huang Rui wrote:
> Introduce sysfs attributes to get the different level processor
> frequencies.
>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 80 +++++++++++++++++++++++++++++++++++-
> 1 file changed, 79 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 48dedd5af101..3c727a22cb69 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -577,16 +577,94 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
> return 0;
> }
>
> -static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
> +/* Sysfs attributes */
> +
> +static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy,
> char *buf)
> +{
> + int ret = 0, max_freq;
> + struct amd_cpudata *cpudata;
> +
> + cpudata = policy->driver_data;
> +
> + max_freq = amd_get_max_freq(cpudata);
> + if (max_freq < 0)
> + return max_freq;
> +
> + ret += sprintf(&buf[ret], "%u\n", max_freq);
> +
> + return ret;

Here, and in the functions below, you could just do

return sprintf(&buf[ret], "%u\n", max_freq);

and get rid of the intermediary 'ret' variable.

-Nathan

> +}
> +
> +static ssize_t show_amd_pstate_nominal_freq(struct cpufreq_policy *policy,
> + char *buf)
> +{
> + int ret = 0, nominal_freq;
> + struct amd_cpudata *cpudata;
> +
> + cpudata = policy->driver_data;
> +
> + nominal_freq = amd_get_nominal_freq(cpudata);
> + if (nominal_freq < 0)
> + return nominal_freq;
> +
> + ret += sprintf(&buf[ret], "%u\n", nominal_freq);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy, char *buf)
> +{
> + int ret = 0, freq;
> + struct amd_cpudata *cpudata;
> +
> + cpudata = policy->driver_data;
> +
> + freq = amd_get_lowest_nonlinear_freq(cpudata);
> + if (freq < 0)
> + return freq;
> +
> + ret += sprintf(&buf[ret], "%u\n", freq);
> +
> + return ret;
> +}
> +
> +static ssize_t show_amd_pstate_min_freq(struct cpufreq_policy *policy, char *buf)
> +{
> + int ret = 0;
> + int freq;
> + struct amd_cpudata *cpudata;
> +
> + cpudata = policy->driver_data;
> +
> + freq = amd_get_min_freq(cpudata);
> + if (freq < 0)
> + return freq;
> +
> + ret += sprintf(&buf[ret], "%u\n", freq);
> +
> + return ret;
> +}
> +
> +static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
> + char *buf)
> {
> return sprintf(&buf[0], "%d\n", acpi_cpc_valid() ? 1 : 0);
> }
>
> cpufreq_freq_attr_ro(is_amd_pstate_enabled);
> +cpufreq_freq_attr_ro(amd_pstate_max_freq);
> +cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
> +cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> +cpufreq_freq_attr_ro(amd_pstate_min_freq);
>
> static struct freq_attr *amd_pstate_attr[] = {
> &is_amd_pstate_enabled,
> + &amd_pstate_max_freq,
> + &amd_pstate_nominal_freq,
> + &amd_pstate_lowest_nonlinear_freq,
> + &amd_pstate_min_freq,
> NULL,
> };
>
> --
> 2.25.1
>

2021-09-08 19:06:18

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 11/19] cpufreq: amd: add amd-pstate performance attributes

On 9/8/2021 9:59 AM, Huang Rui wrote:
> Introduce sysfs attributes to get the different level amd-pstate
> performances.
>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 66 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 66 insertions(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 3c727a22cb69..9c60388d45ed 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -647,6 +647,62 @@ static ssize_t show_amd_pstate_min_freq(struct cpufreq_policy *policy, char *buf
> return ret;
> }
>
> +static ssize_t
> +show_amd_pstate_highest_perf(struct cpufreq_policy *policy, char *buf)

Here (and in the other functions) the function return value and name should
be on the same line.

> +{
> + int ret = 0;
> + u32 perf;
> + struct amd_cpudata *cpudata = policy->driver_data;
> +
> + perf = READ_ONCE(cpudata->highest_perf);
> +
> + ret += sprintf(&buf[ret], "%u\n", perf);
> +
> + return ret;

Same comment as the previous patch here and in the functions below, just do

return sprintf(&buf[ret], "%u\n", perf);

and get rid of the intermediary 'ret' variable.

-Nathan

> +}
> +
> +static ssize_t
> +show_amd_pstate_nominal_perf(struct cpufreq_policy *policy, char *buf)
> +{
> + int ret = 0;
> + u32 perf;
> + struct amd_cpudata *cpudata = policy->driver_data;
> +
> + perf = READ_ONCE(cpudata->nominal_perf);
> +
> + ret += sprintf(&buf[ret], "%u\n", perf);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +show_amd_pstate_lowest_nonlinear_perf(struct cpufreq_policy *policy, char *buf)
> +{
> + int ret = 0;
> + u32 perf;
> + struct amd_cpudata *cpudata = policy->driver_data;
> +
> + perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> +
> + ret += sprintf(&buf[ret], "%u\n", perf);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +show_amd_pstate_lowest_perf(struct cpufreq_policy *policy, char *buf)
> +{
> + int ret = 0;
> + u32 perf;
> + struct amd_cpudata *cpudata = policy->driver_data;
> +
> + perf = READ_ONCE(cpudata->lowest_perf);
> +
> + ret += sprintf(&buf[ret], "%u\n", perf);
> +
> + return ret;
> +}
> +
> static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
> char *buf)
> {
> @@ -654,17 +710,27 @@ static ssize_t show_is_amd_pstate_enabled(struct cpufreq_policy *policy,
> }
>
> cpufreq_freq_attr_ro(is_amd_pstate_enabled);
> +
> cpufreq_freq_attr_ro(amd_pstate_max_freq);
> cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
> cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> cpufreq_freq_attr_ro(amd_pstate_min_freq);
>
> +cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> +cpufreq_freq_attr_ro(amd_pstate_nominal_perf);
> +cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_perf);
> +cpufreq_freq_attr_ro(amd_pstate_lowest_perf);
> +
> static struct freq_attr *amd_pstate_attr[] = {
> &is_amd_pstate_enabled,
> &amd_pstate_max_freq,
> &amd_pstate_nominal_freq,
> &amd_pstate_lowest_nonlinear_freq,
> &amd_pstate_min_freq,
> + &amd_pstate_highest_perf,
> + &amd_pstate_nominal_perf,
> + &amd_pstate_lowest_nonlinear_perf,
> + &amd_pstate_lowest_perf,
> NULL,
> };
>
> --
> 2.25.1
>

2021-09-08 19:06:36

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 08/19] cpufreq: amd: add boost mode support for amd-pstate

On 9/8/2021 9:59 AM, Huang Rui wrote:
> If the sbios supports the boost mode of amd-pstate, let's switch to
> boost enabled by default.
>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> drivers/cpufreq/amd-pstate.c | 50 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index ea965a122431..67a9a117f524 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -75,6 +75,8 @@ struct amd_cpudata {
> u32 min_freq;
> u32 nominal_freq;
> u32 lowest_nonlinear_freq;
> +
> + bool boost_supported;
> };
>
> struct amd_pstate_perf_funcs {
> @@ -229,6 +231,19 @@ amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
> max_perf, fast_switch);
> }
>
> +static bool amd_pstate_boost_supported(struct amd_cpudata *cpudata)
> +{
> + u32 highest_perf, nominal_perf;
> +
> + highest_perf = READ_ONCE(cpudata->highest_perf);
> + nominal_perf = READ_ONCE(cpudata->nominal_perf);
> +
> + if (highest_perf > nominal_perf)
> + return true;
> +
> + return false;
> +}
> +
> static int amd_pstate_verify(struct cpufreq_policy_data *policy)
> {
> cpufreq_verify_within_cpu_limits(policy);
> @@ -402,6 +417,37 @@ static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
> return lowest_nonlinear_freq * 1000;
> }
>
> +static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state)
> +{
> + struct amd_cpudata *cpudata = policy->driver_data;
> + int ret;
> +
> + if (!cpudata->boost_supported) {
> + pr_err("Boost mode is not supported by this processor or SBIOS\n");
> + return -EINVAL;
> + }
> +
> + if (state)
> + policy->cpuinfo.max_freq = cpudata->max_freq;
> + else
> + policy->cpuinfo.max_freq = cpudata->nominal_freq;
> +
> + policy->max = policy->cpuinfo.max_freq;
> +
> + ret = freq_qos_update_request(&cpudata->req[1],
> + policy->cpuinfo.max_freq);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
> +{
> + cpudata->boost_supported = true;
> + amd_pstate_driver.boost_enabled = true;
> +}
> +
> static int amd_pstate_init_freqs_in_cpudata(struct amd_cpudata *cpudata,
> u32 max_freq, u32 min_freq,
> u32 nominal_freq,
> @@ -504,6 +550,9 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>
> policy->driver_data = cpudata;
>
> + if (amd_pstate_boost_supported(cpudata))
> + amd_pstate_boost_init(cpudata);

Is there any reason to not merge amd_pstate_boost_supported() and
amd_pstate_boost_init() into a single function? I don't see that
amd_pstate_boost_supported() is called anywhere else.

-Nathan

> +
> return 0;
>
> free_cpudata3:
> @@ -535,6 +584,7 @@ static struct cpufreq_driver amd_pstate_driver = {
> .fast_switch = amd_pstate_fast_switch,
> .init = amd_pstate_cpu_init,
> .exit = amd_pstate_cpu_exit,
> + .set_boost = amd_pstate_set_boost,
> .name = "amd-pstate",
> };
>
> --
> 2.25.1
>

2021-09-08 19:16:56

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 03/19] ACPI: CPPC: add cppc enable register function

On 9/8/2021 9:59 AM, Huang Rui wrote:
> From: Jinzhou Su <[email protected]>
>
> Export the cppc enable register function for future use.
>
> Signed-off-by: Jinzhou Su <[email protected]>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> drivers/acpi/cppc_acpi.c | 42 ++++++++++++++++++++++++++++++++++++++++
> include/acpi/cppc_acpi.h | 5 +++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index a4d4eebba1da..de4b30545215 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -1220,6 +1220,48 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> }
> EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
>
> +/**
> + * cppc_set_enable - Set to enable CPPC register.
> + * @cpu: CPU for which to enable CPPC register.
> + * @enable: enable field to write into share memory.
> + *
> + * Return: 0 for success, -ERRNO otherwise.
> + */
> +int cppc_set_enable(int cpu, u32 enable)
> +{
> + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> + struct cpc_register_resource *enable_reg;
> + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> + struct cppc_pcc_data *pcc_ss_data = NULL;
> + int ret = -1;
> +
> + if (!cpc_desc) {
> + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> + return -ENODEV;
> + }
> +
> + enable_reg = &cpc_desc->cpc_regs[ENABLE];
> +
> + if (CPC_IN_PCC(enable_reg)) {
> +
> + if (pcc_ss_id < 0)
> + return -EIO;
> +
> + ret = cpc_write(cpu, enable_reg, enable);
> + if (ret)
> + return ret;
> +
> + pcc_ss_data = pcc_data[pcc_ss_id];
> +
> + down_write(&pcc_ss_data->pcc_lock);
> + send_pcc_cmd(pcc_ss_id, CMD_WRITE);

Shouldn't we be checking the return value from send_pcc_cmd()?

Also, if the call to send_pcc_cmd() fails do we need to update
enable_reg? i.e. cpc_write(..., !enable);

-Nathan

> + up_write(&pcc_ss_data->pcc_lock);
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(cppc_set_enable);
> +
> /**
> * cppc_set_perf - Set a CPU's performance controls.
> * @cpu: CPU for which to set performance controls.
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 9f4985b4d64d..3fdae40a75fc 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -137,6 +137,7 @@ struct cppc_cpudata {
> extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
> extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
> extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
> +extern int cppc_set_enable(int cpu, u32 enable);
> extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
> extern bool acpi_cpc_valid(void);
> extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
> @@ -157,6 +158,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
> {
> return -ENOTSUPP;
> }
> +static inline int cppc_set_enable(int cpu, u32 enable)
> +{
> + return -ENOTSUPP;
> +}
> static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
> {
> return -ENOTSUPP;
> --
> 2.25.1
>

2021-09-08 20:11:28

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

On 9/8/21 8:59 AM, Huang Rui wrote:
> Add Collaborative Processor Performance Control Extension feature flag
> for AMD processors.
>

Please add a couple of sentences about the feature and what it does.

> Signed-off-by: Huang Rui <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..f7aea50e3371 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -313,6 +313,7 @@
> #define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
> #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
> #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
> +#define X86_FEATURE_AMD_CPPC_EXT (13*32+27) /* Collaborative Processor Performance Control Extension */
>
> /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
> #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
>

thanks,
-- Shuah

2021-09-09 00:58:24

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 03/19] ACPI: CPPC: add cppc enable register function

On 9/8/21 8:59 AM, Huang Rui wrote:
> From: Jinzhou Su <[email protected]>
>
> Export the cppc enable register function for future use.

This patch also adds a new function. How about saying something about
adding a new function.

>
> Signed-off-by: Jinzhou Su <[email protected]>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> drivers/acpi/cppc_acpi.c | 42 ++++++++++++++++++++++++++++++++++++++++
> include/acpi/cppc_acpi.h | 5 +++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index a4d4eebba1da..de4b30545215 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -1220,6 +1220,48 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> }
> EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
>
> +/**
> + * cppc_set_enable - Set to enable CPPC register.

Please make this more descriptive - does it write to register
What is the behavior in error paths etc.

> + * @cpu: CPU for which to enable CPPC register.
> + * @enable: enable field to write into share memory.

What should this be? What are the valid values to write?
Also aren't we adding this to header file where prtotype
is defined these days?

> + *
> + * Return: 0 for success, -ERRNO otherwise.
> + */
> +int cppc_set_enable(int cpu, u32 enable)
> +{
> + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> + struct cpc_register_resource *enable_reg;
> + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> + struct cppc_pcc_data *pcc_ss_data = NULL;
> + int ret = -1;
> +
> + if (!cpc_desc) {
> + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> + return -ENODEV;
> + }
> +

Don't we need to do some error checking on input args? What is the
valid range for cpu and enbale?

> + enable_reg = &cpc_desc->cpc_regs[ENABLE];
> +
> + if (CPC_IN_PCC(enable_reg)) {
> +
> + if (pcc_ss_id < 0)
> + return -EIO;
> +
> + ret = cpc_write(cpu, enable_reg, enable);
> + if (ret)
> + return ret;
> +
> + pcc_ss_data = pcc_data[pcc_ss_id];
> +
> + down_write(&pcc_ss_data->pcc_lock);
> + send_pcc_cmd(pcc_ss_id, CMD_WRITE);

Could this fail?

> + up_write(&pcc_ss_data->pcc_lock);
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(cppc_set_enable);
> +
> /**
> * cppc_set_perf - Set a CPU's performance controls.
> * @cpu: CPU for which to set performance controls.
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index 9f4985b4d64d..3fdae40a75fc 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -137,6 +137,7 @@ struct cppc_cpudata {
> extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
> extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
> extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
> +extern int cppc_set_enable(int cpu, u32 enable);
> extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
> extern bool acpi_cpc_valid(void);
> extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
> @@ -157,6 +158,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
> {
> return -ENOTSUPP;
> }
> +static inline int cppc_set_enable(int cpu, u32 enable)
> +{
> + return -ENOTSUPP;
> +}
> static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
> {
> return -ENOTSUPP;
>

thanks,
-- Shuah

2021-09-09 09:48:01

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

On Thu, Sep 09, 2021 at 04:00:04AM +0800, Shuah Khan wrote:
> On 9/8/21 8:59 AM, Huang Rui wrote:
> > Add Collaborative Processor Performance Control Extension feature flag
> > for AMD processors.
> >
>
> Please add a couple of sentences about the feature and what it does.

OK, will describe details at V2.

Thanks,
Ray

2021-09-09 09:53:02

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 03/19] ACPI: CPPC: add cppc enable register function

On Thu, Sep 09, 2021 at 03:14:37AM +0800, Fontenot, Nathan wrote:
> On 9/8/2021 9:59 AM, Huang Rui wrote:
> > From: Jinzhou Su <[email protected]>
> >
> > Export the cppc enable register function for future use.
> >
> > Signed-off-by: Jinzhou Su <[email protected]>
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > drivers/acpi/cppc_acpi.c | 42 ++++++++++++++++++++++++++++++++++++++++
> > include/acpi/cppc_acpi.h | 5 +++++
> > 2 files changed, 47 insertions(+)
> >
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index a4d4eebba1da..de4b30545215 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1220,6 +1220,48 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> > }
> > EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
> >
> > +/**
> > + * cppc_set_enable - Set to enable CPPC register.
> > + * @cpu: CPU for which to enable CPPC register.
> > + * @enable: enable field to write into share memory.
> > + *
> > + * Return: 0 for success, -ERRNO otherwise.
> > + */
> > +int cppc_set_enable(int cpu, u32 enable)
> > +{
> > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> > + struct cpc_register_resource *enable_reg;
> > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > + int ret = -1;
> > +
> > + if (!cpc_desc) {
> > + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> > + return -ENODEV;
> > + }
> > +
> > + enable_reg = &cpc_desc->cpc_regs[ENABLE];
> > +
> > + if (CPC_IN_PCC(enable_reg)) {
> > +
> > + if (pcc_ss_id < 0)
> > + return -EIO;
> > +
> > + ret = cpc_write(cpu, enable_reg, enable);
> > + if (ret)
> > + return ret;
> > +
> > + pcc_ss_data = pcc_data[pcc_ss_id];
> > +
> > + down_write(&pcc_ss_data->pcc_lock);
> > + send_pcc_cmd(pcc_ss_id, CMD_WRITE);
>
> Shouldn't we be checking the return value from send_pcc_cmd()?
>
> Also, if the call to send_pcc_cmd() fails do we need to update
> enable_reg? i.e. cpc_write(..., !enable);
>

Sounds reasonable. I will modify this in V2.

Thanks,
Ray

2021-09-09 10:00:09

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 03/19] ACPI: CPPC: add cppc enable register function

On Thu, Sep 09, 2021 at 08:21:48AM +0800, Shuah Khan wrote:
> On 9/8/21 8:59 AM, Huang Rui wrote:
> > From: Jinzhou Su <[email protected]>
> >
> > Export the cppc enable register function for future use.
>
> This patch also adds a new function. How about saying something about
> adding a new function.
>
> >
> > Signed-off-by: Jinzhou Su <[email protected]>
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > drivers/acpi/cppc_acpi.c | 42 ++++++++++++++++++++++++++++++++++++++++
> > include/acpi/cppc_acpi.h | 5 +++++
> > 2 files changed, 47 insertions(+)
> >
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index a4d4eebba1da..de4b30545215 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1220,6 +1220,48 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> > }
> > EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
> >
> > +/**
> > + * cppc_set_enable - Set to enable CPPC register.
>
> Please make this more descriptive - does it write to register
> What is the behavior in error paths etc.
>
> > + * @cpu: CPU for which to enable CPPC register.
> > + * @enable: enable field to write into share memory.
>
> What should this be? What are the valid values to write?
> Also aren't we adding this to header file where prtotype
> is defined these days?

Thank you for the suggestions, I will refine the comments and commmit log
in V2.

>
> > + *
> > + * Return: 0 for success, -ERRNO otherwise.
> > + */
> > +int cppc_set_enable(int cpu, u32 enable)
> > +{
> > + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> > + struct cpc_register_resource *enable_reg;
> > + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > + struct cppc_pcc_data *pcc_ss_data = NULL;
> > + int ret = -1;
> > +
> > + if (!cpc_desc) {
> > + pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> > + return -ENODEV;
> > + }
> > +
>
> Don't we need to do some error checking on input args? What is the
> valid range for cpu and enbale?

Good point.

>
> > + enable_reg = &cpc_desc->cpc_regs[ENABLE];
> > +
> > + if (CPC_IN_PCC(enable_reg)) {
> > +
> > + if (pcc_ss_id < 0)
> > + return -EIO;
> > +
> > + ret = cpc_write(cpu, enable_reg, enable);
> > + if (ret)
> > + return ret;
> > +
> > + pcc_ss_data = pcc_data[pcc_ss_id];
> > +
> > + down_write(&pcc_ss_data->pcc_lock);
> > + send_pcc_cmd(pcc_ss_id, CMD_WRITE);
>
> Could this fail?

Will add error handling in V2.

Thanks,
Ray

2021-09-09 10:03:07

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 16/19] cpupower: enable boost state support for amd-pstate module

On Thu, Sep 09, 2021 at 01:32:37AM +0800, Fontenot, Nathan wrote:
> On 9/8/2021 9:59 AM, Huang Rui wrote:
> > The AMD P-state boost API is different from ACPI hardware P-states, so
> > implement the support for amd-pstate kernel module.
> >
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > tools/power/cpupower/lib/cpufreq.c | 20 ++++++++++++++++++++
> > tools/power/cpupower/lib/cpufreq.h | 3 +++
> > tools/power/cpupower/utils/helpers/misc.c | 7 +++++++
> > 3 files changed, 30 insertions(+)
> >
> > diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> > index 3f92ddadaad2..37da87bdcfb1 100644
> > --- a/tools/power/cpupower/lib/cpufreq.c
> > +++ b/tools/power/cpupower/lib/cpufreq.c
> > @@ -790,3 +790,23 @@ unsigned long cpufreq_get_transitions(unsigned int cpu)
> > {
> > return sysfs_cpufreq_get_one_value(cpu, STATS_NUM_TRANSITIONS);
> > }
> > +
> > +int amd_pstate_boost_support(unsigned int cpu)
> > +{
> > + unsigned int highest_perf, nominal_perf;
> > +
> > + highest_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_HIGHEST_PERF);
> > + nominal_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_NOMINAL_PERF);
> > +
> > + return highest_perf > nominal_perf ? 1 : 0;
> > +}
> > +
> > +int amd_pstate_boost_enabled(unsigned int cpu)
> > +{
> > + unsigned int cpuinfo_max, amd_pstate_max;
> > +
> > + cpuinfo_max = sysfs_cpufreq_get_one_value(cpu, CPUINFO_MAX_FREQ);
> > + amd_pstate_max = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_MAX_FREQ);
> > +
> > + return cpuinfo_max == amd_pstate_max ? 1 : 0;
> > +}
> > diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
> > index 95f4fd9e2656..d54d02a7a4f4 100644
> > --- a/tools/power/cpupower/lib/cpufreq.h
> > +++ b/tools/power/cpupower/lib/cpufreq.h
> > @@ -203,6 +203,9 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
> > int cpufreq_set_frequency(unsigned int cpu,
> > unsigned long target_frequency);
> >
> > +int amd_pstate_boost_support(unsigned int cpu);
> > +int amd_pstate_boost_enabled(unsigned int cpu);
> > +
> > #ifdef __cplusplus
> > }
> > #endif
> > diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> > index 07d80775fb68..aba979320760 100644
> > --- a/tools/power/cpupower/utils/helpers/misc.c
> > +++ b/tools/power/cpupower/utils/helpers/misc.c
> > @@ -10,6 +10,7 @@
> > #if defined(__i386__) || defined(__x86_64__)
> >
> > #include "cpupower_intern.h"
> > +#include "cpufreq.h"
> >
> > #define MSR_AMD_HWCR 0xc0010015
> >
> > @@ -39,6 +40,12 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
> > if (ret)
> > return ret;
> > }
> > + } if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&
>
> Shouldn't this be
>
> } else if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&
>

Nice catch, it's my typo.

Thanks,
Ray

2021-09-09 10:15:03

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 08/19] cpufreq: amd: add boost mode support for amd-pstate

On Thu, Sep 09, 2021 at 02:24:54AM +0800, Fontenot, Nathan wrote:
> On 9/8/2021 9:59 AM, Huang Rui wrote:
> > If the sbios supports the boost mode of amd-pstate, let's switch to
> > boost enabled by default.
> >
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > drivers/cpufreq/amd-pstate.c | 50 ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 50 insertions(+)
> >
> > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > index ea965a122431..67a9a117f524 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -75,6 +75,8 @@ struct amd_cpudata {
> > u32 min_freq;
> > u32 nominal_freq;
> > u32 lowest_nonlinear_freq;
> > +
> > + bool boost_supported;
> > };
> >
> > struct amd_pstate_perf_funcs {
> > @@ -229,6 +231,19 @@ amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
> > max_perf, fast_switch);
> > }
> >
> > +static bool amd_pstate_boost_supported(struct amd_cpudata *cpudata)
> > +{
> > + u32 highest_perf, nominal_perf;
> > +
> > + highest_perf = READ_ONCE(cpudata->highest_perf);
> > + nominal_perf = READ_ONCE(cpudata->nominal_perf);
> > +
> > + if (highest_perf > nominal_perf)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > static int amd_pstate_verify(struct cpufreq_policy_data *policy)
> > {
> > cpufreq_verify_within_cpu_limits(policy);
> > @@ -402,6 +417,37 @@ static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
> > return lowest_nonlinear_freq * 1000;
> > }
> >
> > +static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state)
> > +{
> > + struct amd_cpudata *cpudata = policy->driver_data;
> > + int ret;
> > +
> > + if (!cpudata->boost_supported) {
> > + pr_err("Boost mode is not supported by this processor or SBIOS\n");
> > + return -EINVAL;
> > + }
> > +
> > + if (state)
> > + policy->cpuinfo.max_freq = cpudata->max_freq;
> > + else
> > + policy->cpuinfo.max_freq = cpudata->nominal_freq;
> > +
> > + policy->max = policy->cpuinfo.max_freq;
> > +
> > + ret = freq_qos_update_request(&cpudata->req[1],
> > + policy->cpuinfo.max_freq);
> > + if (ret < 0)
> > + return ret;
> > +
> > + return 0;
> > +}
> > +
> > +static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
> > +{
> > + cpudata->boost_supported = true;
> > + amd_pstate_driver.boost_enabled = true;
> > +}
> > +
> > static int amd_pstate_init_freqs_in_cpudata(struct amd_cpudata *cpudata,
> > u32 max_freq, u32 min_freq,
> > u32 nominal_freq,
> > @@ -504,6 +550,9 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> >
> > policy->driver_data = cpudata;
> >
> > + if (amd_pstate_boost_supported(cpudata))
> > + amd_pstate_boost_init(cpudata);
>
> Is there any reason to not merge amd_pstate_boost_supported() and
> amd_pstate_boost_init() into a single function? I don't see that
> amd_pstate_boost_supported() is called anywhere else.
>

Sounds reasonable. Will update it in V2 as well.

Thanks,
Ray

2021-09-09 15:05:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Wed, Sep 08, 2021 at 10:59:46PM +0800, Huang Rui wrote:

> +struct amd_pstate_perf_funcs {
> + int (*enable)(bool enable);
> + int (*init_perf)(struct amd_cpudata *cpudata);
> + void (*update_perf)(struct amd_cpudata *cpudata,
> + u32 min_perf, u32 des_perf,
> + u32 max_perf, bool fast_switch);
> +};

> +static int
> +amd_pstate_enable(struct amd_pstate_perf_funcs *funcs, bool enable)
> +{
> + if (!funcs)
> + return -EINVAL;
> +
> + return funcs->enable(enable);
> +}

> +static int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> + struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
> +
> + if (!funcs)
> + return -EINVAL;
> +
> + return funcs->init_perf(cpudata);
> +}

> +static int
> +amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> + u32 des_perf, u32 max_perf, bool fast_switch)
> +{
> + struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
> +
> + if (!funcs)
> + return -EINVAL;
> +
> + funcs->update_perf(cpudata, min_perf, des_perf,
> + max_perf, fast_switch);
> +
> + return 0;
> +}

> +static struct amd_pstate_perf_funcs pstate_funcs = {
> + .enable = pstate_enable,
> + .init_perf = pstate_init_perf,
> + .update_perf = pstate_update_perf,
> +};

> +static int __init amd_pstate_init(void)
> +{
> + int ret;
> + struct amd_pstate_perf_funcs *funcs;

> +
> + funcs = &pstate_funcs;

What is the purpose of this seemingly pointless indirection? Showing off
how good AMD hardware is at doing retpolines or something?

2021-09-09 15:07:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Wed, Sep 08, 2021 at 10:59:46PM +0800, Huang Rui wrote:

> +static int pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> + u64 cap1;
> +
> + int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> + &cap1);
> + if (ret)
> + return ret;
> +
> + /* Some AMD processors has specific power features that the cppc entry
> + * doesn't indicate the highest performance. It will introduce the
> + * feature in following days.
> + */

Wrong comment style; also imagine reading this comment half a year from
now...

> + WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> +
> + WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
> + WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
> + WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
> +
> + return 0;
> +}

2021-09-09 17:59:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

On Wed, Sep 08, 2021 at 10:59:43PM +0800, Huang Rui wrote:
> Add Collaborative Processor Performance Control Extension feature flag
> for AMD processors.
>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..f7aea50e3371 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -313,6 +313,7 @@
> #define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
> #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
> #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
> +#define X86_FEATURE_AMD_CPPC_EXT (13*32+27) /* Collaborative Processor Performance Control Extension */

Why not simply X86_FEATURE_AMD_CPPC ?

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2021-09-09 19:56:02

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors


On 9/8/2021 9:59 AM, Huang Rui wrote:
> amd-pstate is the AMD CPU performance scaling driver that introduces a
> new CPU frequency control mechanism on AMD Zen based CPU series in Linux
> kernel. The new mechanism is based on Collaborative processor
> performance control (CPPC) which is finer grain frequency management
> than legacy ACPI hardware P-States. Current AMD CPU platforms are using
> the ACPI P-states driver to manage CPU frequency and clocks with
> switching only in 3 P-states. AMD P-States is to replace the ACPI
> P-states controls, allows a flexible, low-latency interface for the
> Linux kernel to directly communicate the performance hints to hardware.
>

This patch seems like it is just enabling CPPC on AMD and not a new mechanism
based on CPPC. Can you clarify?

Also, if this is just enabling CPPC, shouldn't the driver be named something
like amd_cppc and not amd_pstate? This isn't using P-states.

> "amd-pstate" leverages the Linux kernel governors such as *schedutil*,
> *ondemand*, etc. to manage the performance hints which are provided by CPPC
> hardware functionality. The first version for amd-pstate is to support one
> of the Zen3 processors, and we will support more in future after we verify
> the hardware and SBIOS functionalities.
>
> There are two types of hardware implementations for amd-pstate: one is full
> MSR support and another is shared memory support. It can use
> X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.
>

Looking at the drivers/acpi code for CPPC I don't think this distinction
between MSRs and shared memory requires a feature flag. Shouldn't this be
handled properly in cpc_read|write if the ACPI tables are set up correctly?
Please correct me if I'm wrong.

This would also remove the need for the additional indirection pointed
out by Peter.

Could you also provide an explanation as to why a new CPPC driver is need
instead of updating the existing cppc_cpufreq driver.

-Nathan

2021-09-09 22:18:00

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 13/19] cpupower: add the function to check amd-pstate enabled

On 9/8/21 8:59 AM, Huang Rui wrote:
> Introduce the cpupower_amd_pstate_enabled() to check whether the kernel
> mode enables amd-pstate.
>

What does "kernel mode" mean? Are you referring to kernel vs.
firmware mode?

> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/utils/helpers/helpers.h | 5 +++++
> tools/power/cpupower/utils/helpers/misc.c | 20 ++++++++++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
> index b4813efdfb00..eb43c14d1728 100644
> --- a/tools/power/cpupower/utils/helpers/helpers.h
> +++ b/tools/power/cpupower/utils/helpers/helpers.h
> @@ -136,6 +136,11 @@ extern int decode_pstates(unsigned int cpu, int boost_states,
>
> extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
> int *active, int * states);
> +
> +/* AMD PSTATE enabling **************************/
> +
> +extern unsigned long cpupower_amd_pstate_enabled(unsigned int cpu);
> +
> /*
> * CPUID functions returning a single datum
> */
> diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> index fc6e34511721..07d80775fb68 100644
> --- a/tools/power/cpupower/utils/helpers/misc.c
> +++ b/tools/power/cpupower/utils/helpers/misc.c
> @@ -83,6 +83,26 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val)
> return 0;
> }
>
> +unsigned long cpupower_amd_pstate_enabled(unsigned int cpu)
> +{
> + char linebuf[MAX_LINE_LEN];
> + char path[SYSFS_PATH_MAX];
> + unsigned long val;
> + char *endp;
> +
> + snprintf(path, sizeof(path),
> + PATH_TO_CPU "cpu%u/cpufreq/is_amd_pstate_enabled", cpu);
> +
> + if (cpupower_read_sysfs(path, linebuf, MAX_LINE_LEN) == 0)
> + return 0;
> +
> + val = strtoul(linebuf, &endp, 0);
> + if (endp == linebuf || errno == ERANGE)
> + return 0;
> +
> + return val;
> +}
> +
> #endif /* #if defined(__i386__) || defined(__x86_64__) */
>
> /* get_cpustate
>

2021-09-09 22:18:31

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 14/19] cpupower: initial AMD P-state capability

On 9/8/21 8:59 AM, Huang Rui wrote:
> If kernel enables AMD P-state, cpupower won't need to respond ACPI
> hardware P-states function anymore.
>

This commit log doesn't seem to match the code change. I see it
calling cpupower_amd_pstate_enabled() and setting flags.

> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/utils/helpers/cpuid.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c
> index 72eb43593180..78218c54acca 100644
> --- a/tools/power/cpupower/utils/helpers/cpuid.c
> +++ b/tools/power/cpupower/utils/helpers/cpuid.c
> @@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info)
> if (ext_cpuid_level >= 0x80000008 &&
> cpuid_ebx(0x80000008) & (1 << 4))
> cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU;
> +
> + if (cpupower_amd_pstate_enabled(0)) {

What is the reason for calling this function with cpu id = 0?


> + cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE;
> +
> + /*
> + * If AMD P-state is enabled, the firmware will treat
> + * AMD P-state function as high priority.
> + */
> + cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB;
> + cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR;
> + cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE;
> + cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF;
> + }
> }
>
> if (cpu_info->vendor == X86_VENDOR_INTEL) {
>

thanks,
-- Shuah

2021-09-09 22:27:55

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 15/19] cpupower: add amd-pstate sysfs entries into libcpufreq

On 9/8/21 8:59 AM, Huang Rui wrote:
> These amd-pstate sysfs entries will be used on cpupower for amd-pstate
> kernel module.
>

This commit log doesn't make sense. If these sysfs entries are used
for amd-pstate kernel module, why are they defined here.

Describe how these are used and the relationship between these defines
and the amd-pstate kernel module

> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/lib/cpufreq.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> index c3b56db8b921..3f92ddadaad2 100644
> --- a/tools/power/cpupower/lib/cpufreq.c
> +++ b/tools/power/cpupower/lib/cpufreq.c
> @@ -69,6 +69,14 @@ enum cpufreq_value {
> SCALING_MIN_FREQ,
> SCALING_MAX_FREQ,
> STATS_NUM_TRANSITIONS,
> + AMD_PSTATE_HIGHEST_PERF,
> + AMD_PSTATE_NOMINAL_PERF,
> + AMD_PSTATE_LOWEST_NONLINEAR_PERF,
> + AMD_PSTATE_LOWEST_PERF,
> + AMD_PSTATE_MAX_FREQ,
> + AMD_PSTATE_NOMINAL_FREQ,
> + AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
> + AMD_PSTATE_MIN_FREQ,
> MAX_CPUFREQ_VALUE_READ_FILES
> };
>

These are AMD specific values being added to a common code.

> @@ -80,7 +88,15 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = {
> [SCALING_CUR_FREQ] = "scaling_cur_freq",
> [SCALING_MIN_FREQ] = "scaling_min_freq",
> [SCALING_MAX_FREQ] = "scaling_max_freq",
> - [STATS_NUM_TRANSITIONS] = "stats/total_trans"
> + [STATS_NUM_TRANSITIONS] = "stats/total_trans",
> + [AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
> + [AMD_PSTATE_NOMINAL_PERF] = "amd_pstate_nominal_perf",
> + [AMD_PSTATE_LOWEST_NONLINEAR_PERF] = "amd_pstate_lowest_nonlinear_perf",
> + [AMD_PSTATE_LOWEST_PERF] = "amd_pstate_lowest_perf",
> + [AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
> + [AMD_PSTATE_NOMINAL_FREQ] = "amd_pstate_nominal_freq",
> + [AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
> + [AMD_PSTATE_MIN_FREQ] = "amd_pstate_min_freq"
> };
>
>
>

These are AMD specific values being added to a common code.
It doesn't sound right. What happens if there is a conflict
between AMD values and another vendor values?

This doesn't seem a good place to add these.

thanks,
-- Shuah


2021-09-09 22:55:16

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 18/19] cpupower: print amd-pstate information on cpupower

On 9/8/21 9:00 AM, Huang Rui wrote:
> amd-pstate kernel module is using the fine grain frequency instead of
> acpi hardware pstate. So the performance and frequency values should be
> printed in frequency-info.
>
> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/utils/cpufreq-info.c | 27 ++++++++++++++++++++---
> 1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c
> index f9895e31ff5a..9eabed209adc 100644
> --- a/tools/power/cpupower/utils/cpufreq-info.c
> +++ b/tools/power/cpupower/utils/cpufreq-info.c
> @@ -183,9 +183,30 @@ static int get_boost_mode_x86(unsigned int cpu)
> printf(_(" Supported: %s\n"), support ? _("yes") : _("no"));
> printf(_(" Active: %s\n"), active ? _("yes") : _("no"));
>
> - if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
> - cpupower_cpu_info.family >= 0x10) ||
> - cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
> + if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
> + cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
> + printf(_(" AMD PSTATE Highest Performance: %u. Maximum Frequency: "),
> + amd_pstate_get_data(cpu, HIGHEST_PERF));
> + print_speed(amd_pstate_get_data(cpu, MAX_FREQ));
> + printf(".\n");
> +
> + printf(_(" AMD PSTATE Nominal Performance: %u. Nominal Frequency: "),
> + amd_pstate_get_data(cpu, NOMINAL_PERF));
> + print_speed(amd_pstate_get_data(cpu, NOMINAL_FREQ));
> + printf(".\n");
> +
> + printf(_(" AMD PSTATE Lowest Non-linear Performance: %u. Lowest Non-linear Frequency: "),
> + amd_pstate_get_data(cpu, LOWEST_NONLINEAR_PERF));
> + print_speed(amd_pstate_get_data(cpu, LOWEST_NONLINEAR_FREQ));
> + printf(".\n");
> +
> + printf(_(" AMD PSTATE Lowest Performance: %u. Lowest Frequency: "),
> + amd_pstate_get_data(cpu, LOWEST_PERF));
> + print_speed(amd_pstate_get_data(cpu, MIN_FREQ));
> + printf(".\n");
> + } else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
> + cpupower_cpu_info.family >= 0x10) ||
> + cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
> ret = decode_pstates(cpu, b_states, pstates, &pstate_no);
> if (ret)
> return ret;
>

Same issue here - amd specific code sprinkled all over the common routines.
Needs fixing.

thanks,
-- Shuah

2021-09-09 22:57:23

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 17/19] cpupower: add amd-pstate get data function to query the info

On 9/8/21 8:59 AM, Huang Rui wrote:
> Frequency-info needs an interface to query the current amd-pstate data.
>

Let's add more information here.
> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/lib/cpufreq.c | 6 ++++++
> tools/power/cpupower/lib/cpufreq.h | 13 +++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> index 37da87bdcfb1..1443080868da 100644
> --- a/tools/power/cpupower/lib/cpufreq.c
> +++ b/tools/power/cpupower/lib/cpufreq.c
> @@ -810,3 +810,9 @@ int amd_pstate_boost_enabled(unsigned int cpu)
>
> return cpuinfo_max == amd_pstate_max ? 1 : 0;
> }
> +
> +unsigned amd_pstate_get_data(unsigned int cpu, enum amd_pstate_param param)
> +{
> + return sysfs_cpufreq_get_one_value(cpu,
> + param + AMD_PSTATE_HIGHEST_PERF);
> +}
> diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
> index d54d02a7a4f4..954e72704fc0 100644
> --- a/tools/power/cpupower/lib/cpufreq.h
> +++ b/tools/power/cpupower/lib/cpufreq.h
> @@ -206,6 +206,19 @@ int cpufreq_set_frequency(unsigned int cpu,
> int amd_pstate_boost_support(unsigned int cpu);
> int amd_pstate_boost_enabled(unsigned int cpu);
>
> +enum amd_pstate_param {
> + HIGHEST_PERF,
> + NOMINAL_PERF,
> + LOWEST_NONLINEAR_PERF,
> + LOWEST_PERF,
> + MAX_FREQ,
> + NOMINAL_FREQ,
> + LOWEST_NONLINEAR_FREQ,
> + MIN_FREQ,
> +};
> +
> +unsigned amd_pstate_get_data(unsigned int cpu, enum amd_pstate_param param);
> +
> #ifdef __cplusplus
> }
> #endif
>

amd specific things added to common files? I would like to see this patch
series redone to follow the existing common vs. vendor specific separation.

thanks,
-- Shuah


2021-09-09 23:23:10

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 16/19] cpupower: enable boost state support for amd-pstate module

On 9/8/21 8:59 AM, Huang Rui wrote:
> The AMD P-state boost API is different from ACPI hardware P-states, so
> implement the support for amd-pstate kernel module.
>

This commit log doesn't make sense. If these sysfs entries are used
for amd-pstate kernel module, why are they defined here.

Describe how these are used and the relationship between these defines
and the amd-pstate kernel module

> Signed-off-by: Huang Rui <[email protected]>
> ---
> tools/power/cpupower/lib/cpufreq.c | 20 ++++++++++++++++++++
> tools/power/cpupower/lib/cpufreq.h | 3 +++
> tools/power/cpupower/utils/helpers/misc.c | 7 +++++++
> 3 files changed, 30 insertions(+)
>
> diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> index 3f92ddadaad2..37da87bdcfb1 100644
> --- a/tools/power/cpupower/lib/cpufreq.c
> +++ b/tools/power/cpupower/lib/cpufreq.c
> @@ -790,3 +790,23 @@ unsigned long cpufreq_get_transitions(unsigned int cpu)
> {
> return sysfs_cpufreq_get_one_value(cpu, STATS_NUM_TRANSITIONS);
> }
> +
> +int amd_pstate_boost_support(unsigned int cpu)
> +{
> + unsigned int highest_perf, nominal_perf;
> +
> + highest_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_HIGHEST_PERF);
> + nominal_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_NOMINAL_PERF);
> +
> + return highest_perf > nominal_perf ? 1 : 0;
> +}
> +
> +int amd_pstate_boost_enabled(unsigned int cpu)
> +{
> + unsigned int cpuinfo_max, amd_pstate_max;
> +
> + cpuinfo_max = sysfs_cpufreq_get_one_value(cpu, CPUINFO_MAX_FREQ);
> + amd_pstate_max = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_MAX_FREQ);
> +
> + return cpuinfo_max == amd_pstate_max ? 1 : 0;
> +}

Why are these amd specific routines added to common file.
Why not add them to tools/power/cpupower/utils/helpers/amd.c?

> diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
> index 95f4fd9e2656..d54d02a7a4f4 100644
> --- a/tools/power/cpupower/lib/cpufreq.h
> +++ b/tools/power/cpupower/lib/cpufreq.h
> @@ -203,6 +203,9 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
> int cpufreq_set_frequency(unsigned int cpu,
> unsigned long target_frequency);
>
> +int amd_pstate_boost_support(unsigned int cpu);
> +int amd_pstate_boost_enabled(unsigned int cpu);
> +
> #ifdef __cplusplus
> }
> #endif
> diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> index 07d80775fb68..aba979320760 100644
> --- a/tools/power/cpupower/utils/helpers/misc.c
> +++ b/tools/power/cpupower/utils/helpers/misc.c
> @@ -10,6 +10,7 @@
> #if defined(__i386__) || defined(__x86_64__)
>
> #include "cpupower_intern.h"
> +#include "cpufreq.h"
>
> #define MSR_AMD_HWCR 0xc0010015
>
> @@ -39,6 +40,12 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
>

This logic here calls amd_pci_get_num_boost_states() ---
There is another routine called decode_pstates() in
tools/power/cpupower/utils/helpers/amd.c

if (ret)
> return ret;
> }
> + } if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&
> + amd_pstate_boost_support(cpu)) {

Coupled with the above routines, the naming amd_pstate_boost_support()
is rather confusing.

Also why is this amd_pstate_boost_support() added to
> + *support = 1;
> +
> + if (amd_pstate_boost_enabled(cpu))
> + *active = 1;
> } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
> *support = *active = 1;
> return 0;
>

thanks,
-- Shuah

2021-09-13 08:12:56

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Thu, Sep 09, 2021 at 11:01:41PM +0800, Peter Zijlstra wrote:
> On Wed, Sep 08, 2021 at 10:59:46PM +0800, Huang Rui wrote:
>
> > +struct amd_pstate_perf_funcs {
> > + int (*enable)(bool enable);
> > + int (*init_perf)(struct amd_cpudata *cpudata);
> > + void (*update_perf)(struct amd_cpudata *cpudata,
> > + u32 min_perf, u32 des_perf,
> > + u32 max_perf, bool fast_switch);
> > +};
>
> > +static int
> > +amd_pstate_enable(struct amd_pstate_perf_funcs *funcs, bool enable)
> > +{
> > + if (!funcs)
> > + return -EINVAL;
> > +
> > + return funcs->enable(enable);
> > +}
>
> > +static int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> > +{
> > + struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
> > +
> > + if (!funcs)
> > + return -EINVAL;
> > +
> > + return funcs->init_perf(cpudata);
> > +}
>
> > +static int
> > +amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> > + u32 des_perf, u32 max_perf, bool fast_switch)
> > +{
> > + struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
> > +
> > + if (!funcs)
> > + return -EINVAL;
> > +
> > + funcs->update_perf(cpudata, min_perf, des_perf,
> > + max_perf, fast_switch);
> > +
> > + return 0;
> > +}
>
> > +static struct amd_pstate_perf_funcs pstate_funcs = {
> > + .enable = pstate_enable,
> > + .init_perf = pstate_init_perf,
> > + .update_perf = pstate_update_perf,
> > +};
>
> > +static int __init amd_pstate_init(void)
> > +{
> > + int ret;
> > + struct amd_pstate_perf_funcs *funcs;
>
> > +
> > + funcs = &pstate_funcs;
>
> What is the purpose of this seemingly pointless indirection? Showing off
> how good AMD hardware is at doing retpolines or something?

Hi Petter,

Thanks to look at our codes again. We adopt your suggestion which raised
about two year ago that using the kernel governors such as schedutil to
manage frequency control for new cpufreq driver.

We will have two approaches (it depends on different AMD processor
hardware) to implement the amd-pstate driver. (Please see details in Patch
19)

1) Full MSR Support
If current hardware has the full MSR support, we register "pstate_funcs"
callback functions to implement the MSR operations to control the clocks.

The reason that we use the separated way is that we can implement the
fast_switch or adjust_perf functions for schedutil and other governors. The
fast switch function can provide the better performance and lower latency
during frequency control.

2) Shared Memory Support
If current hardware doesn't have the full MSR support, that means it only
provides share memory support. We will leverage APIs in cppc_acpi libs with
"cppc_funcs" to implement the target function for the frequency control.


The mainly reasons that we proposed a new amd-pstate driver, not use the
existing acpi-freq or cppc-cpufreq driver are below:

1. As mentioned above, amd-pstate driver can implement
fast_switch/adjust_perf function with full MSR operations that have better
performance for schedutil and other governors.

2. We will implement the AMD specific features such as Energy Performance
Preference, Preferred Core, and etc. in the amd-pstate driver next step.

3. acpi-cpufreq and cppc-cpufreq are absolutely very good drivers which
provide the general solution for ACPI standards. However,
- <i> if add amd-pstate similar support in acpi-cpufreq driver, it will
impact the legacy P-States function on old Intel and AMD processors.
- <ii> if add amd-pstate similar the support in cppc-cpufreq driver, that
will make this driver bring a lot of x86 specific or AMD specific changes
(quirks or AMD specific handling) in the cppc-cpufreq driver, then the
cppc-cpufreq driver won't be general anymore and will impact the existing
ARM SOCs. And Rafael also didn't want me to add the x86/amd specific
things in cppc-acpi before.

4. AMD will do the performance and power tunning or profiling on each AMD
CPU chip in future, different types of chips will have different policies.
For example, mobile chip and performance desktop problably have the
different frequency control policies.

5. We can maintain amd-pstate driver and handle the bugs which are reported
from community. Make sure it validated on each future and previous AMD CPU
processor, it can reduce the upstream maintenance work load. :-)

Best Regards,
Ray

2021-09-13 09:50:24

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

Hi Boris,

On Fri, Sep 10, 2021 at 01:58:19AM +0800, Borislav Petkov wrote:
> On Wed, Sep 08, 2021 at 10:59:43PM +0800, Huang Rui wrote:
> > Add Collaborative Processor Performance Control Extension feature flag
> > for AMD processors.
> >
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > arch/x86/include/asm/cpufeatures.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index d0ce5cfd3ac1..f7aea50e3371 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -313,6 +313,7 @@
> > #define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
> > #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
> > #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
> > +#define X86_FEATURE_AMD_CPPC_EXT (13*32+27) /* Collaborative Processor Performance Control Extension */
>
> Why not simply X86_FEATURE_AMD_CPPC ?

This feature flag indicates the full MSR hardware solution of AMD P-States,
if it is not set, that means we will go with in shared memory hardware
solution. So we name this as extension. I will explain the details in the
commit log at V2. ;-)

Thanks,
Ray

2021-09-13 10:19:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Mon, Sep 13, 2021 at 04:11:34PM +0800, Huang Rui wrote:
> On Thu, Sep 09, 2021 at 11:01:41PM +0800, Peter Zijlstra wrote:

> > What is the purpose of this seemingly pointless indirection? Showing off
> > how good AMD hardware is at doing retpolines or something?
>
> Hi Petter,
>
> Thanks to look at our codes again. We adopt your suggestion which raised
> about two year ago that using the kernel governors such as schedutil to
> manage frequency control for new cpufreq driver.

Indeed, no objections there :-)

> We will have two approaches (it depends on different AMD processor
> hardware) to implement the amd-pstate driver. (Please see details in Patch
> 19)

Patch 19 is RST and as such I will not read it. But I think you're
referring to patch 6, which adds another amd_pstate_perf_funcs instance,
which I seem to have missed the last time.

As such, perhaps you could do with something like the below.

> 1) Full MSR Support
> If current hardware has the full MSR support, we register "pstate_funcs"
> callback functions to implement the MSR operations to control the clocks.

What's the WRMSR cost for those? I've not really kept track of the MSR
costs on AMD platforms, but on Intel it has (luckily) been coming down
quite a bit.

> 2) Shared Memory Support
> If current hardware doesn't have the full MSR support, that means it only
> provides share memory support. We will leverage APIs in cppc_acpi libs with
> "cppc_funcs" to implement the target function for the frequency control.

Right, the mailbox thing. How is the performance of this vs MSR accesses?

> The mainly reasons that we proposed a new amd-pstate driver, not use the
> existing acpi-freq or cppc-cpufreq driver are below:

I wasn't really questioning that, much seems similar to having
intel-pstate, but since you brought it up, a few questions: -)

> 1. As mentioned above, amd-pstate driver can implement
> fast_switch/adjust_perf function with full MSR operations that have better
> performance for schedutil and other governors.

Why couldn't the existing cppc-cpufreq grow this?

> 2. We will implement the AMD specific features such as Energy Performance
> Preference, Preferred Core, and etc. in the amd-pstate driver next step.

That's the ITMT stuff, right?


---

--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -79,14 +79,6 @@ struct amd_cpudata {
bool boost_supported;
};

-struct amd_pstate_perf_funcs {
- int (*enable)(bool enable);
- int (*init_perf)(struct amd_cpudata *cpudata);
- void (*update_perf)(struct amd_cpudata *cpudata,
- u32 min_perf, u32 des_perf,
- u32 max_perf, bool fast_switch);
-};
-
static inline int pstate_enable(bool enable)
{
return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
@@ -105,13 +97,12 @@ static int cppc_enable(bool enable)
return ret;
}

-static int
-amd_pstate_enable(struct amd_pstate_perf_funcs *funcs, bool enable)
-{
- if (!funcs)
- return -EINVAL;
+static DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);

- return funcs->enable(enable);
+static inline int
+amd_pstate_enable(bool enable)
+{
+ return static_call(amd_pstate_enable)(enable);
}

static int pstate_init_perf(struct amd_cpudata *cpudata)
@@ -154,14 +145,11 @@ static int cppc_init_perf(struct amd_cpu
return 0;
}

-static int amd_pstate_init_perf(struct amd_cpudata *cpudata)
-{
- struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
+static DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);

- if (!funcs)
- return -EINVAL;
-
- return funcs->init_perf(cpudata);
+static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
+{
+ return static_call(amd_pstate_init_perf)(cpudata);
}

static void pstate_update_perf(struct amd_cpudata *cpudata,
@@ -188,19 +176,14 @@ static void cppc_update_perf(struct amd_
cppc_set_perf(cpudata->cpu, &perf_ctrls);
}

-static int
+static DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
+
+static inline int
amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
u32 des_perf, u32 max_perf, bool fast_switch)
{
- struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
-
- if (!funcs)
- return -EINVAL;
-
- funcs->update_perf(cpudata, min_perf, des_perf,
- max_perf, fast_switch);
-
- return 0;
+ return static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
+ max_perf, fast_switch);
}

static int
@@ -465,18 +448,6 @@ static int amd_pstate_init_freqs_in_cpud
return 0;
}

-static struct amd_pstate_perf_funcs pstate_funcs = {
- .enable = pstate_enable,
- .init_perf = pstate_init_perf,
- .update_perf = pstate_update_perf,
-};
-
-static struct amd_pstate_perf_funcs cppc_funcs = {
- .enable = cppc_enable,
- .init_perf = cppc_init_perf,
- .update_perf = cppc_update_perf,
-};
-
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
{
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -749,7 +720,6 @@ static struct cpufreq_driver amd_pstate_
static int __init amd_pstate_init(void)
{
int ret;
- struct amd_pstate_perf_funcs *funcs;

if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return -ENODEV;
@@ -768,22 +738,21 @@ static int __init amd_pstate_init(void)
if (boot_cpu_has(X86_FEATURE_AMD_CPPC_EXT)) {
pr_debug("%s, AMD CPPC extension functionality is supported\n",
__func__);
- funcs = &pstate_funcs;
amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
} else {
- funcs = &cppc_funcs;
+ static_call_update(amd_pstate_enable, cppc_enable);
+ static_call_update(amd_pstate_init_perf, cppc_init_perf);
+ static_call_update(amd_pstate_update_perf, cppc_update_perf);
}

/* enable amd pstate feature */
- ret = amd_pstate_enable(funcs, true);
+ ret = amd_pstate_enable(true);
if (ret) {
pr_err("%s, failed to enable amd-pstate with return %d\n",
__func__, ret);
return ret;
}

- amd_pstate_driver.driver_data = funcs;
-
ret = cpufreq_register_driver(&amd_pstate_driver);
if (ret) {
pr_err("%s, return %d\n", __func__, ret);
@@ -795,13 +764,8 @@ static int __init amd_pstate_init(void)

static void __exit amd_pstate_exit(void)
{
- struct amd_pstate_perf_funcs *funcs;
-
- funcs = cpufreq_get_driver_data();
-
cpufreq_unregister_driver(&amd_pstate_driver);
-
- amd_pstate_enable(funcs, false);
+ amd_pstate_enable(false);
}

module_init(amd_pstate_init);

2021-09-13 10:59:07

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Mon, Sep 13, 2021 at 04:56:24PM +0800, Peter Zijlstra wrote:
> On Mon, Sep 13, 2021 at 04:11:34PM +0800, Huang Rui wrote:
> > On Thu, Sep 09, 2021 at 11:01:41PM +0800, Peter Zijlstra wrote:
>
> > > What is the purpose of this seemingly pointless indirection? Showing off
> > > how good AMD hardware is at doing retpolines or something?
> >
> > Hi Petter,
> >
> > Thanks to look at our codes again. We adopt your suggestion which raised
> > about two year ago that using the kernel governors such as schedutil to
> > manage frequency control for new cpufreq driver.
>
> Indeed, no objections there :-)
>
> > We will have two approaches (it depends on different AMD processor
> > hardware) to implement the amd-pstate driver. (Please see details in Patch
> > 19)
>
> Patch 19 is RST and as such I will not read it. But I think you're
> referring to patch 6, which adds another amd_pstate_perf_funcs instance,
> which I seem to have missed the last time.

Yes, right. No problem. ;-)

>
> As such, perhaps you could do with something like the below.
>
> > 1) Full MSR Support
> > If current hardware has the full MSR support, we register "pstate_funcs"
> > callback functions to implement the MSR operations to control the clocks.
>
> What's the WRMSR cost for those? I've not really kept track of the MSR
> costs on AMD platforms, but on Intel it has (luckily) been coming down
> quite a bit.

Good to know this, I didn't have a chance to give a check. May I know how
did you test this latency? But MSR is new hardware design for this
solution, as designer mentioned, the WRMSR is low-latency register model is
faster than ACPI AML code interpreter.

>
> > 2) Shared Memory Support
> > If current hardware doesn't have the full MSR support, that means it only
> > provides share memory support. We will leverage APIs in cppc_acpi libs with
> > "cppc_funcs" to implement the target function for the frequency control.
>
> Right, the mailbox thing. How is the performance of this vs MSR accesses?

I will give a check. If you have a existing test method that can be used, I
can check it quickly.

>
> > The mainly reasons that we proposed a new amd-pstate driver, not use the
> > existing acpi-freq or cppc-cpufreq driver are below:
>
> I wasn't really questioning that, much seems similar to having
> intel-pstate, but since you brought it up, a few questions: -)

Thank you!

>
> > 1. As mentioned above, amd-pstate driver can implement
> > fast_switch/adjust_perf function with full MSR operations that have better
> > performance for schedutil and other governors.
>
> Why couldn't the existing cppc-cpufreq grow this?

Because fast_switch can adjust the frequency directly in the interrupt
context, if we use the acpi cppc handling with shared memory solution, it
will have a deadlock. So fast switch needs the control with registers
directly like acpi-cpufreq and intel-pstate.

>
> > 2. We will implement the AMD specific features such as Energy Performance
> > Preference, Preferred Core, and etc. in the amd-pstate driver next step.
>
> That's the ITMT stuff, right?

Similar with ITMT. :-)

>
>
> ---
>
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -79,14 +79,6 @@ struct amd_cpudata {
> bool boost_supported;
> };
>
> -struct amd_pstate_perf_funcs {
> - int (*enable)(bool enable);
> - int (*init_perf)(struct amd_cpudata *cpudata);
> - void (*update_perf)(struct amd_cpudata *cpudata,
> - u32 min_perf, u32 des_perf,
> - u32 max_perf, bool fast_switch);
> -};
> -
> static inline int pstate_enable(bool enable)
> {
> return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
> @@ -105,13 +97,12 @@ static int cppc_enable(bool enable)
> return ret;
> }
>
> -static int
> -amd_pstate_enable(struct amd_pstate_perf_funcs *funcs, bool enable)
> -{
> - if (!funcs)
> - return -EINVAL;
> +static DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
>
> - return funcs->enable(enable);
> +static inline int
> +amd_pstate_enable(bool enable)
> +{
> + return static_call(amd_pstate_enable)(enable);
> }
>
> static int pstate_init_perf(struct amd_cpudata *cpudata)
> @@ -154,14 +145,11 @@ static int cppc_init_perf(struct amd_cpu
> return 0;
> }
>
> -static int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> -{
> - struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
> +static DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
>
> - if (!funcs)
> - return -EINVAL;
> -
> - return funcs->init_perf(cpudata);
> +static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> + return static_call(amd_pstate_init_perf)(cpudata);
> }
>
> static void pstate_update_perf(struct amd_cpudata *cpudata,
> @@ -188,19 +176,14 @@ static void cppc_update_perf(struct amd_
> cppc_set_perf(cpudata->cpu, &perf_ctrls);
> }
>
> -static int
> +static DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
> +
> +static inline int
> amd_pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> u32 des_perf, u32 max_perf, bool fast_switch)
> {
> - struct amd_pstate_perf_funcs *funcs = cpufreq_get_driver_data();
> -
> - if (!funcs)
> - return -EINVAL;
> -
> - funcs->update_perf(cpudata, min_perf, des_perf,
> - max_perf, fast_switch);
> -
> - return 0;
> + return static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
> + max_perf, fast_switch);
> }
>
> static int
> @@ -465,18 +448,6 @@ static int amd_pstate_init_freqs_in_cpud
> return 0;
> }
>
> -static struct amd_pstate_perf_funcs pstate_funcs = {
> - .enable = pstate_enable,
> - .init_perf = pstate_init_perf,
> - .update_perf = pstate_update_perf,
> -};
> -
> -static struct amd_pstate_perf_funcs cppc_funcs = {
> - .enable = cppc_enable,
> - .init_perf = cppc_init_perf,
> - .update_perf = cppc_update_perf,
> -};
> -
> static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> {
> int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> @@ -749,7 +720,6 @@ static struct cpufreq_driver amd_pstate_
> static int __init amd_pstate_init(void)
> {
> int ret;
> - struct amd_pstate_perf_funcs *funcs;
>
> if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> return -ENODEV;
> @@ -768,22 +738,21 @@ static int __init amd_pstate_init(void)
> if (boot_cpu_has(X86_FEATURE_AMD_CPPC_EXT)) {
> pr_debug("%s, AMD CPPC extension functionality is supported\n",
> __func__);
> - funcs = &pstate_funcs;
> amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> } else {
> - funcs = &cppc_funcs;
> + static_call_update(amd_pstate_enable, cppc_enable);
> + static_call_update(amd_pstate_init_perf, cppc_init_perf);
> + static_call_update(amd_pstate_update_perf, cppc_update_perf);

Thanks again for detailed example, I will update to this approach at V2.

Best Regards,
Ray

> }
>
> /* enable amd pstate feature */
> - ret = amd_pstate_enable(funcs, true);
> + ret = amd_pstate_enable(true);
> if (ret) {
> pr_err("%s, failed to enable amd-pstate with return %d\n",
> __func__, ret);
> return ret;
> }
>
> - amd_pstate_driver.driver_data = funcs;
> -
> ret = cpufreq_register_driver(&amd_pstate_driver);
> if (ret) {
> pr_err("%s, return %d\n", __func__, ret);
> @@ -795,13 +764,8 @@ static int __init amd_pstate_init(void)
>
> static void __exit amd_pstate_exit(void)
> {
> - struct amd_pstate_perf_funcs *funcs;
> -
> - funcs = cpufreq_get_driver_data();
> -
> cpufreq_unregister_driver(&amd_pstate_driver);
> -
> - amd_pstate_enable(funcs, false);
> + amd_pstate_enable(false);
> }
>
> module_init(amd_pstate_init);

2021-09-13 11:37:48

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 13/19] cpupower: add the function to check amd-pstate enabled

On Fri, Sep 10, 2021 at 06:16:21AM +0800, Shuah Khan wrote:
> On 9/8/21 8:59 AM, Huang Rui wrote:
> > Introduce the cpupower_amd_pstate_enabled() to check whether the kernel
> > mode enables amd-pstate.
> >
>
> What does "kernel mode" mean? Are you referring to kernel vs.
> firmware mode?

I am referring kernel. In fact, the proccessor which supports the AMD
P-States also supports the legacy ACPI P-States as well. So this API is to
check whether the kernel driver is using amd-pstate or acpi-cpufreq.

I would have explained this, sorry to make you confused, I will explain the
details in the commit log at V2.

Thanks,
Ray

>
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > tools/power/cpupower/utils/helpers/helpers.h | 5 +++++
> > tools/power/cpupower/utils/helpers/misc.c | 20 ++++++++++++++++++++
> > 2 files changed, 25 insertions(+)
> >
> > diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
> > index b4813efdfb00..eb43c14d1728 100644
> > --- a/tools/power/cpupower/utils/helpers/helpers.h
> > +++ b/tools/power/cpupower/utils/helpers/helpers.h
> > @@ -136,6 +136,11 @@ extern int decode_pstates(unsigned int cpu, int boost_states,
> >
> > extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
> > int *active, int * states);
> > +
> > +/* AMD PSTATE enabling **************************/
> > +
> > +extern unsigned long cpupower_amd_pstate_enabled(unsigned int cpu);
> > +
> > /*
> > * CPUID functions returning a single datum
> > */
> > diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> > index fc6e34511721..07d80775fb68 100644
> > --- a/tools/power/cpupower/utils/helpers/misc.c
> > +++ b/tools/power/cpupower/utils/helpers/misc.c
> > @@ -83,6 +83,26 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val)
> > return 0;
> > }
> >
> > +unsigned long cpupower_amd_pstate_enabled(unsigned int cpu)
> > +{
> > + char linebuf[MAX_LINE_LEN];
> > + char path[SYSFS_PATH_MAX];
> > + unsigned long val;
> > + char *endp;
> > +
> > + snprintf(path, sizeof(path),
> > + PATH_TO_CPU "cpu%u/cpufreq/is_amd_pstate_enabled", cpu);
> > +
> > + if (cpupower_read_sysfs(path, linebuf, MAX_LINE_LEN) == 0)
> > + return 0;
> > +
> > + val = strtoul(linebuf, &endp, 0);
> > + if (endp == linebuf || errno == ERANGE)
> > + return 0;
> > +
> > + return val;
> > +}
> > +
> > #endif /* #if defined(__i386__) || defined(__x86_64__) */
> >
> > /* get_cpustate
> >
>

2021-09-13 12:00:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Mon, Sep 13, 2021 at 06:54:58PM +0800, Huang Rui wrote:
> On Mon, Sep 13, 2021 at 04:56:24PM +0800, Peter Zijlstra wrote:

> > > 1) Full MSR Support
> > > If current hardware has the full MSR support, we register "pstate_funcs"
> > > callback functions to implement the MSR operations to control the clocks.
> >
> > What's the WRMSR cost for those? I've not really kept track of the MSR
> > costs on AMD platforms, but on Intel it has (luckily) been coming down
> > quite a bit.
>
> Good to know this, I didn't have a chance to give a check. May I know how
> did you test this latency? But MSR is new hardware design for this
> solution, as designer mentioned, the WRMSR is low-latency register model is
> faster than ACPI AML code interpreter.
>
> >
> > > 2) Shared Memory Support
> > > If current hardware doesn't have the full MSR support, that means it only
> > > provides share memory support. We will leverage APIs in cppc_acpi libs with
> > > "cppc_funcs" to implement the target function for the frequency control.
> >
> > Right, the mailbox thing. How is the performance of this vs MSR accesses?
>
> I will give a check. If you have a existing test method that can be used, I
> can check it quickly.

Oh, I was mostly wondering if using the mailbox as MMIO would be faster
than an MSR, but you've already answered that above. Also:

> > > 1. As mentioned above, amd-pstate driver can implement
> > > fast_switch/adjust_perf function with full MSR operations that have better
> > > performance for schedutil and other governors.
> >
> > Why couldn't the existing cppc-cpufreq grow this?
>
> Because fast_switch can adjust the frequency directly in the interrupt
> context, if we use the acpi cppc handling with shared memory solution, it
> will have a deadlock. So fast switch needs the control with registers
> directly like acpi-cpufreq and intel-pstate.

Aah, I see, you're only doing fast_switch support when you have MSRs.
That was totally non-obvious.. :/

But then amd_pstate_adjust_perf() could just direct call the pstate
methods and we don't need that indirection *at*all*, right?

2021-09-13 12:54:02

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Fri, Sep 10, 2021 at 03:31:26AM +0800, Fontenot, Nathan wrote:
>
> On 9/8/2021 9:59 AM, Huang Rui wrote:
> > amd-pstate is the AMD CPU performance scaling driver that introduces a
> > new CPU frequency control mechanism on AMD Zen based CPU series in Linux
> > kernel. The new mechanism is based on Collaborative processor
> > performance control (CPPC) which is finer grain frequency management
> > than legacy ACPI hardware P-States. Current AMD CPU platforms are using
> > the ACPI P-states driver to manage CPU frequency and clocks with
> > switching only in 3 P-states. AMD P-States is to replace the ACPI
> > P-states controls, allows a flexible, low-latency interface for the
> > Linux kernel to directly communicate the performance hints to hardware.
> >
>
> This patch seems like it is just enabling CPPC on AMD and not a new mechanism
> based on CPPC. Can you clarify?
>
> Also, if this is just enabling CPPC, shouldn't the driver be named something
> like amd_cppc and not amd_pstate? This isn't using P-states.

That's just a name. We use the "amd-pstate" to indicate the new driver that
use the kernel governors such as schedutil and others for frequency
control. And "amd_cppc" indicates the legacy solution with userspace tool
for frequency control.

>
> > "amd-pstate" leverages the Linux kernel governors such as *schedutil*,
> > *ondemand*, etc. to manage the performance hints which are provided by CPPC
> > hardware functionality. The first version for amd-pstate is to support one
> > of the Zen3 processors, and we will support more in future after we verify
> > the hardware and SBIOS functionalities.
> >
> > There are two types of hardware implementations for amd-pstate: one is full
> > MSR support and another is shared memory support. It can use
> > X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.
> >
>
> Looking at the drivers/acpi code for CPPC I don't think this distinction
> between MSRs and shared memory requires a feature flag. Shouldn't this be
> handled properly in cpc_read|write if the ACPI tables are set up correctly?
> Please correct me if I'm wrong.

MSR registers can be used for implementing the fast switch function which
has the better performance on schedutil and other governors.

>
> This would also remove the need for the additional indirection pointed
> out by Peter.
>
> Could you also provide an explanation as to why a new CPPC driver is need
> instead of updating the existing cppc_cpufreq driver.
>

Good question, I had explained these in last mail with Peter.

Thanks,
Ray

2021-09-13 13:06:27

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 14/19] cpupower: initial AMD P-state capability

On Fri, Sep 10, 2021 at 06:16:50AM +0800, Shuah Khan wrote:
> On 9/8/21 8:59 AM, Huang Rui wrote:
> > If kernel enables AMD P-state, cpupower won't need to respond ACPI
> > hardware P-states function anymore.
> >
>
> This commit log doesn't seem to match the code change. I see it
> calling cpupower_amd_pstate_enabled() and setting flags.

Hmm, yes, I should reword this as well. If the kernel uses the amd-pstate
module, we only need CPUPOWER_CAP_AMD_PSTATE flag, and disable the legacy
acpi-cpufreq relative flags.

>
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > tools/power/cpupower/utils/helpers/cpuid.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c
> > index 72eb43593180..78218c54acca 100644
> > --- a/tools/power/cpupower/utils/helpers/cpuid.c
> > +++ b/tools/power/cpupower/utils/helpers/cpuid.c
> > @@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info)
> > if (ext_cpuid_level >= 0x80000008 &&
> > cpuid_ebx(0x80000008) & (1 << 4))
> > cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU;
> > +
> > + if (cpupower_amd_pstate_enabled(0)) {
>
> What is the reason for calling this function with cpu id = 0?

No specific reason, actually, any cpu id should be ok here.

Thanks,
Ray

>
>
> > + cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE;
> > +
> > + /*
> > + * If AMD P-state is enabled, the firmware will treat
> > + * AMD P-state function as high priority.
> > + */
> > + cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB;
> > + cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR;
> > + cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE;
> > + cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF;
> > + }
> > }
> >
> > if (cpu_info->vendor == X86_VENDOR_INTEL) {
> >
>
> thanks,
> -- Shuah

2021-09-13 13:12:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

On Mon, Sep 13, 2021 at 05:48:51PM +0800, Huang Rui wrote:
> This feature flag indicates the full MSR hardware solution of AMD
> P-States, if it is not set, that means we will go with in shared
> memory hardware solution. So we name this as extension.

Nobody cares whether it is an extension except you guys. Also, having
AMD_CPPC_EXT suggests there already is AMD_CPPC. But there isn't.

So call it X86_FEATURE_AMD_CPPC, please, for simplicity's sake.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-13 17:59:20

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Thu, Sep 09, 2021 at 11:03:54PM +0800, Peter Zijlstra wrote:
> On Wed, Sep 08, 2021 at 10:59:46PM +0800, Huang Rui wrote:
>
> > +static int pstate_init_perf(struct amd_cpudata *cpudata)
> > +{
> > + u64 cap1;
> > +
> > + int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> > + &cap1);
> > + if (ret)
> > + return ret;
> > +
> > + /* Some AMD processors has specific power features that the cppc entry
> > + * doesn't indicate the highest performance. It will introduce the
> > + * feature in following days.
> > + */
>
> Wrong comment style; also imagine reading this comment half a year from
> now...

How about use a "TODO" to indicate the next step here?

Thanks,
Ray

>
> > + WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> > +
> > + WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
> > + WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
> > + WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
> > +
> > + return 0;
> > +}

2021-09-16 08:49:02

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 15/19] cpupower: add amd-pstate sysfs entries into libcpufreq

On Fri, Sep 10, 2021 at 06:26:06AM +0800, Shuah Khan wrote:
> On 9/8/21 8:59 AM, Huang Rui wrote:
> > These amd-pstate sysfs entries will be used on cpupower for amd-pstate
> > kernel module.
> >
>
> This commit log doesn't make sense. If these sysfs entries are used
> for amd-pstate kernel module, why are they defined here.
>
> Describe how these are used and the relationship between these defines
> and the amd-pstate kernel module
>
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > tools/power/cpupower/lib/cpufreq.c | 18 +++++++++++++++++-
> > 1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> > index c3b56db8b921..3f92ddadaad2 100644
> > --- a/tools/power/cpupower/lib/cpufreq.c
> > +++ b/tools/power/cpupower/lib/cpufreq.c
> > @@ -69,6 +69,14 @@ enum cpufreq_value {
> > SCALING_MIN_FREQ,
> > SCALING_MAX_FREQ,
> > STATS_NUM_TRANSITIONS,
> > + AMD_PSTATE_HIGHEST_PERF,
> > + AMD_PSTATE_NOMINAL_PERF,
> > + AMD_PSTATE_LOWEST_NONLINEAR_PERF,
> > + AMD_PSTATE_LOWEST_PERF,
> > + AMD_PSTATE_MAX_FREQ,
> > + AMD_PSTATE_NOMINAL_FREQ,
> > + AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
> > + AMD_PSTATE_MIN_FREQ,
> > MAX_CPUFREQ_VALUE_READ_FILES
> > };
> >
>
> These are AMD specific values being added to a common code.
>
> > @@ -80,7 +88,15 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = {
> > [SCALING_CUR_FREQ] = "scaling_cur_freq",
> > [SCALING_MIN_FREQ] = "scaling_min_freq",
> > [SCALING_MAX_FREQ] = "scaling_max_freq",
> > - [STATS_NUM_TRANSITIONS] = "stats/total_trans"
> > + [STATS_NUM_TRANSITIONS] = "stats/total_trans",
> > + [AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
> > + [AMD_PSTATE_NOMINAL_PERF] = "amd_pstate_nominal_perf",
> > + [AMD_PSTATE_LOWEST_NONLINEAR_PERF] = "amd_pstate_lowest_nonlinear_perf",
> > + [AMD_PSTATE_LOWEST_PERF] = "amd_pstate_lowest_perf",
> > + [AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
> > + [AMD_PSTATE_NOMINAL_FREQ] = "amd_pstate_nominal_freq",
> > + [AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
> > + [AMD_PSTATE_MIN_FREQ] = "amd_pstate_min_freq"
> > };
> >
> >
> >
>
> These are AMD specific values being added to a common code.
> It doesn't sound right. What happens if there is a conflict
> between AMD values and another vendor values?
>
> This doesn't seem a good place to add these.
>

Shuah, thanks for your suggestion, I went through the cpupower patches
again. And yes, we should not combine the amd specific and common things
together.

Could I expose a simliar sysfs_cpufreq_get_one_value in cpupower_intern.h
header, and move amd_pstate_* function implementation into
utils/helpers/amd.c? It can keep the lib/cpufreq still general.

Thanks,
Ray

2021-09-16 09:31:16

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 18/19] cpupower: print amd-pstate information on cpupower

On Fri, Sep 10, 2021 at 06:46:39AM +0800, Shuah Khan wrote:
> On 9/8/21 9:00 AM, Huang Rui wrote:
> > amd-pstate kernel module is using the fine grain frequency instead of
> > acpi hardware pstate. So the performance and frequency values should be
> > printed in frequency-info.
> >
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > tools/power/cpupower/utils/cpufreq-info.c | 27 ++++++++++++++++++++---
> > 1 file changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c
> > index f9895e31ff5a..9eabed209adc 100644
> > --- a/tools/power/cpupower/utils/cpufreq-info.c
> > +++ b/tools/power/cpupower/utils/cpufreq-info.c
> > @@ -183,9 +183,30 @@ static int get_boost_mode_x86(unsigned int cpu)
> > printf(_(" Supported: %s\n"), support ? _("yes") : _("no"));
> > printf(_(" Active: %s\n"), active ? _("yes") : _("no"));
> >
> > - if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
> > - cpupower_cpu_info.family >= 0x10) ||
> > - cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
> > + if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
> > + cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
> > + printf(_(" AMD PSTATE Highest Performance: %u. Maximum Frequency: "),
> > + amd_pstate_get_data(cpu, HIGHEST_PERF));
> > + print_speed(amd_pstate_get_data(cpu, MAX_FREQ));
> > + printf(".\n");
> > +
> > + printf(_(" AMD PSTATE Nominal Performance: %u. Nominal Frequency: "),
> > + amd_pstate_get_data(cpu, NOMINAL_PERF));
> > + print_speed(amd_pstate_get_data(cpu, NOMINAL_FREQ));
> > + printf(".\n");
> > +
> > + printf(_(" AMD PSTATE Lowest Non-linear Performance: %u. Lowest Non-linear Frequency: "),
> > + amd_pstate_get_data(cpu, LOWEST_NONLINEAR_PERF));
> > + print_speed(amd_pstate_get_data(cpu, LOWEST_NONLINEAR_FREQ));
> > + printf(".\n");
> > +
> > + printf(_(" AMD PSTATE Lowest Performance: %u. Lowest Frequency: "),
> > + amd_pstate_get_data(cpu, LOWEST_PERF));
> > + print_speed(amd_pstate_get_data(cpu, MIN_FREQ));
> > + printf(".\n");
> > + } else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
> > + cpupower_cpu_info.family >= 0x10) ||
> > + cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
> > ret = decode_pstates(cpu, b_states, pstates, &pstate_no);
> > if (ret)
> > return ret;
> >
>
> Same issue here - amd specific code sprinkled all over the common routines.
> Needs fixing.

I will make a funciton in utils/helpers/amd.c to prints amd_pstate status.

Thanks,
Ray

2021-09-16 09:31:24

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 16/19] cpupower: enable boost state support for amd-pstate module

On Fri, Sep 10, 2021 at 06:42:42AM +0800, Shuah Khan wrote:
> On 9/8/21 8:59 AM, Huang Rui wrote:
> > The AMD P-state boost API is different from ACPI hardware P-states, so
> > implement the support for amd-pstate kernel module.
> >
>
> This commit log doesn't make sense. If these sysfs entries are used
> for amd-pstate kernel module, why are they defined here.
>
> Describe how these are used and the relationship between these defines
> and the amd-pstate kernel module

Will refine the commit log in V2.

>
> > Signed-off-by: Huang Rui <[email protected]>
> > ---
> > tools/power/cpupower/lib/cpufreq.c | 20 ++++++++++++++++++++
> > tools/power/cpupower/lib/cpufreq.h | 3 +++
> > tools/power/cpupower/utils/helpers/misc.c | 7 +++++++
> > 3 files changed, 30 insertions(+)
> >
> > diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
> > index 3f92ddadaad2..37da87bdcfb1 100644
> > --- a/tools/power/cpupower/lib/cpufreq.c
> > +++ b/tools/power/cpupower/lib/cpufreq.c
> > @@ -790,3 +790,23 @@ unsigned long cpufreq_get_transitions(unsigned int cpu)
> > {
> > return sysfs_cpufreq_get_one_value(cpu, STATS_NUM_TRANSITIONS);
> > }
> > +
> > +int amd_pstate_boost_support(unsigned int cpu)
> > +{
> > + unsigned int highest_perf, nominal_perf;
> > +
> > + highest_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_HIGHEST_PERF);
> > + nominal_perf = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_NOMINAL_PERF);
> > +
> > + return highest_perf > nominal_perf ? 1 : 0;
> > +}
> > +
> > +int amd_pstate_boost_enabled(unsigned int cpu)
> > +{
> > + unsigned int cpuinfo_max, amd_pstate_max;
> > +
> > + cpuinfo_max = sysfs_cpufreq_get_one_value(cpu, CPUINFO_MAX_FREQ);
> > + amd_pstate_max = sysfs_cpufreq_get_one_value(cpu, AMD_PSTATE_MAX_FREQ);
> > +
> > + return cpuinfo_max == amd_pstate_max ? 1 : 0;
> > +}
>
> Why are these amd specific routines added to common file.
> Why not add them to tools/power/cpupower/utils/helpers/amd.c?

You're right. As mentioned at last mail, I move all the amd_pstate* from
lib/cpufreq.c to utils/helpers/amd.c

>
> > diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
> > index 95f4fd9e2656..d54d02a7a4f4 100644
> > --- a/tools/power/cpupower/lib/cpufreq.h
> > +++ b/tools/power/cpupower/lib/cpufreq.h
> > @@ -203,6 +203,9 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
> > int cpufreq_set_frequency(unsigned int cpu,
> > unsigned long target_frequency);
> >
> > +int amd_pstate_boost_support(unsigned int cpu);
> > +int amd_pstate_boost_enabled(unsigned int cpu);
> > +
> > #ifdef __cplusplus
> > }
> > #endif
> > diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> > index 07d80775fb68..aba979320760 100644
> > --- a/tools/power/cpupower/utils/helpers/misc.c
> > +++ b/tools/power/cpupower/utils/helpers/misc.c
> > @@ -10,6 +10,7 @@
> > #if defined(__i386__) || defined(__x86_64__)
> >
> > #include "cpupower_intern.h"
> > +#include "cpufreq.h"
> >
> > #define MSR_AMD_HWCR 0xc0010015
> >
> > @@ -39,6 +40,12 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
> >
>
> This logic here calls amd_pci_get_num_boost_states() ---
> There is another routine called decode_pstates() in
> tools/power/cpupower/utils/helpers/amd.c
>

The decode_pstates() is for legacy ACPI hardware Pstates (AMD only has 3),
but new amd_pstate function supports the finer grain frequency range. It's
the different hardware design. So we don't have the pstate number anymore.

> if (ret)
> > return ret;
> > }
> > + } if ((cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) &&
> > + amd_pstate_boost_support(cpu)) {
>
> Coupled with the above routines, the naming amd_pstate_boost_support()
> is rather confusing.
>
> Also why is this amd_pstate_boost_support() added to
> > + *support = 1;
> > +
> > + if (amd_pstate_boost_enabled(cpu))
> > + *active = 1;

OK, yes, it can be merged in one function here. Will update this in V2.

If the boost is not enabled, the maximum perf will be the normal perf
(similiar with P0 before).

Thanks,
Ray

2021-09-16 09:33:36

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 01/19] x86/cpufreatures: add AMD CPPC extension feature flag

On Mon, Sep 13, 2021 at 09:04:42PM +0800, Borislav Petkov wrote:
> On Mon, Sep 13, 2021 at 05:48:51PM +0800, Huang Rui wrote:
> > This feature flag indicates the full MSR hardware solution of AMD
> > P-States, if it is not set, that means we will go with in shared
> > memory hardware solution. So we name this as extension.
>
> Nobody cares whether it is an extension except you guys. Also, having
> AMD_CPPC_EXT suggests there already is AMD_CPPC. But there isn't.
>
> So call it X86_FEATURE_AMD_CPPC, please, for simplicity's sake.
>

OK, no problem. I will update this in V2.

Thanks,
Ray

2021-09-16 10:11:39

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Mon, Sep 13, 2021 at 07:56:32PM +0800, Peter Zijlstra wrote:
> On Mon, Sep 13, 2021 at 06:54:58PM +0800, Huang Rui wrote:
> > On Mon, Sep 13, 2021 at 04:56:24PM +0800, Peter Zijlstra wrote:
>
> > > > 1) Full MSR Support
> > > > If current hardware has the full MSR support, we register "pstate_funcs"
> > > > callback functions to implement the MSR operations to control the clocks.
> > >
> > > What's the WRMSR cost for those? I've not really kept track of the MSR
> > > costs on AMD platforms, but on Intel it has (luckily) been coming down
> > > quite a bit.
> >
> > Good to know this, I didn't have a chance to give a check. May I know how
> > did you test this latency? But MSR is new hardware design for this
> > solution, as designer mentioned, the WRMSR is low-latency register model is
> > faster than ACPI AML code interpreter.
> >
> > >
> > > > 2) Shared Memory Support
> > > > If current hardware doesn't have the full MSR support, that means it only
> > > > provides share memory support. We will leverage APIs in cppc_acpi libs with
> > > > "cppc_funcs" to implement the target function for the frequency control.
> > >
> > > Right, the mailbox thing. How is the performance of this vs MSR accesses?
> >
> > I will give a check. If you have a existing test method that can be used, I
> > can check it quickly.
>
> Oh, I was mostly wondering if using the mailbox as MMIO would be faster
> than an MSR, but you've already answered that above. Also:
>
> > > > 1. As mentioned above, amd-pstate driver can implement
> > > > fast_switch/adjust_perf function with full MSR operations that have better
> > > > performance for schedutil and other governors.
> > >
> > > Why couldn't the existing cppc-cpufreq grow this?
> >
> > Because fast_switch can adjust the frequency directly in the interrupt
> > context, if we use the acpi cppc handling with shared memory solution, it
> > will have a deadlock. So fast switch needs the control with registers
> > directly like acpi-cpufreq and intel-pstate.
>
> Aah, I see, you're only doing fast_switch support when you have MSRs.
> That was totally non-obvious.. :/

Yes, I should have written a comment to there. :-)
Will update this in V2.

>
> But then amd_pstate_adjust_perf() could just direct call the pstate
> methods and we don't need that indirection *at*all*, right?

Hmm, yes, if we use amd_pstate_adjust_perf here, we won't need to call
amd_pstate_fast_switch. I saw intel_pstate had adjust_perf and fast_switch
at the same time, would you mind to let me know how to distinguish these
two use scenario on intel processors?

Thanks,
Ray

2021-09-16 11:23:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Thu, Sep 16, 2021 at 12:09 PM Huang Rui <[email protected]> wrote:
>
> On Mon, Sep 13, 2021 at 07:56:32PM +0800, Peter Zijlstra wrote:
> > On Mon, Sep 13, 2021 at 06:54:58PM +0800, Huang Rui wrote:
> > > On Mon, Sep 13, 2021 at 04:56:24PM +0800, Peter Zijlstra wrote:
> >
> > > > > 1) Full MSR Support
> > > > > If current hardware has the full MSR support, we register "pstate_funcs"
> > > > > callback functions to implement the MSR operations to control the clocks.
> > > >
> > > > What's the WRMSR cost for those? I've not really kept track of the MSR
> > > > costs on AMD platforms, but on Intel it has (luckily) been coming down
> > > > quite a bit.
> > >
> > > Good to know this, I didn't have a chance to give a check. May I know how
> > > did you test this latency? But MSR is new hardware design for this
> > > solution, as designer mentioned, the WRMSR is low-latency register model is
> > > faster than ACPI AML code interpreter.
> > >
> > > >
> > > > > 2) Shared Memory Support
> > > > > If current hardware doesn't have the full MSR support, that means it only
> > > > > provides share memory support. We will leverage APIs in cppc_acpi libs with
> > > > > "cppc_funcs" to implement the target function for the frequency control.
> > > >
> > > > Right, the mailbox thing. How is the performance of this vs MSR accesses?
> > >
> > > I will give a check. If you have a existing test method that can be used, I
> > > can check it quickly.
> >
> > Oh, I was mostly wondering if using the mailbox as MMIO would be faster
> > than an MSR, but you've already answered that above. Also:
> >
> > > > > 1. As mentioned above, amd-pstate driver can implement
> > > > > fast_switch/adjust_perf function with full MSR operations that have better
> > > > > performance for schedutil and other governors.
> > > >
> > > > Why couldn't the existing cppc-cpufreq grow this?
> > >
> > > Because fast_switch can adjust the frequency directly in the interrupt
> > > context, if we use the acpi cppc handling with shared memory solution, it
> > > will have a deadlock. So fast switch needs the control with registers
> > > directly like acpi-cpufreq and intel-pstate.
> >
> > Aah, I see, you're only doing fast_switch support when you have MSRs.
> > That was totally non-obvious.. :/
>
> Yes, I should have written a comment to there. :-)
> Will update this in V2.
>
> >
> > But then amd_pstate_adjust_perf() could just direct call the pstate
> > methods and we don't need that indirection *at*all*, right?
>
> Hmm, yes, if we use amd_pstate_adjust_perf here, we won't need to call
> amd_pstate_fast_switch. I saw intel_pstate had adjust_perf and fast_switch
> at the same time, would you mind to let me know how to distinguish these
> two use scenario on intel processors?

The ->fast_switch() callback is for the use cases in which
->adjust_perf() cannot be installed, that is basically systems without
HWP enabled.

2021-09-17 16:58:21

by Huang Rui

[permalink] [raw]
Subject: Re: [PATCH 04/19] cpufreq: amd: introduce a new amd pstate driver to support future processors

On Thu, Sep 16, 2021 at 07:19:21PM +0800, Rafael J. Wysocki wrote:
> On Thu, Sep 16, 2021 at 12:09 PM Huang Rui <[email protected]> wrote:
> >
> > On Mon, Sep 13, 2021 at 07:56:32PM +0800, Peter Zijlstra wrote:
> > > On Mon, Sep 13, 2021 at 06:54:58PM +0800, Huang Rui wrote:
> > > > On Mon, Sep 13, 2021 at 04:56:24PM +0800, Peter Zijlstra wrote:
> > >
> > > > > > 1) Full MSR Support
> > > > > > If current hardware has the full MSR support, we register "pstate_funcs"
> > > > > > callback functions to implement the MSR operations to control the clocks.
> > > > >
> > > > > What's the WRMSR cost for those? I've not really kept track of the MSR
> > > > > costs on AMD platforms, but on Intel it has (luckily) been coming down
> > > > > quite a bit.
> > > >
> > > > Good to know this, I didn't have a chance to give a check. May I know how
> > > > did you test this latency? But MSR is new hardware design for this
> > > > solution, as designer mentioned, the WRMSR is low-latency register model is
> > > > faster than ACPI AML code interpreter.
> > > >
> > > > >
> > > > > > 2) Shared Memory Support
> > > > > > If current hardware doesn't have the full MSR support, that means it only
> > > > > > provides share memory support. We will leverage APIs in cppc_acpi libs with
> > > > > > "cppc_funcs" to implement the target function for the frequency control.
> > > > >
> > > > > Right, the mailbox thing. How is the performance of this vs MSR accesses?
> > > >
> > > > I will give a check. If you have a existing test method that can be used, I
> > > > can check it quickly.
> > >
> > > Oh, I was mostly wondering if using the mailbox as MMIO would be faster
> > > than an MSR, but you've already answered that above. Also:
> > >
> > > > > > 1. As mentioned above, amd-pstate driver can implement
> > > > > > fast_switch/adjust_perf function with full MSR operations that have better
> > > > > > performance for schedutil and other governors.
> > > > >
> > > > > Why couldn't the existing cppc-cpufreq grow this?
> > > >
> > > > Because fast_switch can adjust the frequency directly in the interrupt
> > > > context, if we use the acpi cppc handling with shared memory solution, it
> > > > will have a deadlock. So fast switch needs the control with registers
> > > > directly like acpi-cpufreq and intel-pstate.
> > >
> > > Aah, I see, you're only doing fast_switch support when you have MSRs.
> > > That was totally non-obvious.. :/
> >
> > Yes, I should have written a comment to there. :-)
> > Will update this in V2.
> >
> > >
> > > But then amd_pstate_adjust_perf() could just direct call the pstate
> > > methods and we don't need that indirection *at*all*, right?
> >
> > Hmm, yes, if we use amd_pstate_adjust_perf here, we won't need to call
> > amd_pstate_fast_switch. I saw intel_pstate had adjust_perf and fast_switch
> > at the same time, would you mind to let me know how to distinguish these
> > two use scenario on intel processors?
>
> The ->fast_switch() callback is for the use cases in which
> ->adjust_perf() cannot be installed, that is basically systems without
> HWP enabled.

OK, I see. Thanks to clarify this.

Thanks,
Ray