Hi All,
The HWP calibration in intel_pstate is needed to map HWP performance levels to
frequencies, which are used in the cpufreq sysfs interface, in a reliable way.
On all non-hybrid "core" platforms it is sufficient to multiply the HWP
performance levels by 100000 to obtain the corresponding frequencies, but on
hybrid ones there is a difference between P-cores and E-cores.
Previous attempts to make this work were based on using CPPC (and in particular
the nominal performance values provided by _CPC), but it turns out that the
CPPC information is not sufficiently reliable for this purpose and the only
way to do it is to use a hard-coded scaling factors for P-cores and for E-cores
(which fortunately is the same as in the non-hybrid case). Fortunately, the
same scaling factor for P-cores works on all of the hybrid platforms to date.
The first patch in the series ensures that all of the CPUs will use correct
information from MSRs by avoiding the situations in which an MSR values read
on one CPU will be used for performance scaling of another CPU.
The second one implements the approach outlined above.
Please see the changelogs for details.
Thanks!
On Mon, 2022-10-24 at 21:18 +0200, Rafael J. Wysocki wrote:
> Hi All,
>
> The HWP calibration in intel_pstate is needed to map HWP performance
> levels to
> frequencies, which are used in the cpufreq sysfs interface, in a
> reliable way.
> On all non-hybrid "core" platforms it is sufficient to multiply the
> HWP
> performance levels by 100000 to obtain the corresponding frequencies,
> but on
> hybrid ones there is a difference between P-cores and E-cores.
>
> Previous attempts to make this work were based on using CPPC (and in
> particular
> the nominal performance values provided by _CPC), but it turns out
> that the
> CPPC information is not sufficiently reliable for this purpose and
> the only
> way to do it is to use a hard-coded scaling factors for P-cores and
> for E-cores
> (which fortunately is the same as in the non-hybrid case).
> Fortunately, the
> same scaling factor for P-cores works on all of the hybrid platforms
> to date.
>
> The first patch in the series ensures that all of the CPUs will use
> correct
> information from MSRs by avoiding the situations in which an MSR
> values read
> on one CPU will be used for performance scaling of another CPU.
>
> The second one implements the approach outlined above.
>
> Please see the changelogs for details.
Acked-by: Srinivas Pandruvada <[email protected]>
Tested-by: Srinivas Pandruvada <[email protected]>
>
> Thanks!
>
>
>
From: Rafael J. Wysocki <[email protected]>
Some of the MSR accesses in intel_pstate are carried out on the CPU
that is running the code, but the values coming from them are used
for the performance scaling of the other CPUs.
This is problematic, for example, on hybrid platforms where
MSR_TURBO_RATIO_LIMIT for P-cores and E-cores is different, so the
values read from it on a P-core are generally not applicable to E-cores
and the other way around.
For this reason, make the driver access all MSRs on the target CPU on
platforms using the "core" pstate_funcs callbacks which is the case for
all of the hybrid platforms released to date. For this purpose, pass
a CPU argument to the ->get_max(), ->get_max_physical(), ->get_min()
and ->get_turbo() pstate_funcs callbacks and from there pass it to
rdmsrl_on_cpu() or rdmsrl_safe_on_cpu() to access the MSR on the target
CPU.
Fixes: 46573fd6369f ("cpufreq: intel_pstate: hybrid: Rework HWP calibration")
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 66 ++++++++++++++++++++---------------------
1 file changed, 33 insertions(+), 33 deletions(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -280,10 +280,10 @@ static struct cpudata **all_cpu_data;
* structure is used to store those callbacks.
*/
struct pstate_funcs {
- int (*get_max)(void);
- int (*get_max_physical)(void);
- int (*get_min)(void);
- int (*get_turbo)(void);
+ int (*get_max)(int cpu);
+ int (*get_max_physical)(int cpu);
+ int (*get_min)(int cpu);
+ int (*get_turbo)(int cpu);
int (*get_scaling)(void);
int (*get_cpu_scaling)(int cpu);
int (*get_aperf_mperf_shift)(void);
@@ -531,12 +531,12 @@ static void intel_pstate_hybrid_hwp_adju
{
int perf_ctl_max_phys = cpu->pstate.max_pstate_physical;
int perf_ctl_scaling = cpu->pstate.perf_ctl_scaling;
- int perf_ctl_turbo = pstate_funcs.get_turbo();
+ int perf_ctl_turbo = pstate_funcs.get_turbo(cpu->cpu);
int turbo_freq = perf_ctl_turbo * perf_ctl_scaling;
int scaling = cpu->pstate.scaling;
pr_debug("CPU%d: perf_ctl_max_phys = %d\n", cpu->cpu, perf_ctl_max_phys);
- pr_debug("CPU%d: perf_ctl_max = %d\n", cpu->cpu, pstate_funcs.get_max());
+ pr_debug("CPU%d: perf_ctl_max = %d\n", cpu->cpu, pstate_funcs.get_max(cpu->cpu));
pr_debug("CPU%d: perf_ctl_turbo = %d\n", cpu->cpu, perf_ctl_turbo);
pr_debug("CPU%d: perf_ctl_scaling = %d\n", cpu->cpu, perf_ctl_scaling);
pr_debug("CPU%d: HWP_CAP guaranteed = %d\n", cpu->cpu, cpu->pstate.max_pstate);
@@ -1740,7 +1740,7 @@ static void intel_pstate_hwp_enable(stru
intel_pstate_update_epp_defaults(cpudata);
}
-static int atom_get_min_pstate(void)
+static int atom_get_min_pstate(int not_used)
{
u64 value;
@@ -1748,7 +1748,7 @@ static int atom_get_min_pstate(void)
return (value >> 8) & 0x7F;
}
-static int atom_get_max_pstate(void)
+static int atom_get_max_pstate(int not_used)
{
u64 value;
@@ -1756,7 +1756,7 @@ static int atom_get_max_pstate(void)
return (value >> 16) & 0x7F;
}
-static int atom_get_turbo_pstate(void)
+static int atom_get_turbo_pstate(int not_used)
{
u64 value;
@@ -1834,23 +1834,23 @@ static void atom_get_vid(struct cpudata
cpudata->vid.turbo = value & 0x7f;
}
-static int core_get_min_pstate(void)
+static int core_get_min_pstate(int cpu)
{
u64 value;
- rdmsrl(MSR_PLATFORM_INFO, value);
+ rdmsrl_on_cpu(cpu, MSR_PLATFORM_INFO, &value);
return (value >> 40) & 0xFF;
}
-static int core_get_max_pstate_physical(void)
+static int core_get_max_pstate_physical(int cpu)
{
u64 value;
- rdmsrl(MSR_PLATFORM_INFO, value);
+ rdmsrl_on_cpu(cpu, MSR_PLATFORM_INFO, &value);
return (value >> 8) & 0xFF;
}
-static int core_get_tdp_ratio(u64 plat_info)
+static int core_get_tdp_ratio(int cpu, u64 plat_info)
{
/* Check how many TDP levels present */
if (plat_info & 0x600000000) {
@@ -1860,13 +1860,13 @@ static int core_get_tdp_ratio(u64 plat_i
int err;
/* Get the TDP level (0, 1, 2) to get ratios */
- err = rdmsrl_safe(MSR_CONFIG_TDP_CONTROL, &tdp_ctrl);
+ err = rdmsrl_safe_on_cpu(cpu, MSR_CONFIG_TDP_CONTROL, &tdp_ctrl);
if (err)
return err;
/* TDP MSR are continuous starting at 0x648 */
tdp_msr = MSR_CONFIG_TDP_NOMINAL + (tdp_ctrl & 0x03);
- err = rdmsrl_safe(tdp_msr, &tdp_ratio);
+ err = rdmsrl_safe_on_cpu(cpu, tdp_msr, &tdp_ratio);
if (err)
return err;
@@ -1883,7 +1883,7 @@ static int core_get_tdp_ratio(u64 plat_i
return -ENXIO;
}
-static int core_get_max_pstate(void)
+static int core_get_max_pstate(int cpu)
{
u64 tar;
u64 plat_info;
@@ -1891,10 +1891,10 @@ static int core_get_max_pstate(void)
int tdp_ratio;
int err;
- rdmsrl(MSR_PLATFORM_INFO, plat_info);
+ rdmsrl_on_cpu(cpu, MSR_PLATFORM_INFO, &plat_info);
max_pstate = (plat_info >> 8) & 0xFF;
- tdp_ratio = core_get_tdp_ratio(plat_info);
+ tdp_ratio = core_get_tdp_ratio(cpu, plat_info);
if (tdp_ratio <= 0)
return max_pstate;
@@ -1903,7 +1903,7 @@ static int core_get_max_pstate(void)
return tdp_ratio;
}
- err = rdmsrl_safe(MSR_TURBO_ACTIVATION_RATIO, &tar);
+ err = rdmsrl_safe_on_cpu(cpu, MSR_TURBO_ACTIVATION_RATIO, &tar);
if (!err) {
int tar_levels;
@@ -1918,13 +1918,13 @@ static int core_get_max_pstate(void)
return max_pstate;
}
-static int core_get_turbo_pstate(void)
+static int core_get_turbo_pstate(int cpu)
{
u64 value;
int nont, ret;
- rdmsrl(MSR_TURBO_RATIO_LIMIT, value);
- nont = core_get_max_pstate();
+ rdmsrl_on_cpu(cpu, MSR_TURBO_RATIO_LIMIT, &value);
+ nont = core_get_max_pstate(cpu);
ret = (value) & 255;
if (ret <= nont)
ret = nont;
@@ -1952,13 +1952,13 @@ static int knl_get_aperf_mperf_shift(voi
return 10;
}
-static int knl_get_turbo_pstate(void)
+static int knl_get_turbo_pstate(int cpu)
{
u64 value;
int nont, ret;
- rdmsrl(MSR_TURBO_RATIO_LIMIT, value);
- nont = core_get_max_pstate();
+ rdmsrl_on_cpu(cpu, MSR_TURBO_RATIO_LIMIT, &value);
+ nont = core_get_max_pstate(cpu);
ret = (((value) >> 8) & 0xFF);
if (ret <= nont)
ret = nont;
@@ -2025,10 +2025,10 @@ static void intel_pstate_max_within_limi
static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
{
- int perf_ctl_max_phys = pstate_funcs.get_max_physical();
+ int perf_ctl_max_phys = pstate_funcs.get_max_physical(cpu->cpu);
int perf_ctl_scaling = pstate_funcs.get_scaling();
- cpu->pstate.min_pstate = pstate_funcs.get_min();
+ cpu->pstate.min_pstate = pstate_funcs.get_min(cpu->cpu);
cpu->pstate.max_pstate_physical = perf_ctl_max_phys;
cpu->pstate.perf_ctl_scaling = perf_ctl_scaling;
@@ -2044,8 +2044,8 @@ static void intel_pstate_get_cpu_pstates
}
} else {
cpu->pstate.scaling = perf_ctl_scaling;
- cpu->pstate.max_pstate = pstate_funcs.get_max();
- cpu->pstate.turbo_pstate = pstate_funcs.get_turbo();
+ cpu->pstate.max_pstate = pstate_funcs.get_max(cpu->cpu);
+ cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(cpu->cpu);
}
if (cpu->pstate.scaling == perf_ctl_scaling) {
@@ -3221,9 +3221,9 @@ static unsigned int force_load __initdat
static int __init intel_pstate_msrs_not_valid(void)
{
- if (!pstate_funcs.get_max() ||
- !pstate_funcs.get_min() ||
- !pstate_funcs.get_turbo())
+ if (!pstate_funcs.get_max(0) ||
+ !pstate_funcs.get_min(0) ||
+ !pstate_funcs.get_turbo(0))
return -ENODEV;
return 0;
From: Rafael J. Wysocki <[email protected]>
Commit 46573fd6369f ("cpufreq: intel_pstate: hybrid: Rework HWP
calibration") attempted to use the information from CPPC (the nominal
performance in particular) to obtain the scaling factor allowing the
frequency to be computed if the HWP performance level of the given CPU
is known or vice versa.
However, it turns out that on some platforms this doesn't work, because
the CPPC information on them does not align with the contents of the
MSR_HWP_CAPABILITIES registers.
This basically means that the only way to make intel_pstate work on all
of the hybrid platforms to date is to use the observation that on all
of them the scaling factor between the HWP performance levels and
frequency for P-cores is 78741 (approximately 100000/1.27). For
E-cores it is 100000, which is the same as for all of the non-hybrid
"core" platforms and does not require any changes.
Accordingly, make intel_pstate use 78741 as the scaling factor between
HWP performance levels and frequency for P-cores on all hybrid platforms
and drop the dependency of the HWP calibration code on CPPC.
Fixes: 46573fd6369f ("cpufreq: intel_pstate: hybrid: Rework HWP calibration")
Reported-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 69 ++++++++---------------------------------
1 file changed, 15 insertions(+), 54 deletions(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -27,6 +27,7 @@
#include <linux/pm_qos.h>
#include <trace/events/power.h>
+#include <asm/cpu.h>
#include <asm/div64.h>
#include <asm/msr.h>
#include <asm/cpu_device_id.h>
@@ -398,16 +399,6 @@ static int intel_pstate_get_cppc_guarant
return cppc_perf.nominal_perf;
}
-
-static u32 intel_pstate_cppc_nominal(int cpu)
-{
- u64 nominal_perf;
-
- if (cppc_get_nominal_perf(cpu, &nominal_perf))
- return 0;
-
- return nominal_perf;
-}
#else /* CONFIG_ACPI_CPPC_LIB */
static inline void intel_pstate_set_itmt_prio(int cpu)
{
@@ -532,34 +523,17 @@ static void intel_pstate_hybrid_hwp_adju
int perf_ctl_max_phys = cpu->pstate.max_pstate_physical;
int perf_ctl_scaling = cpu->pstate.perf_ctl_scaling;
int perf_ctl_turbo = pstate_funcs.get_turbo(cpu->cpu);
- int turbo_freq = perf_ctl_turbo * perf_ctl_scaling;
int scaling = cpu->pstate.scaling;
pr_debug("CPU%d: perf_ctl_max_phys = %d\n", cpu->cpu, perf_ctl_max_phys);
- pr_debug("CPU%d: perf_ctl_max = %d\n", cpu->cpu, pstate_funcs.get_max(cpu->cpu));
pr_debug("CPU%d: perf_ctl_turbo = %d\n", cpu->cpu, perf_ctl_turbo);
pr_debug("CPU%d: perf_ctl_scaling = %d\n", cpu->cpu, perf_ctl_scaling);
pr_debug("CPU%d: HWP_CAP guaranteed = %d\n", cpu->cpu, cpu->pstate.max_pstate);
pr_debug("CPU%d: HWP_CAP highest = %d\n", cpu->cpu, cpu->pstate.turbo_pstate);
pr_debug("CPU%d: HWP-to-frequency scaling factor: %d\n", cpu->cpu, scaling);
- /*
- * If the product of the HWP performance scaling factor and the HWP_CAP
- * highest performance is greater than the maximum turbo frequency
- * corresponding to the pstate_funcs.get_turbo() return value, the
- * scaling factor is too high, so recompute it to make the HWP_CAP
- * highest performance correspond to the maximum turbo frequency.
- */
- cpu->pstate.turbo_freq = cpu->pstate.turbo_pstate * scaling;
- if (turbo_freq < cpu->pstate.turbo_freq) {
- cpu->pstate.turbo_freq = turbo_freq;
- scaling = DIV_ROUND_UP(turbo_freq, cpu->pstate.turbo_pstate);
- cpu->pstate.scaling = scaling;
-
- pr_debug("CPU%d: refined HWP-to-frequency scaling factor: %d\n",
- cpu->cpu, scaling);
- }
-
+ cpu->pstate.turbo_freq = rounddown(cpu->pstate.turbo_pstate * scaling,
+ perf_ctl_scaling);
cpu->pstate.max_freq = rounddown(cpu->pstate.max_pstate * scaling,
perf_ctl_scaling);
@@ -1965,37 +1939,24 @@ static int knl_get_turbo_pstate(int cpu)
return ret;
}
-#ifdef CONFIG_ACPI_CPPC_LIB
-static u32 hybrid_ref_perf;
-
-static int hybrid_get_cpu_scaling(int cpu)
+static void hybrid_get_type(void *data)
{
- return DIV_ROUND_UP(core_get_scaling() * hybrid_ref_perf,
- intel_pstate_cppc_nominal(cpu));
+ u8 *cpu_type = data;
+
+ *cpu_type = get_this_hybrid_cpu_type();
}
-static void intel_pstate_cppc_set_cpu_scaling(void)
+static int hybrid_get_cpu_scaling(int cpu)
{
- u32 min_nominal_perf = U32_MAX;
- int cpu;
+ u8 cpu_type = 0;
- for_each_present_cpu(cpu) {
- u32 nominal_perf = intel_pstate_cppc_nominal(cpu);
+ smp_call_function_single(cpu, hybrid_get_type, &cpu_type, 1);
+ /* P-cores have a smaller perf level-to-freqency scaling factor. */
+ if (cpu_type == 0x40)
+ return 78741;
- if (nominal_perf && nominal_perf < min_nominal_perf)
- min_nominal_perf = nominal_perf;
- }
-
- if (min_nominal_perf < U32_MAX) {
- hybrid_ref_perf = min_nominal_perf;
- pstate_funcs.get_cpu_scaling = hybrid_get_cpu_scaling;
- }
+ return core_get_scaling();
}
-#else
-static inline void intel_pstate_cppc_set_cpu_scaling(void)
-{
-}
-#endif /* CONFIG_ACPI_CPPC_LIB */
static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate)
{
@@ -3450,7 +3411,7 @@ static int __init intel_pstate_init(void
default_driver = &intel_pstate;
if (boot_cpu_has(X86_FEATURE_HYBRID_CPU))
- intel_pstate_cppc_set_cpu_scaling();
+ pstate_funcs.get_cpu_scaling = hybrid_get_cpu_scaling;
goto hwp_cpu_matched;
}
On Tue, Oct 25, 2022 at 1:58 AM srinivas pandruvada
<[email protected]> wrote:
>
> On Mon, 2022-10-24 at 21:18 +0200, Rafael J. Wysocki wrote:
> > Hi All,
> >
> > The HWP calibration in intel_pstate is needed to map HWP performance
> > levels to
> > frequencies, which are used in the cpufreq sysfs interface, in a
> > reliable way.
> > On all non-hybrid "core" platforms it is sufficient to multiply the
> > HWP
> > performance levels by 100000 to obtain the corresponding frequencies,
> > but on
> > hybrid ones there is a difference between P-cores and E-cores.
> >
> > Previous attempts to make this work were based on using CPPC (and in
> > particular
> > the nominal performance values provided by _CPC), but it turns out
> > that the
> > CPPC information is not sufficiently reliable for this purpose and
> > the only
> > way to do it is to use a hard-coded scaling factors for P-cores and
> > for E-cores
> > (which fortunately is the same as in the non-hybrid case).
> > Fortunately, the
> > same scaling factor for P-cores works on all of the hybrid platforms
> > to date.
> >
> > The first patch in the series ensures that all of the CPUs will use
> > correct
> > information from MSRs by avoiding the situations in which an MSR
> > values read
> > on one CPU will be used for performance scaling of another CPU.
> >
> > The second one implements the approach outlined above.
> >
> > Please see the changelogs for details.
>
> Acked-by: Srinivas Pandruvada <[email protected]>
> Tested-by: Srinivas Pandruvada <[email protected]>
Thank you!
As discussed offline, I'm going to fast-track this series as urgent
fixes to cover systems in the field that are likely to have problems
related to it.