Hi,
Even after the changes made very recently, the handling of the powersave
governor is not exactly as expected when intel_pstate operates in the
"passive" mode with HWP enabled.
Namely, in that case HWP is not limited to the policy min frequency, but it
can scale the frequency up to the policy max limit and it cannot be constrained
currently, because there are no provisions for that in the framework.
To address that, patches [1-3/4] add a new governor flag to indicate that this
governor wants the target frequency to be set to the exact value passed to the
driver, if possible, and change the powersave and performance governors to have
that flag set.
The last patch makes intel_pstate take that flag into account when programming
the HWP Request MSR.
Thanks!
From: Rafael J. Wysocki <[email protected]>
Make intel_pstate take the new CPUFREQ_GOV_FLAG_STRICT_TARGET
governor flag into account when it operates in the passive mode with
HWP enabled, so as to fix the "powersave" governor behavior in that
case (currently, HWP is allowed to scale the performance all the way
up to the policy max limit when the "powersave" governor is used,
but it should be constrained to the policy min limit then).
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2527,7 +2527,7 @@ static void intel_cpufreq_trace(struct c
}
static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate,
- bool fast_switch)
+ bool strict, bool fast_switch)
{
u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev;
@@ -2539,7 +2539,7 @@ static void intel_cpufreq_adjust_hwp(str
* field in it, so opportunistically update the max too if needed.
*/
value &= ~HWP_MAX_PERF(~0L);
- value |= HWP_MAX_PERF(cpu->max_perf_ratio);
+ value |= HWP_MAX_PERF(strict ? target_pstate : cpu->max_perf_ratio);
if (value == prev)
return;
@@ -2562,14 +2562,16 @@ static void intel_cpufreq_adjust_perf_ct
pstate_funcs.get_val(cpu, target_pstate));
}
-static int intel_cpufreq_update_pstate(struct cpudata *cpu, int target_pstate,
- bool fast_switch)
+static int intel_cpufreq_update_pstate(struct cpufreq_policy *policy,
+ int target_pstate, bool fast_switch)
{
+ struct cpudata *cpu = all_cpu_data[policy->cpu];
int old_pstate = cpu->pstate.current_pstate;
target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
if (hwp_active) {
- intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch);
+ intel_cpufreq_adjust_hwp(cpu, target_pstate,
+ policy->strict_target, fast_switch);
cpu->pstate.current_pstate = target_pstate;
} else if (target_pstate != old_pstate) {
intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
@@ -2609,7 +2611,7 @@ static int intel_cpufreq_target(struct c
break;
}
- target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, false);
+ target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, false);
freqs.new = target_pstate * cpu->pstate.scaling;
@@ -2628,7 +2630,7 @@ static unsigned int intel_cpufreq_fast_s
target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
- target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, true);
+ target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, true);
return target_pstate * cpu->pstate.scaling;
}
On 09-11-20, 17:55, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Make intel_pstate take the new CPUFREQ_GOV_FLAG_STRICT_TARGET
> governor flag into account when it operates in the passive mode with
> HWP enabled, so as to fix the "powersave" governor behavior in that
> case (currently, HWP is allowed to scale the performance all the way
> up to the policy max limit when the "powersave" governor is used,
> but it should be constrained to the policy min limit then).
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/cpufreq/intel_pstate.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2527,7 +2527,7 @@ static void intel_cpufreq_trace(struct c
> }
>
> static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate,
> - bool fast_switch)
> + bool strict, bool fast_switch)
> {
> u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev;
>
> @@ -2539,7 +2539,7 @@ static void intel_cpufreq_adjust_hwp(str
> * field in it, so opportunistically update the max too if needed.
> */
> value &= ~HWP_MAX_PERF(~0L);
> - value |= HWP_MAX_PERF(cpu->max_perf_ratio);
> + value |= HWP_MAX_PERF(strict ? target_pstate : cpu->max_perf_ratio);
>
> if (value == prev)
> return;
> @@ -2562,14 +2562,16 @@ static void intel_cpufreq_adjust_perf_ct
> pstate_funcs.get_val(cpu, target_pstate));
> }
>
> -static int intel_cpufreq_update_pstate(struct cpudata *cpu, int target_pstate,
> - bool fast_switch)
> +static int intel_cpufreq_update_pstate(struct cpufreq_policy *policy,
> + int target_pstate, bool fast_switch)
> {
> + struct cpudata *cpu = all_cpu_data[policy->cpu];
> int old_pstate = cpu->pstate.current_pstate;
>
> target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
> if (hwp_active) {
> - intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch);
> + intel_cpufreq_adjust_hwp(cpu, target_pstate,
> + policy->strict_target, fast_switch);
> cpu->pstate.current_pstate = target_pstate;
> } else if (target_pstate != old_pstate) {
> intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
> @@ -2609,7 +2611,7 @@ static int intel_cpufreq_target(struct c
> break;
> }
>
> - target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, false);
> + target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, false);
>
> freqs.new = target_pstate * cpu->pstate.scaling;
>
> @@ -2628,7 +2630,7 @@ static unsigned int intel_cpufreq_fast_s
>
> target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
>
> - target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, true);
> + target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, true);
>
> return target_pstate * cpu->pstate.scaling;
> }
Acked-by: Viresh Kumar <[email protected]>
--
viresh
Hi,
On Monday, November 9, 2020 5:49:49 PM CET Rafael J. Wysocki wrote:
>
> Even after the changes made very recently, the handling of the powersave
> governor is not exactly as expected when intel_pstate operates in the
> "passive" mode with HWP enabled.
>
> Namely, in that case HWP is not limited to the policy min frequency, but it
> can scale the frequency up to the policy max limit and it cannot be constrained
> currently, because there are no provisions for that in the framework.
>
> To address that, patches [1-3/4] add a new governor flag to indicate that this
> governor wants the target frequency to be set to the exact value passed to the
> driver, if possible, and change the powersave and performance governors to have
> that flag set.
>
> The last patch makes intel_pstate take that flag into account when programming
> the HWP Request MSR.
The v3 simply uses different names for the new governor flags.
Thanks!
From: Rafael J. Wysocki <[email protected]>
Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the
governors that want the target frequency to be set exactly to the
given value without leaving any room for adjustments on the hardware
side and set this flag for the powersave and performance governors.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq_performance.c | 1 +
drivers/cpufreq/cpufreq_powersave.c | 1 +
include/linux/cpufreq.h | 3 +++
3 files changed, 5 insertions(+)
Index: linux-pm/drivers/cpufreq/cpufreq_performance.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_performance.c
+++ linux-pm/drivers/cpufreq/cpufreq_performance.c
@@ -20,6 +20,7 @@ static void cpufreq_gov_performance_limi
static struct cpufreq_governor cpufreq_gov_performance = {
.name = "performance",
.owner = THIS_MODULE,
+ .flags = CPUFREQ_GOV_STRICT_TARGET,
.limits = cpufreq_gov_performance_limits,
};
Index: linux-pm/drivers/cpufreq/cpufreq_powersave.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_powersave.c
+++ linux-pm/drivers/cpufreq/cpufreq_powersave.c
@@ -21,6 +21,7 @@ static struct cpufreq_governor cpufreq_g
.name = "powersave",
.limits = cpufreq_gov_powersave_limits,
.owner = THIS_MODULE,
+ .flags = CPUFREQ_GOV_STRICT_TARGET,
};
MODULE_AUTHOR("Dominik Brodowski <[email protected]>");
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -575,6 +575,9 @@ struct cpufreq_governor {
/* For governors which change frequency dynamically by themselves */
#define CPUFREQ_GOV_DYNAMIC_SWITCHING BIT(0)
+/* For governors wanting the target frequency to be set exactly */
+#define CPUFREQ_GOV_STRICT_TARGET BIT(1)
+
/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
From: Rafael J. Wysocki <[email protected]>
A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/cpufreq_governor.h | 2 +-
include/linux/cpufreq.h | 9 +++++++--
kernel/sched/cpufreq_schedutil.c | 2 +-
4 files changed, 10 insertions(+), 5 deletions(-)
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -2254,7 +2254,7 @@ static int cpufreq_init_governor(struct
return -EINVAL;
/* Platform doesn't want dynamic frequency switching ? */
- if (policy->governor->dynamic_switching &&
+ if (policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING &&
cpufreq_driver->flags & CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING) {
struct cpufreq_governor *gov = cpufreq_fallback_governor();
Index: linux-pm/drivers/cpufreq/cpufreq_governor.h
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h
+++ linux-pm/drivers/cpufreq/cpufreq_governor.h
@@ -156,7 +156,7 @@ void cpufreq_dbs_governor_limits(struct
#define CPUFREQ_DBS_GOVERNOR_INITIALIZER(_name_) \
{ \
.name = _name_, \
- .dynamic_switching = true, \
+ .flags = CPUFREQ_GOV_DYNAMIC_SWITCHING, \
.owner = THIS_MODULE, \
.init = cpufreq_dbs_governor_init, \
.exit = cpufreq_dbs_governor_exit, \
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -565,12 +565,17 @@ struct cpufreq_governor {
char *buf);
int (*store_setspeed) (struct cpufreq_policy *policy,
unsigned int freq);
- /* For governors which change frequency dynamically by themselves */
- bool dynamic_switching;
struct list_head governor_list;
struct module *owner;
+ u8 flags;
};
+/* Governor flags */
+
+/* For governors which change frequency dynamically by themselves */
+#define CPUFREQ_GOV_DYNAMIC_SWITCHING BIT(0)
+
+
/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq);
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -881,7 +881,7 @@ static void sugov_limits(struct cpufreq_
struct cpufreq_governor schedutil_gov = {
.name = "schedutil",
.owner = THIS_MODULE,
- .dynamic_switching = true,
+ .flags = CPUFREQ_GOV_DYNAMIC_SWITCHING,
.init = sugov_init,
.exit = sugov_exit,
.start = sugov_start,
On 2020.11.10 09:22 Rafael J. Wysocki wrote:
> On Monday, November 9, 2020 5:49:49 PM CET Rafael J. Wysocki wrote:
>>
>> Even after the changes made very recently, the handling of the powersave
>> governor is not exactly as expected when intel_pstate operates in the
>> "passive" mode with HWP enabled.
>>
>> Namely, in that case HWP is not limited to the policy min frequency, but it
>> can scale the frequency up to the policy max limit and it cannot be constrained
>> currently, because there are no provisions for that in the framework.
>>
>> To address that, patches [1-3/4] add a new governor flag to indicate that this
>> governor wants the target frequency to be set to the exact value passed to the
>> driver, if possible, and change the powersave and performance governors to have
>> that flag set.
>>
>> The last patch makes intel_pstate take that flag into account when programming
>> the HWP Request MSR.
>
> The v3 simply uses different names for the new governor flags.
Thank you.
I tested v2, with positive results, as reported for v1. I do not have time to
re-test v3.
My input is to also default this flag to be set for the userspace and ondemand governors.
userspace: I tested with and without this flag set, and the flag is needed if
the user expects the scaling_setspeed to be enforced.
Disclaimer: I don't normally actually use the userspace governor.
ondemand: from my tests, the ondemand response more closely mimics acpi-ondemand with the flag set.
Power consumption has been better for the limited testing done.
However, it is also a function of work/sleep frequency for periodic workflows and a function of
INTEL_CPUFREQ_TRANSITION_DELAY_HWP. I am saying that my ability to support the suggestion to default
to setting the flag is a little weak.
... Doug