Hi All,
There is a problem in intel_pstate that if it works in the passive mode with
HWP enabled, changing the policy max frequency may not cause the HWP max limit
to be updated accordingly which is quite confusing and may be incorrect.
That happens because of two checks, one in the cpufreq core and one in the
driver itself, that are there to avoid unnecessary HW/FW updates when the
current frequency doesn't change. Of course, that is the case when the
policy max limit changes under the "powersave" governor (which sets the
current frequency to the policy min limit) and in that particular case,
the checks turn out to be harmful.
There is also an analogous problem in the schedutil governor that prevents
driver callbacks from being invoked if the target frequency doesn't change
and which also affects intel_pstate in the passive mode with HWP enabled
(see the changelog of patch [4/4]).
The v2 addresses some review comments from Viresh and adds patches [3-4/4] to
address the schedutil issue.
The cleanup posted as the [2/2] previously will be applied independently and
it is not included in the v2.
Thanks!
From: Rafael J. Wysocki <[email protected]>
If the cpufreq policy max limit is changed when intel_pstate operates
in the passive mode with HWP enabled and the "powersave" governor is
used on top of it, the HWP max limit is not updated as appropriate.
Namely, in the "powersave" governor case, the target P-state
is always equal to the policy min limit, so if the latter does
not change, intel_cpufreq_adjust_hwp() is not invoked to update
the HWP Request MSR due to the "target_pstate != old_pstate" check
in intel_cpufreq_update_pstate(), so the HWP max limit is not
updated as a result.
Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the
driver and the target frequency does not change along with the
policy max limit, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents the driver's ->target() callback
from being invoked at all, so the HWP max limit is not updated.
To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag
in the intel_cpufreq driver structure if HWP is enabled and modify
intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate"
check only in the non-HWP case and let intel_cpufreq_adjust_hwp()
always run in the HWP case (it will update HWP Request only if the
cached value of the register is different from the new one including
the limits, so if neither the target P-state value nor the max limit
changes, the register write will still be avoided).
Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Zhang Rui <[email protected]>
Cc: 5.9+ <[email protected]> # 5.9+
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
The v2 is just the intel_pstate changes (without the core changes) and setting
the new flag.
---
drivers/cpufreq/intel_pstate.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2550,14 +2550,12 @@ static int intel_cpufreq_update_pstate(s
int old_pstate = cpu->pstate.current_pstate;
target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
- if (target_pstate != old_pstate) {
+ if (hwp_active) {
+ intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch);
+ cpu->pstate.current_pstate = target_pstate;
+ } else if (target_pstate != old_pstate) {
+ intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
cpu->pstate.current_pstate = target_pstate;
- if (hwp_active)
- intel_cpufreq_adjust_hwp(cpu, target_pstate,
- fast_switch);
- else
- intel_cpufreq_adjust_perf_ctl(cpu, target_pstate,
- fast_switch);
}
intel_cpufreq_trace(cpu, fast_switch ? INTEL_PSTATE_TRACE_FAST_SWITCH :
@@ -3014,6 +3012,7 @@ static int __init intel_pstate_init(void
hwp_mode_bdw = id->driver_data;
intel_pstate.attr = hwp_cpufreq_attrs;
intel_cpufreq.attr = hwp_cpufreq_attrs;
+ intel_cpufreq.flags |= CPUFREQ_NEED_UPDATE_LIMITS;
if (!default_driver)
default_driver = &intel_pstate;
From: Rafael J. Wysocki <[email protected]>
Generally, a cpufreq driver may need to update some internal upper
and lower frequency boundaries on policy max and min changes,
respectively, but currently this does not work if the target
frequency does not change along with the policy limit.
Namely, if the target frequency does not change along with the
policy min or max, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents driver callbacks from being
invoked and they do not even have a chance to update the
corresponding internal boundary.
This particularly affects the "powersave" and "performance"
governors that always set the target frequency to one of the
policy limits and it never changes when the other limit is updated.
To allow cpufreq the drivers needing to update internal frequency
boundaries on policy limits changes to avoid this issue, introduce
a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will
neutralize the check mentioned above.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
New patch in v2.
---
drivers/cpufreq/cpufreq.c | 3 ++-
include/linux/cpufreq.h | 10 +++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -297,7 +297,7 @@ __ATTR(_name, 0644, show_##_name, store_
struct cpufreq_driver {
char name[CPUFREQ_NAME_LEN];
- u8 flags;
+ u16 flags;
void *driver_data;
/* needed by all drivers */
@@ -417,6 +417,14 @@ struct cpufreq_driver {
*/
#define CPUFREQ_IS_COOLING_DEV BIT(7)
+/*
+ * Set by drivers that need to update internale upper and lower boundaries along
+ * with the target frequency and so the core and governors should also invoke
+ * the diver if the target frequency does not change, but the policy min or max
+ * may have changed.
+ */
+#define CPUFREQ_NEED_UPDATE_LIMITS BIT(8)
+
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -2191,7 +2191,8 @@ int __cpufreq_driver_target(struct cpufr
* exactly same freq is called again and so we can save on few function
* calls.
*/
- if (target_freq == policy->cur)
+ if (target_freq == policy->cur &&
+ !(cpufreq_driver->flags & CPUFREQ_NEED_UPDATE_LIMITS))
return 0;
if (cpufreq_driver->target)
On 23-10-20, 17:35, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> If the cpufreq policy max limit is changed when intel_pstate operates
> in the passive mode with HWP enabled and the "powersave" governor is
> used on top of it, the HWP max limit is not updated as appropriate.
>
> Namely, in the "powersave" governor case, the target P-state
> is always equal to the policy min limit, so if the latter does
> not change, intel_cpufreq_adjust_hwp() is not invoked to update
> the HWP Request MSR due to the "target_pstate != old_pstate" check
> in intel_cpufreq_update_pstate(), so the HWP max limit is not
> updated as a result.
>
> Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the
> driver and the target frequency does not change along with the
> policy max limit, the "target_freq == policy->cur" check in
> __cpufreq_driver_target() prevents the driver's ->target() callback
> from being invoked at all, so the HWP max limit is not updated.
>
> To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag
> in the intel_cpufreq driver structure if HWP is enabled and modify
> intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate"
> check only in the non-HWP case and let intel_cpufreq_adjust_hwp()
> always run in the HWP case (it will update HWP Request only if the
> cached value of the register is different from the new one including
> the limits, so if neither the target P-state value nor the max limit
> changes, the register write will still be avoided).
>
> Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
> Reported-by: Zhang Rui <[email protected]>
> Cc: 5.9+ <[email protected]> # 5.9+
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> The v2 is just the intel_pstate changes (without the core changes) and setting
> the new flag.
>
> ---
> drivers/cpufreq/intel_pstate.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2550,14 +2550,12 @@ static int intel_cpufreq_update_pstate(s
> int old_pstate = cpu->pstate.current_pstate;
>
> target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
> - if (target_pstate != old_pstate) {
> + if (hwp_active) {
> + intel_cpufreq_adjust_hwp(cpu, target_pstate, fast_switch);
> + cpu->pstate.current_pstate = target_pstate;
> + } else if (target_pstate != old_pstate) {
> + intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
> cpu->pstate.current_pstate = target_pstate;
> - if (hwp_active)
> - intel_cpufreq_adjust_hwp(cpu, target_pstate,
> - fast_switch);
> - else
> - intel_cpufreq_adjust_perf_ctl(cpu, target_pstate,
> - fast_switch);
> }
>
> intel_cpufreq_trace(cpu, fast_switch ? INTEL_PSTATE_TRACE_FAST_SWITCH :
> @@ -3014,6 +3012,7 @@ static int __init intel_pstate_init(void
> hwp_mode_bdw = id->driver_data;
> intel_pstate.attr = hwp_cpufreq_attrs;
> intel_cpufreq.attr = hwp_cpufreq_attrs;
> + intel_cpufreq.flags |= CPUFREQ_NEED_UPDATE_LIMITS;
> if (!default_driver)
> default_driver = &intel_pstate;
Acked-by: Viresh Kumar <[email protected]>
--
viresh
On 23-10-20, 17:35, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Generally, a cpufreq driver may need to update some internal upper
> and lower frequency boundaries on policy max and min changes,
> respectively, but currently this does not work if the target
> frequency does not change along with the policy limit.
>
> Namely, if the target frequency does not change along with the
> policy min or max, the "target_freq == policy->cur" check in
> __cpufreq_driver_target() prevents driver callbacks from being
> invoked and they do not even have a chance to update the
> corresponding internal boundary.
>
> This particularly affects the "powersave" and "performance"
> governors that always set the target frequency to one of the
> policy limits and it never changes when the other limit is updated.
>
> To allow cpufreq the drivers needing to update internal frequency
> boundaries on policy limits changes to avoid this issue, introduce
> a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will
> neutralize the check mentioned above.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> New patch in v2.
>
> ---
> drivers/cpufreq/cpufreq.c | 3 ++-
> include/linux/cpufreq.h | 10 +++++++++-
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -297,7 +297,7 @@ __ATTR(_name, 0644, show_##_name, store_
>
> struct cpufreq_driver {
> char name[CPUFREQ_NAME_LEN];
> - u8 flags;
> + u16 flags;
> void *driver_data;
>
> /* needed by all drivers */
> @@ -417,6 +417,14 @@ struct cpufreq_driver {
> */
> #define CPUFREQ_IS_COOLING_DEV BIT(7)
>
> +/*
> + * Set by drivers that need to update internale upper and lower boundaries along
> + * with the target frequency and so the core and governors should also invoke
> + * the diver if the target frequency does not change, but the policy min or max
> + * may have changed.
> + */
> +#define CPUFREQ_NEED_UPDATE_LIMITS BIT(8)
> +
> int cpufreq_register_driver(struct cpufreq_driver *driver_data);
> int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
>
> Index: linux-pm/drivers/cpufreq/cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/cpufreq.c
> +++ linux-pm/drivers/cpufreq/cpufreq.c
> @@ -2191,7 +2191,8 @@ int __cpufreq_driver_target(struct cpufr
> * exactly same freq is called again and so we can save on few function
> * calls.
> */
> - if (target_freq == policy->cur)
> + if (target_freq == policy->cur &&
> + !(cpufreq_driver->flags & CPUFREQ_NEED_UPDATE_LIMITS))
I was wondering if the same change should be made in the target_index part as we
do this kind of check again ? But then I thought that since we know there are no
users of that right now, why bother :)
Acked-by: Viresh Kumar <[email protected]>
--
viresh
On Fri, 2020-10-23 at 17:35 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> If the cpufreq policy max limit is changed when intel_pstate operates
> in the passive mode with HWP enabled and the "powersave" governor is
> used on top of it, the HWP max limit is not updated as appropriate.
>
> Namely, in the "powersave" governor case, the target P-state
> is always equal to the policy min limit, so if the latter does
> not change, intel_cpufreq_adjust_hwp() is not invoked to update
> the HWP Request MSR due to the "target_pstate != old_pstate" check
> in intel_cpufreq_update_pstate(), so the HWP max limit is not
> updated as a result.
>
> Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the
> driver and the target frequency does not change along with the
> policy max limit, the "target_freq == policy->cur" check in
> __cpufreq_driver_target() prevents the driver's ->target() callback
> from being invoked at all, so the HWP max limit is not updated.
>
> To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag
> in the intel_cpufreq driver structure if HWP is enabled and modify
> intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate"
> check only in the non-HWP case and let intel_cpufreq_adjust_hwp()
> always run in the HWP case (it will update HWP Request only if the
> cached value of the register is different from the new one including
> the limits, so if neither the target P-state value nor the max limit
> changes, the register write will still be avoided).
>
> Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode
> with HWP enabled")
> Reported-by: Zhang Rui <[email protected]>
> Cc: 5.9+ <[email protected]> # 5.9+
> Signed-off-by: Rafael J. Wysocki <[email protected]>
I have confirmed that the problem is gone with this patch series
applied.
The HWP register is updated after changing the scaling_max_freq sysfs
attribute, with powersave governor.
Tested-by: Zhang Rui <[email protected]>
thanks,
rui
> ---
>
> The v2 is just the intel_pstate changes (without the core changes)
> and setting
> the new flag.
>
> ---
> drivers/cpufreq/intel_pstate.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2550,14 +2550,12 @@ static int intel_cpufreq_update_pstate(s
> int old_pstate = cpu->pstate.current_pstate;
>
> target_pstate = intel_pstate_prepare_request(cpu,
> target_pstate);
> - if (target_pstate != old_pstate) {
> + if (hwp_active) {
> + intel_cpufreq_adjust_hwp(cpu, target_pstate,
> fast_switch);
> + cpu->pstate.current_pstate = target_pstate;
> + } else if (target_pstate != old_pstate) {
> + intel_cpufreq_adjust_perf_ctl(cpu, target_pstate,
> fast_switch);
> cpu->pstate.current_pstate = target_pstate;
> - if (hwp_active)
> - intel_cpufreq_adjust_hwp(cpu, target_pstate,
> - fast_switch);
> - else
> - intel_cpufreq_adjust_perf_ctl(cpu,
> target_pstate,
> - fast_switch);
> }
>
> intel_cpufreq_trace(cpu, fast_switch ?
> INTEL_PSTATE_TRACE_FAST_SWITCH :
> @@ -3014,6 +3012,7 @@ static int __init intel_pstate_init(void
> hwp_mode_bdw = id->driver_data;
> intel_pstate.attr = hwp_cpufreq_attrs;
> intel_cpufreq.attr = hwp_cpufreq_attrs;
> + intel_cpufreq.flags |=
> CPUFREQ_NEED_UPDATE_LIMITS;
> if (!default_driver)
> default_driver = &intel_pstate;
>
>
>
>