Hi All,
The last two patches in the v3 needed to be updated to take re-enabling of HWP
after an ACPI S3 suspend/resume cycle into account appropriately. The first
three patches are the same as before.
The purpose of this series is to address some peculiarities related to
taking CPUs offline/online and switching between different operation
modes with HWP enabled that have become visible after allowing the
driver to work in the passive mode with HWP enabled in 5.9-rc1 (and
one that was there earlier, but can be addressed easily after the
changes made in 5.9-rc1).
Please refer to the patch changelogs for details.
For easier testing/review, the series is available from the git branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
intel_pstate-testing
I've done my best to address all of the possible corner cases, but the test
matrix is quite extensive and I may have missed something, so go ahead
and test.
Thanks,
Rafael
From: "Rafael J. Wysocki" <[email protected]>
After commit f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive
mode with HWP enabled") it is possible to change the driver status
to "off" via sysfs with HWP enabled, which effectively causes the
driver to unregister itself, but HWP remains active and it forces the
minimum performance, so even if another cpufreq driver is loaded,
it will not be able to control the CPU frequency.
For this reason, make the driver refuse to change the status to
"off" with HWP enabled.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
v1 -> v4: No changes
---
drivers/cpufreq/intel_pstate.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index e0220a6fbc69..bcda1e700a73 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2716,9 +2716,15 @@ static int intel_pstate_update_status(const char *buf, size_t size)
{
int ret;
- if (size == 3 && !strncmp(buf, "off", size))
- return intel_pstate_driver ?
- intel_pstate_unregister_driver() : -EINVAL;
+ if (size == 3 && !strncmp(buf, "off", size)) {
+ if (!intel_pstate_driver)
+ return -EINVAL;
+
+ if (hwp_active)
+ return -EBUSY;
+
+ return intel_pstate_unregister_driver();
+ }
if (size == 6 && !strncmp(buf, "active", size)) {
if (intel_pstate_driver) {
--
2.26.2
From: "Rafael J. Wysocki" <[email protected]>
Add ->offline and ->online driver callbacks to prepare for taking a
CPU offline and to restore its working configuration when it goes
back online, respectively, to avoid invoking the ->init callback on
every CPU online which is quite a bit of unnecessary overhead.
Define ->offline and ->online so that they can be used in the
passive mode as well as in the active mode and because ->offline
will do the majority of ->stop_cpu work, the passive mode does
not need that callback any more, so drop it from there.
Also modify the active mode ->suspend and ->resume callbacks to
prevent them from interfering with the new ->offline and ->online
ones in case the latter are invoked withing the system-wide suspend
and resume code flow and make the passive mode use them too.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
-> v2: Rearrange intel_pstate_init_cpu() to restore some of the previous
behavior of it to retain the current active-mode EPP management.
v2 -> v3:
* Fold the previous [5/5] in, rework intel_pstate_resume(), add
intel_pstate_suspend().
* Drop intel_pstate_hwp_save_state() and drop epp_saved from struct cpudata.
* Update the changelog.
v3 -> v4:
* Re-enable HWP in "online" and "resume" (if "online" didn't do that).
---
drivers/cpufreq/intel_pstate.c | 143 ++++++++++++++++++++++-----------
1 file changed, 94 insertions(+), 49 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b308c39b6204..8181a1f1dc79 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -219,14 +219,13 @@ struct global_params {
* @epp_policy: Last saved policy used to set EPP/EPB
* @epp_default: Power on default HWP energy performance
* preference/bias
- * @epp_saved: Saved EPP/EPB during system suspend or CPU offline
- * operation
* @epp_cached Cached HWP energy-performance preference value
* @hwp_req_cached: Cached value of the last HWP Request MSR
* @hwp_cap_cached: Cached value of the last HWP Capabilities MSR
* @last_io_update: Last time when IO wake flag was set
* @sched_flags: Store scheduler flags for possible cross CPU update
* @hwp_boost_min: Last HWP boosted min performance
+ * @suspended: Whether or not the driver has been suspended.
*
* This structure stores per CPU instance data for all CPUs.
*/
@@ -258,13 +257,13 @@ struct cpudata {
s16 epp_powersave;
s16 epp_policy;
s16 epp_default;
- s16 epp_saved;
s16 epp_cached;
u64 hwp_req_cached;
u64 hwp_cap_cached;
u64 last_io_update;
unsigned int sched_flags;
u32 hwp_boost_min;
+ bool suspended;
};
static struct cpudata **all_cpu_data;
@@ -871,12 +870,6 @@ static void intel_pstate_hwp_set(unsigned int cpu)
cpu_data->epp_policy = cpu_data->policy;
- if (cpu_data->epp_saved >= 0) {
- epp = cpu_data->epp_saved;
- cpu_data->epp_saved = -EINVAL;
- goto update_epp;
- }
-
if (cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE) {
epp = intel_pstate_get_epp(cpu_data, value);
cpu_data->epp_powersave = epp;
@@ -903,7 +896,6 @@ static void intel_pstate_hwp_set(unsigned int cpu)
epp = cpu_data->epp_powersave;
}
-update_epp:
if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
value &= ~GENMASK_ULL(31, 24);
value |= (u64)epp << 24;
@@ -915,14 +907,24 @@ static void intel_pstate_hwp_set(unsigned int cpu)
wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
}
-static void intel_pstate_hwp_force_min_perf(int cpu)
+static void intel_pstate_hwp_offline(struct cpudata *cpu)
{
- u64 value;
+ u64 value = READ_ONCE(cpu->hwp_req_cached);
int min_perf;
- value = all_cpu_data[cpu]->hwp_req_cached;
+ if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
+ /*
+ * In case the EPP has been set to "performance" by the
+ * active mode "performance" scaling algorithm, replace that
+ * temporary value with the cached EPP one.
+ */
+ value &= ~GENMASK_ULL(31, 24);
+ value |= HWP_ENERGY_PERF_PREFERENCE(cpu->epp_cached);
+ WRITE_ONCE(cpu->hwp_req_cached, value);
+ }
+
value &= ~GENMASK_ULL(31, 0);
- min_perf = HWP_LOWEST_PERF(all_cpu_data[cpu]->hwp_cap_cached);
+ min_perf = HWP_LOWEST_PERF(cpu->hwp_cap_cached);
/* Set hwp_max = hwp_min */
value |= HWP_MAX_PERF(min_perf);
@@ -932,19 +934,7 @@ static void intel_pstate_hwp_force_min_perf(int cpu)
if (boot_cpu_has(X86_FEATURE_HWP_EPP))
value |= HWP_ENERGY_PERF_PREFERENCE(HWP_EPP_POWERSAVE);
- wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
-}
-
-static int intel_pstate_hwp_save_state(struct cpufreq_policy *policy)
-{
- struct cpudata *cpu_data = all_cpu_data[policy->cpu];
-
- if (!hwp_active)
- return 0;
-
- cpu_data->epp_saved = intel_pstate_get_epp(cpu_data, 0);
-
- return 0;
+ wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
}
#define POWER_CTL_EE_ENABLE 1
@@ -971,8 +961,28 @@ static void set_power_ctl_ee_state(bool input)
static void intel_pstate_hwp_enable(struct cpudata *cpudata);
+static void intel_pstate_hwp_reenable(struct cpudata *cpu)
+{
+ intel_pstate_hwp_enable(cpu);
+ wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, READ_ONCE(cpu->hwp_req_cached));
+}
+
+static int intel_pstate_suspend(struct cpufreq_policy *policy)
+{
+ struct cpudata *cpu = all_cpu_data[policy->cpu];
+
+ pr_debug("CPU %d suspending\n", cpu->cpu);
+
+ cpu->suspended = true;
+
+ return 0;
+}
+
static int intel_pstate_resume(struct cpufreq_policy *policy)
{
+ struct cpudata *cpu = all_cpu_data[policy->cpu];
+
+ pr_debug("CPU %d resuming\n", cpu->cpu);
/* Only restore if the system default is changed */
if (power_ctl_ee_state == POWER_CTL_EE_ENABLE)
@@ -980,18 +990,16 @@ static int intel_pstate_resume(struct cpufreq_policy *policy)
else if (power_ctl_ee_state == POWER_CTL_EE_DISABLE)
set_power_ctl_ee_state(false);
- if (!hwp_active)
- return 0;
+ if (cpu->suspended && hwp_active) {
+ mutex_lock(&intel_pstate_limits_lock);
- mutex_lock(&intel_pstate_limits_lock);
-
- if (policy->cpu == 0)
- intel_pstate_hwp_enable(all_cpu_data[policy->cpu]);
+ /* Re-enable HWP, because "online" has not done that. */
+ intel_pstate_hwp_reenable(cpu);
- all_cpu_data[policy->cpu]->epp_policy = 0;
- intel_pstate_hwp_set(policy->cpu);
+ mutex_unlock(&intel_pstate_limits_lock);
+ }
- mutex_unlock(&intel_pstate_limits_lock);
+ cpu->suspended = false;
return 0;
}
@@ -1440,7 +1448,6 @@ static void intel_pstate_hwp_enable(struct cpudata *cpudata)
wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
wrmsrl_on_cpu(cpudata->cpu, MSR_PM_ENABLE, 0x1);
- cpudata->epp_policy = 0;
if (cpudata->epp_default == -EINVAL)
cpudata->epp_default = intel_pstate_get_epp(cpudata, 0);
}
@@ -2111,7 +2118,6 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
cpu->epp_default = -EINVAL;
cpu->epp_powersave = -EINVAL;
- cpu->epp_saved = -EINVAL;
}
cpu = all_cpu_data[cpunum];
@@ -2122,6 +2128,7 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
const struct x86_cpu_id *id;
intel_pstate_hwp_enable(cpu);
+ cpu->epp_policy = 0;
id = x86_match_cpu(intel_pstate_hwp_boost_ids);
if (id && intel_pstate_acpi_pm_profile_server())
@@ -2308,28 +2315,61 @@ static int intel_pstate_verify_policy(struct cpufreq_policy_data *policy)
return 0;
}
-static void intel_cpufreq_stop_cpu(struct cpufreq_policy *policy)
+static int intel_pstate_cpu_offline(struct cpufreq_policy *policy)
{
+ struct cpudata *cpu = all_cpu_data[policy->cpu];
+
+ pr_debug("CPU %d going offline\n", cpu->cpu);
+
+ if (cpu->suspended)
+ return 0;
+
+ /*
+ * If the CPU is an SMT thread and it goes offline with the performance
+ * settings different from the minimum, it will prevent its sibling
+ * from getting to lower performance levels, so force the minimum
+ * performance on CPU offline to prevent that from happening.
+ */
if (hwp_active)
- intel_pstate_hwp_force_min_perf(policy->cpu);
+ intel_pstate_hwp_offline(cpu);
else
- intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]);
+ intel_pstate_set_min_pstate(cpu);
+
+ intel_pstate_exit_perf_limits(policy);
+
+ return 0;
+}
+
+static int intel_pstate_cpu_online(struct cpufreq_policy *policy)
+{
+ struct cpudata *cpu = all_cpu_data[policy->cpu];
+
+ pr_debug("CPU %d going online\n", cpu->cpu);
+
+ intel_pstate_init_acpi_perf_limits(policy);
+
+ if (hwp_active) {
+ /*
+ * Re-enable HWP and clear the "suspended" flag to let "resume"
+ * know that it need not do that.
+ */
+ intel_pstate_hwp_reenable(cpu);
+ cpu->suspended = false;
+ }
+
+ return 0;
}
static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
{
- pr_debug("CPU %d exiting\n", policy->cpu);
+ pr_debug("CPU %d stopping\n", policy->cpu);
intel_pstate_clear_update_util_hook(policy->cpu);
- if (hwp_active)
- intel_pstate_hwp_save_state(policy);
-
- intel_cpufreq_stop_cpu(policy);
}
static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
{
- intel_pstate_exit_perf_limits(policy);
+ pr_debug("CPU %d exiting\n", policy->cpu);
policy->fast_switch_possible = false;
@@ -2403,11 +2443,13 @@ static struct cpufreq_driver intel_pstate = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = intel_pstate_verify_policy,
.setpolicy = intel_pstate_set_policy,
- .suspend = intel_pstate_hwp_save_state,
+ .suspend = intel_pstate_suspend,
.resume = intel_pstate_resume,
.init = intel_pstate_cpu_init,
.exit = intel_pstate_cpu_exit,
.stop_cpu = intel_pstate_stop_cpu,
+ .offline = intel_pstate_cpu_offline,
+ .online = intel_pstate_cpu_online,
.update_limits = intel_pstate_update_limits,
.name = "intel_pstate",
};
@@ -2662,7 +2704,10 @@ static struct cpufreq_driver intel_cpufreq = {
.fast_switch = intel_cpufreq_fast_switch,
.init = intel_cpufreq_cpu_init,
.exit = intel_cpufreq_cpu_exit,
- .stop_cpu = intel_cpufreq_stop_cpu,
+ .offline = intel_pstate_cpu_offline,
+ .online = intel_pstate_cpu_online,
+ .suspend = intel_pstate_suspend,
+ .resume = intel_pstate_resume,
.update_limits = intel_pstate_update_limits,
.name = "intel_cpufreq",
};
--
2.26.2
From: "Rafael J. Wysocki" <[email protected]>
When intel_pstate switches the operation mode from "active" to
"passive" or the other way around, freeing its data structures
representing CPUs and allocating them again from scratch is not
necessary and wasteful. Moreover, if these data structures are
preserved, the cached HWP Request MSR value from there may be
written to the MSR to start with to reinitialize it and help to
restore the EPP value set previously (it is set to 0xFF when CPUs
go offline to allow their SMT siblings to use the full range of
EPP values and that also happens when the driver gets unregistered).
Accordingly, modify the driver to only do a full cleanup on driver
object registration errors and when its status is changed to "off"
via sysfs and to write the cached HWP Request MSR value back to
the MSR on CPU init if the data structure representing the given
CPU is still there.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
-> v2: Rearrange intel_pstate_init_cpu() to restore some of the previous
behavior of it to retain the current active-mode EPP management.
v2 -> v3:
* Rebase (it was [4/5] previously).
v3 -> v4:
* Re-enable HWP in "init" even if the data structures are in there.
---
drivers/cpufreq/intel_pstate.c | 57 ++++++++++++++--------------------
1 file changed, 24 insertions(+), 33 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 8181a1f1dc79..c92c085fc495 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2116,25 +2116,31 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
all_cpu_data[cpunum] = cpu;
- cpu->epp_default = -EINVAL;
- cpu->epp_powersave = -EINVAL;
- }
+ cpu->cpu = cpunum;
- cpu = all_cpu_data[cpunum];
+ cpu->epp_default = -EINVAL;
- cpu->cpu = cpunum;
+ if (hwp_active) {
+ const struct x86_cpu_id *id;
- if (hwp_active) {
- const struct x86_cpu_id *id;
+ intel_pstate_hwp_enable(cpu);
- intel_pstate_hwp_enable(cpu);
- cpu->epp_policy = 0;
-
- id = x86_match_cpu(intel_pstate_hwp_boost_ids);
- if (id && intel_pstate_acpi_pm_profile_server())
- hwp_boost = true;
+ id = x86_match_cpu(intel_pstate_hwp_boost_ids);
+ if (id && intel_pstate_acpi_pm_profile_server())
+ hwp_boost = true;
+ }
+ } else if (hwp_active) {
+ /*
+ * Re-enable HWP in case this happens after a resume from ACPI
+ * S3 if the CPU was offline during the whole system/resume
+ * cycle.
+ */
+ intel_pstate_hwp_reenable(cpu);
}
+ cpu->epp_powersave = -EINVAL;
+ cpu->epp_policy = 0;
+
intel_pstate_get_cpu_pstates(cpu);
pr_debug("controlling: cpu %d\n", cpunum);
@@ -2730,9 +2736,6 @@ static void intel_pstate_driver_cleanup(void)
}
put_online_cpus();
- if (intel_pstate_driver == &intel_pstate)
- intel_pstate_sysfs_hide_hwp_dynamic_boost();
-
intel_pstate_driver = NULL;
}
@@ -2758,14 +2761,6 @@ static int intel_pstate_register_driver(struct cpufreq_driver *driver)
return 0;
}
-static int intel_pstate_unregister_driver(void)
-{
- cpufreq_unregister_driver(intel_pstate_driver);
- intel_pstate_driver_cleanup();
-
- return 0;
-}
-
static ssize_t intel_pstate_show_status(char *buf)
{
if (!intel_pstate_driver)
@@ -2777,8 +2772,6 @@ static ssize_t intel_pstate_show_status(char *buf)
static int intel_pstate_update_status(const char *buf, size_t size)
{
- int ret;
-
if (size == 3 && !strncmp(buf, "off", size)) {
if (!intel_pstate_driver)
return -EINVAL;
@@ -2786,7 +2779,8 @@ static int intel_pstate_update_status(const char *buf, size_t size)
if (hwp_active)
return -EBUSY;
- return intel_pstate_unregister_driver();
+ cpufreq_unregister_driver(intel_pstate_driver);
+ intel_pstate_driver_cleanup();
}
if (size == 6 && !strncmp(buf, "active", size)) {
@@ -2794,9 +2788,7 @@ static int intel_pstate_update_status(const char *buf, size_t size)
if (intel_pstate_driver == &intel_pstate)
return 0;
- ret = intel_pstate_unregister_driver();
- if (ret)
- return ret;
+ cpufreq_unregister_driver(intel_pstate_driver);
}
return intel_pstate_register_driver(&intel_pstate);
@@ -2807,9 +2799,8 @@ static int intel_pstate_update_status(const char *buf, size_t size)
if (intel_pstate_driver == &intel_cpufreq)
return 0;
- ret = intel_pstate_unregister_driver();
- if (ret)
- return ret;
+ cpufreq_unregister_driver(intel_pstate_driver);
+ intel_pstate_sysfs_hide_hwp_dynamic_boost();
}
return intel_pstate_register_driver(&intel_cpufreq);
--
2.26.2
From: "Rafael J. Wysocki" <[email protected]>
Modify the EPP sysfs interface to reject attempts to change the EPP
to values different from 0 ("performance") in the active mode with
the "performance" policy (ie. scaling_governor set to "performance"),
to avoid situations in which the kernel appears to discard data
passed to it via the EPP sysfs attribute.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Artem Bityutskiy <[email protected]>
---
v2 -> v3: New patch
v3 -> v4: Add the R-by from Artem
---
Documentation/admin-guide/pm/intel_pstate.rst | 4 +++-
drivers/cpufreq/intel_pstate.c | 8 ++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index cdd1a9a7f9a2..5072e7064d13 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -123,7 +123,9 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
internal P-state selection logic is expected to focus entirely on performance.
This will override the EPP/EPB setting coming from the ``sysfs`` interface
-(see `Energy vs Performance Hints`_ below).
+(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change
+the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
+configuration will be rejected.
Also, in this configuration the range of P-states available to the processor's
internal P-state selection logic is always restricted to the upper boundary
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index e540448e0bd0..b308c39b6204 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -684,6 +684,14 @@ static int intel_pstate_set_energy_pref_index(struct cpudata *cpu_data,
else if (epp == -EINVAL)
epp = epp_values[pref_index - 1];
+ /*
+ * To avoid confusion, refuse to set EPP to any values different
+ * from 0 (performance) if the current policy is "performance",
+ * because those values would be overridden.
+ */
+ if (epp > 0 && cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE)
+ return -EBUSY;
+
ret = intel_pstate_set_epp(cpu_data, epp);
} else {
if (epp == -EINVAL)
--
2.26.2
On Tue, 2020-09-01 at 19:23 +0200, Rafael J. Wysocki wrote:
> Hi All,
>
> The last two patches in the v3 needed to be updated to take re-
> enabling of HWP
> after an ACPI S3 suspend/resume cycle into account
> appropriately. The first
> three patches are the same as before.
>
> The purpose of this series is to address some peculiarities related
> to
> taking CPUs offline/online and switching between different operation
> modes with HWP enabled that have become visible after allowing the
> driver to work in the passive mode with HWP enabled in 5.9-rc1 (and
> one that was there earlier, but can be addressed easily after the
> changes made in 5.9-rc1).
>
> Please refer to the patch changelogs for details.
>
> For easier testing/review, the series is available from the git
> branch at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> intel_pstate-testing
>
> I've done my best to address all of the possible corner cases, but
> the test
> matrix is quite extensive and I may have missed something, so go
> ahead
> and test.
Acked-by: Srinivas Pandruvada <[email protected]>
>
> Thanks,
> Rafael
>
>
>