2024-01-03 04:13:31

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 0/4] thermal: intel: hfi: Fix memory corruption on resume from hibernation

Hi,

This is v2 of this series. You can read the overview and motivation in the
cover letter of v1 [1].

I smoke-retested this version on a Meteor Lake system. It completed 50
cycles of suspend-to-disk and resume. I completed longer testing for v1.

Changes since v1:
* Added locking to hfi_pm_notify() to serialize RMW operations on the
MSR_IA32_HW_FEEDBACK_CONFIG register. (Rafael)
* Relocated a comment regarding the reallocation of HFI memory to
patch 3/4. (Rafael)
* Clarified that patch 1/4 does not introduce functional changes.
(Rafael)
* Indicated the first stable version on which this patchset should be
backported.
* Renamed hfi_do_pm_[enable|disable]() as hfi_do_[enable|disable]() for
future reuse. (Stan)
* Registered the HFI suspend notifier towards the end of
intel_hfi_init(). (Stan)

Thanks and BR,
Ricardo

[1]. https://lore.kernel.org/all/[email protected]/

Ricardo Neri (4):
thermal: intel: hfi: Refactor enabling code into helper functions
thermal: intel: hfi: Enable an HFI instance from its first online CPU
thermal: intel: hfi: Disable an HFI instance when all its CPUs go
offline
thermal: intel: hfi: Add a suspend notifier

drivers/thermal/intel/intel_hfi.c | 153 +++++++++++++++++++++++++-----
1 file changed, 127 insertions(+), 26 deletions(-)

--
2.25.1



2024-01-03 04:13:48

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 1/4] thermal: intel: hfi: Refactor enabling code into helper functions

In preparation for the addition of a suspend notifier, wrap the logic to
enable HFI and program its memory buffer into helper functions. Both the
CPU hotplug callback and the suspend notifier will use them.

This refactoring does not introduce functional changes.

Cc: Chen Yu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: [email protected]
Cc: [email protected] # 6.1
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v1:
* Warned callers of hfi_enable() to hold hfi_instance_lock.
* Relocated comment about not freeing the HFI pages to the patch that
introduces hfi_disable(). (Rafael)
* Updated the changeset description to clarify that there are no
functional changes. (Rafael)
---
drivers/thermal/intel/intel_hfi.c | 43 ++++++++++++++++---------------
1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index c69db6c90869..820613e293cd 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -347,6 +347,26 @@ static void init_hfi_instance(struct hfi_instance *hfi_instance)
hfi_instance->data = hfi_instance->hdr + hfi_features.hdr_size;
}

+/* Caller must hold hfi_instance_lock. */
+static void hfi_enable(void)
+{
+ u64 msr_val;
+
+ rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
+ msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
+ wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
+}
+
+static void hfi_set_hw_table(struct hfi_instance *hfi_instance)
+{
+ phys_addr_t hw_table_pa;
+ u64 msr_val;
+
+ hw_table_pa = virt_to_phys(hfi_instance->hw_table);
+ msr_val = hw_table_pa | HW_FEEDBACK_PTR_VALID_BIT;
+ wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val);
+}
+
/**
* intel_hfi_online() - Enable HFI on @cpu
* @cpu: CPU in which the HFI will be enabled
@@ -364,8 +384,6 @@ void intel_hfi_online(unsigned int cpu)
{
struct hfi_instance *hfi_instance;
struct hfi_cpu_info *info;
- phys_addr_t hw_table_pa;
- u64 msr_val;
u16 die_id;

/* Nothing to do if hfi_instances are missing. */
@@ -409,8 +427,6 @@ void intel_hfi_online(unsigned int cpu)
if (!hfi_instance->hw_table)
goto unlock;

- hw_table_pa = virt_to_phys(hfi_instance->hw_table);
-
/*
* Allocate memory to keep a local copy of the table that
* hardware generates.
@@ -420,16 +436,6 @@ void intel_hfi_online(unsigned int cpu)
if (!hfi_instance->local_table)
goto free_hw_table;

- /*
- * Program the address of the feedback table of this die/package. On
- * some processors, hardware remembers the old address of the HFI table
- * even after having been reprogrammed and re-enabled. Thus, do not free
- * the pages allocated for the table or reprogram the hardware with a
- * new base address. Namely, program the hardware only once.
- */
- msr_val = hw_table_pa | HW_FEEDBACK_PTR_VALID_BIT;
- wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val);
-
init_hfi_instance(hfi_instance);

INIT_DELAYED_WORK(&hfi_instance->update_work, hfi_update_work_fn);
@@ -438,13 +444,8 @@ void intel_hfi_online(unsigned int cpu)

cpumask_set_cpu(cpu, hfi_instance->cpus);

- /*
- * Enable the hardware feedback interface and never disable it. See
- * comment on programming the address of the table.
- */
- rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
- msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
- wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
+ hfi_set_hw_table(hfi_instance);
+ hfi_enable();

unlock:
mutex_unlock(&hfi_instance_lock);
--
2.25.1


2024-01-03 04:13:51

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 2/4] thermal: intel: hfi: Enable an HFI instance from its first online CPU

Previously, HFI instances were never disabled once enabled. A CPU in an
instance only had to check during boot whether another CPU had previously
initialized the instance and its corresponding data structure.

A subsequent changeset will add functionality to disable instances
to support hibernation. Such change will also make possible to disable an
HFI instance during runtime via CPU hotplug.

Enable an HFI instance from the first of its CPUs that comes online. This
covers the boot, CPU hotplug, and resume-from-suspend cases. It also covers
systems with one or more HFI instances (i.e., packages).

Cc: Chen Yu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: [email protected]
Cc: [email protected] # 6.1
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v1:
* None
---
drivers/thermal/intel/intel_hfi.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 820613e293cd..713da8befd40 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -410,13 +410,12 @@ void intel_hfi_online(unsigned int cpu)
/*
* Now check if the HFI instance of the package/die of @cpu has been
* initialized (by checking its header). In such case, all we have to
- * do is to add @cpu to this instance's cpumask.
+ * do is to add @cpu to this instance's cpumask and enable the instance
+ * if needed.
*/
mutex_lock(&hfi_instance_lock);
- if (hfi_instance->hdr) {
- cpumask_set_cpu(cpu, hfi_instance->cpus);
- goto unlock;
- }
+ if (hfi_instance->hdr)
+ goto enable;

/*
* Hardware is programmed with the physical address of the first page
@@ -442,10 +441,14 @@ void intel_hfi_online(unsigned int cpu)
raw_spin_lock_init(&hfi_instance->table_lock);
raw_spin_lock_init(&hfi_instance->event_lock);

+enable:
cpumask_set_cpu(cpu, hfi_instance->cpus);

- hfi_set_hw_table(hfi_instance);
- hfi_enable();
+ /* Enable this HFI instance if this is its first online CPU. */
+ if (cpumask_weight(hfi_instance->cpus) == 1) {
+ hfi_set_hw_table(hfi_instance);
+ hfi_enable();
+ }

unlock:
mutex_unlock(&hfi_instance_lock);
--
2.25.1


2024-01-03 04:14:15

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 3/4] thermal: intel: hfi: Disable an HFI instance when all its CPUs go offline

In preparation to support hibernation, add functionality to disable an HFI
instance during CPU offline. The last CPU of an instance that goes offline
will disable such instance.

The Intel Software Development Manual states that the operating system must
wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after
disabling an HFI instance to ensure that it will no longer write on the HFI
memory. Some processors, however, do not ever set such bit. Wait a minimum
of 2ms to give time hardware to complete any pending memory writes.

Cc: Chen Yu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: [email protected]
Cc: [email protected] # 6.1
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v1:
* Relocated here comment about not freeing the HFI pages. (Rafael)
---
drivers/thermal/intel/intel_hfi.c | 35 +++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 713da8befd40..22445403b520 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -24,6 +24,7 @@
#include <linux/bitops.h>
#include <linux/cpufeature.h>
#include <linux/cpumask.h>
+#include <linux/delay.h>
#include <linux/gfp.h>
#include <linux/io.h>
#include <linux/kernel.h>
@@ -367,6 +368,32 @@ static void hfi_set_hw_table(struct hfi_instance *hfi_instance)
wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val);
}

+/* Caller must hold hfi_instance_lock. */
+static void hfi_disable(void)
+{
+ u64 msr_val;
+ int i;
+
+ rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
+ msr_val &= ~HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
+ wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
+
+ /*
+ * Wait for hardware to acknowledge the disabling of HFI. Some
+ * processors may not do it. Wait for ~2ms. This is a reasonable
+ * time for hardware to complete any pending actions on the HFI
+ * memory.
+ */
+ for (i = 0; i < 2000; i++) {
+ rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
+ if (msr_val & PACKAGE_THERM_STATUS_HFI_UPDATED)
+ break;
+
+ udelay(1);
+ cpu_relax();
+ }
+}
+
/**
* intel_hfi_online() - Enable HFI on @cpu
* @cpu: CPU in which the HFI will be enabled
@@ -420,6 +447,10 @@ void intel_hfi_online(unsigned int cpu)
/*
* Hardware is programmed with the physical address of the first page
* frame of the table. Hence, the allocated memory must be page-aligned.
+ *
+ * Some processors do not forget the initial address of the HFI table
+ * even after having been reprogrammed. Keep using the same pages. Do
+ * not free them.
*/
hfi_instance->hw_table = alloc_pages_exact(hfi_features.nr_table_pages,
GFP_KERNEL | __GFP_ZERO);
@@ -488,6 +519,10 @@ void intel_hfi_offline(unsigned int cpu)

mutex_lock(&hfi_instance_lock);
cpumask_clear_cpu(cpu, hfi_instance->cpus);
+
+ if (!cpumask_weight(hfi_instance->cpus))
+ hfi_disable();
+
mutex_unlock(&hfi_instance_lock);
}

--
2.25.1


2024-01-03 04:14:15

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

The kernel gives the HFI hardware a memory region that the latter uses to
provide updates to the HFI table. The kernel allocates this memory region
at boot. It remains constant throughout runtime time.

When resuming from suspend or hibernation, the restore kernel allocates a
second memory buffer and reprograms the HFI hardware with the new location
as part of a normal boot. The location of the second memory buffer may
differ from the one allocated by the image kernel. Subsequently, when the
restore kernel transfers control to the image kernel, the second buffer
becomes invalid, potentially leading to memory corruption if the hardware
writes to it (hardware continues using the buffer from the restore kernel).

Add a suspend notifier to disable all HFI instances before jumping to the
image kernel and enable them once the image kernel has been restored. Use
the memory buffer that the image kernel allocated.

For non-boot CPUs, rely on the CPU hotplug callbacks as CPUs are disabled
and enabled during suspend and resume, respectively.

The CPU hotplug callbacks do not cover the boot CPU. Handle the HFI
instance of the boot CPU from the suspend notifier callback.

Cc: Chen Yu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stanislaw Gruszka <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Zhao Liu <[email protected]>
Cc: [email protected]
Cc: [email protected] # 6.1
Signed-off-by: Ricardo Neri <[email protected]>
--
Changes since v1:
* Moved registration of the suspend notifier towards the end of
intel_hfi_init(). (Stan)
* Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
will use these functions outside the suspend notifier. (Stan)
* Added locking to calls to hfi_[enable|disable]() from the suspend
notifier. (Rafael)
---
drivers/thermal/intel/intel_hfi.c | 62 +++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 22445403b520..8d6e4f8dc67a 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -30,11 +30,13 @@
#include <linux/kernel.h>
#include <linux/math.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
#include <linux/percpu-defs.h>
#include <linux/printk.h>
#include <linux/processor.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/suspend.h>
#include <linux/string.h>
#include <linux/topology.h>
#include <linux/workqueue.h>
@@ -571,6 +573,60 @@ static __init int hfi_parse_features(void)
return 0;
}

+static void hfi_do_enable(void *info)
+{
+ struct hfi_instance *hfi_instance = info;
+
+ hfi_set_hw_table(hfi_instance);
+ hfi_enable();
+}
+
+static void hfi_do_disable(void *info)
+{
+ hfi_disable();
+}
+
+static int hfi_pm_notify(struct notifier_block *nb,
+ unsigned long mode, void *unused)
+{
+ struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
+ struct hfi_instance *hfi = info->hfi_instance;
+ int ret = 0;
+
+ /* HFI may not be in use. */
+ if (!hfi)
+ return ret;
+
+ mutex_lock(&hfi_instance_lock);
+ /*
+ * Only handle the HFI instance of the package of the boot CPU. The
+ * instances of other packages are handled in the CPU hotplug callbacks.
+ */
+ switch (mode) {
+ case PM_HIBERNATION_PREPARE:
+ case PM_SUSPEND_PREPARE:
+ case PM_RESTORE_PREPARE:
+ ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
+ break;
+
+ case PM_POST_RESTORE:
+ case PM_POST_HIBERNATION:
+ case PM_POST_SUSPEND:
+ ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ mutex_unlock(&hfi_instance_lock);
+
+ return ret;
+}
+
+static struct notifier_block hfi_pm_nb = {
+ .notifier_call = hfi_pm_notify,
+};
+
void __init intel_hfi_init(void)
{
struct hfi_instance *hfi_instance;
@@ -602,8 +658,14 @@ void __init intel_hfi_init(void)
if (!hfi_updates_wq)
goto err_nomem;

+ if (register_pm_notifier(&hfi_pm_nb))
+ goto err_pm_notif;
+
return;

+err_pm_notif:
+ destroy_workqueue(hfi_updates_wq);
+
err_nomem:
for (j = 0; j < i; ++j) {
hfi_instance = &hfi_instances[j];
--
2.25.1


2024-01-03 13:15:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] thermal: intel: hfi: Fix memory corruption on resume from hibernation

On Wed, Jan 3, 2024 at 5:13 AM Ricardo Neri
<[email protected]> wrote:
>
> Hi,
>
> This is v2 of this series. You can read the overview and motivation in the
> cover letter of v1 [1].
>
> I smoke-retested this version on a Meteor Lake system. It completed 50
> cycles of suspend-to-disk and resume. I completed longer testing for v1.
>
> Changes since v1:
> * Added locking to hfi_pm_notify() to serialize RMW operations on the
> MSR_IA32_HW_FEEDBACK_CONFIG register. (Rafael)
> * Relocated a comment regarding the reallocation of HFI memory to
> patch 3/4. (Rafael)
> * Clarified that patch 1/4 does not introduce functional changes.
> (Rafael)
> * Indicated the first stable version on which this patchset should be
> backported.
> * Renamed hfi_do_pm_[enable|disable]() as hfi_do_[enable|disable]() for
> future reuse. (Stan)
> * Registered the HFI suspend notifier towards the end of
> intel_hfi_init(). (Stan)
>
> Thanks and BR,
> Ricardo
>
> [1]. https://lore.kernel.org/all/[email protected]/
>
> Ricardo Neri (4):
> thermal: intel: hfi: Refactor enabling code into helper functions
> thermal: intel: hfi: Enable an HFI instance from its first online CPU
> thermal: intel: hfi: Disable an HFI instance when all its CPUs go
> offline
> thermal: intel: hfi: Add a suspend notifier
>
> drivers/thermal/intel/intel_hfi.c | 153 +++++++++++++++++++++++++-----
> 1 file changed, 127 insertions(+), 26 deletions(-)
>
> --

I've queued up the first 3 patches from the series for 6.8 as they
make sense even without the [4/4] IMO.

I still have some comments on the last one, though, but let me reply
to it directly.

2024-01-03 13:35:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

The subject should say "add a PM notifier" to indicate that
hibernation is covered too.

On Wed, Jan 3, 2024 at 5:13 AM Ricardo Neri
<[email protected]> wrote:
>
> The kernel gives the HFI hardware a memory region that the latter uses to
> provide updates to the HFI table. The kernel allocates this memory region
> at boot. It remains constant throughout runtime time.
>
> When resuming from suspend or hibernation, the restore kernel allocates a

The restore kernel is only used during resume from hibernation, so
this particular problem is hibernation-specific.

It is possible, at least in principle, that the address of the HFI
table is "lost" by the processor during resume from "deep" suspend
(ACPI S3), in which case it may not survive the firmware-driven part
of the suspend-resume cycle. It is thus prudent to disable HFI on
suspend and re-enable it on resume for the boot CPU (under the
assumption that the other CPUs will be taken care of by CPU offline),
but for a somewhat different reason than in the hibernation case.

> second memory buffer and reprograms the HFI hardware with the new location
> as part of a normal boot. The location of the second memory buffer may
> differ from the one allocated by the image kernel. Subsequently, when the
> restore kernel transfers control to the image kernel, the second buffer
> becomes invalid, potentially leading to memory corruption if the hardware
> writes to it (hardware continues using the buffer from the restore kernel).
>
> Add a suspend notifier to disable all HFI instances before jumping to the
> image kernel and enable them once the image kernel has been restored. Use
> the memory buffer that the image kernel allocated.
>
> For non-boot CPUs, rely on the CPU hotplug callbacks as CPUs are disabled
> and enabled during suspend and resume, respectively.
>
> The CPU hotplug callbacks do not cover the boot CPU. Handle the HFI
> instance of the boot CPU from the suspend notifier callback.
>
> Cc: Chen Yu <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: Stanislaw Gruszka <[email protected]>
> Cc: Zhang Rui <[email protected]>
> Cc: Zhao Liu <[email protected]>
> Cc: [email protected]
> Cc: [email protected] # 6.1
> Signed-off-by: Ricardo Neri <[email protected]>
> --
> Changes since v1:
> * Moved registration of the suspend notifier towards the end of
> intel_hfi_init(). (Stan)
> * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
> will use these functions outside the suspend notifier. (Stan)
> * Added locking to calls to hfi_[enable|disable]() from the suspend
> notifier. (Rafael)
> ---
> drivers/thermal/intel/intel_hfi.c | 62 +++++++++++++++++++++++++++++++
> 1 file changed, 62 insertions(+)
>
> diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> index 22445403b520..8d6e4f8dc67a 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -30,11 +30,13 @@
> #include <linux/kernel.h>
> #include <linux/math.h>
> #include <linux/mutex.h>
> +#include <linux/notifier.h>
> #include <linux/percpu-defs.h>
> #include <linux/printk.h>
> #include <linux/processor.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> +#include <linux/suspend.h>
> #include <linux/string.h>
> #include <linux/topology.h>
> #include <linux/workqueue.h>
> @@ -571,6 +573,60 @@ static __init int hfi_parse_features(void)
> return 0;
> }
>
> +static void hfi_do_enable(void *info)
> +{
> + struct hfi_instance *hfi_instance = info;
> +
> + hfi_set_hw_table(hfi_instance);
> + hfi_enable();
> +}
> +
> +static void hfi_do_disable(void *info)
> +{
> + hfi_disable();
> +}
> +
> +static int hfi_pm_notify(struct notifier_block *nb,
> + unsigned long mode, void *unused)
> +{
> + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> + struct hfi_instance *hfi = info->hfi_instance;
> + int ret = 0;
> +
> + /* HFI may not be in use. */
> + if (!hfi)
> + return ret;
> +
> + mutex_lock(&hfi_instance_lock);
> + /*
> + * Only handle the HFI instance of the package of the boot CPU. The
> + * instances of other packages are handled in the CPU hotplug callbacks.
> + */
> + switch (mode) {
> + case PM_HIBERNATION_PREPARE:
> + case PM_SUSPEND_PREPARE:
> + case PM_RESTORE_PREPARE:
> + ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
> + break;
> +
> + case PM_POST_RESTORE:
> + case PM_POST_HIBERNATION:
> + case PM_POST_SUSPEND:
> + ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
> + break;

Because this handles the boot CPU only, one has to wonder if it should
be a syscore op rather than a PM notifier.

It does not sleep AFAICS, so it can run in that context, and it is
guaranteed to run on the boot CPU then, so it is not necessary to
force that. Moreover, syscore ops are guaranteed to be
non-concurrent, so locking is not needed.

In addition, disabling HFI from a PM notifier is generally observable
by user space, because PM notifiers run before user space is frozen,
but doing it from a syscore op wouldn't be.

2024-01-03 13:41:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

On Wed, Jan 3, 2024 at 2:34 PM Rafael J. Wysocki <[email protected]> wrote:
>
> The subject should say "add a PM notifier" to indicate that
> hibernation is covered too.
>
> On Wed, Jan 3, 2024 at 5:13 AM Ricardo Neri
> <[email protected]> wrote:
> >
> > The kernel gives the HFI hardware a memory region that the latter uses to
> > provide updates to the HFI table. The kernel allocates this memory region
> > at boot. It remains constant throughout runtime time.
> >
> > When resuming from suspend or hibernation, the restore kernel allocates a
>
> The restore kernel is only used during resume from hibernation, so
> this particular problem is hibernation-specific.
>
> It is possible, at least in principle, that the address of the HFI
> table is "lost" by the processor during resume from "deep" suspend
> (ACPI S3), in which case it may not survive the firmware-driven part
> of the suspend-resume cycle. It is thus prudent to disable HFI on
> suspend and re-enable it on resume for the boot CPU (under the
> assumption that the other CPUs will be taken care of by CPU offline),
> but for a somewhat different reason than in the hibernation case.
>
> > second memory buffer and reprograms the HFI hardware with the new location
> > as part of a normal boot. The location of the second memory buffer may
> > differ from the one allocated by the image kernel. Subsequently, when the
> > restore kernel transfers control to the image kernel, the second buffer
> > becomes invalid, potentially leading to memory corruption if the hardware
> > writes to it (hardware continues using the buffer from the restore kernel).
> >
> > Add a suspend notifier to disable all HFI instances before jumping to the
> > image kernel and enable them once the image kernel has been restored. Use
> > the memory buffer that the image kernel allocated.
> >
> > For non-boot CPUs, rely on the CPU hotplug callbacks as CPUs are disabled
> > and enabled during suspend and resume, respectively.
> >
> > The CPU hotplug callbacks do not cover the boot CPU. Handle the HFI
> > instance of the boot CPU from the suspend notifier callback.
> >
> > Cc: Chen Yu <[email protected]>
> > Cc: Len Brown <[email protected]>
> > Cc: Srinivas Pandruvada <[email protected]>
> > Cc: Stanislaw Gruszka <[email protected]>
> > Cc: Zhang Rui <[email protected]>
> > Cc: Zhao Liu <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected] # 6.1
> > Signed-off-by: Ricardo Neri <[email protected]>
> > --
> > Changes since v1:
> > * Moved registration of the suspend notifier towards the end of
> > intel_hfi_init(). (Stan)
> > * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
> > will use these functions outside the suspend notifier. (Stan)
> > * Added locking to calls to hfi_[enable|disable]() from the suspend
> > notifier. (Rafael)
> > ---
> > drivers/thermal/intel/intel_hfi.c | 62 +++++++++++++++++++++++++++++++
> > 1 file changed, 62 insertions(+)
> >
> > diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> > index 22445403b520..8d6e4f8dc67a 100644
> > --- a/drivers/thermal/intel/intel_hfi.c
> > +++ b/drivers/thermal/intel/intel_hfi.c
> > @@ -30,11 +30,13 @@
> > #include <linux/kernel.h>
> > #include <linux/math.h>
> > #include <linux/mutex.h>
> > +#include <linux/notifier.h>
> > #include <linux/percpu-defs.h>
> > #include <linux/printk.h>
> > #include <linux/processor.h>
> > #include <linux/slab.h>
> > #include <linux/spinlock.h>
> > +#include <linux/suspend.h>
> > #include <linux/string.h>
> > #include <linux/topology.h>
> > #include <linux/workqueue.h>
> > @@ -571,6 +573,60 @@ static __init int hfi_parse_features(void)
> > return 0;
> > }
> >
> > +static void hfi_do_enable(void *info)
> > +{
> > + struct hfi_instance *hfi_instance = info;
> > +
> > + hfi_set_hw_table(hfi_instance);
> > + hfi_enable();
> > +}
> > +
> > +static void hfi_do_disable(void *info)
> > +{
> > + hfi_disable();
> > +}
> > +
> > +static int hfi_pm_notify(struct notifier_block *nb,
> > + unsigned long mode, void *unused)
> > +{
> > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> > + struct hfi_instance *hfi = info->hfi_instance;
> > + int ret = 0;
> > +
> > + /* HFI may not be in use. */
> > + if (!hfi)
> > + return ret;
> > +
> > + mutex_lock(&hfi_instance_lock);
> > + /*
> > + * Only handle the HFI instance of the package of the boot CPU. The
> > + * instances of other packages are handled in the CPU hotplug callbacks.
> > + */
> > + switch (mode) {
> > + case PM_HIBERNATION_PREPARE:
> > + case PM_SUSPEND_PREPARE:
> > + case PM_RESTORE_PREPARE:
> > + ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
> > + break;
> > +
> > + case PM_POST_RESTORE:
> > + case PM_POST_HIBERNATION:
> > + case PM_POST_SUSPEND:
> > + ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
> > + break;
>
> Because this handles the boot CPU only, one has to wonder if it should
> be a syscore op rather than a PM notifier.
>
> It does not sleep AFAICS, so it can run in that context, and it is
> guaranteed to run on the boot CPU then, so it is not necessary to
> force that. Moreover, syscore ops are guaranteed to be
> non-concurrent, so locking is not needed.
>
> In addition, disabling HFI from a PM notifier is generally observable
> by user space, because PM notifiers run before user space is frozen,
> but doing it from a syscore op wouldn't be.

One more thing: PM notifiers run on all variants of system suspend and
resume, including suspend-to-idle in which case HFI need not be
disabled/enabled IIUC and syscore ops only run in hibernation and
"deep" suspend cycles, so they cover the cases in which the special
handling is really needed and don't add useless overhead otherwise.

2024-01-03 17:27:06

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

On Wed, Jan 03, 2024 at 02:34:26PM +0100, Rafael J. Wysocki wrote:
> > +static int hfi_pm_notify(struct notifier_block *nb,
> > + unsigned long mode, void *unused)
> > +{
> > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> > + struct hfi_instance *hfi = info->hfi_instance;
> > + int ret = 0;
> > +
> > + /* HFI may not be in use. */
> > + if (!hfi)
> > + return ret;
> > +
> > + mutex_lock(&hfi_instance_lock);
> > + /*
> > + * Only handle the HFI instance of the package of the boot CPU. The
> > + * instances of other packages are handled in the CPU hotplug callbacks.
> > + */
> > + switch (mode) {
> > + case PM_HIBERNATION_PREPARE:
> > + case PM_SUSPEND_PREPARE:
> > + case PM_RESTORE_PREPARE:
> > + ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
> > + break;
> > +
> > + case PM_POST_RESTORE:
> > + case PM_POST_HIBERNATION:
> > + case PM_POST_SUSPEND:
> > + ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
> > + break;
>
> Because this handles the boot CPU only, one has to wonder if it should
> be a syscore op rather than a PM notifier.
>
> It does not sleep AFAICS, so it can run in that context, and it is
> guaranteed to run on the boot CPU then, so it is not necessary to
> force that. Moreover, syscore ops are guaranteed to be
> non-concurrent, so locking is not needed.

There are below warnings in smp_call_function_single() :

/*
* Can deadlock when called with interrupts disabled.
* We allow cpu's that are not yet online though, as no one else can
* send smp call function interrupt to this cpu and as such deadlocks
* can't happen.
*/
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
&& !oops_in_progress);

/*
* When @wait we can deadlock when we interrupt between llist_add() and
* arch_send_call_function_ipi*(); when !@wait we can deadlock due to
* csd_lock() on because the interrupt context uses the same csd
* storage.
*/
WARN_ON_ONCE(!in_task());

And this one in syscore_suspend():

WARN_ONCE(!irqs_disabled(),
"Interrupts enabled before system core suspend.\n");

So seems they are not compatible.

Regards
Stanislaw

2024-01-03 18:21:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

On Wed, Jan 3, 2024 at 6:26 PM Stanislaw Gruszka
<[email protected]> wrote:
>
> On Wed, Jan 03, 2024 at 02:34:26PM +0100, Rafael J. Wysocki wrote:
> > > +static int hfi_pm_notify(struct notifier_block *nb,
> > > + unsigned long mode, void *unused)
> > > +{
> > > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> > > + struct hfi_instance *hfi = info->hfi_instance;
> > > + int ret = 0;
> > > +
> > > + /* HFI may not be in use. */
> > > + if (!hfi)
> > > + return ret;
> > > +
> > > + mutex_lock(&hfi_instance_lock);
> > > + /*
> > > + * Only handle the HFI instance of the package of the boot CPU. The
> > > + * instances of other packages are handled in the CPU hotplug callbacks.
> > > + */
> > > + switch (mode) {
> > > + case PM_HIBERNATION_PREPARE:
> > > + case PM_SUSPEND_PREPARE:
> > > + case PM_RESTORE_PREPARE:
> > > + ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
> > > + break;
> > > +
> > > + case PM_POST_RESTORE:
> > > + case PM_POST_HIBERNATION:
> > > + case PM_POST_SUSPEND:
> > > + ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
> > > + break;
> >
> > Because this handles the boot CPU only, one has to wonder if it should
> > be a syscore op rather than a PM notifier.
> >
> > It does not sleep AFAICS, so it can run in that context, and it is
> > guaranteed to run on the boot CPU then, so it is not necessary to
> > force that. Moreover, syscore ops are guaranteed to be
> > non-concurrent, so locking is not needed.
>
> There are below warnings in smp_call_function_single() :
>
> /*
> * Can deadlock when called with interrupts disabled.
> * We allow cpu's that are not yet online though, as no one else can
> * send smp call function interrupt to this cpu and as such deadlocks
> * can't happen.
> */
> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> && !oops_in_progress);
>
> /*
> * When @wait we can deadlock when we interrupt between llist_add() and
> * arch_send_call_function_ipi*(); when !@wait we can deadlock due to
> * csd_lock() on because the interrupt context uses the same csd
> * storage.
> */
> WARN_ON_ONCE(!in_task());
>
> And this one in syscore_suspend():
>
> WARN_ONCE(!irqs_disabled(),
> "Interrupts enabled before system core suspend.\n");
>
> So seems they are not compatible.

But smp_call_function_single() need not be used in syscore ops at all,
because they always run on the boot CPU.

2024-01-04 03:01:36

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

On Wed, Jan 03, 2024 at 02:38:26PM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 3, 2024 at 2:34 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > The subject should say "add a PM notifier" to indicate that
> > hibernation is covered too.
> >
> > On Wed, Jan 3, 2024 at 5:13 AM Ricardo Neri
> > <[email protected]> wrote:
> > >
> > > The kernel gives the HFI hardware a memory region that the latter uses to
> > > provide updates to the HFI table. The kernel allocates this memory region
> > > at boot. It remains constant throughout runtime time.
> > >
> > > When resuming from suspend or hibernation, the restore kernel allocates a
> >
> > The restore kernel is only used during resume from hibernation, so
> > this particular problem is hibernation-specific.
> >
> > It is possible, at least in principle, that the address of the HFI
> > table is "lost" by the processor during resume from "deep" suspend
> > (ACPI S3), in which case it may not survive the firmware-driven part
> > of the suspend-resume cycle. It is thus prudent to disable HFI on
> > suspend and re-enable it on resume for the boot CPU (under the
> > assumption that the other CPUs will be taken care of by CPU offline),
> > but for a somewhat different reason than in the hibernation case.
> >
> > > second memory buffer and reprograms the HFI hardware with the new location
> > > as part of a normal boot. The location of the second memory buffer may
> > > differ from the one allocated by the image kernel. Subsequently, when the
> > > restore kernel transfers control to the image kernel, the second buffer
> > > becomes invalid, potentially leading to memory corruption if the hardware
> > > writes to it (hardware continues using the buffer from the restore kernel).
> > >
> > > Add a suspend notifier to disable all HFI instances before jumping to the
> > > image kernel and enable them once the image kernel has been restored. Use
> > > the memory buffer that the image kernel allocated.
> > >
> > > For non-boot CPUs, rely on the CPU hotplug callbacks as CPUs are disabled
> > > and enabled during suspend and resume, respectively.
> > >
> > > The CPU hotplug callbacks do not cover the boot CPU. Handle the HFI
> > > instance of the boot CPU from the suspend notifier callback.
> > >
> > > Cc: Chen Yu <[email protected]>
> > > Cc: Len Brown <[email protected]>
> > > Cc: Srinivas Pandruvada <[email protected]>
> > > Cc: Stanislaw Gruszka <[email protected]>
> > > Cc: Zhang Rui <[email protected]>
> > > Cc: Zhao Liu <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected] # 6.1
> > > Signed-off-by: Ricardo Neri <[email protected]>
> > > --
> > > Changes since v1:
> > > * Moved registration of the suspend notifier towards the end of
> > > intel_hfi_init(). (Stan)
> > > * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
> > > will use these functions outside the suspend notifier. (Stan)
> > > * Added locking to calls to hfi_[enable|disable]() from the suspend
> > > notifier. (Rafael)
> > > ---
> > > drivers/thermal/intel/intel_hfi.c | 62 +++++++++++++++++++++++++++++++
> > > 1 file changed, 62 insertions(+)
> > >
> > > diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> > > index 22445403b520..8d6e4f8dc67a 100644
> > > --- a/drivers/thermal/intel/intel_hfi.c
> > > +++ b/drivers/thermal/intel/intel_hfi.c
> > > @@ -30,11 +30,13 @@
> > > #include <linux/kernel.h>
> > > #include <linux/math.h>
> > > #include <linux/mutex.h>
> > > +#include <linux/notifier.h>
> > > #include <linux/percpu-defs.h>
> > > #include <linux/printk.h>
> > > #include <linux/processor.h>
> > > #include <linux/slab.h>
> > > #include <linux/spinlock.h>
> > > +#include <linux/suspend.h>
> > > #include <linux/string.h>
> > > #include <linux/topology.h>
> > > #include <linux/workqueue.h>
> > > @@ -571,6 +573,60 @@ static __init int hfi_parse_features(void)
> > > return 0;
> > > }
> > >
> > > +static void hfi_do_enable(void *info)
> > > +{
> > > + struct hfi_instance *hfi_instance = info;
> > > +
> > > + hfi_set_hw_table(hfi_instance);
> > > + hfi_enable();
> > > +}
> > > +
> > > +static void hfi_do_disable(void *info)
> > > +{
> > > + hfi_disable();
> > > +}
> > > +
> > > +static int hfi_pm_notify(struct notifier_block *nb,
> > > + unsigned long mode, void *unused)
> > > +{
> > > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> > > + struct hfi_instance *hfi = info->hfi_instance;
> > > + int ret = 0;
> > > +
> > > + /* HFI may not be in use. */
> > > + if (!hfi)
> > > + return ret;
> > > +
> > > + mutex_lock(&hfi_instance_lock);
> > > + /*
> > > + * Only handle the HFI instance of the package of the boot CPU. The
> > > + * instances of other packages are handled in the CPU hotplug callbacks.
> > > + */
> > > + switch (mode) {
> > > + case PM_HIBERNATION_PREPARE:
> > > + case PM_SUSPEND_PREPARE:
> > > + case PM_RESTORE_PREPARE:
> > > + ret = smp_call_function_single(0, hfi_do_disable, NULL, true);
> > > + break;
> > > +
> > > + case PM_POST_RESTORE:
> > > + case PM_POST_HIBERNATION:
> > > + case PM_POST_SUSPEND:
> > > + ret = smp_call_function_single(0, hfi_do_enable, hfi, true);
> > > + break;
> >
> > Because this handles the boot CPU only, one has to wonder if it should
> > be a syscore op rather than a PM notifier.
> >
> > It does not sleep AFAICS, so it can run in that context, and it is
> > guaranteed to run on the boot CPU then, so it is not necessary to
> > force that. Moreover, syscore ops are guaranteed to be
> > non-concurrent, so locking is not needed.
> >
> > In addition, disabling HFI from a PM notifier is generally observable
> > by user space, because PM notifiers run before user space is frozen,
> > but doing it from a syscore op wouldn't be.

Yes, we only have to handle the boot CPU. The rest are handled via CPU
offline. Then syscore ops look like a good fit for me.

>
> One more thing: PM notifiers run on all variants of system suspend and
> resume, including suspend-to-idle in which case HFI need not be
> disabled/enabled IIUC and syscore ops only run in hibernation and
> "deep" suspend cycles, so they cover the cases in which the special
> handling is really needed and don't add useless overhead otherwise.

I verified that the HFI configuration survives suspend-to-idle. No extra
handling is needed.

2024-01-04 03:02:39

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] thermal: intel: hfi: Add a suspend notifier

On Wed, Jan 03, 2024 at 02:34:26PM +0100, Rafael J. Wysocki wrote:
> The subject should say "add a PM notifier" to indicate that
> hibernation is covered too.
>
> On Wed, Jan 3, 2024 at 5:13 AM Ricardo Neri
> <[email protected]> wrote:
> >
> > The kernel gives the HFI hardware a memory region that the latter uses to
> > provide updates to the HFI table. The kernel allocates this memory region
> > at boot. It remains constant throughout runtime time.
> >
> > When resuming from suspend or hibernation, the restore kernel allocates a
>
> The restore kernel is only used during resume from hibernation, so
> this particular problem is hibernation-specific.

Agreed. This correction is pertinent.
>
> It is possible, at least in principle, that the address of the HFI
> table is "lost" by the processor during resume from "deep" suspend
> (ACPI S3), in which case it may not survive the firmware-driven part
> of the suspend-resume cycle. It is thus prudent to disable HFI on
> suspend and re-enable it on resume for the boot CPU (under the
> assumption that the other CPUs will be taken care of by CPU offline),

Indeed, this patch aims to handle this scenario.

> but for a somewhat different reason than in the hibernation case.

I will correct the commit message to reflect this reasoning