2018-07-25 18:22:31

by Eduardo Valentin

[permalink] [raw]
Subject: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

System instability are seen during resume from hibernation when system
is under heavy CPU load. This is due to the lack of update of sched
clock data, and the scheduler would then think that heavy CPU hog
tasks need more time in CPU, causing the system to freeze
during the unfreezing of tasks. For example, threaded irqs,
and kernel processes servicing network interface may be delayed
for several tens of seconds, causing the system to be unreachable.

Situation like this can be reported by using lockup detectors
such as workqueue lockup detectors:

Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May 7 18:29:06 ...
kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s!

The fix for this situation is to mark the sched clock as unstable
as early as possible in the resume path, leaving it unstable
for the duration of the resume process. This will force the
scheduler to attempt to align the sched clock across CPUs using
the delta with time of day, updating sched clock data. In a post
hibernation event, we can then mark the sched clock as stable
again, avoiding unnecessary syncs with time of day on systems
in which TSC is reliable.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dou Liyang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Rajvi Jingar <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Eduardo Valentin <[email protected]>
---
arch/x86/kernel/tsc.c | 29 +++++++++++++++++++++++++++++
include/linux/sched/clock.h | 5 +++++
kernel/sched/clock.c | 4 ++--
3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f8142e..f197c9742fef 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -13,6 +13,7 @@
#include <linux/percpu.h>
#include <linux/timex.h>
#include <linux/static_key.h>
+#include <linux/suspend.h>

#include <asm/hpet.h>
#include <asm/timer.h>
@@ -1377,3 +1378,31 @@ unsigned long calibrate_delay_is_known(void)
return 0;
}
#endif
+
+static int tsc_pm_notifier(struct notifier_block *notifier,
+ unsigned long pm_event, void *unused)
+{
+ switch (pm_event) {
+ case PM_HIBERNATION_PREPARE:
+ clear_sched_clock_stable();
+ break;
+ case PM_POST_HIBERNATION:
+ /* Set back to the default */
+ if (!check_tsc_unstable())
+ set_sched_clock_stable();
+ break;
+ }
+
+ return 0;
+};
+
+static struct notifier_block tsc_pm_notifier_block = {
+ .notifier_call = tsc_pm_notifier,
+};
+
+static int tsc_setup_pm_notifier(void)
+{
+ return register_pm_notifier(&tsc_pm_notifier_block);
+}
+
+subsys_initcall(tsc_setup_pm_notifier);
diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index 867d588314e0..902654ac5f7e 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -32,6 +32,10 @@ static inline void clear_sched_clock_stable(void)
{
}

+static inline void set_sched_clock_stable(void)
+{
+}
+
static inline void sched_clock_idle_sleep_event(void)
{
}
@@ -51,6 +55,7 @@ static inline u64 local_clock(void)
}
#else
extern int sched_clock_stable(void);
+extern void set_sched_clock_stable(void);
extern void clear_sched_clock_stable(void);

/*
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index e086babe6c61..8453440e236c 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -131,7 +131,7 @@ static void __scd_stamp(struct sched_clock_data *scd)
scd->tick_raw = sched_clock();
}

-static void __set_sched_clock_stable(void)
+void set_sched_clock_stable(void)
{
struct sched_clock_data *scd;

@@ -228,7 +228,7 @@ static int __init sched_clock_init_late(void)
smp_mb(); /* matches {set,clear}_sched_clock_stable() */

if (__sched_clock_stable_early)
- __set_sched_clock_stable();
+ set_sched_clock_stable();

return 0;
}
--
2.18.0



2018-07-26 08:57:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

Hi Eduardo,

On Wednesday, July 25, 2018 8:18:46 PM CEST Eduardo Valentin wrote:
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data, and the scheduler would then think that heavy CPU hog
> tasks need more time in CPU, causing the system to freeze
> during the unfreezing of tasks. For example, threaded irqs,
> and kernel processes servicing network interface may be delayed
> for several tens of seconds, causing the system to be unreachable.
>
> Situation like this can be reported by using lockup detectors
> such as workqueue lockup detectors:
>
> Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
> kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!
>
> Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
> kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s!
>
> Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
> kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s!
>
> Message from syslogd@ip-172-31-67-114 at May 7 18:29:06 ...
> kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s!
>
> The fix for this situation is to mark the sched clock as unstable
> as early as possible in the resume path, leaving it unstable
> for the duration of the resume process. This will force the
> scheduler to attempt to align the sched clock across CPUs using
> the delta with time of day, updating sched clock data. In a post
> hibernation event, we can then mark the sched clock as stable
> again, avoiding unnecessary syncs with time of day on systems
> in which TSC is reliable.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Dou Liyang <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Eduardo Valentin <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Cc: Rajvi Jingar <[email protected]>
> Cc: Pavel Tatashin <[email protected]>
> Cc: Philippe Ombredanne <[email protected]>
> Cc: Kate Stewart <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Eduardo Valentin <[email protected]>

Can you please resend this with a CC to linux-pm?

Cheers,
Rafael


2018-07-26 16:26:55

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

Hey Rafael,

On Thu, Jul 26, 2018 at 10:54:06AM +0200, Rafael J. Wysocki wrote:
> Hi Eduardo,
>
> On Wednesday, July 25, 2018 8:18:46 PM CEST Eduardo Valentin wrote:
>

> Can you please resend this with a CC to linux-pm?

Sure, resent here:

https://lkml.org/lkml/2018/7/26/607
https://marc.info/?l=linux-pm&m=153262063331515&w=2

I missed it, probably because it did not come from get_maintainers.

>
> Cheers,
> Rafael
>
>

--
All the best,
Eduardo Valentin

2018-07-27 07:52:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

On Wed, Jul 25, 2018 at 11:18:46AM -0700, Eduardo Valentin wrote:
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data, and the scheduler would then think that heavy CPU hog
> tasks need more time in CPU, causing the system to freeze
> during the unfreezing of tasks. For example, threaded irqs,
> and kernel processes servicing network interface may be delayed
> for several tens of seconds, causing the system to be unreachable.

> +static int tsc_pm_notifier(struct notifier_block *notifier,
> + unsigned long pm_event, void *unused)
> +{
> + switch (pm_event) {
> + case PM_HIBERNATION_PREPARE:
> + clear_sched_clock_stable();
> + break;
> + case PM_POST_HIBERNATION:
> + /* Set back to the default */
> + if (!check_tsc_unstable())
> + set_sched_clock_stable();
> + break;
> + }

I've not looked at this in detail yet, but this is an absolute no go,
not going to happen, full stop.

If we _ever_ mark the thing unstable, that's it, the end. Allowing it to
go back to stable is a source of utter fail.