2021-03-11 12:23:29

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 0/7] watchdog/softlockup: Report overall time and some cleanup

Resending. The original post is [1].

I dug deep into the softlockup watchdog history when time permitted
this year. And reworked the patchset that fixed timestamps and
cleaned up the code[2].

I split it into very small steps and did even more code clean up.
The result looks quite strightforward and I am pretty confident
with the changes.

[1] v2: https://lore.kernel.org/r/[email protected]
[2] v1: https://lore.kernel.org/r/[email protected]

Petr Mladek (7):
watchdog: Rename __touch_watchdog() to a better descriptive name
watchdog: Explicitly update timestamp when reporting softlockup
watchdog/softlockup: Report the overall time of softlockups
watchdog/softlockup: Remove logic that tried to prevent repeated
reports
watchdog: Fix barriers when printing backtraces from all CPUs
watchdog: Cleanup handling of false positives
Test softlockup

fs/proc/consoles.c | 5 +++
fs/proc/version.c | 7 ++++
kernel/watchdog.c | 97 ++++++++++++++++++++++++++--------------------
3 files changed, 66 insertions(+), 43 deletions(-)

--
2.26.2


2021-03-11 12:23:35

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 3/7] watchdog/softlockup: Report the overall time of softlockups

The softlockup detector currently shows the time spent since the last
report. As a result it is not clear whether a CPU is infinitely hogged
by a single task or if it is a repeated event.

The situation can be simulated with a simply busy loop:

while (true)
cpu_relax();

The softlockup detector produces:

[ 168.277520] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
[ 196.277604] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
[ 236.277522] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cat:4865]

But it should be, something like:

[ 480.372418] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cat:4943]
[ 508.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 52s! [cat:4943]
[ 548.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 89s! [cat:4943]
[ 576.372351] watchdog: BUG: soft lockup - CPU#2 stuck for 115s! [cat:4943]

For the better output, add an additional timestamp of the last report.
Only this timestamp is reset when the watchdog is intentionally
touched from slow code paths or when printing the report.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/watchdog.c | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7776d53a015c..6259590d6474 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -154,7 +154,11 @@ static void lockup_detector_update_enable(void)

#ifdef CONFIG_SOFTLOCKUP_DETECTOR

-#define SOFTLOCKUP_RESET ULONG_MAX
+/*
+ * Delay the soflockup report when running a known slow code.
+ * It does _not_ affect the timestamp of the last successdul reschedule.
+ */
+#define SOFTLOCKUP_DELAY_REPORT ULONG_MAX

#ifdef CONFIG_SMP
int __read_mostly sysctl_softlockup_all_cpu_backtrace;
@@ -169,7 +173,10 @@ unsigned int __read_mostly softlockup_panic =
static bool softlockup_initialized __read_mostly;
static u64 __read_mostly sample_period;

+/* Timestamp taken after the last successful reschedule. */
static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
+/* Timestamp of the last softlockup report. */
+static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
static DEFINE_PER_CPU(bool, softlockup_touch_sync);
static DEFINE_PER_CPU(bool, soft_watchdog_warn);
@@ -235,10 +242,16 @@ static void set_sample_period(void)
watchdog_update_hrtimer_threshold(sample_period);
}

+static void update_report_ts(void)
+{
+ __this_cpu_write(watchdog_report_ts, get_timestamp());
+}
+
/* Commands for resetting the watchdog */
static void update_touch_ts(void)
{
__this_cpu_write(watchdog_touch_ts, get_timestamp());
+ update_report_ts();
}

/**
@@ -252,10 +265,10 @@ static void update_touch_ts(void)
notrace void touch_softlockup_watchdog_sched(void)
{
/*
- * Preemption can be enabled. It doesn't matter which CPU's timestamp
- * gets zeroed here, so use the raw_ operation.
+ * Preemption can be enabled. It doesn't matter which CPU's watchdog
+ * report period gets restarted here, so use the raw_ operation.
*/
- raw_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
+ raw_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT);
}

notrace void touch_softlockup_watchdog(void)
@@ -279,23 +292,23 @@ void touch_all_softlockup_watchdogs(void)
* the softlockup check.
*/
for_each_cpu(cpu, &watchdog_allowed_mask)
- per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
+ per_cpu(watchdog_report_ts, cpu) = SOFTLOCKUP_DELAY_REPORT;
wq_watchdog_touch(-1);
}

void touch_softlockup_watchdog_sync(void)
{
__this_cpu_write(softlockup_touch_sync, true);
- __this_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
+ __this_cpu_write(watchdog_report_ts, SOFTLOCKUP_DELAY_REPORT);
}

-static int is_softlockup(unsigned long touch_ts)
+static int is_softlockup(unsigned long touch_ts, unsigned long period_ts)
{
unsigned long now = get_timestamp();

if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
/* Warn about unreasonable delays. */
- if (time_after(now, touch_ts + get_softlockup_thresh()))
+ if (time_after(now, period_ts + get_softlockup_thresh()))
return now - touch_ts;
}
return 0;
@@ -341,6 +354,7 @@ static int softlockup_fn(void *data)
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
+ unsigned long period_ts = __this_cpu_read(watchdog_report_ts);
struct pt_regs *regs = get_irq_regs();
int duration;
int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
@@ -362,7 +376,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

- if (touch_ts == SOFTLOCKUP_RESET) {
+ /* Reset the interval when touched externally by a known slow code. */
+ if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
* If the time stamp was touched atomically
@@ -374,7 +389,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)

/* Clear the guest paused flag on watchdog reset */
kvm_check_and_clear_guest_paused();
- update_touch_ts();
+ update_report_ts();
+
return HRTIMER_RESTART;
}

@@ -384,7 +400,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
* indicate it is getting cpu time. If it hasn't then
* this is a good indication some task is hogging the cpu
*/
- duration = is_softlockup(touch_ts);
+ duration = is_softlockup(touch_ts, period_ts);
if (unlikely(duration)) {
/*
* If a virtual machine is stopped by the host it can look to
@@ -410,7 +426,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
}

/* Start period for the next softlockup warning. */
- update_touch_ts();
+ update_report_ts();

pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
--
2.26.2

2021-03-11 12:23:47

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 1/7] watchdog: Rename __touch_watchdog() to a better descriptive name

There are many touch_*watchdog() functions. They are called in situations
where the watchdog could report false positives or create unnecessary
noise. For example, when CPU is entering idle mode, a virtual machine
is stopped, or a lot of messages are printed in the atomic context.

These functions set SOFTLOCKUP_RESET instead of a real timestamp. It allows
to call them even in a context where jiffies might be outdated. For example,
in an atomic context.

The real timestamp is set by __touch_watchdog() that is called from
the watchdog timer callback.

Rename this callback to update_touch_ts(). It better describes the effect
and clearly distinguish is from the other touch_*watchdog() functions.

Another motivation is that two timestamps are going to be used. One will
be used for the total softlockup time. The other will be used to measure
time since the last report. The new function name will help to distinguish
which timestamp is being updated.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 71109065bd8e..c58244064de8 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -236,7 +236,7 @@ static void set_sample_period(void)
}

/* Commands for resetting the watchdog */
-static void __touch_watchdog(void)
+static void update_touch_ts(void)
{
__this_cpu_write(watchdog_touch_ts, get_timestamp());
}
@@ -331,7 +331,7 @@ static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work);
*/
static int softlockup_fn(void *data)
{
- __touch_watchdog();
+ update_touch_ts();
complete(this_cpu_ptr(&softlockup_completion));

return 0;
@@ -374,7 +374,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)

/* Clear the guest paused flag on watchdog reset */
kvm_check_and_clear_guest_paused();
- __touch_watchdog();
+ update_touch_ts();
return HRTIMER_RESTART;
}

@@ -460,7 +460,7 @@ static void watchdog_enable(unsigned int cpu)
HRTIMER_MODE_REL_PINNED_HARD);

/* Initialize timestamp */
- __touch_watchdog();
+ update_touch_ts();
/* Enable the perf event */
if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
watchdog_nmi_enable(cpu);
--
2.26.2

2021-03-11 12:23:58

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 5/7] watchdog: Fix barriers when printing backtraces from all CPUs

Any parallel softlockup reports are skipped when one CPU is already
printing backtraces from all CPUs.

The exclusive rights are synchronized using one bit in
soft_lockup_nmi_warn. There is also one memory barrier that
does not make much sense.

Use two barriers on the right location to prevent mixing two
reports.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/watchdog.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index dc8a0bf943f5..6dc1f79e36aa 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -409,12 +409,18 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
if (kvm_check_and_clear_guest_paused())
return HRTIMER_RESTART;

+ /*
+ * Prevent multiple soft-lockup reports if one cpu is already
+ * engaged in dumping all cpu back traces.
+ */
if (softlockup_all_cpu_backtrace) {
- /* Prevent multiple soft-lockup reports if one cpu is already
- * engaged in dumping cpu back traces
- */
if (test_and_set_bit(0, &soft_lockup_nmi_warn))
return HRTIMER_RESTART;
+ /*
+ * Make sure that reports are serialized. Start
+ * printing after getting the exclusive rights.
+ */
+ smp_mb__after_atomic();
}

/* Start period for the next softlockup warning. */
@@ -431,14 +437,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
dump_stack();

if (softlockup_all_cpu_backtrace) {
- /* Avoid generating two back traces for current
- * given that one is already made above
- */
trigger_allbutself_cpu_backtrace();
-
+ /*
+ * Make sure that everything is printed before
+ * another CPU is allowed to report lockup again.
+ */
+ smp_mb__before_atomic();
+ /* Allow a further report. */
clear_bit(0, &soft_lockup_nmi_warn);
- /* Barrier to sync with other cpus */
- smp_mb__after_atomic();
}

add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
--
2.26.2

2021-03-11 12:23:59

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 4/7] watchdog/softlockup: Remove logic that tried to prevent repeated reports

The softlockup detector does some gymnastic with the variable
soft_watchdog_warn. It was added by the commit 58687acba59266735ad
("lockup_detector: Combine nmi_watchdog and softlockup detector").

The purpose is not completely clear. There are the following clues.
They describe the situation how it looked after the above mentioned
commit:

1. The variable was checked with a comment "only warn once".

2. The variable was set when softlockup was reported. It was cleared
only when the CPU was not longer in the softlockup state.

3. watchdog_touch_ts was not explicitly updated when the softlockup
was reported. Without this variable, the report would normally
be printed again during every following watchdog_timer_fn()
invocation.

The logic has got even more tangled up by the commit ed235875e2ca98
("kernel/watchdog.c: print traces for all cpus on lockup detection").
After this commit, soft_watchdog_warn is set only when
softlockup_all_cpu_backtrace is enabled. But multiple reports
from all CPUs are prevented by a new variable soft_lockup_nmi_warn.

Conclusion:

The variable probably never worked as intended. In each case, it has
not worked last many years because the softlockup was reported
repeatedly after the full period defined by watchdog_thresh.

The reason is that watchdog gets touched in many known slow paths,
for example, in printk_stack_address(). This code is called also when
printing the softlockup report. It means that the watchdog timestamp
gets updated after each report.

Solution:

Simply remove the logic. People want the periodic report anyway.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/watchdog.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6259590d6474..dc8a0bf943f5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -179,7 +179,6 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
static DEFINE_PER_CPU(bool, softlockup_touch_sync);
-static DEFINE_PER_CPU(bool, soft_watchdog_warn);
static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
static unsigned long soft_lockup_nmi_warn;
@@ -410,19 +409,12 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
if (kvm_check_and_clear_guest_paused())
return HRTIMER_RESTART;

- /* only warn once */
- if (__this_cpu_read(soft_watchdog_warn) == true)
- return HRTIMER_RESTART;
-
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is already
* engaged in dumping cpu back traces
*/
- if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {
- /* Someone else will report us. Let's give up */
- __this_cpu_write(soft_watchdog_warn, true);
+ if (test_and_set_bit(0, &soft_lockup_nmi_warn))
return HRTIMER_RESTART;
- }
}

/* Start period for the next softlockup warning. */
@@ -452,9 +444,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
if (softlockup_panic)
panic("softlockup: hung tasks");
- __this_cpu_write(soft_watchdog_warn, true);
- } else
- __this_cpu_write(soft_watchdog_warn, false);
+ }

return HRTIMER_RESTART;
}
--
2.26.2

2021-03-11 12:25:12

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 7/7] Test softlockup

Trigger busy loop by:
$> cat /proc/version

Stop the busy loop by:
$> cat /proc/consoles

The code also shows the first touch*watchdog() function that hides
softlockup on a "well known" location.

Signed-off-by: Petr Mladek <[email protected]>
---
fs/proc/consoles.c | 5 +++++
fs/proc/version.c | 7 +++++++
2 files changed, 12 insertions(+)

diff --git a/fs/proc/consoles.c b/fs/proc/consoles.c
index dfe6ce3505ce..213c0a209a7c 100644
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -9,6 +9,8 @@
#include <linux/seq_file.h>
#include <linux/tty_driver.h>

+extern volatile bool proc_version_wait;
+
/*
* This is handler for /proc/consoles
*/
@@ -30,6 +32,9 @@ static int show_console_dev(struct seq_file *m, void *v)
unsigned int a;
dev_t dev = 0;

+ printk("%s: Going to break /proc/version infinite loop\n", __func__);
+ proc_version_wait = false;
+
if (con->device) {
const struct tty_driver *driver;
int index;
diff --git a/fs/proc/version.c b/fs/proc/version.c
index b449f186577f..15ec6a502589 100644
--- a/fs/proc/version.c
+++ b/fs/proc/version.c
@@ -6,8 +6,15 @@
#include <linux/seq_file.h>
#include <linux/utsname.h>

+volatile bool proc_version_wait;
+
static int version_proc_show(struct seq_file *m, void *v)
{
+ printk("%s: Going to wait until stopped\n", __func__);
+ proc_version_wait = true;
+ while (proc_version_wait)
+ cpu_relax();
+
seq_printf(m, linux_proc_banner,
utsname()->sysname,
utsname()->release,
--
2.26.2

2021-03-11 12:26:01

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 6/7] watchdog: Cleanup handling of false positives

The commit d6ad3e286d2c075 ("softlockup: Add sched_clock_tick() to avoid
kernel warning on kgdb resume") introduced touch_softlockup_watchdog_sync().

It solved a problem when the watchdog was touched in an atomic context,
the timer callback was proceed right after releasing interrupts,
and the local clock has not been updated yet. In this case,
sched_clock_tick() was called in watchdog_timer_fn() before
updating the timer.

So far so good.

Later the commit 5d1c0f4a80a6df73 ("watchdog: add check for suspended vm
in softlockup detector") added two kvm_check_and_clear_guest_paused()
calls. They touch the watchdog when the guest has been sleeping.

The code makes my head spin around.

Scenario 1:

+ guest did sleep:
+ PVCLOCK_GUEST_STOPPED is set

+ 1st watchdog_timer_fn() invocation:
+ the watchdog is not touched yet
+ is_softlockup() returns too big delay
+ kvm_check_and_clear_guest_paused():
+ clear PVCLOCK_GUEST_STOPPED
+ call touch_softlockup_watchdog_sync()
+ set SOFTLOCKUP_DELAY_REPORT
+ set softlockup_touch_sync
+ return from the timer callback

+ 2nd watchdog_timer_fn() invocation:

+ call sched_clock_tick() even though it is not needed.
The timer callback was invoked again only because the clock
has already been updated in the meantime.

+ call kvm_check_and_clear_guest_paused() that does nothing
because PVCLOCK_GUEST_STOPPED has been cleared already.

+ call update_report_ts() and return. This is fine. Except
that sched_clock_tick() might allow to set it already
during the 1st invocation.

Scenario 2:

+ guest did sleep

+ 1st watchdog_timer_fn() invocation
+ same as in 1st scenario

+ guest did sleep again:
+ set PVCLOCK_GUEST_STOPPED again

+ 2nd watchdog_timer_fn() invocation
+ SOFTLOCKUP_DELAY_REPORT is set from 1st invocation
+ call sched_clock_tick()
+ call kvm_check_and_clear_guest_paused()
+ clear PVCLOCK_GUEST_STOPPED
+ call touch_softlockup_watchdog_sync()
+ set SOFTLOCKUP_DELAY_REPORT
+ set softlockup_touch_sync
+ call update_report_ts() (set real timestamp immediately)
+ return from the timer callback

+ 3rd watchdog_timer_fn() invocation
+ timestamp is set from 2nd invocation
+ softlockup_touch_sync is set but not checked because
the real timestamp is already set

Make the code more straightforward:

1. Always call kvm_check_and_clear_guest_paused() at the very
beginning to handle PVCLOCK_GUEST_STOPPED. It touches the watchdog
when the quest did sleep.

2. Handle the situation when the watchdog has been touched
(SOFTLOCKUP_DELAY_REPORT is set).

Call sched_clock_tick() when touch_*sync() variant was used. It makes
sure that the timestamp will be up to date even when it has been
touched in atomic context or quest did sleep.

As a result, kvm_check_and_clear_guest_paused() is called on a single
location. And the right timestamp is always set when returning from
the timer callback.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/watchdog.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6dc1f79e36aa..c050323fcd33 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

- /* Reset the interval when touched externally by a known slow code. */
+ /*
+ * If a virtual machine is stopped by the host it can look to
+ * the watchdog like a soft lockup. Check to see if the host
+ * stopped the vm before we process the timestamps.
+ */
+ kvm_check_and_clear_guest_paused();
+
+ /* Reset the interval when touched by known problematic code. */
if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
@@ -386,10 +393,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
sched_clock_tick();
}

- /* Clear the guest paused flag on watchdog reset */
- kvm_check_and_clear_guest_paused();
update_report_ts();
-
return HRTIMER_RESTART;
}

@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
*/
duration = is_softlockup(touch_ts, period_ts);
if (unlikely(duration)) {
- /*
- * If a virtual machine is stopped by the host it can look to
- * the watchdog like a soft lockup, check to see if the host
- * stopped the vm before we issue the warning
- */
- if (kvm_check_and_clear_guest_paused())
- return HRTIMER_RESTART;
-
/*
* Prevent multiple soft-lockup reports if one cpu is already
* engaged in dumping all cpu back traces.
--
2.26.2

2021-03-11 12:55:22

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2 2/7] watchdog: Explicitly update timestamp when reporting softlockup

The softlockup situation might stay for a long time or even forever.
When it happens, the softlockup debug messages are printed in regular
intervals defined by get_softlockup_thresh().

There is a mystery. The repeated message is printed after the full interval
that is defined by get_softlockup_thresh(). But the timer callback is called
more often as defined by sample_period. The code looks like the soflockup
should get reported in every sample_period when it was once behind the thresh.

It works only by chance. The watchdog is touched when printing the stall
report, for example, in printk_stack_address().

Make the behavior clear and predictable by explicitly updating
the timestamp in watchdog_timer_fn() when the report gets printed.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/watchdog.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c58244064de8..7776d53a015c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -409,6 +409,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
}
}

+ /* Start period for the next softlockup warning. */
+ update_touch_ts();
+
pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
--
2.26.2

2021-03-15 18:36:47

by kernel test robot

[permalink] [raw]
Subject: c928e9b143: BUG:soft_lockup-CPU##stuck_for#s![perf:#]



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: c928e9b1439de4d74b942abd30d5c838a40af777 ("[PATCH v2 7/7] Test softlockup")
url: https://github.com/0day-ci/linux/commits/Petr-Mladek/watchdog-softlockup-Report-overall-time-and-some-cleanup/20210311-205501
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git a74e6a014c9d4d4161061f770c9b4f98372ac778

in testcase: stress-ng
version: stress-ng-x86_64-0.11-06_20210314
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: ext4
class: filesystem
test: binderfs
cpufreq_governor: performance
ucode: 0x42e



on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 70.666742] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [perf:1794]
[ 70.675062] Modules linked in: dm_mod xfs libcrc32c sd_mod t10_pi sg intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper ghash_clmulni_intel isci syscopyarea sysfillrect rapl sysimgblt libsas fb_sys_fops ahci intel_cstate ipmi_si scsi_transport_sas libahci mei_me ipmi_devintf ioatdma drm intel_uncore ipmi_msghandler libata mei joydev dca wmi ip_tables
[ 70.725024] CPU: 8 PID: 1794 Comm: perf Not tainted 5.12.0-rc2-00303-gc928e9b1439d #1
[ 70.734501] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 70.746688] RIP: 0010:version_proc_show (kbuild/src/consumer/fs/proc/version.c:15)
[ 70.752690] Code: c3 0f 1f 44 00 00 55 48 c7 c6 00 dc 24 82 48 89 fd 48 c7 c7 a8 ed 57 82 e8 af 5d ff ff c6 05 90 60 ba 01 01 8a 05 8a 60 ba 01 <84> c0 74 04 f3 90 eb f2 65 48 8b 04 25 00 6f 01 00 48 8b 80 98 0b
All code
========
0: c3 retq
1: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
6: 55 push %rbp
7: 48 c7 c6 00 dc 24 82 mov $0xffffffff8224dc00,%rsi
e: 48 89 fd mov %rdi,%rbp
11: 48 c7 c7 a8 ed 57 82 mov $0xffffffff8257eda8,%rdi
18: e8 af 5d ff ff callq 0xffffffffffff5dcc
1d: c6 05 90 60 ba 01 01 movb $0x1,0x1ba6090(%rip) # 0x1ba60b4
24: 8a 05 8a 60 ba 01 mov 0x1ba608a(%rip),%al # 0x1ba60b4
2a:* 84 c0 test %al,%al <-- trapping instruction
2c: 74 04 je 0x32
2e: f3 90 pause
30: eb f2 jmp 0x24
32: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax
39: 01 00
3b: 48 rex.W
3c: 8b .byte 0x8b
3d: 80 .byte 0x80
3e: 98 cwtl
3f: 0b .byte 0xb

Code starting with the faulting instruction
===========================================
0: 84 c0 test %al,%al
2: 74 04 je 0x8
4: f3 90 pause
6: eb f2 jmp 0xfffffffffffffffa
8: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax
f: 01 00
11: 48 rex.W
12: 8b .byte 0x8b
13: 80 .byte 0x80
14: 98 cwtl
15: 0b .byte 0xb
[ 70.775248] RSP: 0018:ffffc9000b84bdd0 EFLAGS: 00000202
[ 70.781821] RAX: 0000000000000001 RBX: ffff888111af7ca8 RCX: 0000000000000000
[ 70.790549] RDX: 0000000000000000 RSI: ffff888f02a177f0 RDI: ffff888f02a177f0
[ 70.799294] RBP: ffff888111af7ca8 R08: ffff888f02a177f0 R09: ffffc9000b84bbf0
[ 70.808019] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000b84be88
[ 70.816751] R13: ffffc9000b84be60 R14: ffff888111af7cd0 R15: 0000000000000001
[ 70.825493] FS: 00007f3b1f0397c0(0000) GS:ffff888f02a00000(0000) knlGS:0000000000000000
[ 70.835310] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 70.842505] CR2: 00005650280be178 CR3: 0000000194ba2004 CR4: 00000000001706e0
[ 70.851261] Call Trace:
[ 70.854739] seq_read_iter (kbuild/src/consumer/fs/seq_file.c:227)
[ 70.859674] proc_reg_read_iter (kbuild/src/consumer/fs/proc/inode.c:311)
[ 70.864901] new_sync_read (kbuild/src/consumer/fs/read_write.c:416 (discriminator 1))
[ 70.869838] vfs_read (kbuild/src/consumer/fs/read_write.c:496)
[ 70.874326] ksys_read (kbuild/src/consumer/fs/read_write.c:634)
[ 70.878653] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46)
[ 70.883389] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:112)
[ 70.889715] RIP: 0033:0x7f3b1fcd5461
[ 70.894432] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
All code
========
0: fe (bad)
1: ff (bad)
2: ff 50 48 callq *0x48(%rax)
5: 8d 3d fe d0 09 00 lea 0x9d0fe(%rip),%edi # 0x9d109
b: e8 e9 03 02 00 callq 0x203f9
10: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
17: 00 00
19: 48 8d 05 99 62 0d 00 lea 0xd6299(%rip),%rax # 0xd62b9
20: 8b 00 mov (%rax),%eax
22: 85 c0 test %eax,%eax
24: 75 13 jne 0x39
26: 31 c0 xor %eax,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 57 ja 0x89
32: c3 retq
33: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
39: 41 54 push %r12
3b: 49 89 d4 mov %rdx,%r12
3e: 55 push %rbp
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 57 ja 0x5f
8: c3 retq
9: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
f: 41 54 push %r12
11: 49 89 d4 mov %rdx,%r12
14: 55 push %rbp
15: 48 rex.W
[ 70.916824] RSP: 002b:00007fffa9898db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 70.922741] watchdog: BUG: soft lockup - CPU#27 stuck for 27s! [perf:1882]
[ 70.925999] RAX: ffffffffffffffda RBX: 0000565027ffc970 RCX: 00007f3b1fcd5461
[ 70.934426] Modules linked in: dm_mod
[ 70.943145] RDX: 0000000000000400 RSI: 0000565027ffcc20 RDI: 0000000000000004
[ 70.943147] RBP: 0000000000000d68 R08: 0000000000000001 R09: 0000000000000000
[ 70.947963] xfs
[ 70.956663] R10: 00007f3b1f0397c0 R11: 0000000000000246 R12: 00007f3b1fda2760
[ 70.965389] libcrc32c
[ 70.968210] R13: 00007f3b1fda32a0 R14: 0000000000000fff R15: 0000565027ffc970
[ 70.976908] sd_mod
[ 70.980313] Kernel panic - not syncing: softlockup: hung tasks
[ 70.989033] t10_pi
[ 70.992181] CPU: 8 PID: 1794 Comm: perf Tainted: G L 5.12.0-rc2-00303-gc928e9b1439d #1
[ 70.999457] sg
[ 71.002521] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 71.013538] intel_rapl_msr
[ 71.016248] Call Trace:
[ 71.028495] intel_rapl_common
[ 71.032402] <IRQ>
[ 71.035857] sb_edac
[ 71.039990] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122)
[ 71.042930] x86_pkg_temp_thermal
[ 71.046073] panic (kbuild/src/consumer/kernel/panic.c:249)
[ 71.050431] intel_powerclamp
[ 71.054752] watchdog_timer_fn.cold (kbuild/src/consumer/kernel/watchdog.c:433)
[ 71.058764] coretemp
[ 71.062645] ? softlockup_fn (kbuild/src/consumer/kernel/watchdog.c:354)
[ 71.068018] kvm_intel
[ 71.071163] __hrtimer_run_queues (kbuild/src/consumer/kernel/time/hrtimer.c:1519 kbuild/src/consumer/kernel/time/hrtimer.c:1583)
[ 71.075919] kvm
[ 71.079161] hrtimer_interrupt (kbuild/src/consumer/kernel/time/hrtimer.c:1648)
[ 71.084497] irqbypass
[ 71.087153] __sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:25 kbuild/src/consumer/include/linux/jump_label.h:200 kbuild/src/consumer/arch/x86/include/asm/trace/irq_vectors.h:41 kbuild/src/consumer/arch/x86/kernel/apic/apic.c:1107)
[ 71.092326] mgag200
[ 71.095499] sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/kernel/apic/apic.c:1100 (discriminator 14))
[ 71.101652] crct10dif_pclmul
[ 71.104609] </IRQ>
[ 71.110501] crc32_pclmul
[ 71.114365] asm_sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/include/asm/idtentry.h:632)
[ 71.117264] crc32c_intel
[ 71.120677] RIP: 0010:version_proc_show (kbuild/src/consumer/fs/proc/version.c:15)
[ 71.126951] drm_kms_helper
[ 71.130406] Code: c3 0f 1f 44 00 00 55 48 c7 c6 00 dc 24 82 48 89 fd 48 c7 c7 a8 ed 57 82 e8 af 5d ff ff c6 05 90 60 ba 01 01 8a 05 8a 60 ba 01 <84> c0 74 04 f3 90 eb f2 65 48 8b 04 25 00 6f 01 00 48 8b 80 98 0b
All code
========
0: c3 retq
1: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
6: 55 push %rbp
7: 48 c7 c6 00 dc 24 82 mov $0xffffffff8224dc00,%rsi
e: 48 89 fd mov %rdi,%rbp
11: 48 c7 c7 a8 ed 57 82 mov $0xffffffff8257eda8,%rdi
18: e8 af 5d ff ff callq 0xffffffffffff5dcc
1d: c6 05 90 60 ba 01 01 movb $0x1,0x1ba6090(%rip) # 0x1ba60b4
24: 8a 05 8a 60 ba 01 mov 0x1ba608a(%rip),%al # 0x1ba60b4
2a:* 84 c0 test %al,%al <-- trapping instruction
2c: 74 04 je 0x32
2e: f3 90 pause
30: eb f2 jmp 0x24
32: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax
39: 01 00
3b: 48 rex.W
3c: 8b .byte 0x8b
3d: 80 .byte 0x80
3e: 98 cwtl
3f: 0b .byte 0xb

Code starting with the faulting instruction
===========================================
0: 84 c0 test %al,%al
2: 74 04 je 0x8
4: f3 90 pause
6: eb f2 jmp 0xfffffffffffffffa
8: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax
f: 01 00
11: 48 rex.W
12: 8b .byte 0x8b
13: 80 .byte 0x80
14: 98 cwtl
15: 0b .byte 0xb
[ 71.136243] ghash_clmulni_intel
[ 71.139898] RSP: 0018:ffffc9000b84bdd0 EFLAGS: 00000202
[ 71.162013] isci
[ 71.166220]
[ 71.172580] syscopyarea
[ 71.175291] RAX: 0000000000000001 RBX: ffff888111af7ca8 RCX: 0000000000000000
[ 71.177465] sysfillrect
[ 71.180792] RDX: 0000000000000000 RSI: ffff888f02a177f0 RDI: ffff888f02a177f0
[ 71.189291] rapl
[ 71.192595] RBP: ffff888111af7ca8 R08: ffff888f02a177f0 R09: ffffc9000b84bbf0
[ 71.201111] sysimgblt
[ 71.203764] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000b84be88
[ 71.212307] libsas
[ 71.215453] R13: ffffc9000b84be60 R14: ffff888111af7cd0 R15: 0000000000000001
[ 71.223956] fb_sys_fops
[ 71.226799] seq_read_iter (kbuild/src/consumer/fs/seq_file.c:227)
[ 71.235309] ahci
[ 71.238628] proc_reg_read_iter (kbuild/src/consumer/fs/proc/inode.c:311)
[ 71.243337] intel_cstate
[ 71.246008] new_sync_read (kbuild/src/consumer/fs/read_write.c:416 (discriminator 1))
[ 71.250990] ipmi_si
[ 71.254390] vfs_read (kbuild/src/consumer/fs/read_write.c:496)
[ 71.259089] scsi_transport_sas
[ 71.262024] ksys_read (kbuild/src/consumer/fs/read_write.c:634)
[ 71.266253] libahci
[ 71.270260] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46)
[ 71.274349] mei_me
[ 71.277282] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:112)
[ 71.281743] ipmi_devintf
[ 71.284552] RIP: 0033:0x7f3b1fcd5461
[ 71.290647] ioatdma
[ 71.294094] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
All code
========
0: fe (bad)
1: ff (bad)
2: ff 50 48 callq *0x48(%rax)
5: 8d 3d fe d0 09 00 lea 0x9d0fe(%rip),%edi # 0x9d109
b: e8 e9 03 02 00 callq 0x203f9
10: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
17: 00 00
19: 48 8d 05 99 62 0d 00 lea 0xd6299(%rip),%rax # 0xd62b9
20: 8b 00 mov (%rax),%eax
22: 85 c0 test %eax,%eax
24: 75 13 jne 0x39
26: 31 c0 xor %eax,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 57 ja 0x89
32: c3 retq
33: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
39: 41 54 push %r12
3b: 49 89 d4 mov %rdx,%r12
3e: 55 push %rbp
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 57 ja 0x5f
8: c3 retq
9: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
f: 41 54 push %r12
11: 49 89 d4 mov %rdx,%r12
14: 55 push %rbp
15: 48 rex.W


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (14.17 kB)
config-5.12.0-rc2-00303-gc928e9b1439d (175.56 kB)
job-script (8.37 kB)
dmesg.xz (26.72 kB)
job.yaml (5.56 kB)
Download all attachments

2021-03-16 15:40:56

by Petr Mladek

[permalink] [raw]
Subject: Re: c928e9b143: BUG:soft_lockup-CPU##stuck_for#s![perf:#]

On Mon 2021-03-15 22:04:41, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: c928e9b1439de4d74b942abd30d5c838a40af777 ("[PATCH v2 7/7] Test softlockup")
> url: https://github.com/0day-ci/linux/commits/Petr-Mladek/watchdog-softlockup-Report-overall-time-and-some-cleanup/20210311-205501
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git a74e6a014c9d4d4161061f770c9b4f98372ac778

Just for record. Please, ignore this error. The patch is not intended
for upstream.

This patch was provided only for testing purposes. It causes
softlockup, so it works as expected.

Best Regards,
Petr

2021-05-16 10:49:47

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] watchdog: Cleanup handling of false positives

Hi,

// This was never in my inbox, so sorry if I mess up the "Reply-to"
// Original message: https://lore.kernel.org/lkml/[email protected]/


>@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> /* .. and repeat */
> hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
>
> - /* Reset the interval when touched externally by a known slow code. */
> + /*
> + * If a virtual machine is stopped by the host it can look to
> + * the watchdog like a soft lockup. Check to see if the host
> + * stopped the vm before we process the timestamps.
> + */
> + kvm_check_and_clear_guest_paused();
> +
[..]
>@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> */
> duration = is_softlockup(touch_ts, period_ts);
> if (unlikely(duration)) {
> - /*
> - * If a virtual machine is stopped by the host it can look to
> - * the watchdog like a soft lockup, check to see if the host
> - * stopped the vm before we issue the warning
> - */
> - if (kvm_check_and_clear_guest_paused())
> - return HRTIMER_RESTART;

This looks racy to me. I believe kvm_check_and_clear_guest_paused()
was in the right place.

VCPU can be scheduled out/preepmpted any time at any point; and then
guest VM (or even the entire system) can be suspended. When we resume
the VM we continue from where we were preempted (from VCPU POW).

So what the old code did

watchdog_timer_fn()
{
...
<<!!>>

// Suppose we are suspended here. When we are getting resumed
// jiffies jump forward, which may look like a soft lockup.
duration = is_softlockup(touch_ts, period_ts);
if (unlikely(duration)) {
// And this is where kvm_check_and_clear_guest_paused()
// jumps in. We know already that jiffies have jumped,
// we don't know if jiffies jumped because the VM was
// suspended. And this is what we figure out here and
// bail out
if (kvm_check_and_clear_guest_paused())
return HRTIMER_RESTART;
}
}

The new code does the following

watchdog_timer_fn()
{
...
kvm_check_and_clear_guest_paused(); // PVCLOCK_GUEST_STOPPED is not set

<<!!>>

// Suppose the VM got suspended at this point. PVCLOCK_GUEST_STOPPED
// is set, but we don't check it. jiffies will jump and this will look
// like a lockup, but we don't check if jiffies jumped because the VM
// was suspended
duration = is_softlockup(touch_ts, period_ts);
if (unlikely(duration)) {
// report the lockup and perhaps panic the system,
// depending on the configuration
}
}

What am I missing?

2021-05-18 10:09:12

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] watchdog: Cleanup handling of false positives

On Sun 2021-05-16 19:46:21, Sergey Senozhatsky wrote:
> Hi,
>
> // This was never in my inbox, so sorry if I mess up the "Reply-to"
> // Original message: https://lore.kernel.org/lkml/[email protected]/
>
>
> >@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > /* .. and repeat */
> > hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
> >
> > - /* Reset the interval when touched externally by a known slow code. */
> > + /*
> > + * If a virtual machine is stopped by the host it can look to
> > + * the watchdog like a soft lockup. Check to see if the host
> > + * stopped the vm before we process the timestamps.
> > + */
> > + kvm_check_and_clear_guest_paused();
> > +
> [..]
> >@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > */
> > duration = is_softlockup(touch_ts, period_ts);
> > if (unlikely(duration)) {
> > - /*
> > - * If a virtual machine is stopped by the host it can look to
> > - * the watchdog like a soft lockup, check to see if the host
> > - * stopped the vm before we issue the warning
> > - */
> > - if (kvm_check_and_clear_guest_paused())
> > - return HRTIMER_RESTART;
>
> This looks racy to me. I believe kvm_check_and_clear_guest_paused()
> was in the right place.
>
> VCPU can be scheduled out/preepmpted any time at any point; and then
> guest VM (or even the entire system) can be suspended. When we resume
> the VM we continue from where we were preempted (from VCPU POW).
>
> So what the old code did
>
> watchdog_timer_fn()
> {
> ...
> <<!!>>
>
> // Suppose we are suspended here. When we are getting resumed
> // jiffies jump forward, which may look like a soft lockup.
> duration = is_softlockup(touch_ts, period_ts);
> if (unlikely(duration)) {
> // And this is where kvm_check_and_clear_guest_paused()
> // jumps in. We know already that jiffies have jumped,
> // we don't know if jiffies jumped because the VM was
> // suspended. And this is what we figure out here and
> // bail out
> if (kvm_check_and_clear_guest_paused())
> return HRTIMER_RESTART;
> }
> }
>
> The new code does the following
>
> watchdog_timer_fn()
> {
> ...
> kvm_check_and_clear_guest_paused(); // PVCLOCK_GUEST_STOPPED is not set
>
> <<!!>>
>
> // Suppose the VM got suspended at this point. PVCLOCK_GUEST_STOPPED
> // is set, but we don't check it. jiffies will jump and this will look
> // like a lockup, but we don't check if jiffies jumped because the VM
> // was suspended
> duration = is_softlockup(touch_ts, period_ts);
> if (unlikely(duration)) {
> // report the lockup and perhaps panic the system,
> // depending on the configuration
> }
> }
>
> What am I missing?

Great catch! You have a point.

Well, I think that the entire code is racy. touch_ts and period_ts are
set by:

unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
unsigned long period_ts = __this_cpu_read(watchdog_report_ts);

They are neither volatile not there are any barriers. It means that
period_ts might be re-read in these two checks:

/* Reset the interval when touched by known problematic code. */
if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
update_report_ts();
return HRTIMER_RESTART;
}

and

duration = is_softlockup(touch_ts, period_ts);


where:

static int is_softlockup(unsigned long touch_ts, unsigned long period_ts)
{
unsigned long now = get_timestamp();

if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
/* Warn about unreasonable delays. */
if (time_after(now, period_ts + get_softlockup_thresh()))
return now - touch_ts;
}
return 0;
}

Now, if the watchdog is touched from NMI. period_ts might be
SOFTLOCKUP_DELAY_REPORT. It is ULONG_MAX.

As a result period_ts + get_softlockup_thresh() would overflow and
we could report softlockup even when there is none.

I probably does not happen because the per-CPU variable is read only
once. And watchdogs are not typically touched from NMI. Except that
show_regs() actually touch the watchdog.

That said. This patch most likely made things worse and should be
reverted. But even better solution would be to remove this race.
I mean to make the code safe and sane at the same time.

Best Regards,
Petr