2020-01-03 15:11:54

by Waiman Long

[permalink] [raw]
Subject: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

It was found that watchdog soft lockup warning was displayed on a Cavium
ThunderX2 Sabre arm64 system at bootup time:

[ 25.496379] watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [swapper/14:0]
[ 25.496381] Modules linked in:
[ 25.496386] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G W L --------- - - 4.18.0-rhel8.1+ #9
[ 25.496388] pstate: 60000009 (nZCv daif -PAN -UAO)
[ 25.496393] pc : arch_cpu_idle+0x34/0x140
[ 25.496395] lr : arch_cpu_idle+0x30/0x140
[ 25.496397] sp : ffff000021f4ff10
[ 25.496398] x29: ffff000021f4ff10 x28: 0000000000000000
[ 25.496401] x27: 0000000000000000 x26: ffff809f483c0000
[ 25.496404] x25: 0000000000000000 x24: ffff00001145c03c
[ 25.496407] x23: ffff00001110c9f8 x22: ffff000011453708
[ 25.496410] x21: ffff00001145bffc x20: 0000000000004000
[ 25.496413] x19: ffff0000110f0018 x18: 0000000000000010
[ 25.496416] x17: 0000000000000cc8 x16: 0000000000000000
[ 25.496419] x15: ffffffffffffffff x14: ffff000011453708
[ 25.496422] x13: ffff000091cc5caf x12: ffff000011cc5cb7
[ 25.496424] x11: 6572203030642072 x10: 0000000000000d10
[ 25.496427] x9 : ffff000021f4fe80 x8 : ffff809f483c0d70
[ 25.496430] x7 : 00000000b123f581 x6 : 00000000ffff8ae1
[ 25.496433] x5 : 00000000ffffffff x4 : 0000809f6ac90000
[ 25.496436] x3 : 4000000000000000 x2 : ffff809f7bd9e9c0
[ 25.496439] x1 : ffff0000110f0018 x0 : ffff000021f4ff10
[ 25.496441] Call trace:
[ 25.496444] arch_cpu_idle+0x34/0x140
[ 25.496447] do_idle+0x210/0x288
[ 25.496449] cpu_startup_entry+0x2c/0x30
[ 25.496452] secondary_start_kernel+0x124/0x138

Further analysis of the situation revealed that the smp_init() call
itself took more than 20s for that 2-socket 56-core and 224-thread
server.

[ 0.115632] CPU1: Booted secondary processor 0x0000000100 [0x431f0af1]
:
[ 27.177282] CPU223: Booted secondary processor 0x0000011b03 [0x431f0af1]

By adding some instrumentation code, it was found that for cpu 14,
watchdog_enable() was called early with a timestamp of 1. That activates
the watchdog time checking logic. It was also found that the monotonic
time measured during the smp_init() phase runs much slower than the
real elapsed time as shown by the below debug printf output:

[ 1.138522] run_queues, watchdog_timer_fn: now = 170000000
[ 25.519391] run_queues, watchdog_timer_fn: now = 4170000000

In this particular case, it took about 24.4s of elapsed time for the
clock to advance 4s which is the soft expiration time that is required
to trigger the calling of watchdog_timer_fn(). That clock slowdown
stopped once the smp_init() call was done and the clock time ran at
the same rate as the elapsed time afterward.

On a comparable CN9980 system from HPE, there was also a bit of clock
slowdown but not as much as the Cavium system:

[ 1.177068] run_queues, watchdog_timer_fn: now = 1010000000
[ 5.577925] run_queues, watchdog_timer_fn: now = 5010000000

Similar clock slowdown was not obseved on x86-64 systems.

The clock slowdown in arm64 systems seems to be hardware specific.

Given the fact that the clock is less reliable during the smp_init()
call and that call can take a rather long time, one simple workaround
to avoid this soft lockup problem is to move lockup_detector_init()
call after smp_init() when the clock is reliable. This does fix the
soft lockup problem for that Cavium system and does not seem to affect
the operation of the watchdog. This is much easier than trying to fix
the clock slowdown problem in smp_init() as the watchdog timer is the
only timer function that can be active at such early boot stage anyway.

Signed-off-by: Waiman Long <[email protected]>
---
init/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index 1ecfd43ed464..82e231d69265 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1204,9 +1204,9 @@ static noinline void __init kernel_init_freeable(void)
init_mm_internals();

do_pre_smp_initcalls();
- lockup_detector_init();

smp_init();
+ lockup_detector_init();
sched_init_smp();

page_alloc_init_late();
--
2.18.1


2020-01-16 02:36:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

Waiman Long <[email protected]> writes:

> [ 25.496379] watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [swapper/14:0]
>
> Further analysis of the situation revealed that the smp_init() call
> itself took more than 20s for that 2-socket 56-core and 224-thread
> server.
>
> [ 0.115632] CPU1: Booted secondary processor 0x0000000100 [0x431f0af1]
> :
> [ 27.177282] CPU223: Booted secondary processor 0x0000011b03 [0x431f0af1]

So what?

> By adding some instrumentation code, it was found that for cpu 14,
> watchdog_enable() was called early with a timestamp of 1. That activates
> the watchdog time checking logic. It was also found that the monotonic
> time measured during the smp_init() phase runs much slower than the
> real elapsed time as shown by the below debug printf output:
>
> [ 1.138522] run_queues, watchdog_timer_fn: now = 170000000
> [ 25.519391] run_queues, watchdog_timer_fn: now = 4170000000
>
> In this particular case, it took about 24.4s of elapsed time for the
> clock to advance 4s which is the soft expiration time that is required
> to trigger the calling of watchdog_timer_fn(). That clock slowdown
> stopped once the smp_init() call was done and the clock time ran at
> the same rate as the elapsed time afterward.

So the consequence is that you defer the soft lockup detector
initialization instead of figuring out WHY this time difference
happens. There is no clock slowdown. The clock keeps ticking at the same
frequency so there must be something else which causes this deviation.

> On a comparable CN9980 system from HPE, there was also a bit of clock
> slowdown but not as much as the Cavium system:
>
> [ 1.177068] run_queues, watchdog_timer_fn: now = 1010000000
> [ 5.577925] run_queues, watchdog_timer_fn: now = 5010000000
>
> Similar clock slowdown was not obseved on x86-64 systems.
>
> The clock slowdown in arm64 systems seems to be hardware specific.

Maybe, but that's not a reasonable explanation.

> Given the fact that the clock is less reliable during the smp_init()

Which fact? An observation is not a fact.

> call and that call can take a rather long time, one simple workaround
> to avoid this soft lockup problem is to move lockup_detector_init()
> call after smp_init() when the clock is reliable. This does fix the
> soft lockup problem for that Cavium system and does not seem to affect
> the operation of the watchdog. This is much easier than trying to fix
> the clock slowdown problem in smp_init() as the watchdog timer is the
> only timer function that can be active at such early boot stage anyway.

Sure, it's much easier to weaken debug features instead of actually
analyzing why there is a deviation between clocks which should not have
any deviation in the first place. Root cause analysis is overrrated and
just a pet pieve of grumpy old greybeards, right?

Andrew, please drop this hackery.

Thanks,

tglx

2020-01-16 11:45:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

Thomas Gleixner <[email protected]> writes:

Added ARM64 and ThunderX folks

> Waiman Long <[email protected]> writes:
>> By adding some instrumentation code, it was found that for cpu 14,
>> watchdog_enable() was called early with a timestamp of 1. That activates
>> the watchdog time checking logic. It was also found that the monotonic
>> time measured during the smp_init() phase runs much slower than the
>> real elapsed time as shown by the below debug printf output:
>>
>> [ 1.138522] run_queues, watchdog_timer_fn: now = 170000000
>> [ 25.519391] run_queues, watchdog_timer_fn: now = 4170000000
>>
>> In this particular case, it took about 24.4s of elapsed time for the
>> clock to advance 4s which is the soft expiration time that is required
>> to trigger the calling of watchdog_timer_fn(). That clock slowdown
>> stopped once the smp_init() call was done and the clock time ran at
>> the same rate as the elapsed time afterward.

And looking at this with a more awake brain, the root cause is pretty
obvious.

sched_clock() advances by 24 seconds, but clock MONOTONIC on which the
watchdog timer is based does not. As the timestamps you printed have 7
trailing zeros, it's pretty clear that timekeeping is still jiffies
based at this point and HZ is set to 100.

So while bringing up the non-boot CPUs the boot CPU loses ~2000 timer
interrupts. That needs to be fixed and not papered over.

Thanks,

tglx

2020-01-16 15:13:46

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

Thanks Thomas for cc'ing me.

On 16.01.20 12:44:07, Thomas Gleixner wrote:
> Thomas Gleixner <[email protected]> writes:
>
> Added ARM64 and ThunderX folks
>
> > Waiman Long <[email protected]> writes:
> >> By adding some instrumentation code, it was found that for cpu 14,
> >> watchdog_enable() was called early with a timestamp of 1. That activates
> >> the watchdog time checking logic. It was also found that the monotonic
> >> time measured during the smp_init() phase runs much slower than the
> >> real elapsed time as shown by the below debug printf output:
> >>
> >> [ 1.138522] run_queues, watchdog_timer_fn: now = 170000000
> >> [ 25.519391] run_queues, watchdog_timer_fn: now = 4170000000
> >>
> >> In this particular case, it took about 24.4s of elapsed time for the
> >> clock to advance 4s which is the soft expiration time that is required
> >> to trigger the calling of watchdog_timer_fn(). That clock slowdown
> >> stopped once the smp_init() call was done and the clock time ran at
> >> the same rate as the elapsed time afterward.
>
> And looking at this with a more awake brain, the root cause is pretty
> obvious.
>
> sched_clock() advances by 24 seconds, but clock MONOTONIC on which the
> watchdog timer is based does not. As the timestamps you printed have 7
> trailing zeros, it's pretty clear that timekeeping is still jiffies
> based at this point and HZ is set to 100.
>
> So while bringing up the non-boot CPUs the boot CPU loses ~2000 timer
> interrupts. That needs to be fixed and not papered over.

I have looked into this back in December and observed the following
soft lockups on cpu #15 to #155 (out of 224):

[ 22.697242] watchdog: BUG: soft lockup - CPU#15 stuck for 20s! [swapper/15:0]
[ 22.990042] watchdog: BUG: soft lockup - CPU#16 stuck for 20s! [swapper/16:0]
[ 23.139546] watchdog: BUG: soft lockup - CPU#17 stuck for 20s! [swapper/17:0]
[ 23.280614] watchdog: BUG: soft lockup - CPU#18 stuck for 20s! [swapper/18:0]
[ 23.431378] watchdog: BUG: soft lockup - CPU#20 stuck for 20s! [swapper/20:0]
[ 23.431747] watchdog: BUG: soft lockup - CPU#19 stuck for 20s! [swapper/19:0]
[ 23.722134] watchdog: BUG: soft lockup - CPU#21 stuck for 21s! [swapper/21:0]
[ 23.723440] watchdog: BUG: soft lockup - CPU#22 stuck for 21s! [swapper/22:0]
[ 24.010887] watchdog: BUG: soft lockup - CPU#36 stuck for 20s! [swapper/36:0]
...
[ 25.149322] watchdog: BUG: soft lockup - CPU#150 stuck for 11s! [swapper/150:0]
[ 25.155822] watchdog: BUG: soft lockup - CPU#151 stuck for 11s! [swapper/151:0]
[ 25.162345] watchdog: BUG: soft lockup - CPU#152 stuck for 11s! [swapper/152:0]
[ 25.178463] watchdog: BUG: soft lockup - CPU#153 stuck for 11s! [swapper/153:0]
[ 25.184929] watchdog: BUG: soft lockup - CPU#154 stuck for 11s! [swapper/154:0]
[ 25.191382] watchdog: BUG: soft lockup - CPU#155 stuck for 11s! [swapper/155:0]

The reason it starts at #15 is that this is the first cpu where the
timer starts at 1 sec, cpus #1 to #14 have a value of 0 which is
handled special (see watchdog_timer_fn(), if (touch_ts == 0) ...).
Otherwise those cpus would also show the soft lockup.

The reason it stops after cpu #155 is that the boot process advanced
(25 secs) so that watchdog_touch_ts is now updated properly.

During secondary boot cpu bringup I have seen the hrtimer callback
watchdog_timer_fn() running correctly. get_timestamp() is also
correct. But watchdog_touch_ts is not updated since softlockup_fn() is
not running for more than 23 secs, it still keeps the original time
from watchdog_enable() (e.g. 1 for cpu #15). So the high prio
stop_worker thread which calls softlockup_fn() is not scheduled for
some reason. But it looks like scheduling starts around the time the
primary cpu has scheduled all secondary cpus to become online
(smp_init()).

[ 23.431441] watchdog: __touch_watchdog:277: cpu 20, old: 1, new: 21, from: softlockup_fn+0x14/0x38, thresh: 5
[ 23.877843] CPU223: Booted secondary processor 0x0000011b03 [0x431f0af2]

So your theory the MONOTONIC clock runs differently/wrongly could
explain that (assuming this drives the sched clock). Though, I am
wondering in what state the scheduler is until to the point it is
finally starting to run, idle?

On the other side, could it be that the scheduler starts running only
after the primary cpu finishs smp_init()? Meaning, if that takes
longer than the soft_thresh we will see soft lockups?

Note: We have seen it in a bios change causing longer delays when
executing the onlining of cpus by the fw call.

Thanks,

-Robert

2020-01-16 15:35:54

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

On 1/16/20 6:44 AM, Thomas Gleixner wrote:
> Thomas Gleixner <[email protected]> writes:
>
> Added ARM64 and ThunderX folks
>
>> Waiman Long <[email protected]> writes:
>>> By adding some instrumentation code, it was found that for cpu 14,
>>> watchdog_enable() was called early with a timestamp of 1. That activates
>>> the watchdog time checking logic. It was also found that the monotonic
>>> time measured during the smp_init() phase runs much slower than the
>>> real elapsed time as shown by the below debug printf output:
>>>
>>> [ 1.138522] run_queues, watchdog_timer_fn: now = 170000000
>>> [ 25.519391] run_queues, watchdog_timer_fn: now = 4170000000
>>>
>>> In this particular case, it took about 24.4s of elapsed time for the
>>> clock to advance 4s which is the soft expiration time that is required
>>> to trigger the calling of watchdog_timer_fn(). That clock slowdown
>>> stopped once the smp_init() call was done and the clock time ran at
>>> the same rate as the elapsed time afterward.
> And looking at this with a more awake brain, the root cause is pretty
> obvious.
>
> sched_clock() advances by 24 seconds, but clock MONOTONIC on which the
> watchdog timer is based does not. As the timestamps you printed have 7
> trailing zeros, it's pretty clear that timekeeping is still jiffies
> based at this point and HZ is set to 100.
>
> So while bringing up the non-boot CPUs the boot CPU loses ~2000 timer
> interrupts. That needs to be fixed and not papered over.
>
You are right that the root-causing effort wasn't complete and I took
the easy way out. As I have limited knowledge on how the timer code
work, I was not sure how to start further investigation at that time.
Your insight gives me a hint on where to start now. So I will dig
further in to see what causes this.

Thanks,
Longman

2020-01-16 16:59:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

Robert Richter <[email protected]> writes:
> On 16.01.20 12:44:07, Thomas Gleixner wrote:
>> And looking at this with a more awake brain, the root cause is pretty
>> obvious.
>>
>> sched_clock() advances by 24 seconds, but clock MONOTONIC on which the
>> watchdog timer is based does not. As the timestamps you printed have 7
>> trailing zeros, it's pretty clear that timekeeping is still jiffies
>> based at this point and HZ is set to 100.
>>
>> So while bringing up the non-boot CPUs the boot CPU loses ~2000 timer
>> interrupts. That needs to be fixed and not papered over.
>
> I have looked into this back in December and observed the following
> soft lockups on cpu #15 to #155 (out of 224):
>
> [ 22.697242] watchdog: BUG: soft lockup - CPU#15 stuck for 20s! [swapper/15:0]
> ...
> [ 25.191382] watchdog: BUG: soft lockup - CPU#155 stuck for 11s! [swapper/155:0]
>
> The reason it starts at #15 is that this is the first cpu where the
> timer starts at 1 sec, cpus #1 to #14 have a value of 0 which is
> handled special (see watchdog_timer_fn(), if (touch_ts == 0) ...).
> Otherwise those cpus would also show the soft lockup.

Bah. That's broken to begin with, but trivial to fix.

> The reason it stops after cpu #155 is that the boot process advanced
> (25 secs) so that watchdog_touch_ts is now updated properly.
>
> During secondary boot cpu bringup I have seen the hrtimer callback
> watchdog_timer_fn() running correctly. get_timestamp() is also

On all the upcoming CPUs?

> correct. But watchdog_touch_ts is not updated since softlockup_fn() is
> not running for more than 23 secs, it still keeps the original time
> from watchdog_enable() (e.g. 1 for cpu #15). So the high prio
> stop_worker thread which calls softlockup_fn() is not scheduled for
> some reason. But it looks like scheduling starts around the time the
> primary cpu has scheduled all secondary cpus to become online
> (smp_init()).

So what prevents the stopper thread to run? It should be runnable right
after the point when the upcoming CPU reaches cpu_startup_entry(). The
stopper thread is unparked _before_ the hotplug thread on the upcoming
CPU is woken.

> So your theory the MONOTONIC clock runs differently/wrongly could
> explain that (assuming this drives the sched clock). Though, I am

No. sched_clock() is separate. It uses a raw timestamp (in your case
from the ARM arch timer) and converts it to something which is close to
proper time. So my assumption was based on the printout Waiman had:

[ 1... ] CPU.... watchdog_fn now 170000000
[ 25.. ] CPU.... watchdog_fn now 4170000000

I assumed that now comes from ktime_get() or something like
that. Waiman?

The printk timestamps come from sched_clock().

> wondering in what state the scheduler is until to the point it is
> finally starting to run, idle?

The scheduler is active right at the point where the upcoming CPU
reaches cpu_startup_entry()

> On the other side, could it be that the scheduler starts running only
> after the primary cpu finishs smp_init()? Meaning, if that takes
> longer than the soft_thresh we will see soft lockups?

No.

> Note: We have seen it in a bios change causing longer delays when
> executing the onlining of cpus by the fw call.

The onlining through the firmware call is just a tiny part of the
hotplug procedure. Hotplug does:

Control CPU Upcoming CPU

prepare_stuff_for_cpu()
kick_cpu_alive() <- Firmware call, IPI, whatever

wait_for_idle()
Low level init
cpu_startup_entry()
unpark_stopper()
kick_hotplug_thread()
thread context init

wait_for_online()
final state reached

But IF the firmware call takes ages, then the boot CPU which is
responsible for timekeeping at this point might actually miss timer
interrupts en masse. Let's assume the firmware call takes one second and
we have HZ 100.

Real time Kernel time Context
0.0 0.0 firmware call
1.0 0.01 timer interrupt
hotplug stuff
1.01 0.02 timer interrupt
...
1.05 0.05 firmware call
2.05 0.06 timer interrupt

That sums up quickly. And that would obviously cause the watchdog timers
to expire late as well as they are based on kernel time.

If the watchdog timer then fires after a while, the timestamp is already
off by more than the threshold and the lockup detector triggers.

So yes, this would be a very reasonable explanation. Let's do the math
for Waiman's example:

While sched clock advanced by 24.4 seconds kernel time advanced only by
4 seconds.

The whole hotplug machinery takes:

[ 0.115632] CPU1: Booted secondary processor 0x0000000100 [0x431f0af1]
[ 27.177282] CPU223: Booted secondary processor 0x0000011b03 [0x431f0af1]

27 seconds, which means ~120ms per CPU.

From the time delta we know that we lost ~2000 timer interrupts in 24.4
seconds, which is about 24.4 / 0.12 ~= 203 CPUs.

Now 2000 timer interrupts (HZ=100) are 20 seconds which means that the
firmware call took about 100ms (20/203) per CPU.

Waiman also said, that on a HPE machine the clock delta was observable
as well, but not as bad, which means the firmware call there is
significantly shorter.

Of course this cannot be observed on x86 because all x86 does to kick a
secondary CPU into life is to send an IPI to it.

Hmm?

Thanks,

tglx

2020-01-16 18:18:59

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] watchdog/softlockup: Enforce that timestamp is valid on boot

Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.

The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.

Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.

Reported-by: Robert Richter <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/watchdog.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -161,6 +161,8 @@ static void lockup_detector_update_enabl

#ifdef CONFIG_SOFTLOCKUP_DETECTOR

+#define SOFTLOCKUP_RESET ULONG_MAX
+
/* Global variables, exported for sysctl */
unsigned int __read_mostly softlockup_panic =
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
@@ -272,7 +274,7 @@ notrace void touch_softlockup_watchdog_s
* Preemption can be enabled. It doesn't matter which CPU's timestamp
* gets zeroed here, so use the raw_ operation.
*/
- raw_cpu_write(watchdog_touch_ts, 0);
+ raw_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
}

notrace void touch_softlockup_watchdog(void)
@@ -296,14 +298,14 @@ void touch_all_softlockup_watchdogs(void
* the softlockup check.
*/
for_each_cpu(cpu, &watchdog_allowed_mask)
- per_cpu(watchdog_touch_ts, cpu) = 0;
+ per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
wq_watchdog_touch(-1);
}

void touch_softlockup_watchdog_sync(void)
{
__this_cpu_write(softlockup_touch_sync, true);
- __this_cpu_write(watchdog_touch_ts, 0);
+ __this_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
}

static int is_softlockup(unsigned long touch_ts)
@@ -379,7 +381,7 @@ static enum hrtimer_restart watchdog_tim
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

- if (touch_ts == 0) {
+ if (touch_ts == SOFTLOCKUP_RESET) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
* If the time stamp was touched atomically

2020-01-16 19:13:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

Waiman Long <[email protected]> writes:

> On 1/16/20 11:57 AM, Thomas Gleixner wrote:
>>> So your theory the MONOTONIC clock runs differently/wrongly could
>>> explain that (assuming this drives the sched clock). Though, I am
>> No. sched_clock() is separate. It uses a raw timestamp (in your case
>> from the ARM arch timer) and converts it to something which is close to
>> proper time. So my assumption was based on the printout Waiman had:
>>
>> [ 1... ] CPU.... watchdog_fn now 170000000
>> [ 25.. ] CPU.... watchdog_fn now 4170000000
>>
>> I assumed that now comes from ktime_get() or something like
>> that. Waiman?
>
> I printed out the now parameter of theĀ  __hrtimer_run_queues() call.

Yes. That's clock MONOTONIC.

> So from the timer perspective, it is losing time. For watchdog, the soft
> expiry time is 4s. The watchdog function won't be called until the
> timer's time advances 4s or more. That corresponds to about 24s in
> timestamp time for that particular class of systems.

Right. And assumed that the firmware call is the culprit this has an
explanation.

Could you please take sched_clock() timestamps before and after the
firmware call which kicks the secondary CPUs into life to verify that?

They should sum up to the amount of time which gets lost accross
smp_init().

Thanks,

tglx

2020-01-16 21:26:24

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

On 1/16/20 11:57 AM, Thomas Gleixner wrote:
>> So your theory the MONOTONIC clock runs differently/wrongly could
>> explain that (assuming this drives the sched clock). Though, I am
> No. sched_clock() is separate. It uses a raw timestamp (in your case
> from the ARM arch timer) and converts it to something which is close to
> proper time. So my assumption was based on the printout Waiman had:
>
> [ 1... ] CPU.... watchdog_fn now 170000000
> [ 25.. ] CPU.... watchdog_fn now 4170000000
>
> I assumed that now comes from ktime_get() or something like
> that. Waiman?

I printed out the now parameter of the? __hrtimer_run_queues() call. So
from the timer perspective, it is losing time. For watchdog, the soft
expiry time is 4s. The watchdog function won't be called until the
timer's time advances 4s or more. That corresponds to about 24s in
timestamp time for that particular class of systems.

Cheers,
Longman

2020-01-16 23:23:13

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

On 1/16/20 2:10 PM, Thomas Gleixner wrote:
> Waiman Long <[email protected]> writes:
>
>> On 1/16/20 11:57 AM, Thomas Gleixner wrote:
>>>> So your theory the MONOTONIC clock runs differently/wrongly could
>>>> explain that (assuming this drives the sched clock). Though, I am
>>> No. sched_clock() is separate. It uses a raw timestamp (in your case
>>> from the ARM arch timer) and converts it to something which is close to
>>> proper time. So my assumption was based on the printout Waiman had:
>>>
>>> [ 1... ] CPU.... watchdog_fn now 170000000
>>> [ 25.. ] CPU.... watchdog_fn now 4170000000
>>>
>>> I assumed that now comes from ktime_get() or something like
>>> that. Waiman?
>> I printed out the now parameter of theĀ  __hrtimer_run_queues() call.
> Yes. That's clock MONOTONIC.
>
>> So from the timer perspective, it is losing time. For watchdog, the soft
>> expiry time is 4s. The watchdog function won't be called until the
>> timer's time advances 4s or more. That corresponds to about 24s in
>> timestamp time for that particular class of systems.
> Right. And assumed that the firmware call is the culprit this has an
> explanation.
>
> Could you please take sched_clock() timestamps before and after the
> firmware call which kicks the secondary CPUs into life to verify that?
>
> They should sum up to the amount of time which gets lost accross
> smp_init().

Sure, I will do that after I get hold of the arm64 system that can
reproduce the issue. That system is currently used by another engineer.

Cheers,
Longman

Subject: [tip: core/core] watchdog/softlockup: Enforce that timestamp is valid on boot

The following commit has been merged into the core/core branch of tip:

Commit-ID: 11e31f608b499f044f24b20be73f1dcab3e43f8a
Gitweb: https://git.kernel.org/tip/11e31f608b499f044f24b20be73f1dcab3e43f8a
Author: Thomas Gleixner <[email protected]>
AuthorDate: Thu, 16 Jan 2020 19:17:02 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 17 Jan 2020 11:19:22 +01:00

watchdog/softlockup: Enforce that timestamp is valid on boot

Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.

The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.

Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.

Reported-by: Robert Richter <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
kernel/watchdog.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index e3774e9..b6b1f54 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -161,6 +161,8 @@ static void lockup_detector_update_enable(void)

#ifdef CONFIG_SOFTLOCKUP_DETECTOR

+#define SOFTLOCKUP_RESET ULONG_MAX
+
/* Global variables, exported for sysctl */
unsigned int __read_mostly softlockup_panic =
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
@@ -272,7 +274,7 @@ notrace void touch_softlockup_watchdog_sched(void)
* Preemption can be enabled. It doesn't matter which CPU's timestamp
* gets zeroed here, so use the raw_ operation.
*/
- raw_cpu_write(watchdog_touch_ts, 0);
+ raw_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
}

notrace void touch_softlockup_watchdog(void)
@@ -296,14 +298,14 @@ void touch_all_softlockup_watchdogs(void)
* the softlockup check.
*/
for_each_cpu(cpu, &watchdog_allowed_mask)
- per_cpu(watchdog_touch_ts, cpu) = 0;
+ per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
wq_watchdog_touch(-1);
}

void touch_softlockup_watchdog_sync(void)
{
__this_cpu_write(softlockup_touch_sync, true);
- __this_cpu_write(watchdog_touch_ts, 0);
+ __this_cpu_write(watchdog_touch_ts, SOFTLOCKUP_RESET);
}

static int is_softlockup(unsigned long touch_ts)
@@ -379,7 +381,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

- if (touch_ts == 0) {
+ if (touch_ts == SOFTLOCKUP_RESET) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
* If the time stamp was touched atomically