2023-08-05 16:21:15

by Liu Song

[permalink] [raw]
Subject: [PATCH] watchdog/hardlockup: set watchdog_hardlockup_warned to true as early as possible

Since we want to ensure only printing hardlockups once, it is necessary
to set "watchdog_hardlockup_warned" to true as early as possible.

Signed-off-by: Liu Song <[email protected]>
---
kernel/watchdog.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 25d5627a6580..c4795f2d148c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -180,6 +180,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
/* Only print hardlockups once. */
if (per_cpu(watchdog_hardlockup_warned, cpu))
return;
+ else
+ per_cpu(watchdog_hardlockup_warned, cpu) = true;

pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
print_modules();
@@ -206,8 +208,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)

if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
-
- per_cpu(watchdog_hardlockup_warned, cpu) = true;
} else {
per_cpu(watchdog_hardlockup_warned, cpu) = false;
}
--
2.19.1.6.gb485710b



2023-08-05 18:47:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] watchdog/hardlockup: set watchdog_hardlockup_warned to true as early as possible

On Sun, 6 Aug 2023 00:01:44 +0800 Liu Song <[email protected]> wrote:

> Since we want to ensure only printing hardlockups once, it is necessary
> to set "watchdog_hardlockup_warned" to true as early as possible.
>
> ...
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -180,6 +180,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> /* Only print hardlockups once. */
> if (per_cpu(watchdog_hardlockup_warned, cpu))
> return;
> + else
> + per_cpu(watchdog_hardlockup_warned, cpu) = true;

The "else" is unneeded.

> pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
> print_modules();
> @@ -206,8 +208,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>
> if (hardlockup_panic)
> nmi_panic(regs, "Hard LOCKUP");
> -
> - per_cpu(watchdog_hardlockup_warned, cpu) = true;
> } else {
> per_cpu(watchdog_hardlockup_warned, cpu) = false;
> }

When resending, please tell us some more about the effects of the
change. Presumably there are circumstances in which excess output is
produced? If so, describe these circumstances and the observed
effects.


2023-08-06 03:47:10

by Liu Song

[permalink] [raw]
Subject: Re: [PATCH] watchdog/hardlockup: set watchdog_hardlockup_warned to true as early as possible


在 2023/8/6 01:17, Andrew Morton 写道:
> When resending, please tell us some more about the effects of the
> change. Presumably there are circumstances in which excess output is
> produced? If so, describe these circumstances and the observed
> effects.

Hi,

I haven't found duplicate warnings in the real environment.

However, considering that when system occurs hard lockup is basically
abnormal, it

seems more reasonable to set "watchdog_hardlockup_warned" to ture,
rather than

waiting for all kinds of information to be printed.


Thanks


2023-08-07 15:03:39

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH] watchdog/hardlockup: set watchdog_hardlockup_warned to true as early as possible

On Sun 2023-08-06 10:52:57, Liu Song wrote:
>
> 在 2023/8/6 01:17, Andrew Morton 写道:
> > When resending, please tell us some more about the effects of the
> > change. Presumably there are circumstances in which excess output is
> > produced? If so, describe these circumstances and the observed
> > effects.
>
> Hi,
>
> I haven't found duplicate warnings in the real environment.
>
> However, considering that when system occurs hard lockup is basically
> abnormal, it
>
> seems more reasonable to set "watchdog_hardlockup_warned" to ture, rather
> than
>
> waiting for all kinds of information to be printed.

I believe that this is not needed.

watchdog_hardlockup_check(cpu, regs) is called on a CPU periodically.
There are two callers:

+ buddy detector checks the particular CPU when the solflockup's
hrtimer callback is called. See watchdog_hardlockup_kick()
in watchdog_timer_fn().

+ perf detector checks the particular CPU from a perf callback,
see watchdog_overflow_callback().

Neither timer nor perf callbacks might be nested. They are naturally
serialized on a given CPU. So, races are not possible in this case.

Best Regards,
Petr