2016-03-18 15:32:06

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 1/2] watchdog: Fix output

The pr_crap() functions generate idiotic output; use printk().

Broken output:

[ 3538.718135] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:2646]
[ 3383.233583] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27

Fixed output:

[14180.328194] BUG: soft lockup - CPU#2 stuck for 23s! [perf_fuzzer:13650]
[ 1064.914925] BUG: NMI Watchdog detected hard LOCKUP on cpu 11

Cc: Don Zickus <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/watchdog.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -350,7 +350,7 @@ static void watchdog_overflow_callback(s
if (__this_cpu_read(hard_watchdog_warn) == true)
return;

- pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
+ printk(KERN_EMERG "BUG: NMI detected hard LOCKUP on cpu %d", this_cpu);
print_modules();
print_irqtrace_events(current);
if (regs)
@@ -467,7 +467,7 @@ static enum hrtimer_restart watchdog_tim
}
}

- pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
+ printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
__this_cpu_write(softlockup_task_ptr_saved, current);



2016-03-18 16:37:59

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, Mar 18, 2016 at 04:28:03PM +0100, Peter Zijlstra wrote:
> The pr_crap() functions generate idiotic output; use printk().
>
> Broken output:
>
> [ 3538.718135] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:2646]
> [ 3383.233583] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
>
> Fixed output:
>
> [14180.328194] BUG: soft lockup - CPU#2 stuck for 23s! [perf_fuzzer:13650]
> [ 1064.914925] BUG: NMI Watchdog detected hard LOCKUP on cpu 11

Would something like this be a better patch?

Cheers,
Don



diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b3ace6e..e3302a0 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -9,7 +9,7 @@
* to those contributors as well.
*/

-#define pr_fmt(fmt) "NMI watchdog: " fmt
+#define pr_fmt(fmt) "Lockup detector: " fmt

#include <linux/mm.h>
#include <linux/cpu.h>
@@ -350,7 +350,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
if (__this_cpu_read(hard_watchdog_warn) == true)
return;

- pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
+ pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu);
print_modules();
print_irqtrace_events(current);
if (regs)
@@ -467,7 +467,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
}
}

- pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
+ pr_emerg("Detected soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
__this_cpu_write(softlockup_task_ptr_saved, current);
>
> Cc: Don Zickus <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> kernel/watchdog.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(s
> if (__this_cpu_read(hard_watchdog_warn) == true)
> return;
>
> - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> + printk(KERN_EMERG "BUG: NMI detected hard LOCKUP on cpu %d", this_cpu);
> print_modules();
> print_irqtrace_events(current);
> if (regs)
> @@ -467,7 +467,7 @@ static enum hrtimer_restart watchdog_tim
> }
> }
>
> - pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> + printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> smp_processor_id(), duration,
> current->comm, task_pid_nr(current));
> __this_cpu_write(softlockup_task_ptr_saved, current);
>
>

2016-03-18 16:46:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote:
> On Fri, Mar 18, 2016 at 04:28:03PM +0100, Peter Zijlstra wrote:
> > The pr_crap() functions generate idiotic output; use printk().
> >
> > Broken output:
> >
> > [ 3538.718135] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:2646]
> > [ 3383.233583] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
> >
> > Fixed output:
> >
> > [14180.328194] BUG: soft lockup - CPU#2 stuck for 23s! [perf_fuzzer:13650]
> > [ 1064.914925] BUG: NMI Watchdog detected hard LOCKUP on cpu 11
>
> Would something like this be a better patch?

> -#define pr_fmt(fmt) "NMI watchdog: " fmt
> +#define pr_fmt(fmt) "Lockup detector: " fmt

> - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu);

That still results in something like:

Lockup detector: Detected hard LOCKUP on cpu 22

Which is still weird, doesn't mention NMI nor BUG (and when the NMI
triggers it really rather frequently is a dead system).

2016-03-18 16:49:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote:
> > Would something like this be a better patch?
>
> > -#define pr_fmt(fmt) "NMI watchdog: " fmt
> > +#define pr_fmt(fmt) "Lockup detector: " fmt

You might want to audit all the other pr_ nonsense in that file; a lot
of the msgs don't particularly make sense when combined with the prefix.

Some are for the NMI thing, others for the soft lockup one.

I would just chuck pr_crap() altogether and use sensible text.

2016-03-18 17:42:09

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, 2016-03-18 at 17:48 +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote:
> >
> > On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote:
> > >
> > > Would something like this be a better patch?
> > >
> > > -#define pr_fmt(fmt) "NMI watchdog: " fmt
> > > +#define pr_fmt(fmt) "Lockup detector: " fmt
> You might want to audit all the other pr_ nonsense in that file; a
> lot
> of the msgs don't particularly make sense when combined with the
> prefix.
>
> Some are for the NMI thing, others for the soft lockup one.
>
> I would just chuck pr_crap() altogether and use sensible text.

or change to #define pr_fmt(fmt) fmt

and output whatever unprefixed content you want.

btw: the first printk/pr_emerg should add a newline


pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
sb:
pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);

2016-03-18 17:50:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote:
> +++ b/kernel/watchdog.c
> @@ -9,7 +9,7 @@
> * to those contributors as well.
> */
>
> -#define pr_fmt(fmt) "NMI watchdog: " fmt
> +#define pr_fmt(fmt) "Lockup detector: " fmt
>
> #include <linux/mm.h>
> #include <linux/cpu.h>
> @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
> if (__this_cpu_read(hard_watchdog_warn) == true)
> return;
>
> - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu);
> print_modules();
> print_irqtrace_events(current);
> if (regs)

It was Jiri who made this mess by replacing WARN(), which has a very
distinct format, with this custom stuff.

I think we should go back to the WARN() thing.

2016-03-18 18:12:13

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, Mar 18, 2016 at 05:48:58PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote:
> > On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote:
> > > Would something like this be a better patch?
> >
> > > -#define pr_fmt(fmt) "NMI watchdog: " fmt
> > > +#define pr_fmt(fmt) "Lockup detector: " fmt
>
> You might want to audit all the other pr_ nonsense in that file; a lot
> of the msgs don't particularly make sense when combined with the prefix.
>
> Some are for the NMI thing, others for the soft lockup one.
>
> I would just chuck pr_crap() altogether and use sensible text.

Hehe. I'll look into that next week. :-)

Cheers,
Don

2016-03-19 00:11:51

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Fri, 18 Mar 2016, Peter Zijlstra wrote:

> > -#define pr_fmt(fmt) "NMI watchdog: " fmt
> > +#define pr_fmt(fmt) "Lockup detector: " fmt
> >
> > #include <linux/mm.h>
> > #include <linux/cpu.h>
> > @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
> > if (__this_cpu_read(hard_watchdog_warn) == true)
> > return;
> >
> > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> > + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu);
> > print_modules();
> > print_irqtrace_events(current);
> > if (regs)
>
> It was Jiri who made this mess by replacing WARN(), which has a very
> distinct format, with this custom stuff.

Right, this was in 55537871e ("kernel/watchdog.c: perform all-CPU
backtrace in case of hard lockup").

To be really honest, I don't really have 100% reliable explanation why I
did it this way; I *think* it was because otherwise, in case WARN is
preserved, we'll either

- have different output format for current (warning) CPU and all the
others (trigger_allbutself_cpu_backtrace())

or

- have duplicated backtrace for current CPU, one coming from WARN and one
coming from the all-cpu backtrace

> I think we should go back to the WARN() thing.

I don't really have strong opinion on this, but someone should pick the
poison from the two options above; I tried in the mentioned commit, but
apparently not to general satisfaction :)

Thanks,

--
Jiri Kosina
SUSE Labs

2016-03-19 00:39:19

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog: Fix output

On Sat, 19 Mar 2016, Jiri Kosina wrote:

> > It was Jiri who made this mess by replacing WARN(), which has a very
> > distinct format, with this custom stuff.
>
> Right, this was in 55537871e ("kernel/watchdog.c: perform all-CPU
> backtrace in case of hard lockup").
>
> To be really honest, I don't really have 100% reliable explanation why I
> did it this way; I *think* it was because otherwise, in case WARN is
> preserved, we'll either
>
> - have different output format for current (warning) CPU and all the
> others (trigger_allbutself_cpu_backtrace())
>
> or
>
> - have duplicated backtrace for current CPU, one coming from WARN and one
> coming from the all-cpu backtrace
>
> > I think we should go back to the WARN() thing.
>
> I don't really have strong opinion on this, but someone should pick the
> poison from the two options above; I tried in the mentioned commit, but
> apparently not to general satisfaction :)

Hmm, or we can actually make the whole thing behave the way that was
introduced by 55537871e only in case of
sysctl_hardlockup_all_cpu_backtrace being set, and use WARN() otherwise.

I'll send out a patch for this during / after the weekend if noone else
beats me to do it.

--
Jiri Kosina
SUSE Labs