2012-06-06 18:09:48

by Nathan Zimmer

[permalink] [raw]
Subject: [PATCH] watchdog: reduce "NMI watchdog enabled, takes one hw-pmu counter." messages

watchdog: reduces some noise on a large system
The printk buffer can be flooded with with redundant
"NMI watchdog enabled, takes one hw-pmu counter." messages.
It doesn't add any value beyond the first.

Note the message needs logged a second time if the watchdog was disabled then
reenabled.

Cc: Andrew Morton <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Nathan Zimmer <[email protected]>
---
kernel/watchdog.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index e5e1d85..404b141 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -372,6 +372,8 @@ static int watchdog(void *unused)


#ifdef CONFIG_HARDLOCKUP_DETECTOR
+static bool already_notified;
+
static int watchdog_nmi_enable(int cpu)
{
struct perf_event_attr *wd_attr;
@@ -391,7 +393,10 @@ static int watchdog_nmi_enable(int cpu)
/* Try to register using hardware perf events */
event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
if (!IS_ERR(event)) {
- pr_info("enabled, takes one hw-pmu counter.\n");
+ if (!already_notified) {
+ pr_info("enabled, takes one hw-pmu counter.\n");
+ already_notified = true;
+ }
goto out_save;
}

@@ -426,6 +431,7 @@ static void watchdog_nmi_disable(int cpu)

/* should be in cleanup, but blocks oprofile */
perf_event_release_kernel(event);
+ already_notified = false;
}
return;
}
--
1.6.0.2


2012-06-06 21:15:43

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH] watchdog: reduce "NMI watchdog enabled, takes one hw-pmu counter." messages

On Wed, Jun 06, 2012 at 01:09:46PM -0500, Nathan Zimmer wrote:
> watchdog: reduces some noise on a large system
> The printk buffer can be flooded with with redundant
> "NMI watchdog enabled, takes one hw-pmu counter." messages.
> It doesn't add any value beyond the first.
>
> Note the message needs logged a second time if the watchdog was disabled then
> reenabled.

Hi Nathan,

Thanks for the patch. I added something similar to RHEL-6 a while ago
that solved the same problem in a more robust way (I think). IOW, I dealt
with the watchdog failures too (for virt and bios issues).

It doesn't cover the nmi_disable case like your patch does, but is easy to
add.

I attached it below. Let me know if this meets your needs or not?

Cheers,
Don

-----------------------------8<-----------------------
From: Don Zickus <[email protected]>
Date: Wed, 6 Jun 2012 15:17:19 -0400
Subject: [PATCH] watchdog: quiet down the boot messages

A bunch of bugzillas have complained how noisy the nmi_watchdog is during
boot-up especially with its expected failure cases (like virt and bios
resource contention).

This is my attempt to quiet them down and keep it less confusing for the end
user. What I did is print the message for cpu0 and save it for future
comparisions. If future cpus have an identical message as cpu0, then don't
print the redundant info. However, if a future cpu has a different message,
happily print that loudly.

Before the change, you would see something like:

..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a
Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
NMI watchdog enabled, takes one hw-pmu counter.
Booting Node 0, Processors #1
NMI watchdog enabled, takes one hw-pmu counter.
#2
NMI watchdog enabled, takes one hw-pmu counter.
#3 Ok.
NMI watchdog enabled, takes one hw-pmu counter.
Brought up 4 CPUs
Total of 4 processors activated (22607.24 BogoMIPS).

After the change, it is simlified to:

..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a
Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
NMI watchdog enabled, takes one hw-pmu counter.
Booting Node 0, Processors #1 #2 #3 Ok.
Brought up 4 CPUs

Signed-off-by: Don Zickus <[email protected]>
---
kernel/watchdog.c | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index e5e1d85..79ff671 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -377,6 +377,14 @@ static int watchdog_nmi_enable(int cpu)
struct perf_event_attr *wd_attr;
struct perf_event *event = per_cpu(watchdog_ev, cpu);

+ /*
+ * People like the simple clean cpu node info
+ * on boot. Simplify the noise from the watchdog
+ * by only printing messages that are different than
+ * what cpu0 displayed
+ */
+ static unsigned long err0 = 0;
+
/* is it already setup and enabled? */
if (event && event->state > PERF_EVENT_STATE_OFF)
goto out;
@@ -390,11 +398,21 @@ static int watchdog_nmi_enable(int cpu)

/* Try to register using hardware perf events */
event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
+
+ /* save cpu0 error for future comparision */
+ if (!cpu)
+ err0 = (IS_ERR(event) ? PTR_ERR(event) : 0);
+
if (!IS_ERR(event)) {
- pr_info("enabled, takes one hw-pmu counter.\n");
+ /* only print for cpu0 or different than cpu0 */
+ if (!cpu || err0)
+ pr_info("enabled, takes one hw-pmu counter.\n");
goto out_save;
}

+ /* skip displaying the same error again */
+ if ((PTR_ERR(event) == err0) && cpu)
+ return PTR_ERR(event);

/* vary the KERN level based on the returned errno */
if (PTR_ERR(event) == -EOPNOTSUPP)
--
1.7.7.6

2012-06-07 17:14:36

by Nathan Zimmer

[permalink] [raw]
Subject: Re: [PATCH] watchdog: reduce "NMI watchdog enabled, takes one hw-pmu counter." messages

On Wed, Jun 06, 2012 at 05:15:23PM -0400, Don Zickus wrote:
> On Wed, Jun 06, 2012 at 01:09:46PM -0500, Nathan Zimmer wrote:
> > watchdog: reduces some noise on a large system
> > The printk buffer can be flooded with with redundant
> > "NMI watchdog enabled, takes one hw-pmu counter." messages.
> > It doesn't add any value beyond the first.
> >
> > Note the message needs logged a second time if the watchdog was disabled then
> > reenabled.
>
> Hi Nathan,
>
> Thanks for the patch. I added something similar to RHEL-6 a while ago
> that solved the same problem in a more robust way (I think). IOW, I dealt
> with the watchdog failures too (for virt and bios issues).
>
> It doesn't cover the nmi_disable case like your patch does, but is easy to
> add.
>
> I attached it below. Let me know if this meets your needs or not?
>
> Cheers,
> Don

Yes it does seem to meet my needs quite well.

Thanks,
Nate