2011-03-03 20:34:07

by Don Zickus

[permalink] [raw]
Subject: [PATCH 1/2] watchdog, nmi: Allow hardlockup to panic by default

Add a Kconfig option to allow users to set the hardlockup to panic
by default. Also add in a 'nmi_watchdog=nopanic' to override this.

Signed-off-by: Don Zickus <[email protected]>

---
Forgot to cc lkml, sorry for the spam

---
Documentation/kernel-parameters.txt | 5 +++--
kernel/watchdog.c | 5 ++++-
lib/Kconfig.debug | 17 +++++++++++++++++
3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 89835a4..ae0b499 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1577,11 +1577,12 @@ and is between 256 and 4096 characters. It is defined in the file
Format: [state][,regs][,debounce][,die]

nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
- Format: [panic,][num]
+ Format: [panic,][nopanic,][num]
Valid num: 0
0 - turn nmi_watchdog off
When panic is specified, panic when an NMI watchdog
- timeout occurs.
+ timeout occurs (or 'nopanic' to override the opposite
+ default).
This is useful when you use a panic=... timeout and
need the box quickly up again.

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 18bb157..f7c0272 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -48,12 +48,15 @@ static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
* Should we panic when a soft-lockup or hard-lockup occurs:
*/
#ifdef CONFIG_HARDLOCKUP_DETECTOR
-static int hardlockup_panic;
+static int hardlockup_panic =
+ CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;

static int __init hardlockup_panic_setup(char *str)
{
if (!strncmp(str, "panic", 5))
hardlockup_panic = 1;
+ else if (!strncmp(str, "nopanic", 5))
+ hardlockup_panic = 0;
else if (!strncmp(str, "0", 1))
watchdog_enabled = 0;
return 1;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2b97418..80bd292 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -176,6 +176,23 @@ config HARDLOCKUP_DETECTOR
def_bool LOCKUP_DETECTOR && PERF_EVENTS && HAVE_PERF_EVENTS_NMI && \
!ARCH_HAS_NMI_WATCHDOG

+config BOOTPARAM_HARDLOCKUP_PANIC
+ bool "Panic (Reboot) On Soft Lockups"
+ depends on LOCKUP_DETECTOR
+ help
+ Say Y here to enable the kernel to panic on "hard lockups",
+ which are bugs that cause the kernel to loop in kernel
+ mode with interrupts disabled for more than 60 seconds.
+
+ Say N if unsure.
+
+config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
+ int
+ depends on LOCKUP_DETECTOR
+ range 0 1
+ default 0 if !BOOTPARAM_HARDLOCKUP_PANIC
+ default 1 if BOOTPARAM_HARDLOCKUP_PANIC
+
config BOOTPARAM_SOFTLOCKUP_PANIC
bool "Panic (Reboot) On Soft Lockups"
depends on LOCKUP_DETECTOR
--
1.7.3.5


2011-03-03 20:34:12

by Don Zickus

[permalink] [raw]
Subject: [PATCH 2/2] watchdog: Always return NOTIFY_OK during cpu up/down events

This patch addresses a couple of problems. One was the case when the
hardlockup failed to start, it also failed to start the softlockup.
There were valid cases when the hardlockup shouldn't start and that
shouldn't block the softlockup (no lapic, bios controls perf counters).

The second problem was when the hardlockup failed to start on boxes
(from a no lapic or bios controlled perf counter case), it reported
failure to the cpu notifier chain. This blocked the notifier from
continuing to start other more critical pieces of cpu bring-up (in
our case based on a 2.6.32 fork, it was the mce). As a result,
during soft cpu online/offline testing, the system would panic
when a cpu was offlined because the cpu notifier would succeed in
processing a watchdog disable cpu event and would panic in the mce
case as a result of un-initialized variables from a never executed
cpu up event.

I realized the hardlockup/softlockup cases are really just debugging
aids and should never impede the progress of a cpu up/down event.
Therefore I modified the code to always return NOTIFY_OK and instead
rely on printks to inform the user of problems.

Signed-off-by: Don Zickus <[email protected]>
---
Forgot to cc lkml, sorry for the spam

---
kernel/watchdog.c | 22 ++++++++++++++++------
1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index f7c0272..c52645b 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -418,19 +418,22 @@ static int watchdog_prepare_cpu(int cpu)
static int watchdog_enable(int cpu)
{
struct task_struct *p = per_cpu(softlockup_watchdog, cpu);
- int err;
+ int err = 0;

/* enable the perf event */
err = watchdog_nmi_enable(cpu);
- if (err)
- return err;
+
+ /* Regardless of err above, fall through and start softlockup */

/* create the watchdog thread */
if (!p) {
p = kthread_create(watchdog, (void *)(unsigned long)cpu, "watchdog/%d", cpu);
if (IS_ERR(p)) {
printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu);
- return PTR_ERR(p);
+ if (!err)
+ /* if hardlockup hasn't already set this */
+ err = PTR_ERR(p);
+ goto out;
}
kthread_bind(p, cpu);
per_cpu(watchdog_touch_ts, cpu) = 0;
@@ -438,7 +441,8 @@ static int watchdog_enable(int cpu)
wake_up_process(p);
}

- return 0;
+out:
+ return err;
}

static void watchdog_disable(int cpu)
@@ -550,7 +554,13 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
break;
#endif /* CONFIG_HOTPLUG_CPU */
}
- return notifier_from_errno(err);
+
+ /*
+ * hardlockup and softlockup are not important enough
+ * to block cpu bring up. Just always succeed and
+ * rely on printk output to flag problems.
+ */
+ return NOTIFY_OK;
}

static struct notifier_block __cpuinitdata cpu_nfb = {
--
1.7.3.5

2011-03-04 23:15:28

by Jack Stone

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog, nmi: Allow hardlockup to panic by default

On 03/03/2011 20:33, Don Zickus wrote:
> Add a Kconfig option to allow users to set the hardlockup to panic
> by default. Also add in a 'nmi_watchdog=nopanic' to override this.
>
> Signed-off-by: Don Zickus <[email protected]>
>
[snip]
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 2b97418..80bd292 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -176,6 +176,23 @@ config HARDLOCKUP_DETECTOR
> def_bool LOCKUP_DETECTOR && PERF_EVENTS && HAVE_PERF_EVENTS_NMI && \
> !ARCH_HAS_NMI_WATCHDOG
>
> +config BOOTPARAM_HARDLOCKUP_PANIC
> + bool "Panic (Reboot) On Soft Lockups"

Did you mean Hard Lockups here?

Thanks,

Jack

2011-03-07 21:37:28

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 1/2] watchdog, nmi: Allow hardlockup to panic by default

On Fri, Mar 04, 2011 at 11:15:21PM +0000, Jack Stone wrote:
> On 03/03/2011 20:33, Don Zickus wrote:
> > Add a Kconfig option to allow users to set the hardlockup to panic
> > by default. Also add in a 'nmi_watchdog=nopanic' to override this.
> >
> > Signed-off-by: Don Zickus <[email protected]>
> >
> [snip]
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 2b97418..80bd292 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -176,6 +176,23 @@ config HARDLOCKUP_DETECTOR
> > def_bool LOCKUP_DETECTOR && PERF_EVENTS && HAVE_PERF_EVENTS_NMI && \
> > !ARCH_HAS_NMI_WATCHDOG
> >
> > +config BOOTPARAM_HARDLOCKUP_PANIC
> > + bool "Panic (Reboot) On Soft Lockups"
>
> Did you mean Hard Lockups here?

Yes. Thanks! :-)

Cheers,
Don