2008-10-26 14:59:48

by Bernhard Walle

[permalink] [raw]
Subject: [PATCH] [WATCHDOG] Fix kdump when using hpwdt

When the "hpwdt" module is loaded (even if the /dev/watchdog device is not
opened), then kdump does not work. The panic kernel either does not start at
all or crash in various places.

The problem is that hpwdt_pretimeout is registered with register_die_notifier()
with the highest possible priority. Because it returns NOTIFY_STOP, the
crash_nmi_callback which is also registered with register_die_notifier()
is never executed. This causes the shutdown of other CPUs to fail.

Reverting the order is no option: The crash_nmi_callback executes HLT
and so never returns normally. Because of that, it must be executed as
last notifier, which currently is done.

So, that patch returns NOTIFY_OK in case allow_kdump is set as module parameter
in the hpwdt module. Also, it changes the default of allow_kdump to 1. Kdump is
quite common and should be working as default.

Signed-off-by: Bernhard Walle <[email protected]>
Cc: Wim Van Sebroeck <[email protected]>
Cc: Thomas Mingarelli <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
drivers/watchdog/hpwdt.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index a3765e0..65e7102 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -116,7 +116,7 @@ static unsigned int reload; /* the computed soft_margin */
static int nowayout = WATCHDOG_NOWAYOUT;
static char expect_release;
static unsigned long hpwdt_is_open;
-static unsigned int allow_kdump;
+static unsigned int allow_kdump = 1;

static void __iomem *pci_mem_addr; /* the PCI-memory address */
static unsigned long __iomem *hpwdt_timer_reg;
@@ -482,7 +482,11 @@ static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
"Management Log for details.\n");
}

- return NOTIFY_STOP;
+ /*
+ * for kdump, we must return NOTIFY_OK here to execute the
+ * crash_nmi_callback afterwards, see arch/x86/kernel/crash.c
+ */
+ return allow_kdump ? NOTIFY_OK : NOTIFY_STOP;
}

/*
@@ -759,7 +763,7 @@ MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR);
module_param(soft_margin, int, 0);
MODULE_PARM_DESC(soft_margin, "Watchdog timeout in seconds");

-module_param(allow_kdump, int, 0);
+module_param(allow_kdump, int, 1);
MODULE_PARM_DESC(allow_kdump, "Start a kernel dump after NMI occurs");

module_param(nowayout, int, 0);
--
1.6.0.2


2008-10-27 19:31:08

by Wim Van Sebroeck

[permalink] [raw]
Subject: Re: [PATCH] [WATCHDOG] Fix kdump when using hpwdt

Hi Bernard,

Tom will have a look at the fix but the below code is wrong:

> -module_param(allow_kdump, int, 0);
> +module_param(allow_kdump, int, 1);
> MODULE_PARM_DESC(allow_kdump, "Start a kernel dump after NMI occurs");

the syntax is: #define module_param(name, type, perm)
perm sets the visibility in sysfs: 000 means it's not there,
read bits mean it's readable, write bits mean it's writable.

perm is not the default value for the integer but a file permission attribute.

Kind regards,
Wim.

2008-10-27 21:52:44

by Bernhard Walle

[permalink] [raw]
Subject: Re: [PATCH] [WATCHDOG] Fix kdump when using hpwdt

Hi Wim,

* Wim Van Sebroeck [2008-10-27 20:30]:

> > -module_param(allow_kdump, int, 0);
> > +module_param(allow_kdump, int, 1);
> > MODULE_PARM_DESC(allow_kdump, "Start a kernel dump after NMI occurs");
>
> the syntax is: #define module_param(name, type, perm)
> perm sets the visibility in sysfs: 000 means it's not there,
> read bits mean it's readable, write bits mean it's writable.
>
> perm is not the default value for the integer but a file permission attribute.

Yeah, thanks for spotting that. I was a bit in hurry while creating the
patch ...

I'll come up with an updated patch with the suggestions from Vivek.
Probably tomorrow.


Regards,
Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development