2008-11-23 13:15:41

by Bernhard Walle

[permalink] [raw]
Subject: [PATCH] [WATCHDOG] Fix kdump when using hpwdt

When the "hpwdt" module is loaded (even if the /dev/watchdog device is not
opened), then kdump does not work. The panic kernel either does not start at
all or crash in various places.

The problem is that hpwdt_pretimeout is registered with register_die_notifier()
with the highest possible priority. Because it returns NOTIFY_STOP, the
crash_nmi_callback which is also registered with register_die_notifier() is
never executed. This causes the shutdown of other CPUs to fail.

Reverting the order is no option: The crash_nmi_callback executes HLT and so
never returns normally. Because of that, it must be executed as last notifier,
which currently is done.

So, that patch returns NOTIFY_OK to keep the crash_nmi_callback executed.


Signed-off-by: Bernhard Walle <[email protected]>
Cc: Wim Van Sebroeck <[email protected]>
Cc: Thomas Mingarelli <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
drivers/watchdog/hpwdt.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index a3765e0..21fe202 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -482,7 +482,7 @@ static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
"Management Log for details.\n");
}

- return NOTIFY_STOP;
+ return NOTIFY_OK;
}

/*
--
1.6.0.2


2008-11-25 14:28:52

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] [WATCHDOG] Fix kdump when using hpwdt

On Sun, Nov 23, 2008 at 02:15:24PM +0100, Bernhard Walle wrote:
> When the "hpwdt" module is loaded (even if the /dev/watchdog device is not
> opened), then kdump does not work. The panic kernel either does not start at
> all or crash in various places.
>
> The problem is that hpwdt_pretimeout is registered with register_die_notifier()
> with the highest possible priority. Because it returns NOTIFY_STOP, the
> crash_nmi_callback which is also registered with register_die_notifier() is
> never executed. This causes the shutdown of other CPUs to fail.
>
> Reverting the order is no option: The crash_nmi_callback executes HLT and so
> never returns normally. Because of that, it must be executed as last notifier,
> which currently is done.
>
> So, that patch returns NOTIFY_OK to keep the crash_nmi_callback executed.

Hi Bernhard,

Why does this handler need to run after a crash? IOW, even if kdump NMI
handler halts the cpu, and this handler never gets a chance to run, is
that an issue.

I am getting back to previous discussion of dropping the priority of this
hpwdt. You mentioned that dropping priority will not work as kdump handler
hlts the cpus. But my point is that kdump handler is registered
dynamically only after a system crash. Does hpwdt need to run then?

Above patch as such should fix the kdump issue (assuming the handler of
this driver will always return back), but I don't understand why does
it need to run after a crash?

Thanks
Vivek

>
>
> Signed-off-by: Bernhard Walle <[email protected]>
> Cc: Wim Van Sebroeck <[email protected]>
> Cc: Thomas Mingarelli <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> ---
> drivers/watchdog/hpwdt.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index a3765e0..21fe202 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -482,7 +482,7 @@ static int hpwdt_pretimeout(struct notifier_block *nb, unsigned long ulReason,
> "Management Log for details.\n");
> }
>
> - return NOTIFY_STOP;
> + return NOTIFY_OK;
> }
>
> /*
> --
> 1.6.0.2

2008-11-25 14:32:19

by Bernhard Walle

[permalink] [raw]
Subject: Re: [PATCH] [WATCHDOG] Fix kdump when using hpwdt

* Vivek Goyal [2008-11-25 09:27]:
>
> On Sun, Nov 23, 2008 at 02:15:24PM +0100, Bernhard Walle wrote:
> > When the "hpwdt" module is loaded (even if the /dev/watchdog device is not
> > opened), then kdump does not work. The panic kernel either does not start at
> > all or crash in various places.
> >
> > The problem is that hpwdt_pretimeout is registered with register_die_notifier()
> > with the highest possible priority. Because it returns NOTIFY_STOP, the
> > crash_nmi_callback which is also registered with register_die_notifier() is
> > never executed. This causes the shutdown of other CPUs to fail.
> >
> > Reverting the order is no option: The crash_nmi_callback executes HLT and so
> > never returns normally. Because of that, it must be executed as last notifier,
> > which currently is done.
> >
> > So, that patch returns NOTIFY_OK to keep the crash_nmi_callback executed.
>
> Hi Bernhard,
>
> Why does this handler need to run after a crash? IOW, even if kdump NMI
> handler halts the cpu, and this handler never gets a chance to run, is
> that an issue.

Hi Vivek,

Because otherwise the crashkernel receives NMIs and crashes ... it just
doesn't work. The watchdog guys should be able to provide technical
details here.


Regards,
Bernhard